Advanced Topics

Custom Topologies

Romana uses an advanced, topology aware IPAM module in order to assign IP addresses to endpoints (pods or VMs). The topology awareness of Romana’s IPAM allows for the endpoint IPs to align with the topology of your network. This in turn makes it possible for Romana to effectively aggregate routes. This has many operational and security advantages:

  • Ability to use native L3 routing, allowing network equipment to work at its best
  • Greatly reduced number of routes in your networking hardware
  • Stable routing configurations, less route updates required
  • No “leaking” of endpoint routes into the networking fabric

Key to Romana’s topology aware IPAM is the topology configuration. In this configuration you model the underlying network topology for your cluster.

Terminology

Some useful terminology:

  • Network: Romana’s IPAM chooses endpoint IP addresses from one or more address ranges (CIDRs). Each of those is called a “network” in the Romana topology configuration.
  • Address block: Romana manages IP addresses in small blocks. Usually these blocks may contain 8, 16 or 32 addresses. If an IP address is needed on a host, Romana assigns one of those blocks there, then uses up the block addresses for any further endpoints on the host before assigning a new block. Block sizes are specified as network mask lengths, such as “29” (which would mean a /29 CIDR for the block). You see this parameter in the topology configuration. It effects some networking internals, such as the number of routes created on hosts or ToRs. For the most part you don’t need to worry about it and can just leave it at “29”.
  • Tenant: This may be an OpenStack tenant, or a Kubernetes namespace.
  • Group: This is a key concept of Romana’s IPAM. All hosts within a group will use endpoint addresses that share the same network prefix. That’s why Romana’s “groups” are also called “prefix groups”. This is an important consideration for topology aware addressing and route aggregation.
  • Prefix group: See “group”.

Examples

To make it easy for you to get started we have put together this page with examples for common configurations. The configurations are specified in JSON. To explain individual lines, we have added occasional comments (starting with ‘#’). Since JSON does not natively support comments, you would need to strip out those before using any of these sample config files.

Single, flat network

Use this configuration if you have hosts on a single network segment: All hosts can reach each other directly, no router is needed to forward packets. Another example may be hosts in a single AWS subnet.

Note that in the configuration we usually don’t list the actual hosts. As nodes/hosts are added to a cluster, Romana selects the ‘group’ to which the host will be assigned automatically.

{
    "networks": [                           # 'networks' or CIDRs from which Romana chooses endpoint addresses
        {
            "name"      : "my-network",     # each network needs a unique name...
            "cidr"      : "10.111.0.0/16",  # ... and a CIDR.
            "block_mask": 29                # size of address blocks for this network, safe to leave at "/29"
        }
    ],
    "topologies": [                         # list of topologies Romana knows about, just need one here
        {
            "networks": [                   # specify the networks to which this topology applies
                "my-network"
            ],
            "map": [                        # model the network's prefix groups
                {                           # if only one group is specified, it will use entire network CIDR
                    "groups": []            # just one group, all hosts will be added here
                }
            ]
        }
    ]
}

Single, flat network with host-specific prefixes

Same as above, but this time we want each host to have its own ‘prefix group’: All endpoints on a host should share the same prefix. This is useful if you wish to manually set routes in other parts of the network, so that traffic to pods can be delivered to the correct host.

Note that Romana automatically calculates prefixes for each prefix group: The available overall address space is carved up based on the number of groups. The example below shows this in the comments.

When a host is added to a cluster, Romana assigns hosts to (prefix) groups in a round-robin sort of fashion. Therefore, if the number of defined groups is at least as high as the number of hosts in your cluster, each host will live in its own prefix group.

{
    "networks": [
        {
            "name"      : "my-network",
            "cidr"      : "10.111.0.0/16",
            "block_mask": 29
        }
    ],
    "topologies": [
        {
            "networks": [ "my-network" ],
            "map": [                         # add at least as many groups as you will have hosts
                { "groups": [] },            # endpoints get addresses from 10.111.0.0/18
                { "groups": [] },            # endpoints get addresses from 10.111.64.0/18
                { "groups": [] },            # endpoints get addresses from 10.111.128.0/18
                { "groups": [] }             # endpoints get addresses from 10.111.192.0/18
            ]
        }
    ]
}

Using multiple networks

Sometimes you may have multiple, smaller address ranges available for your pod or VM addresses. Romana can seamlessly use all of them. We show this using the single, flat network topology from the first example.

{
    "networks": [
        {
            "name"      : "net-1",
            "cidr"      : "10.111.0.0/16",
            "block_mask": 29
        },
        {
            "name"      : "net-2",              # unique names for each network
            "cidr"      : "192.168.3.0/24",     # can be non-contiguous CIDR ranges
            "block_mask": 31                    # each network can have different block size
        }
    ],
    "topologies": [
        {
            "networks": [ "net-1", "net-2" ],   # list all networks that apply to the topology
            "map": [
                { "groups": [] }                # endpoints get addresses from both networks
            ]
        }
    ]
}

Using multiple topologies

It is possible to define multiple topologies, which are handled by Romana at the same time. The following example shows this. We have a total of three networks. One topology (all hosts in the same prefix group) is used for two of the networks. A third network is used by a topology, which gives each host its own prefix group (assuming the cluster does not have more than four nodes).

{
    "networks": [
        {
            "name"      : "net-1",
            "cidr"      : "10.111.0.0/16",
            "block_mask": 29
        },
        {
            "name"      : "net-2",
            "cidr"      : "10.222.0.0/16",
            "block_mask": 28
        },
        {
            "name"      : "net-3",
            "cidr"      : "172.16.0.0/16",
            "block_mask": 30
        }
    ],
    "topologies": [
        {
            "networks": [ "net-1", "net-2" ],
            "map": [
                { "groups": [] }                # endpoints get addresses from 10.111.0.0/16 and 10.222.0.0/16
            ]
        },
        {
            "networks": [ "net-3" ],
            "map": [
                { "groups": [] },               # endpoints get addresses from 172.16.0.0/18
                { "groups": [] },               # endpoints get addresses from 172.16.64.0/18
                { "groups": [] },               # endpoints get addresses from 172.16.128.0/18
                { "groups": [] }                # endpoints get addresses from 172.16.192.0/18
            ]
        }
    ]
}

Restricting tenants to networks

Romana can ensure that tenants are given addresses from specific address ranges. This allows separation of traffic in the network, using traditional CIDR based filtering and security policies.

This is accomplished via a new element: A tenants spec can be provided with each network definition.

Note that Romana does NOT influence the placement of new pods/VMs. This is done by the environment (Kubernetes or OpenStack) independently of Romana. Therefore, unless you have specified particular tenant-specific placement options in the environment, it is usually a good idea to re-use the same topology - or at least use a topology for all cluster hosts - for each tenant.

{
    "networks": [
        {
            "name"      : "production",
            "cidr"      : "10.111.0.0/16",
            "block_mask": 29,
            "tenants"   : [ "web", "app", "db" ]
        },
        {
            "name"      : "test",
            "cidr"      : "10.222.0.0/16",
            "block_mask": 32,
            "tenants"   : [ "qa", "integration" ]
        }
    ],
    "topologies": [
        {
            "networks": [ "production", "test" ],
            "map": [
                { "groups": [] }
            ]
        }
    ]
}

Deployment in a multi-rack data center

The topology file is used to model your network. Let’s say you wish to deploy a cluster across four racks in your data center. Let’s assume each rack has a ToR and that ToRs can communicate with each other. Under each ToR (in each rack) there are multiple hosts.

As nodes/hosts are added to your cluster, you should provide labels in the meta data of each host, which can assist Romana in placing the host in the correct, rack-specific prefix group. Both Kubernetes and OpenStack allow you to define labels for nodes. You can choose whatever label names and values you wish, just make sure they express the rack of the host and are identical in the environment (Kubernetes or OpenStack) as well as in the Romana topology configuration.

In this example, we use rack as the label. We introduce a new element to the Romana topology configuration: The assignment spec, which can be part of each group definition.

Note that such a multi-rack deployment would usually also involve the installation of the Romana route publisher, so that ToRs can be configured with the block routes to the hosts in the rack.

{
    "networks": [
        {
            "name"      : "my-network",
            "cidr"      : "10.111.0.0/16",
            "block_mask": 29
        }
    ],
    "topologies": [
        {
            "networks": [ "my-network" ],
            "map": [
                {
                    "assignment": { "rack": "rack-1" },   # all nodes with label 'rack == rack-1'...
                    "groups"    : []                      # ... are assigned by Romana to this group
                },
                {
                    "assignment": { "rack": "rack-2" },
                    "groups"    : []
                },
                {
                    "assignment": { "rack": "rack-3" },
                    "groups"    : []
                },
                {
                    "assignment": { "rack": "rack-4" },
                    "groups"    : []
                },
            ]
        }
    ]
}

Deployment in a multi-zone, multi-rack data center

Larger clusters may be spread over multiple data centers, or multiple spines in the data center. Romana can manage multi-hierarchy prefix groups, so that the routes across the DCs or spines can be aggregated into a single route.

The following example shows a cluster deployed across two “zones” (DCs or spines), with four racks in one zone and two racks in the other. We use multiple labels (“zone” in addition to “rack”) in order to assign nodes to prefix groups.

{
    "networks": [
        {
            "name"      : "my-network",
            "cidr"      : "10.111.0.0/16",
            "block_mask": 29
        }
    ],
    "topologies": [
        {
            "networks": [ "my-network" ],
            "map": [
                {
                    "assignment": { "zone" : "zone-A" },
                    "groups"    : [                              # addresses from 10.111.0.0/17
                        {
                            "assignment": { "rack": "rack-3" },
                            "groups"    : []                     # addresses from 10.111.0.0/19
                        },
                        {
                            "assignment": { "rack": "rack-4" },
                            "groups"    : []                     # addresses from 10.111.32.0/19
                        },
                        {
                            "assignment": { "rack": "rack-7" },
                            "groups"    : []                     # addresses from 10.111.64.0/19
                        },
                        {
                            "assignment": { "rack": "rack-9" },
                            "groups"    : []                     # addresses from 10.111.96.0/19
                        }
                    ]
                },
                {
                    "assignment": { "zone" : "zone-B" },
                    "groups"    : [                              # addresses from 10.111.128.0/17
                        {
                            "assignment": { "rack": "rack-17" },
                            "groups"    : []                     # addresses from 10.111.128.0/18
                        },
                        {
                            "assignment": { "rack": "rack-22" },
                            "groups"    : []                     # addresses from 10.111.192.0/18
                        }
                    ]
                }
            ]
        }
    ]
}

Route Publisher Add-on

For Kubernetes clusters installed in datacenters, it is useful to enable the Romana Route Publisher add-on. It is used to automatically announce routes for Romana addresses to your BGP- or OSPF-enabled router, removing the need to configure these manually.

Because the routes are for prefixes instead of precise /32 endpoint addresses, the rate and volume of routes to publish is reduced.

Configuration

The Romana Route Publisher uses BIRD to announce routes from the node to other network elements. Configuration is separated into two parts:

  • a static bird.conf to describe the basic configuration of BIRD, ending with an include
  • a dynamic publisher.conf that is used to generate a config containing routes for Romana addresses

When the pod first launches, BIRD is launched using the static configuration. Then, when new blocks of Romana addreses are allocated to a node, the dynamic configuration is generated with routes for those blocks, and BIRD is given a signal to reload its configuration.

If your configuration requires custom configuration per-node or per-subnet, there is a naming convention for the files that can be used to support this.

Both config files will look for a “best match” extension to the name first. When loading x.conf on a node with IP 192.168.20.30/24, it will first look for:

  • x.conf.192.168.20.30 (IP suffix for node-specific config)
  • x.conf.192.168.20.0 (Network address suffix, for subnet-specific config)
  • x.conf

Examples

bird.conf (for both BGP and OSPF)

router id from 192.168.0.0/16;

protocol kernel {
    scan time 60;
    import none;
        export all;
}

protocol device {
    scan time 60;
}

include "conf.d/*.conf";
  • Make sure the CIDR specified for router id matches your cluster nodes.
  • The protocol kernel and protocol device can be modified, or just deleted if not necessary.
  • Add any additional, global BIRD configuration to this file (eg: debugging, timeouts, etc)
  • The include line is the hook to load the generated dynamic config. It should be in your bird.conf exactly as specified.

publisher.conf for OSPF

protocol static romana_routes {
    {{range .Networks}}
    route {{.}} reject;
    {{end}}
}

protocol ospf OSPF {
  export where proto = "romana_routes";
  area 0.0.0.0 {
    interface "eth0" {
      type broadcast;
    };
  };
}
  • The first section, protocol static static_bgp is used by the romana-route-publisher to generate a dynamic config.
  • The second section, protocol ospf OSPF should contain the export entry, and area blocks to match your environment.
  • The interface names will need to be modified to match the node’s actual interfaces
  • Add any additional, protocol-specific BIRD configuration to this file

publisher.conf for BGP

protocol static romana_routes {
    {{range .Networks}}
    route {{.}} reject;
    {{end}}
}

protocol bgp BGP {
    export where proto = "romana_routes";
        direct;
    local as {{.LocalAS}};
    neighbor 192.168.20.1 as {{.LocalAS}};
}
  • The first section, protocol static static_bgp is used by the romana-route-publisher to generate a dynamic config.
  • The second section, protocol bgp BGP should be changed to match your specific BGP configuration.
  • Add any additional, protocol-specific BIRD configuration to this file
  • The neighbor address will likely be different for each subnet. To handle this, you can use multiple publisher.conf files with the appropriate network address suffixes, eg:
  • bird.conf.192.168.20.0
  • bird.conf.192.168.35.0

Installation

First, the configuration files need to be loaded into a configmap.

  1. Put all the files into a single directory
  2. cd to that directory
  3. Run kubectl -n kube-system create configmap route-publisher-config --from-file=. (the . indicates the current directory)

Next, download the YAML file from here to your master node.

Then, load the Romana Route Publisher add-on by running this command on your master node.

kubectl apply -f romana-route-publisher.yaml

Verification

Check that route publisher pods are running correctly

$ kubectl -n kube-system get pods --selector=romana-app=route-publisher
NAME                           READY     STATUS    RESTARTS   AGE
romana-route-publisher-22rjh   2/2       Running   0          1d
romana-route-publisher-x5f9g   2/2       Running   0          1d

Check the logs of the bird container inside the pods

$ kubectl -n kube-system logs romana-route-publisher-22rjh bird
Launching BIRD
bird: Chosen router ID 192.168.XX.YY according to interface XXXX
bird: Started

Other messages you may see in this container:

bird: Reconfiguration requested by SIGHUP
bird: Reconfiguring
bird: Adding protocol romana_routes
bird: Adding protocol OSPF
bird: Reconfigured

Check the logs of the publisher container inside the pods

$ kubectl -n kube-system logs romana-route-publisher-22rjh publisher
Checking if etcd is running...ok.
member 8e9e05c52164694d is healthy: got healthy result from http://10.96.0.88:12379
cluster is healthy
Checking if romana daemon is running...ok.
Checking if romana networks are configured...ok. one network configured.
Checking for route publisher template....ok
Checking for pidfile from bird...ok
Launching Romana Route Publisher

Other messages you may see in this container:

20XX/YY/ZZ HH:MM:SS Starting bgp update at 65534 -> : with 2 networks
20XX/YY/ZZ HH:MM:SS Finished bgp update

These are normal, even if OSPF is being used.

Romana VIPs

Kubernetes users running on premises that want an easy way to expose their services outside a cluster on their datacenter network can use external-IPs.

Although external-IPs are simple, they represent a single point of failure for the service and require manual allocation and configuration on the nodes. When there are many to configure, this can be tedious and prone to error. Romana VIPs are a solution to these problems.

Romana VIPs are defined by an annotation in a service spec. Romana then automatically brings up that IP on a node. Romana chooses a node with a pod running locally to avoid network latency within the cluster. When a node with a Romana VIP fails, Romana will bring up the VIP on a new node, providing failover for external services.

Romana VIPs are useful for exposing services on datacenter LANs that only need simple kubeproxy load balancing across pods. Romana VIPs can also be used to expose individual pods when a stable IP is required, such as Cassandra and other Big Data applications. Romana VIPs work in conjunction with Romana DNS, which can be deployed as a service discovery mechanism for individual pods exposed outside of a cluster.

Romana VIP failover requires that all nodes be on the same network segment. Addresses for Romana VIPs must be manually provisioned on the network.

Example configuration

The example below shows a RomanaIP (192.168.99.101) configured on a node for the nginx service by adding the romanaip annotation to the spec.

...
kind: Service
metadata:
  name: nginx
  annotations:
    romanaip: '{"auto": false, "ip": "192.168.99.101"}'
...

The complete service spec is available here

Romana DNS

Romana DNS adds DNS support for Romana VIPs. It is drop in replacement for kube-dns.

Installation

On Master node of kubernetes cluster

  • Make a note on number of replicas for kube-dns using following command:

    echo `kubectl get deploy -n kube-system kube-dns -o jsonpath="{.spec.replicas}"`
    
  • Now set replicas for kube-dns to zero using following command:

    kubectl scale deploy -n kube-system kube-dns --replicas=0
    
  • Wait till kube-dns replicas are zero (around a minute or so)

On All nodes i.e master and compute nodes of the kubernetes cluster

  • Remove earlier docker images and replace it romana one using commands below:

    docker rmi gcr.io/google_containers/k8s-dns-kube-dns-amd64:1.14.5
    docker pull pani/romanadns
    docker tag pani/romanadns:latest gcr.io/google_containers/k8s-dns-kube-dns-amd64:1.14.5
    
  • Now return back to master node for further commands

On Master node of kubernetes cluster

  • Now assuming you had 2 replicas before, from first step above, we restore the replica count for kube-dns as follows:

    kubectl scale deploy -n kube-system kube-dns --replicas=2
    
  • Wait for a minute or so for the pod to come up and we have romanaDNS up and running.

DNS Testing

  • Run dig to see if dns is working properly using command:

    dig @10.96.0.10 +short romana.kube-system.svc.cluster.local
    
  • Download this sample nginx yaml file and then use following command to create an nginx service with RomanaIP in it:

    kubectl create -f nginx.yml
    
  • This should create and load nginx service with RomanaIP, which should reflect in the dig result below:

    dig @10.96.0.10 +short nginx.default.svc.cluster.local
    

Sample DNS Results

$ dig @10.96.0.10 +short romana.kube-system.svc.cluster.local
10.96.0.99
192.168.99.10
$ dig @10.96.0.10 +short nginx.default.svc.cluster.local
10.116.0.0
10.99.181.64
192.168.99.101