Advanced Topics¶
Custom Topologies¶
Romana uses an advanced, topology aware IPAM module in order to assign IP addresses to endpoints (pods or VMs). The topology awareness of Romana’s IPAM allows for the endpoint IPs to align with the topology of your network. This in turn makes it possible for Romana to effectively aggregate routes. This has many operational and security advantages:
- Ability to use native L3 routing, allowing network equipment to work at its best
- Greatly reduced number of routes in your networking hardware
- Stable routing configurations, less route updates required
- No “leaking” of endpoint routes into the networking fabric
Key to Romana’s topology aware IPAM is the topology configuration. In this configuration you model the underlying network topology for your cluster.
Terminology¶
Some useful terminology:
- Network: Romana’s IPAM chooses endpoint IP addresses from one or more address ranges (CIDRs). Each of those is called a “network” in the Romana topology configuration.
- Address block: Romana manages IP addresses in small blocks.
Usually these blocks may contain 8, 16 or 32 addresses. If an IP
address is needed on a host, Romana assigns one of those blocks
there, then uses up the block addresses for any further endpoints on
the host before assigning a new block. Block sizes are specified as
network mask lengths, such as “29” (which would mean a
/29
CIDR for the block). You see this parameter in the topology configuration. It effects some networking internals, such as the number of routes created on hosts or ToRs. For the most part you don’t need to worry about it and can just leave it at “29”. - Tenant: This may be an OpenStack tenant, or a Kubernetes namespace.
- Group: This is a key concept of Romana’s IPAM. All hosts within a group will use endpoint addresses that share the same network prefix. That’s why Romana’s “groups” are also called “prefix groups”. This is an important consideration for topology aware addressing and route aggregation.
- Prefix group: See “group”.
Examples¶
To make it easy for you to get started we have put together this page with examples for common configurations. The configurations are specified in JSON. To explain individual lines, we have added occasional comments (starting with ‘#’). Since JSON does not natively support comments, you would need to strip out those before using any of these sample config files.
Single, flat network¶
Use this configuration if you have hosts on a single network segment: All hosts can reach each other directly, no router is needed to forward packets. Another example may be hosts in a single AWS subnet.
Note that in the configuration we usually don’t list the actual hosts. As nodes/hosts are added to a cluster, Romana selects the ‘group’ to which the host will be assigned automatically.
{
"networks": [ # 'networks' or CIDRs from which Romana chooses endpoint addresses
{
"name" : "my-network", # each network needs a unique name...
"cidr" : "10.111.0.0/16", # ... and a CIDR.
"block_mask": 29 # size of address blocks for this network, safe to leave at "/29"
}
],
"topologies": [ # list of topologies Romana knows about, just need one here
{
"networks": [ # specify the networks to which this topology applies
"my-network"
],
"map": [ # model the network's prefix groups
{ # if only one group is specified, it will use entire network CIDR
"groups": [] # just one group, all hosts will be added here
}
]
}
]
}
Single, flat network with host-specific prefixes¶
Same as above, but this time we want each host to have its own ‘prefix group’: All endpoints on a host should share the same prefix. This is useful if you wish to manually set routes in other parts of the network, so that traffic to pods can be delivered to the correct host.
Note that Romana automatically calculates prefixes for each prefix group: The available overall address space is carved up based on the number of groups. The example below shows this in the comments.
When a host is added to a cluster, Romana assigns hosts to (prefix) groups in a round-robin sort of fashion. Therefore, if the number of defined groups is at least as high as the number of hosts in your cluster, each host will live in its own prefix group.
{
"networks": [
{
"name" : "my-network",
"cidr" : "10.111.0.0/16",
"block_mask": 29
}
],
"topologies": [
{
"networks": [ "my-network" ],
"map": [ # add at least as many groups as you will have hosts
{ "groups": [] }, # endpoints get addresses from 10.111.0.0/18
{ "groups": [] }, # endpoints get addresses from 10.111.64.0/18
{ "groups": [] }, # endpoints get addresses from 10.111.128.0/18
{ "groups": [] } # endpoints get addresses from 10.111.192.0/18
]
}
]
}
Using multiple networks¶
Sometimes you may have multiple, smaller address ranges available for your pod or VM addresses. Romana can seamlessly use all of them. We show this using the single, flat network topology from the first example.
{
"networks": [
{
"name" : "net-1",
"cidr" : "10.111.0.0/16",
"block_mask": 29
},
{
"name" : "net-2", # unique names for each network
"cidr" : "192.168.3.0/24", # can be non-contiguous CIDR ranges
"block_mask": 31 # each network can have different block size
}
],
"topologies": [
{
"networks": [ "net-1", "net-2" ], # list all networks that apply to the topology
"map": [
{ "groups": [] } # endpoints get addresses from both networks
]
}
]
}
Using multiple topologies¶
It is possible to define multiple topologies, which are handled by Romana at the same time. The following example shows this. We have a total of three networks. One topology (all hosts in the same prefix group) is used for two of the networks. A third network is used by a topology, which gives each host its own prefix group (assuming the cluster does not have more than four nodes).
{
"networks": [
{
"name" : "net-1",
"cidr" : "10.111.0.0/16",
"block_mask": 29
},
{
"name" : "net-2",
"cidr" : "10.222.0.0/16",
"block_mask": 28
},
{
"name" : "net-3",
"cidr" : "172.16.0.0/16",
"block_mask": 30
}
],
"topologies": [
{
"networks": [ "net-1", "net-2" ],
"map": [
{ "groups": [] } # endpoints get addresses from 10.111.0.0/16 and 10.222.0.0/16
]
},
{
"networks": [ "net-3" ],
"map": [
{ "groups": [] }, # endpoints get addresses from 172.16.0.0/18
{ "groups": [] }, # endpoints get addresses from 172.16.64.0/18
{ "groups": [] }, # endpoints get addresses from 172.16.128.0/18
{ "groups": [] } # endpoints get addresses from 172.16.192.0/18
]
}
]
}
Restricting tenants to networks¶
Romana can ensure that tenants are given addresses from specific address ranges. This allows separation of traffic in the network, using traditional CIDR based filtering and security policies.
This is accomplished via a new element: A tenants
spec can be
provided with each network definition.
Note that Romana does NOT influence the placement of new pods/VMs. This is done by the environment (Kubernetes or OpenStack) independently of Romana. Therefore, unless you have specified particular tenant-specific placement options in the environment, it is usually a good idea to re-use the same topology - or at least use a topology for all cluster hosts - for each tenant.
{
"networks": [
{
"name" : "production",
"cidr" : "10.111.0.0/16",
"block_mask": 29,
"tenants" : [ "web", "app", "db" ]
},
{
"name" : "test",
"cidr" : "10.222.0.0/16",
"block_mask": 32,
"tenants" : [ "qa", "integration" ]
}
],
"topologies": [
{
"networks": [ "production", "test" ],
"map": [
{ "groups": [] }
]
}
]
}
Deployment in a multi-rack data center¶
The topology file is used to model your network. Let’s say you wish to deploy a cluster across four racks in your data center. Let’s assume each rack has a ToR and that ToRs can communicate with each other. Under each ToR (in each rack) there are multiple hosts.
As nodes/hosts are added to your cluster, you should provide labels in the meta data of each host, which can assist Romana in placing the host in the correct, rack-specific prefix group. Both Kubernetes and OpenStack allow you to define labels for nodes. You can choose whatever label names and values you wish, just make sure they express the rack of the host and are identical in the environment (Kubernetes or OpenStack) as well as in the Romana topology configuration.
In this example, we use rack
as the label. We introduce a new
element to the Romana topology configuration: The assignment
spec,
which can be part of each group definition.
Note that such a multi-rack deployment would usually also involve the installation of the Romana route publisher, so that ToRs can be configured with the block routes to the hosts in the rack.
{
"networks": [
{
"name" : "my-network",
"cidr" : "10.111.0.0/16",
"block_mask": 29
}
],
"topologies": [
{
"networks": [ "my-network" ],
"map": [
{
"assignment": { "rack": "rack-1" }, # all nodes with label 'rack == rack-1'...
"groups" : [] # ... are assigned by Romana to this group
},
{
"assignment": { "rack": "rack-2" },
"groups" : []
},
{
"assignment": { "rack": "rack-3" },
"groups" : []
},
{
"assignment": { "rack": "rack-4" },
"groups" : []
},
]
}
]
}
Deployment in a multi-zone, multi-rack data center¶
Larger clusters may be spread over multiple data centers, or multiple spines in the data center. Romana can manage multi-hierarchy prefix groups, so that the routes across the DCs or spines can be aggregated into a single route.
The following example shows a cluster deployed across two “zones” (DCs or spines), with four racks in one zone and two racks in the other. We use multiple labels (“zone” in addition to “rack”) in order to assign nodes to prefix groups.
{
"networks": [
{
"name" : "my-network",
"cidr" : "10.111.0.0/16",
"block_mask": 29
}
],
"topologies": [
{
"networks": [ "my-network" ],
"map": [
{
"assignment": { "zone" : "zone-A" },
"groups" : [ # addresses from 10.111.0.0/17
{
"assignment": { "rack": "rack-3" },
"groups" : [] # addresses from 10.111.0.0/19
},
{
"assignment": { "rack": "rack-4" },
"groups" : [] # addresses from 10.111.32.0/19
},
{
"assignment": { "rack": "rack-7" },
"groups" : [] # addresses from 10.111.64.0/19
},
{
"assignment": { "rack": "rack-9" },
"groups" : [] # addresses from 10.111.96.0/19
}
]
},
{
"assignment": { "zone" : "zone-B" },
"groups" : [ # addresses from 10.111.128.0/17
{
"assignment": { "rack": "rack-17" },
"groups" : [] # addresses from 10.111.128.0/18
},
{
"assignment": { "rack": "rack-22" },
"groups" : [] # addresses from 10.111.192.0/18
}
]
}
]
}
]
}
Route Publisher Add-on¶
For Kubernetes clusters installed in datacenters, it is useful to enable the Romana Route Publisher add-on. It is used to automatically announce routes for Romana addresses to your BGP- or OSPF-enabled router, removing the need to configure these manually.
Because the routes are for prefixes instead of precise /32 endpoint addresses, the rate and volume of routes to publish is reduced.
Configuration¶
The Romana Route Publisher uses BIRD to announce routes from the node to other network elements. Configuration is separated into two parts:
- a static
bird.conf
to describe the basic configuration of BIRD, ending with aninclude
- a dynamic
publisher.conf
that is used to generate a config containing routes for Romana addresses
When the pod first launches, BIRD is launched using the static configuration. Then, when new blocks of Romana addreses are allocated to a node, the dynamic configuration is generated with routes for those blocks, and BIRD is given a signal to reload its configuration.
If your configuration requires custom configuration per-node or per-subnet, there is a naming convention for the files that can be used to support this.
Both config files will look for a “best match” extension to the name
first. When loading x.conf
on a node with IP 192.168.20.30/24
,
it will first look for:
x.conf.192.168.20.30
(IP suffix for node-specific config)x.conf.192.168.20.0
(Network address suffix, for subnet-specific config)x.conf
Examples¶
bird.conf (for both BGP and OSPF)
router id from 192.168.0.0/16;
protocol kernel {
scan time 60;
import none;
export all;
}
protocol device {
scan time 60;
}
include "conf.d/*.conf";
- Make sure the CIDR specified for
router id
matches your cluster nodes. - The
protocol kernel
andprotocol device
can be modified, or just deleted if not necessary. - Add any additional, global BIRD configuration to this file (eg: debugging, timeouts, etc)
- The
include
line is the hook to load the generated dynamic config. It should be in yourbird.conf
exactly as specified.
publisher.conf for OSPF
protocol static romana_routes {
{{range .Networks}}
route {{.}} reject;
{{end}}
}
protocol ospf OSPF {
export where proto = "romana_routes";
area 0.0.0.0 {
interface "eth0" {
type broadcast;
};
};
}
- The first section,
protocol static static_bgp
is used by theromana-route-publisher
to generate a dynamic config. - The second section,
protocol ospf OSPF
should contain theexport
entry, andarea
blocks to match your environment. - The interface names will need to be modified to match the node’s actual interfaces
- Add any additional, protocol-specific BIRD configuration to this file
publisher.conf for BGP
protocol static romana_routes {
{{range .Networks}}
route {{.}} reject;
{{end}}
}
protocol bgp BGP {
export where proto = "romana_routes";
direct;
local as {{.LocalAS}};
neighbor 192.168.20.1 as {{.LocalAS}};
}
- The first section,
protocol static static_bgp
is used by theromana-route-publisher
to generate a dynamic config. - The second section,
protocol bgp BGP
should be changed to match your specific BGP configuration. - Add any additional, protocol-specific BIRD configuration to this file
- The
neighbor
address will likely be different for each subnet. To handle this, you can use multiplepublisher.conf
files with the appropriate network address suffixes, eg:
- bird.conf.192.168.20.0
- bird.conf.192.168.35.0
Installation¶
First, the configuration files need to be loaded into a configmap
.
- Put all the files into a single directory
cd
to that directory- Run
kubectl -n kube-system create configmap route-publisher-config --from-file=.
(the.
indicates the current directory)
Next, download the YAML file from here to your master node.
Then, load the Romana Route Publisher add-on by running this command on your master node.
kubectl apply -f romana-route-publisher.yaml
Verification¶
Check that route publisher pods are running correctly
$ kubectl -n kube-system get pods --selector=romana-app=route-publisher
NAME READY STATUS RESTARTS AGE
romana-route-publisher-22rjh 2/2 Running 0 1d
romana-route-publisher-x5f9g 2/2 Running 0 1d
Check the logs of the bird container inside the pods
$ kubectl -n kube-system logs romana-route-publisher-22rjh bird
Launching BIRD
bird: Chosen router ID 192.168.XX.YY according to interface XXXX
bird: Started
Other messages you may see in this container:
bird: Reconfiguration requested by SIGHUP
bird: Reconfiguring
bird: Adding protocol romana_routes
bird: Adding protocol OSPF
bird: Reconfigured
Check the logs of the publisher container inside the pods
$ kubectl -n kube-system logs romana-route-publisher-22rjh publisher
Checking if etcd is running...ok.
member 8e9e05c52164694d is healthy: got healthy result from http://10.96.0.88:12379
cluster is healthy
Checking if romana daemon is running...ok.
Checking if romana networks are configured...ok. one network configured.
Checking for route publisher template....ok
Checking for pidfile from bird...ok
Launching Romana Route Publisher
Other messages you may see in this container:
20XX/YY/ZZ HH:MM:SS Starting bgp update at 65534 -> : with 2 networks
20XX/YY/ZZ HH:MM:SS Finished bgp update
These are normal, even if OSPF is being used.
Romana VIPs¶
Kubernetes users running on premises that want an easy way to expose their services outside a cluster on their datacenter network can use external-IPs.
Although external-IPs are simple, they represent a single point of failure for the service and require manual allocation and configuration on the nodes. When there are many to configure, this can be tedious and prone to error. Romana VIPs are a solution to these problems.
Romana VIPs are defined by an annotation in a service spec. Romana then automatically brings up that IP on a node. Romana chooses a node with a pod running locally to avoid network latency within the cluster. When a node with a Romana VIP fails, Romana will bring up the VIP on a new node, providing failover for external services.
Romana VIPs are useful for exposing services on datacenter LANs that only need simple kubeproxy load balancing across pods. Romana VIPs can also be used to expose individual pods when a stable IP is required, such as Cassandra and other Big Data applications. Romana VIPs work in conjunction with Romana DNS, which can be deployed as a service discovery mechanism for individual pods exposed outside of a cluster.
Romana VIP failover requires that all nodes be on the same network segment. Addresses for Romana VIPs must be manually provisioned on the network.
Example configuration¶
The example below shows a RomanaIP (192.168.99.101) configured on a node
for the nginx service by adding the romanaip
annotation to the spec.
...
kind: Service
metadata:
name: nginx
annotations:
romanaip: '{"auto": false, "ip": "192.168.99.101"}'
...
The complete service spec is available here
Romana DNS¶
Romana DNS adds DNS support for Romana VIPs. It is drop in replacement for kube-dns.
Installation¶
On Master node of kubernetes cluster
Make a note on number of replicas for kube-dns using following command:
echo `kubectl get deploy -n kube-system kube-dns -o jsonpath="{.spec.replicas}"`
Now set replicas for kube-dns to zero using following command:
kubectl scale deploy -n kube-system kube-dns --replicas=0
Wait till kube-dns replicas are zero (around a minute or so)
On All nodes i.e master and compute nodes of the kubernetes cluster
Remove earlier docker images and replace it romana one using commands below:
docker rmi gcr.io/google_containers/k8s-dns-kube-dns-amd64:1.14.5 docker pull pani/romanadns docker tag pani/romanadns:latest gcr.io/google_containers/k8s-dns-kube-dns-amd64:1.14.5
Now return back to master node for further commands
On Master node of kubernetes cluster
Now assuming you had 2 replicas before, from first step above, we restore the replica count for kube-dns as follows:
kubectl scale deploy -n kube-system kube-dns --replicas=2
Wait for a minute or so for the pod to come up and we have romanaDNS up and running.
DNS Testing¶
Run dig to see if dns is working properly using command:
dig @10.96.0.10 +short romana.kube-system.svc.cluster.local
Download this sample nginx yaml file and then use following command to create an nginx service with RomanaIP in it:
kubectl create -f nginx.yml
This should create and load nginx service with RomanaIP, which should reflect in the dig result below:
dig @10.96.0.10 +short nginx.default.svc.cluster.local
Sample DNS Results¶
$ dig @10.96.0.10 +short romana.kube-system.svc.cluster.local
10.96.0.99
192.168.99.10
$ dig @10.96.0.10 +short nginx.default.svc.cluster.local
10.116.0.0
10.99.181.64
192.168.99.101