kamailio high availability using keepalived

This is a quick post on how to use keepalived to setup high-availability on two kamailio machines. In this setup there will be a “primary” and “secondary” node. Each node will be running kamailio and keepalived with a “shared” or sometimes referred to as a “floating” IP address.
The idea is that each node will be running health checks on itself and the other node. If a node detects kamailio as being down, it will move the “floating” IP address to itself and become the MASTER node.
After the IP address is moved, all SIP traffic will be automatically directed to the new MASTER node. Both kamailio nodes should be running almost identical routing configs in order to ensure traffic is routed properly after fail-over. This setup was tested on Debian9 with kamailio version 5.10.

We will use the following example IP address setup:

node 1 IP Address: 1.2.3.4
node 2 IP Address: 5.6.7.8
“floating” IP Address: 5.5.5.5

In order for this example to work, you need to enable binding to a non-local IP address. On each node run to enable this setting:
$ echo 1 > /proc/sys/net/ipv4/ip_nonlocal_bind

To permanently save this setting, add the following line to a file like /etc/sysctl.d/99-non-local-bind.conf
net.ipv4.ip_nonlocal_bind = 1

We will want kamailio to bind to the floating IP address and also to allow SIP OPTION pings from each of the other nodes. So you will want something like this in your kamailio.cfg:

#!substdef "!NODE01!1.2.3.4!g"
#!substdef "!NODE02!5.6.7.8!g"

# bind to local IP and "floating" IP
listen=udp:1.2.3.4:5060 # on node 2, use 5.6.7.8 here
listen=udp:5.5.5.5:5060

# main request routing logic
route {

  # allow pings from other node
  if (is_method("OPTIONS") && ($si == "NODE01" || $si == "NODE02")) {
    options_reply();
  }

  ...

}

Start kamailio and make sure it’s running.

Next we will setup keepalived. To install everything we need run:
$ apt-get install libipset3 sipsak keepalived

We will be using sipsak with a custom bash script to report the “health” of each node. Add the following bash script to node 1 at /etc/keepalived/node01.sh

#!/bin/bash

node01=1.2.3.4
node02=5.6.7.8
return_code=0 # success

# check local instance
timeout 2 sipsak -s sip:$node01:5060
exit_status=$?
if [[ $exit_status -eq 0 ]]; then
  echo "sip ping successful to node01 [$node01]"
  exit $return_code
fi

# local instance failed, check remote
timeout 2 sipsak -s sip:$node02:5060
exit_status=$?
if [[ $exit_status -eq 0 ]]; then
  echo "sip ping successful to node02 [$node02]"
  return_code=1
fi

echo "return code [$return_code]"

exit $return_code

Make sure the file is executable:
$ chmod +x /etc/keepalived/node01.sh
The script on node 1 checks the health of kamailio locally first (by sending a SIP OPTION ping), if it fails, it checks the health of node 2. If node 1 is healthy, it reports success to keepalived by returning 0. If node 1 is not healthy and node 2 is healthy, it reports a failure to keepalived by returning 1.

And add the following bash script to node 2 at /etc/keepalived/node02.sh

#!/bin/bash

node01=1.2.3.4
node02=5.6.7.8
return_code=1 # fail

# check remote instance fist
timeout 2 sipsak -s sip:$node01:5060
exit_status=$?
if [[ $exit_status -eq 0 ]]; then
  echo "sip ping successful to node01 [$node01]"
  exit $return_code
fi

# remote instance failed, check local
timeout 2 sipsak -s sip:$node02:5060
exit_status=$?
if [[ $exit_status -eq 0 ]]; then
  echo "sip ping successful to node02 [$node02]"
  return_code=0
fi

echo "return code [$return_code]"

exit $return_code

Make sure the file is executable:
$ chmod +x /etc/keepalived/node02.sh
The script on node 2 checks the health of kamailio on node 1 (remotely) first, if it fails, it checks the health of node 2 (locally). If node 1 is healthy, it reports a “local” failure to keepalived by returning 1. If node 1 is not healthy and node 2 is healthy, it reports a “success” to keepalived by returning 0.

keepalived will promote the node to MASTER based on the return code reported by the bash scripts. You can get more creative and add more logic if you’d like, but for the purposes of this post, I kept it simple.

Last we will configure keepalived by editing the following file on node 1: /etc/keepalived/keepalived.conf

vrrp_script check_sip {
  script "/etc/keepalived/node01.sh"
  interval 6 # check every 6 seconds
}

vrrp_instance VI_SBC {
  state MASTER
  interface eth0
  virtual_router_id 51
  advert_int 1
  authentication {
    auth_type PASS
    auth_pass 11111
  }
  virtual_ipaddress {
    5.5.5.5 dev eth0
  }
  track_script {
    check_sip
  }
}

And on node 2: /etc/keepalived/keepalived.conf

vrrp_script check_sip {
  script "/etc/keepalived/node02.sh"
  interval 6 # check every 6 seconds
}

vrrp_instance VI_SBC {
  state BACKUP
  interface eth0
  virtual_router_id 51
  advert_int 1
  authentication {
    auth_type PASS
    auth_pass 11111
  }
  virtual_ipaddress {
    5.5.5.5 dev eth0
  }
  track_script {
    check_sip
  }
}

The settings above basically say to check health of each node every 6 seconds. If the node should become master, enable the floating IP (5.5.5.5).

Start keepalived service on each node and watch the syslog. As long as kamailio is running and responsive to the SIP OPTION pings on node 1, it will be MASTER. You can test by turning off kamailio on node 1 and watching the IP move to node 2. Use the following command to check for the IP address:
$ ip addr show

 

RTPEngine with Kamailio as Load-balancer and IP Gateway

This post explains how to setup Kamailio as an SBC and IP Gateway. We are using Debian 8 in this example. It uses Kamailio’s dispatcher module to distribute calls to Asterisk. It uses RTPEngine to proxy media to & from the public internet across the LAN to Asterisk.
This is a powerful setup as you can easily scale out using a single public IP address.
Here is the IP layout we will be implementing:

Kamailio Public (eth0) 7.6.5.4
Kamailio Private (eth0) 10.10.10.254
Asterisk machine 1 - 10.10.10.3
Asterisk machine 2 - 10.10.10.4

Setup Debian and iptables to act as IP Gateway:

# edit /etc/sysctl.conf
enabled net.ipv4.ip_forward=1

# add iptables rules
iptables -F
iptables -t nat -F

iptables -P INPUT ACCEPT
iptables -P OUTPUT ACCEPT
iptables -P FORWARD ACCEPT

iptables -A INPUT -i lo -j ACCEPT
iptables -A INPUT -m state --state RELATED,ESTABLISHED -j ACCEPT
iptables -A FORWARD -i eth1 -s 10.10.10.0/255.255.255.0 -j ACCEPT
iptables -t nat -A POSTROUTING -o eth0 -j MASQUERADE

Install RTPEngine + Kamailio with RTPEngine Module.

 

Download kamailio.cfg

Download sip.conf

Download /etc/default/ngcp-rtpengine-daemon

Kamailio: High-Availability/Failover with Corosync and Pacemaker on Debian 7

In this setup, we will have 2 Kamailio servers, referred to as ‘nodes’. One will be active and one will standby. There is a 3rd ‘floating’ IP that is moved to which ever node is active. Kamailio should be configured to use the floating IP. In this example, the nodes are:

kam01: 10.10.10.18
kam02: 10.10.10.19

Floating IP: 10.10.10.200

First, add both nodes to /etc/hosts

10.10.10.18    kam01
10.10.10.19    kam02

Install corosync and pacemaker:

apt-get install ntp corosync pacemaker -y

Generate /etc/corosync/authkey:

corosync-keygen

Copy corosync-key  to kam02:

scp -P 22 /etc/corosync/authkey root@kam02:/etc/corosync

Enable corosync on kam01 & kam02:

sed -i "s/START=no/START=yes/g" /etc/default/corosync

Enable pacemaker service in corosync:

cat > /etc/corosync/service.d/pcmk << EOF
service {
  name: pacemaker
  ver: 1
}
EOF

Add corosync config:

cat >/etc/corosync/corosync.conf<<EOF
#
totem {
        version: 2
        transport: udpu
        interface {
                member {
                        memberaddr: 10.10.10.18
                }
                member {
                        memberaddr: 10.10.10.19
                }
                ringnumber: 0
                bindnetaddr: 10.10.10.0
                mcastport: 5405
        }
}

logging {
        to_logfile: yes
        logfile: /var/log/corosync/corosync.log
        debug: off
        timestamp: on
        logger_subsys {
                subsys: AMF
                debug: off
        }
}
EOF

Start corosync and pacemaker:

service corosync start
service pacemaker start

Configure corosync, changes will propagate to other node. Make sure to disable stonith:

crm configure property stonith-enabled=false
crm configure primitive FAILOVER-IP ocf:heartbeat:IPaddr2 params ip="10.10.10.200" nic="eth0" cidr_netmask="255.255.255.0" op monitor interval="10s"
crm configure primitive KAM-HA lsb:kamailio op monitor interval="30s"
crm configure group KAM-HA-GROUP FAILOVER-IP KAM-HA
crm configure colocation KAM-HA-GROUP-COLO inf: FAILOVER-IP KAM-HA
crm configure order KAM-HA-ORDER inf: FAILOVER-IP KAM-HA
crm configure property no-quorum-policy=ignore

You need no-quorum-policy=ignore for a 2 node cluster. If you messed up during the crm configure part, you can start over using these commands:

# TO CLEAR CONFIG AND START OVER
crm configure property stop-all-resources=true
crm configure erase

It’s sometimes useful to see what resource agents are available, you can check with these commands:

# TO LIST RESOURCE AGENTS
crm ra list lsb
crm ra list systemd # (if using systemd init system)
crm ra list ocf heartbeat
crm ra list ocf pacemaker

If you need to migrate services and floating IP to other node you can run:

crm resource migrate KAM-HA-GROUP kam02

If you need edit specific parameters in the config, you can export it to xml, make the changes and re-import them:

# ADVANCED EDITING
cibadmin --query > tmp.xml
vi tmp.xml
cibadmin --replace --xml-file tmp.xml