Tuesday, April 10, 2012

Handling nginx Failover With KeepAlived


How do configure to release and obtain VIP (virtual IP) when nginx is dead, down or system is rebooted for the kernel upgrades?

Edit /usr/local/etc/keepalived/keepalived.conf and add the following section to check whether nginx is alive or dead:
# vi /usr/local/etc/keepalived/keepalived.conf
Updated file on both lb0 and lb1:
vrrp_script chk_http_port {
script "/usr/bin/killall -0 nginx"
interval 2
weight 2
}
vrrp_instance VI_1 {
interface eth0
state MASTER
virtual_router_id 51
priority 101
authentication {
auth_type PASS
auth_pass Add-Your-Password-Here
}
track_script {
chk_http_port
}
virtual_ipaddress {
202.54.1.1/29 dev eth1
}
}
 
Save and close the file. Reload keealived:
# /etc/init.d/keepalived restart
If nginx died due to any issues keepalived will release master VIP and backup server will become active. When master nginx LB0 comes backs online, the backup LB1 will go down in backup state.


Software Load-Balancing & Failover with NGINX and keepalived

Introduction

This article will describe how to set up software load-balancing and failover for web applications (in this case for a bigger installation of VoiceObjects: 4 machines with each 8 productive instances plus one staging instance). Two additional machines will be hosting the failover / load-balancing solution.
The load should be evenly balanced (round-robin) between the instances. In the beginning it is not planned to make a health-check on individual machines or instances.
The failover capability will be reached by having the actual productive machine claiming a virtual IP (VIP) from the network. In case of a failure this VIP will be passed to the backup machine.
All traffic will reach the machine from Port 80. Thus a port-forwarding for the productive load and an additional URL rewriting for the staging load is needed.

Contents


Setup Overview


Figure 1: General Setup overview

Load-Balancing: nginx

nginx is a simple software load balancer / reverse proxy / web-server for linux operating systems.
As this article focuses only on the load-balancing capabilities, the other functionalities will not be taken care of. Please go to the Resources section to get links to more information.
To set up a simple load balancing, it is needed to configure the following in the nginx.conf:

http {
#define all instances here
upstream VoiceObjects {
server 172.22.23.92:8099;
server 172.22.23.92:8100;
}
server {
listen 80;
location / {
proxy_pass http://VoiceObjects;
}
}
 
This will forward calls reaching nginx on port 80 to the ports 8099 and 8100 on the given machine.

nginx - Automatic reloading of configuration

http://wiki.codemongers.com/NginxCommandLine

nginx - Number of parallel sessions

nginx needs to be set up for the maximum number of possible parallel sessions. There are 2 parameters controlling this value ? the number of parallel processes of nginx and the number of parallel connections.
The maximum number of parallel sessions can be calculated as following:
no. of parallel sessions = worker_processes * worker_connections
Please see also this article on the web for more information: http://articles.slicehost.com/2007/12/13/ubuntu-gutsy-nginx-configuration-1

worker_processes

Default:
worker_processes 1;
nginx can have more than one worker process running at the same time. To take advantage of SMP and to enhance the performance the number of threads can be adjusted:
worker_processes 4;
It is needed to check the most efficient setting for this setting on the involved systems.

worker_connections

Sets the number of connections that each worker can handle. This is a good default setting:
events {
worker_connections 1024;
}
Note: The worker_connections setting is placed inside the 'events' module.

nginx: Routing by incoming URL

In the example project, there will be two types of incoming requests - production calls and calls to a single "pilot" instance on each machine. See the following graph for a detailed overview:




Figure 2: Detailed setup overview (with Pilot)
This means that traffic will need to be routed to 2 different upstream definitions. To make this possible, the incoming URLS must differ to enable nginx to distinguish between the two targets. Other distinguishing possibilities would be possible but have been discarded for the sake of easy implementation.

 The VoiceObjects installations will need to reflect this URL pattern. Please see the nginx documentation for information on how to rewrite complete URLs.
The following setup will forward all requests for http://<VIP>/VoiceObjects/* to the main cluster (upstream VoiceObjects) and incoming calls for http://<VIP>/VOServer to the "Pilot" instances (upstream VOServer):

http {
upstream VoiceObjects {
server 172.22.23.92:8099;
server 172.22.23.92:8100;
}

upstream VOServer{
server 172.22.23.42:8099;
}

server {
listen 80;
access_log /var/log/nginx/host.access.log main;
#the location to pass requests to -> simulate 9th (staging) instance at TMUK
#will send the URL http://<IP>/VOServer/* to the "kschmitte" upstream
location ^~ /VOServer/ {
proxy_pass http://VOServer;
}

#all traffic not starting with VOServer will be routed to the instances in upstream "VoiceObjects"
location / {
proxy_pass http://VoiceObjects;
}
}
 
 
 
 

Failover: keepalived

Keepalived is a part of the LVS (Linux Virtual Server project). It is a software performing health-checks and keeping a directory of clustered machines to handle failover situations. It is also capable of load-balancing incoming requests (see below).


# Configuration File for keepalived
global_defs {
# each load balancer should have a different ID
# this will be used in SMTP alerts, so you should make
# each router easily identifiable
lvs_id LB1
}

#health-check for keepalive
vrrp_script chk_nginx { # Requires keepalived-1.1.13
#script "killall -0 nginx" # cheaper than pidof
script ''"pidof nginx"''
interval 2 # check every 2 seconds
weight 2 # add 2 points of prio if OK
}

vrrp_instance VI_1 {
state MASTER
interface eth0

# each virtual router id must be unique per instance name#
virtual_router_id 51
# MASTER and BACKUP state are determined by the priority
# even if you specify MASTER as the state, the state will
# be voted on by priority (so if your state is MASTER but
# your priority is lower than the router with BACKUP, you
# will lose the MASTER state)
priority 101

#check if we are still running
track_script {
chk_nginx
}

# these are the IP addresses that keepalived will setup on
# this without this block, keepalived will not setup and
# takedown the IP addresses
virtual_ipaddress {
172.22.23.200
}
 
 
This configuration snippet will enable keepalived to connect to the nginx process to check if it is still running. Additionally (when deployed with a different lvs_id and a lower priority) this enables keepalived to switch the VIP between two machines on failover.

Setup description for 2 Load-Balancing machines with failover

After the general setup discussions, here the description & configuration of the setup for the described setup - 4 machines with each 8 productive and 1 staging instance, 2 load-balancing machines in a failover (hot standby) setup.

Keepalived for Load-Balancing

Keepalived is also capable of load balancing.

As keepalived currently lacks the ability of changing the port for an incoming request, the load-balancing capabilities could not be used in this context.

The advantage of this load-balancing is anyway that the existance of the target system can be tested regularly - thus it is possible to exclude targets from the load balancing automatically.

It is also possible to change the balancing method - e.g. a weighted round robin is possible.
A simple round robin load-balancing to two instances could look like:

 


#For Load-Balancing
virtual_server 172.22.23.200 8099 {
delay_loop 30
lb_algo rr
lb_kind NAT
persistence_timeout 50
protocol TCP

real_server 172.22.23.92 8099 {
connect_port 8099
weight 1
HTTP_GET {
url {
path /VoiceObjects/Resources/readme.txt
status_code 200
connect_port 8099
}
connect_timeout 2
nb_get_retry 2
delay_before_retry 1
}
}


real_server 172.22.23.42 8099 {
connect_port 8100
weight 1
HTTP_GET {
url {
path /VoiceObjects/Resources/readme.txt
status_code 200
connect_port 8099
}
connect_timeout 2
nb_get_retry 2
delay_before_retry 1
}
}
 
 
Please note, that the changing of the port is not feasible - thus 2 machines need to be addressed.
This configuration will decide upon availability of the file readme.txt in the Resources folder of VoiceObjects if an instance is available.
 


 

1 comment:

  1. master nginx on goes down LB0, say if I do as follows

    On LBO,

    ~]# ps aux | grep nginx
    root 3411 0.0 0.0 65736 1304 ? Ss 14:43 0:00 nginx: master process /usr/sbin/nginx -c /etc/nginx/nginx.conf
    nginx 3412 0.0 0.1 76304 3672 ? S 14:43 0:00 nginx: worker process
    nginx 3413 0.0 0.1 76124 3452 ? S 14:43 0:00 nginx: worker process
    nginx 3414 0.0 0.0 65740 1800 ? S 14:43 0:00 nginx: cache manager process
    nginx 3415 0.0 0.0 65740 1692 ? S 14:43 0:00 nginx: cache loader process
    root 3494 0.0 0.0 103304 888 pts/0 R+ 14:43 0:00 grep nginx

    Then do a kill,

    ~]# kill -9 3411 3412 3413


    ~]# ps aux | grep nginx
    nginx 3414 0.0 0.0 65740 1800 ? S 14:43 0:00 nginx: cache manager process
    nginx 3415 0.0 0.0 65740 1692 ? S 14:43 0:00 nginx: cache loader process
    root 3528 0.0 0.0 103304 876 pts/0 R+ 14:44 0:00 grep nginx

    Still there are some process running but nginx not responding but keepalived cannot identify it thus IP failover to LB1 doesn't happen.

    ReplyDelete