Drupal in the Cloud: Deploying on Rackspace with Nginx and Boost

Lately, we have been involved in a project where our clients needed a site capable of serving a large number of anonymous users and a reasonable number of concurrently logged in users. In order to reach these goals, we looked to the cloud. We first got as much caching as possible, since this is relatively simple and goes a long way. We next created a distributed system. This blogs describes how we got it to work. A diagram of our architecture is attached, and the various configurations are summarized at the bottom.

First, the anonymous user caching. Anonymous users all view the same content, so if we cache a static html page, we can serve this page without involving php at all. We are using boost to provide these static pages. And then we have nginx serving these cached pages and acting proxying other requests to Apache. Since nginx can scale without much of a memory hit, it is much better to use nginx to serve large amounts of static files and let apache handle the logged in users and new page requests. Now, for anonymous users, the bottleneck suddenly becomes the network, and on a localhost test, ab records well over 10 thousand hits per second being served by a 2gb rackspace instance.

On to logged in caching. We use APC as an opcode cache. This saves the server from recompiling the php code on every page load. Moreover, the whole thing fits easily in RAM (we typically give APC 128M of ram). This drastically decreases the CPU usage. Logged in users can now browse the site much faster. But we can still only handle a limited number of them. We can do a bit better. Instead of querying MySQL every time we go to the cache, we can store these tables in memory. Here come memcached and the cacherouter module.

Now, if you've looked at the nginx conf bellow, you might have noticed that it is also acting as a load balancer. We have Drupal on multiple nodes. The first step in achieving this was putting MySQL on a different node (this does require hardening it up) and having apache live on different machine. However, in order to make sure that user uploaded files and "boosted" cache files are available on all apache servers, we use glusterfs to replicate files accross all machines. We also use glusterfs to replicate the code base so that changes can be made quickly, although we rsync it to the file system since it slows down file operations. The PHP code is not being run from glusterfs.

Putting it all together: the architecture. You can find the attached diagram with the architecture. We are deploying all our servers on rackspace hosting, starting with an Ubuntu Karmic image. There are three types of nodes: load balancers and static file servers which we'll refer to as nginx nodes, server nodes with apache which we'll refer to as apache nodes, and the database node(s) which we'll refer to as mysql nodes.

The nginx nodes have nginx, memcached and glusterfs installed. They serve static files from a shared folder on a glusterfs mount. Any request which is not cached and is not found in the static files will be proxied to the pool of apache nodes. The memcached deamon is part of a pool in which the apache nodes also participate, and which is used by cacherouter to distribute mysql cached queries and the cache tables. The nginx nodes can be replicated for high availability, since the files they are serving are replicated in real time via glusterfs.

The apache nodes have apache with mod_php and php 5.2 installed, as well as glusterfs, apc and memcached. We can spin up new instances quickly and add them to the pool, as once glusterfs is mounted, it will quickly sync up the files from the other nodes as necessary, and be available to receive it's share of requests. All the Drupal nodes talk to the MySQL node for the database. The MySQL node can also be replicated for high availability.

Deploying rapidly: what is the point of having a distributed architecture in the cloud if we cannot scale quickly? We use puppet to quickly configure a node which has been spun up to the nginx or apache pools.

Wrapping it up: we should be able to follow up soon with a post on performance. Testing we have done so far indicates that the system does scale up quite well. We have also compared rackspace hosting to ec2, and the numbers show that rackspace is much faster for drupal, mostly due to the network latency. We will soon have numbers and graphs to show it all.

Configuring apc: we set the memory size to 128M with a single bin.

Configuring cacherouter: version: 6.x.1.x-dev (vs 6.x.1.0-rc1)
* The dev version had some bug fixes for the memcached engine at the time we installed it
Append following to your Drupal's settings.php

<?php
# Cacherouter 
$conf['cache_inc'] = './sites/all/modules/cacherouter/cacherouter.inc';
$conf['cacherouter'] = array(
  
'default' => array(
    
'engine' => 'memcached',
    
'servers' => array(
      
'web01',
      
'web02',
      
'web03',
    ),
  
'shared' => TRUE,
  
'prefix' => '',
  
'path' => '',
  
'static' => FALSE,
  
'fast_cache' => FALSE,
  ),
);
?>

Configuring boost: most of boost's default settings are fine. We turned on gzip and enabled css and js caching. We also ignore the htaccess rules, since we use nginx to serve the html files.

Configuring nginx (version 7.62):
in nginx.conf in the "http" section:

<?php
  upstream apaches 
{
    
#ip_hash;
    
server web01;
    
server web02;
    
server web03;
  }
?>

in the host conf, in the "server" section:

<?php
server 
{
  
listen   80;

  

proxy_set_header Host $http_host;

  

gzip  on;
  
gzip_static on;
  
gzip_proxied any;

  

gzip_types text/plain text/html text/css application/json application/x-javascript text/xml application/xml application/xml+rss text/javascript;

  

set $myroot /var/www;

  

#charset koi8-r;

  # deny access to files beginning with a dot (.htaccess, .git, ...)
  

location ~ ^. {
    
deny all;
  }

  

location ~ .(engine|inc|info|install|module|profile|po|sh|.*sql|theme|tpl(.php)?|xtmpl)$|^(code-style.pl|Entries.*|Repository|Root|Tag|Template)$ {
     
deny all;
  }

  

set $boost "";
  
set $boost_query "_";

  if ( 

$request_method GET ) {
    
set $boost G;
  }

  if (

$http_cookie !~ "DRUPAL_UID") {
    
set $boost "${boost}D";
  }

  if (

$query_string "") {
    
set $boost "${boost}Q";
  }

  if ( -

f $myroot/cache/normal/$http_host$request_uri$boost_query$query_string.html ) {
    
set $boost "${boost}F";
  }

  if (

$boost GDQF){
    
rewrite ^.*$ /cache/normal/$http_host/$request_uri$boost_query$query_string.html break;
  }
  
  if ( -
f $myroot/cache/perm/$http_host$request_uri$boost_query$query_string.css ) {
    
set $boost "${boost}F";
  }

  if (

$boost GDQF){
    
rewrite ^.*$ /cache/perm/$http_host/$request_uri$boost_query$query_string.css break;
  }
  
  if ( -
f $myroot/cache/perm/$http_host$request_uri$boost_query$query_string.js ) {
    
set $boost "${boost}F";
  }

  if (

$boost GDQF){
    
rewrite ^.*$ /cache/perm/$http_host/$request_uri$boost_query$query_string.js break;
  }

  

location ~* .(txt|jpg|jpeg|css|js|gif|png|bmp|flv|pdf|ps|doc|mp3|wmv|wma|wav|ogg|mpg|mpeg|mpg4|htm|zip|bz2|rar|xls|docx|avi|djvu|mp4|rtf|ico)$ {
    
root $myroot;
    
expires max;
    
add_header Vary Accept-Encoding;
    if (-
f $request_filename) {
      break;
    }
    if (!-
f $request_filename) {
      
proxy_pass "<a href="http://apaches";
">http://apaches";
</
a>      break;
    }
  }

  

location ~* .(html(.gz)?|xml)$ {
    
add_header Cache-Control no-cache,no-store,must-validate;
    
root $myroot;
    if (-
f $request_filename) {
      break;
    }
    if (!-
f $request_filename) {
      
proxy_pass "<a href="http://apaches";
">http://apaches";
</
a>      break;
    }
  }

  

location / {
        
access_log  /var/log/nginx/localhost.proxy.log proxy;
      
proxy_pass "<a href="http://apaches";
">http://apaches";
</
a>  }

}

?>

Configuring glusterfs: (version 3.0.3)
There are two files. glusterfsd holds the local "brick". glusterfs holds the info on how to mount and use the bricks.

glusterfsd.vol

<?php
# Generated by Puppet

volume posix
        type storage
/posix
        option directory 
####
end-volume

volume locks
        type features

/locks
        option mandatory
-locks on
        subvolumes posix
end
-volume

volume iothreads
        type performance

/io-threads
        option thread
-count 16
        subvolumes locks
end
-volume

volume server

-tcp
        type protocol
/server
        subvolumes iothreads
        option transport
-type tcp
        option auth
.login.iothreads.allow ####
        
option auth.login.####.password ####
        
option transport.socket.listen-port 6996
        option transport
.socket.nodelay on
end
-volume
?>

glusterfs.vol

<?php
# Generated by Puppet

volume vol-0
        type protocol
/client
        option transport
-type tcp
        option remote
-host ####
        
option transport.socket.nodelay on
        option remote
-port 6996
        option remote
-subvolume iothreads
        option username 
####
        
option password ####
end-volume

... # 1 per apache node + 1 per nginx node

volume vol-3
        type protocol
/client
        option transport
-type tcp
        option remote
-host ####
        
option transport.socket.nodelay on
        option remote
-port 6996
        option remote
-subvolume iothreads
        option username 
####
        
option password ####
end-volume

volume mirror

-0
        type cluster
/replicate
        subvolumes vol
-0 vol-1 vol-2 vol-3
                option read
-subvolume vol-0
        end
-volume

volume writebehind
        type performance

/write-behind
        option cache
-size 4MB
        
# option flush-behind on        # olecam: increasing the performance of handling lots of small files
        
subvolumes mirror-0
end
-volume

volume iothreads
        type performance

/io-threads
        option thread
-count 16 # default is 16
        
subvolumes writebehind
end
-volume

volume iocache
        type performance

/io-cache
        option cache
-size 412MB
        option cache
-timeout 30
        subvolumes iothreads
end
-volume

volume statprefetch
        type performance

/stat-prefetch
        subvolumes iocache
end
-volume
?>

Comments

Hi,

Great write up. Could you just expand on this statement a bit: "...rsync it to the file system since it slows down file operations...". Does that mean you're using gluster to replicate but then using rsync locally on each application instance to copy the contents of gluster to another local mount?

Thanks,

Hi,

yes, we use rsync copy the code base to the file system. We do this since accessing the code base from glusterfs slows down the parsing -- even with apc, unless you turn off the option of doing a stat on each file before serving it.

A better option might be to simply mount the directories we need in glusterfs (more bricks are required however) and sync the code base using git.

Hi,

Great Article, But Auth login module is not working in glusterfs protocol/server config.
auth.addr module works without any issues.

Hi,
great and informative article. Thanks for sharing. If possible, do you mind sharing your image of cloud server (OS+Apache+PHP+glusterfs) that you use with puppet. That way it would be easier for other to start with the setup you sugegsted. Something similar to images people share on AMAZON EC2

thanks
Ajay Gallewale

This is really interesting article. I was wondering how you we're handling sessions for your logged in users and the ngnix load balancers distributed across multiple apache servers.