Serving static content from Cloud Files using Ruby

I recently moved my site skylinesaustralia.com from hosting in the states back to hosting in Australia and while it is reasonably affordable, one thing I can't afford is a burst in bandwidth. I average 600GB outbound per month and I only have 600GB allowance with my server, so I decided to serve my gallery and post attachment images from Rackspaces Cloud Files service. The Cloud Files service is much like Amazons S3 service, only faster and for a similar amount of money.

My issue with this was that my gallery on SAU is 100GB in size, so using something to mount the Cloud Files 'container' locally using fuse or something similar is far too slow and without mounting it, I cannot use rsync. Even if I did manage to mount it, I had no way of doing an 'immediate' sync when files were uploaded.

After some thinking and chatting with a very cluey sys admin at work, I looked at inotify. Basically, inotify (and inotify-tools) alerts you of changes to files and directories, thus making it possible to write scripts based on changes to the file system. Awesome!

Rackspace provides a bunch of very nice API interfaces for all sorts of languages. I used Ruby, but the PHP one is also great.

So, my approach was to monitor a directory for changes, capture the changes, use the file name and location captured and push the changes via the Cloud Files API. Then, we can view the files using the Cloud Files CDN URL. This is easily the trickiest to get right, but the easiest to set up and requires very little integration to work for any site.

The first thing I did was write a little Ruby script to upload files to a container on Cloud Files. I wanted this script to remain generic so that I could use it to push my database backups to Cloud Files also.

require 'rubygems'
require 'cloudfiles'

# Log into the Cloud Files system
cf = CloudFiles::Connection.new("username", "APIKey")

if ARGV.empty? then
    print "Usage: \n"
    print "pushToCloud.rb <container> <remotefile> <localfile> \n"
    print "remote file MUST contain relative path under the container!\n"
else
    container = cf.container(ARGV[0])
    if container.object_exists?(ARGV[1]) then
        # object (file) exists
    else
        # object does not exist...
        newfile = container.create_object(ARGV[1], true)
        newfile.load_from_filename(ARGV[2])
    end
end

This is very simple. Connects to Cloud Files, checks if the object (file) exists, if not, it creates a new object and writes data to it. There are a few things to note here;

  1. An 'object' is the name of anything in Cloud Files; a file or a directory can be an 'object'
  2. Objects don't have a path, they are named with their path. ie /var/log/my.log gets sent as '/var/log/my.log' - the path is part of the name. (At least this is how I understand it)
  3. Objects can be created but not written to. This concept means that you need to create and object and then write data to it (you can see this in the code above.)

Now that I have my Ruby script, I will write a small bash script;

#!/bin/sh
function checkExists {
    if [ ! -e "$1" ]
    then
        sleep 5
        checkExists $1
    fi
}

inotifywait -mr --timefmt '%d/%m/%y-%H:%M' --format '%T %w %f' -e modify,moved_to,create,delete /home/skylines/html/forums/uploads | while read date dir file; do

    cloudpath=${dir:20}${file}
    # I only want everything after /home/skylines/html/
    localpath=${dir}${file}
    checkExists $localpath
    ruby /home/cbiggins/bin/pushToCloud.rb skylinesaustralia.com $cloudpath $localpath
    echo "${date} ruby /home/cbiggins/bin/pushToCloud.rb skylinesaustralia.com $cloudpath $localpath" >> /var/log/pushToCloud.log
done

So, this very simple script uses inotifywait to monitor my uploads directory for changes (modify, moved_to, create and delete changes) and writes the path and file to stdout, then my while loop grabs that output and reads it into the date, directory and file. We create a few paths for Cloud Files and locally and we pass them to my Ruby script. I am also a big advocate of logging everything, so I write my changes to a log file also. Note the checkExists function - if this bash script gets called before the file has completed uploading, then its not available to be pushed to Cloud FIles and we end up with errors, so this function just sleeps for 5 seconds if its not there and trys again until the file exists.

As you can see, pushing files to the 'Cloud' is extremely easy and apps like inotify and inotify-tools make it super simple to monitor and perform actions based on file system changes.

If you have any suggestions or questions, please don't hesitate to leave a comment below.

Thanks!

blog comments powered by Disqus