NGINX Versus Apache Round 2


This is round 2 of my benchmarking of NGINX and Apache for use in serving Skylines Australia. See round 1 here.

I thought after my first small tests that I’d try something a bit more conclusive. Never does my site get a single request at a time, so that was a useless test. Also, as I was testing from home and have many devices here, any of the results could have been affected by something updating itself.

So, in round 2 I tried to make the servers catch on fire. Hopefully Rackspace are not reading this.

Almost the same servers are involved, 2 x 2GB web servers and 1 x 4GB database server. I added a 512MB testing server to run Apache bench from.

From the start I had SO much trouble with Apache. It was so uninterested in serving multiple requests, so much so that even after I changed max connections, keepalive and child processor settings I had to reduce my requests-at-a-time in Apache bench from 1000 to 500 to 50. Apache needed so much massaging.

NGINX was incredible. Zero config changes (besides upping XCache to 64MB.) and the results speak for themselves;

Apache;

[root@SAU-TESTER ~]# ab -c 50 -n 500 http://184.106.112.92/forums/
This is ApacheBench, Version 2.0.40-dev <$Revision: 1.146 mce_markergt; apache-2.0
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Copyright 2006 The Apache Software Foundation, http://www.apache.org/
Benchmarking 184.106.112.92 (be patient)
Completed 100 requests
Completed 200 requests
Completed 300 requests
Completed 400 requests
Finished 500 requests
Server Software:        Apache/2.2.3
Server Hostname:        184.106.112.92
Server Port:            80
Document Path:          /forums/
Document Length:        173443 bytes
Concurrency Level:      50
Time taken for tests:   215.897595 seconds
Complete requests:      500
Failed requests:        4
(Connect: 0, Length: 4, Exceptions: 0)
Write errors:           0
Non-2xx responses:      3
Total transferred:      86442895 bytes
HTML transferred:       86211790 bytes
Requests per second:    2.32 [#/sec] (mean)
Time per request:       21589.760 [ms] (mean)
Time per request:       431.795 [ms] (mean, across all concurrent requests)
Transfer rate:          391.00 [Kbytes/sec] received
Connection Times (ms)
min  mean[+/-sd] median   max
Connect:        0    6 134.0      0    2996
Processing:   678 18480 48230.1   1616  211618
Waiting:      601 13411 38606.1   1533  209398
Total:        678 18486 48228.2   1617  211618
Percentage of the requests served within a certain time (ms)
50%   1617
66%   3500
75%   7076
80%   8648
90%  37570
95%  178587
98%  196671
99%  207034
100%  211618 (longest request)
NGINX;
[root@SAU-TESTER ~]# ab -c 50 -n 500 http://184.106.78.142/forums/
This is ApacheBench, Version 2.0.40-dev <$Revision: 1.146 $> apache-2.0
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Copyright 2006 The Apache Software Foundation, http://www.apache.org/
Benchmarking 184.106.78.142 (be patient)
Completed 100 requests
Completed 200 requests
Completed 300 requests
Completed 400 requests
Finished 500 requests
Server Software:        nginx/0.6.39
Server Hostname:        184.106.78.142
Server Port:            80
Document Path:          /forums/
Document Length:        173520 bytes
Concurrency Level:      50
Time taken for tests:   66.748262 seconds
Complete requests:      500
Failed requests:        0
Write errors:           0
Total transferred:      86986500 bytes
HTML transferred:       86760000 bytes
Requests per second:    7.49 [#/sec] (mean)
Time per request:       6674.826 [ms] (mean)
Time per request:       133.497 [ms] (mean, across all concurrent requests)
Transfer rate:          1272.65 [Kbytes/sec] received
Connection Times (ms)
min  mean[+/-sd] median   max
Connect:        0    0   2.0      0      25
Processing:   991 6361 975.6   6501    7724
Waiting:      834 6302 977.7   6439    7653
Total:        991 6362 975.5   6503    7724
Percentage of the requests served within a certain time (ms)
50%   6503
66%   6619
75%   6725
80%   6816
90%   7075
95%   7176
98%   7267
99%   7393
100%   7724 (longest request)
I should also note that while the load on the Apache box reached 50, the NGINX box briefly nudged the low side of 3 before ducking back into the 2’s. Seriously awesome.
The important numbers;
APACHE;
Requests per second:    2.32 [#/sec] (mean)
Time per request:       21589.760 [ms] (mean)
Time per request:       431.795 [ms] (mean, across all concurrent requests)
Transfer rate:          391.00 [Kbytes/sec] received
Longest resquest:  211618
NGINX;
Requests per second:    7.49 [#/sec] (mean)
Time per request:       6674.826 [ms] (mean)
Time per request:       133.497 [ms] (mean, across all concurrent requests)
Transfer rate:          1272.65 [Kbytes/sec] received
Longest request: 7724

The transfer rate concerned me, so I did a quick ping from the test server;
Average to Apache server: 0.366ms
Average to NGINX server: 0.280ms

So, the connection was a little better, but it still doesn’t explain the differences in load, failed requests (4 for Apache, 0 for NGINX) or Time per request (maybe? Not so sure about this.)

I’m sold. I’ll setup an NGINX test server now to ensure my site runs perfectly on it and then deploy that on the next server. I’m excited.

NGINX Versus APACHE Round 1


Before any of you read this, keep in mind that I am not a sys admin. I can do server stuff, but I build websites. I dont consider myself an authority on this subject, nor do I expect you to treat this as gospel.

I am going to be moving Skylines Australia to a faster (and cheaper) server shortly. I’ve been doing a lot of research into possible ways I can make the site run quicker and one of the areas of interest for me has been the possibility of using NGINX with PHP FPM and XCache for a more swift browsing experience. As I knew know very little about NGINX and reading a lot of conflicting information about using it, I thought I’d do some tests.

The test was simple and has been labelled Round 1 as I ran out of time.

I spun up an 8GB memory Centos 5.6 machine on Rackspace for the database, a 2GB memory machine for Apache and a 2GB memory machine for NGINX. 3 machines in total.

Apache and NGINX were installed with PHP but no caching and the database was read from the Database machine. This was to make sure that these web serving boxes could utilise the entire system themselves without having to compete with MySQL.

I then wrote a small script to load the site’s index page from my Macbook (pretty simple stuff);

(I’m aware that I could have reduced my code with functions and whatnot, it is just a quick and dirty script so I’m not too concerned about reusability.)

The results before installing any caching were already interesting and surprising.

APACHE BEST:
3.55
3.61
3.65
3.68
3.69
APACHE WORST:
6.33
5.22
4.04
3.96
3.73
APACHE AVERAGE: 4.146

NGINX BEST:
3.81
5.36
5.73
5.77
5.81
NGINX WORST:
6.28
6.16
6.03
5.93
5.84
NGINX AVERAGE: 5.672

So, surprisingly (for me, at least) Apache was quicker.

I then installed XCache and Memcached. I did very limited config with them both (which I have now lost, so can’t demonstrate.) and ran the script again.

APACHE BEST:
3.64
3.78
3.83
3.92
3.92
4.02
APACHE WORST:
5.8
5.35
5.25
5.24
4.02
3.92
APACHE AVERAGE: 4.475
NGINX BEST:
3.53
3.56
3.64
3.76
3.76
3.96
NGINX WORST:
6.53
6.18
6.12
5.95
3.96
3.76
NGINX AVERAGE: 4.699

So, while Apache’s average actually increased by .3 seconds, NGINX’s dropped by around 1 second. But, Apache was still quicker.

Now, this test is far from conclusive and the images from all servers have been saved to Cloud Files so when I get some time I’ll spin the same servers up and I am going to try and configure XCache and Memcached a little better and then run the script over 100 iterations to get a more definitive result. As it stands though, NGINX has given me no reason to consider switching from Apache.

If you have any ideas for more conclusive tests I can perform, I’d love to hear them. Keep in mind that I am testing my specific site so I dont want to be directed to other sites that have performed similar tests.

EDIT: See round 2.

Cloud hosting for Australian based sites


One of the conundrums I have with owning Skylines Australia is its hosting.
Its currently hosted in an Australian data centre on a mega machine (16GB memory) and runs well but costs a FORTUNE! The other issue I have is I never know how big to go with my servers so I go HUGE incase I under-spec and then have problems with load.

I recently started looking into cloud hosting with Amazon or Rackspace, I’ve always been dubious because 90% of my end users are in Australia and I want to make sure the site is super fast for those users. As bandwidth in Australia has always been excessively expensive, the Cloud providers haven’t come here yet (well, I’m guessing that is the reason.)

As our ad’s on Skylines Australia haven’t been performing well lately, I need to find a cheap alternative soon.

Today I spun up a few instances on Rackspace Cloud Servers and Amazon EC2 (Elastic Compute Cloud) and RDS (Relational Database Service). Here are my findings.

Rackspace Cloud Servers

This was SO easy to set up, a few clicks and the server was running. It also has persistant storage unlike EC2 which means that once the server is set up, thats it, nothing more to do. Its security is much easier to handle too.
Performance: Not great. 292ms ping to my router. Far too slow. Unfortunately, this ruled Rackspace out.
Price: Great. About 50% for similar specs to my current server.

Amazon RDS

I am throwing the RDS in here as a bit of an FYI. I initially thought I’d put Skylines Australia’s database on the RDS and just host the web server in EC2 but RDS seems to be designed for small databases as anything over 1GB needs to be converted to a flat file (are you serious?) which means my uncompressed 7GB is not going to be easy. Scratch that.

Amazon EC2

EC2 has always looked good, but everything on the AWS site and console looks and feels difficult. Its complicated to even use the calculator to work out what its going to cost. The other issue is that the EBS (Elastic Block Storage) is an additional service and if you dont get that, anytime you reboot you’ll lose your data. Its not expensive, but its another unnecessary complication. The great thing about Amazon is they are expecting to open an Australian based location next year which will be fantastic.
Performance (Singapore): Ok, but nothing special. Ping was 192ms but the trace sent the hops through Tokyo which is the wrong direction for Australia.
Performace (Tokyo): Not bad! 132ms! Its still 100ms more than my ping to the current host though, so we’ll need to work on other ways to make the site faster.
Price: Similar to Rackspace, about 50% of what I currently pay.

I also pay for a second server that I replicate to for backups which can be spun up for a tiny cost with either Amazon or Rackspace. ($25 per month for a ‘micro’ instance with Amazon.)

Hopefully this has been helpful to you.

An introduction to distance-based searching in Sphinx


A friend of mine emailed the other day to ask how I implemented distance based searching and ordering on my old employer’s web sites. Instead of simply replying to him, I thought I’d write up a blog post as its been a fairly long time since I posted about anything and as I cant see myself doing anything like that again in the near future, it would be a good way to help retain the information.

Concepts

The concepts we’ll use are fairly simple. You will use several tools and technologies together to provide this service to your users. Keep in mind, this is not a HOW TO on installing and configuring Sphinx. This post assumes you can handle this part of the tasks on your own. I am happy to help with problems in the comments though.

MySQL

MySQL will be used to store your locations. I wont be going through versions, table creation, column creation or storage types etc. It really doesn’t matter as Sphinx will be handling all the spatial calculations.

Sphinx

Sphinx is a search engine. It powers a lot of big sites and I have been using it to power the search at Skylines Australia for many years. A new version 2 has recently been released (albeit still in beta) but we will be using 0.9.9 in this post.

Data

Now, geodata can be an expensive thing to obtain, but a wonderful free resource geonames.org have downloads for many locations which makes it affordable and easy. Australia’s file is here. Thanks Geonames!
The file is simply a tab separated file and is easy to parse (beyond the scope of this post.) But because I am so nice, I have converted it to sql for you all here. :)

The Setup

MySQL Setup

Create a database and import that linked SQL file above. You’ll now have a table named suburbs in that database. This is pretty much all you need to do for MySQL.

Sphinx Setup

Once Sphinx is installed (pretty easy based on your system; Redhat based OS can use Yum, Debian can use apt, Mac can use Macports or maybe Homebrew) you will need to copy the sphinx.conf.dist to sphinx.conf (in the directory it is in, or you can move it to /etc/sphinx.conf) and open it in a text editor.
This is a very basic config with only minimal changes. Things to note are the RADIANS() function calls in the sql query and that we store the radians as float attributes. Anything you want to filter on in Sphinx needs to be registered as an attribute. This config should be pretty straight forward.

Now we can start searchd and index the suburbs

$ searchd
$ sudo indexer --all --rotate

Hopefully you had no errors, or if you did, they were only permissions errors (or you didnt create the /var/data directory.) If no errors, lets move on.
To test the index, lets do a quick search using the Sphinx command line utility.
$ search -i suburbs ‘manly’
This is saying search the index (-i) suburbs with the term ‘manly’ and hopefully you see something like this;


Elwood /var/data: search -i suburbs 'manly'
Sphinx 0.9.9-release (r2117)
Copyright (c) 2001-2009, Andrew Aksyonoff

using config file '/opt/local/etc/sphinx/sphinx.conf'...
index 'suburbs': query 'manly ': returned 6 matches of 6 total in 0.019 sec

displaying matches:
1. document=159, weight=1, postcode=1655, lat=-0.589921, lon=2.640392
id=159
suburb=Manly
state=New South Wales
postcode=1655
lat=-33.80000000000000
lon=151.28330000000000

This row shows you what Sphinx is returning (The first row prefixed with 1) you can see the unique document ID (the first column in our sql statement in the config) the weight (various ways to weight records in Sphinx, wont get into that here) and also the attributes we registered (postcode, lat, lon.) The following rows are provided by MySQL using our second query in the config (sql_query_info = SELECT * FROM suburbs WHERE id=$id.)

So, now that its working, we’ll write some PHP so we can utilise the Sphinx API.

PHP Setup

Set yourself up a virtual host or put your code somewhere where you can access it, you can use php over the command line if you’d like.
In the Sphinx archive (you may need to obtain this from the Sphinx site if your package management doesn’t provide it. Using Macports, the archive was in “/opt/local/var/macports/distfiles/sphinx/”.) extract the archive and go to sphinx-0.9.9/api. You’ll need to copy the sphinxapi.php file into the directory where your test app is.
NOTE: There are other Sphinx API’s available including a pear one (http://pear.php.net/sphinx.)

The first few results look like this;


[186] => Array
(
[weight] => 1
[attrs] => Array
(
[postcode] => 2000
[lat] => -0.591083705425
[lon] => 2.63908100128
[@geodist] => 1.90258026123
)
)
[188] => Array
(
[weight] => 1
[attrs] => Array
(
[postcode] => 2000
[lat] => -0.591083705425
[lon] => 2.63908100128
[@geodist] => 1.90258026123
)
)
[189] => Array
(
[weight] => 1
[attrs] => Array
(
[postcode] => 2000
[lat] => -0.591083705425
[lon] => 2.63908100128
[@geodist] => 1.90258026123
)
)

And thats it. Once you have set Sphinx up and write your indexes, it can be seriously powerful and can provide many great searching possibilities for your application.

This has been a very shallow dive into Sphinx and was only intended as a simple showcase of how easy it is to have Sphinx do great things.

Will be happy to answer any questions (that I can) in the comments.

Reading and writing files in Adobe Air


The biggest fan (and only user) of my Adobe Air Tea Timer is Michael. Michael is a big fan of the Tea Timer, but he’s also the biggest critic, as you can see in the review he left it on Softpedia. Three stars for GUI and Four stars for features? Heart braking.

I figured to try and gain at least another .5 stars from Mickey, I thought I’d try a few new features.

The first is saving and loading preferences in a file. So that you can have your favorite brew time lengths ready for next time.

I found it really hard to find any decent documentation on this. The main issue I struggled with was that you cannot write files in the application directory, no matter how hard I tried, it just wouldn’t happen. Then, after my 400th search, I found a post on the Adobe blog stating that you can’t. SO, what you need to do is write files in the application’s storage directory. This took a long time to find and resolve.

In a nutshell, what we do is;

  1. Determine if the preferences file exists in the application storage directory.
  2. If it doesn’t exist, copy one from the application directory.
  3. Write the current settings to that file.
  4. Read the settings on each subsequent load.

Start with checking for the file, copying the empty one and returning it.

Next we write the prefs.

Lastly, we read the prefs.

Its worth noting the use of the Dom Parser in the final gist. This allows us to interact with the XML in the prefs file as we would interact with any DOM document (such as X/HTML.) This means I can safe the prefs in a structure that makes it easy to add / remove data with minimal effort (and no need to parse the file manually.)

The full javascript file for the Tea Timer app can be viewed on Github here. This post wasn’t intended as an in depth how to, but more of a reference for people wanting to interact with files and find the docs suck.

Simple Adobe Air app


Discouraged by the lack of decent Tea Timers, I decided it was time to broaden my horizons and build one myself. What better way to learn Air?

I started by getting the Adobe Air SDK and reading through the Hello World tutorial. Its all pretty simple.

My xml:

My html/js:

Testing the app using the SDK compiler adl was great and the app did exactly what I needed it to do. But, when it came to packaging it as an air app I was almost pulling my hair out. The documentation sucks for using the SDK to sign and package .air files, so out of sheer frustration, I downloaded an entire 85MB IDE just so I could package my app. This was the only part of the entire process that I hated. Adobe definitely needs to work on this aspect of it, one of their articles explained how to do it by purchasing a $300 certificate. Not a great option.

Anyhoo, heres the code and the app.

Super simple tweet script in Ruby


I was trying to work out how to get my social network coverage of Skylines Australia and I decided I'd write a small script that would tweet any popular topics.

Obtaining the popular topics was a piece of pie, an alter to my topics table to flag a topic as 'already popular' and a query that grabbed any topics with more than X posts with the last post in the last 24hrs;

SELECT tid, title, posts FROM ibf_topics WHERE (last_post BETWEEN unix_timestamp()-14400 AND unix_timestamp()) AND posts > 50 AND sau_is_popular = 0 LIMIT 1

Easy enough.

But, tweeting them was more difficult. There are lots and lots of gems for Twitter but not all of them have been updated to work with Twitters OAuth. When I found one, the documentation was a little lacking and it was not clear how to obtain the credentials for the OAuth.

Thats when this awesome command line script came to my aid.

The end result is a simple little ruby script on a 4 hourly cron that tweets popular topics. Sweet!


require "rubygems"
require "mysql"
require "twitter"
require "cgi"
require "yajl"

dbh = Mysql.real_connect("localhost", "user", "pass", "db")

res = dbh.query("SELECT tid, title, posts FROM ibf_topics WHERE (last_post BETWEEN unix_timestamp()-14400 AND unix_timestamp()) AND posts > 50 AND sau_is_popular = 0 LIMIT 1")

TWITTER_CONSUMER_KEY = 'x'
TWITTER_CONSUMER_SECRET = 'x'
TWITTER_ACCESS_TOKEN = 'x'
TWITTER_ACCESS_SECRET = 'x'

Twitter.configure do |config|
    config.consumer_key = TWITTER_CONSUMER_KEY
    config.consumer_secret = TWITTER_CONSUMER_SECRET
    config.oauth_token = TWITTER_ACCESS_TOKEN
    config.oauth_token_secret = TWITTER_ACCESS_SECRET
end

client = Twitter::Client.new

while row = res.fetch_hash do
    status = "Skylines Australia trending topic: "
    topic = CGI::unescapeHTML(row["title"])

    # We have 60 chars in our status, so that leaves us with up to 100 for the topic title
    if (topic.length > 100)
        topic = topic[0,97] + "..."
    end

    status = status + topic + " http://skya.us/st=" + row["tid"]

    # Post a status update
    client.update(status)

    dbh.query("UPDATE ibf_topics SET sau_is_popular = 1 WHERE tid = " + row["tid"])
end

res.free

Hopefully this helps you.

You can see the status updates at Twitter.com/skyaus

By the way, I wrote a small URL shortener which helps also but you could use the bit.ly api or something else if you'd like.

Never stop challenging yourself


As a developer, its super easy to fall into a rythym and do the same things, the same way, forever. This doesn’t help make you a better developer, in fact, the least skilled and hardest developers to work with are generally the guys who are closed off to the prospect of ‘a better way’.

You don’t need to work in a team to challenge yourself on approaches to problems, design styles, methodologies or even small, tiny, one-line code changes. Of course, working around other developers who you respect makes a huge difference and can be extremely beneficial to not only your technical knowledge but your ability to stay open minded and accept that your answer is quite possibly not the best one for the current problem.

I think that a lot of developers who work solo, especially in their first few years of their career, can get caught in an ‘I know best’ rut and are not willing or open to different points of view.

If you want to get better, you need to be challenged and that can be as mentally draining as a brain storming session, or as easy as writing a small script to test the differences between coding technicques or function calls.

Today, while working with my boss who I admire and respect for his technical prowess and general speed of uptake, I noticed that he takes a slightly different approach when using PHP variables in a MySQL INSERT query. In this particular scenario, there is a slight possibility that the variable will occasionally be an empty string which would cause syntax errors in the query. My boss wrapped the variable in an intval() call to default the value to 0 in case the string was empty. This was far different from my approach which would always to use a shorthand if statement to test the string and assign a value.

The intval() call in this scenario looks more elegant, but the shorthand statement is just something I have always done. I decided to put the two to the test by writing an unnecessary test on speed.

The first test used a shorthand statement with no function call;

<?php
$str = '';
$func_total = 0;
$sh_total = 0;

for ($i = 0; $i <=1000; $i++) {

$func_start = microtime(true);
$val = intval($str);
$func_total += (microtime(true)-$func_start);

$sh_start = microtime(true);
$val = ($str? 0 : 1);
$sh_total += (microtime(true)-$sh_start);
}

print 'Function call total: ' . $func_total . PHP_EOL;
print 'Short hand total: ' . $sh_total . PHP_EOL;

Function call total: 0.00370979309082
Short hand total: 0.00205898284912

Shorthand is quicker.

And, the second used an strlen() call;

<?php
$str = '';
$func_total = 0;
$sh_total = 0;

for ($i = 0; $i <=1000; $i++) {

$func_start = microtime(true);
$val = intval($str);
$func_total += (microtime(true)-$func_start);

$sh_start = microtime(true);
$val = (strlen($str) > 0 ? 0 : 1);
$sh_total += (microtime(true)-$sh_start);
}

print 'Function call total: ' . $func_total . PHP_EOL;
print 'Short hand total: ' . $sh_total . PHP_EOL;

And was still quicker;

Function call total: 0.00477647781372
Short hand total: 0.00359439849854

The test, as you can see, just simply executes the code 1000 times and records a total time elapsed for each one. Its silly, but was purely for my own curiosity. While the shorthand is faster, the intval() call is much nicer and I will likely opt to use it in future.

This is just one simple, quick method that challenges yourself and your methods for writing code. If you dont challenge yourself and let others challenge you, you have no hope of ever becoming a better developer.

Serving static content from Cloud Files using Ruby


I recently moved my site skylinesaustralia.com from hosting in the states back to hosting in Australia and while it is reasonably affordable, one thing I can't afford is a burst in bandwidth. I average 600GB outbound per month and I only have 600GB allowance with my server, so I decided to serve my gallery and post attachment images from Rackspaces Cloud Files service. The Cloud Files service is much like Amazons S3 service, only faster and for a similar amount of money.

My issue with this was that my gallery on SAU is 100GB in size, so using something to mount the Cloud Files 'container' locally using fuse or something similar is far too slow and without mounting it, I cannot use rsync. Even if I did manage to mount it, I had no way of doing an 'immediate' sync when files were uploaded.

After some thinking and chatting with a very cluey sys admin at work, I looked at inotify. Basically, inotify (and inotify-tools) alerts you of changes to files and directories, thus making it possible to write scripts based on changes to the file system. Awesome!

Rackspace provides a bunch of very nice API interfaces for all sorts of languages. I used Ruby, but the PHP one is also great.

So, my approach was to monitor a directory for changes, capture the changes, use the file name and location captured and push the changes via the Cloud Files API. Then, we can view the files using the Cloud Files CDN URL. This is easily the trickiest to get right, but the easiest to set up and requires very little integration to work for any site.

The first thing I did was write a little Ruby script to upload files to a container on Cloud Files. I wanted this script to remain generic so that I could use it to push my database backups to Cloud Files also.

require 'rubygems'
require 'cloudfiles'

# Log into the Cloud Files system
cf = CloudFiles::Connection.new("username", "APIKey")

if ARGV.empty? then
    print "Usage: \n"
    print "pushToCloud.rb <container> <remotefile> <localfile> \n"
    print "remote file MUST contain relative path under the container!\n"
else
    container = cf.container(ARGV[0])
    if container.object_exists?(ARGV[1]) then
        # object (file) exists
    else
        # object does not exist...
        newfile = container.create_object(ARGV[1], true)
        newfile.load_from_filename(ARGV[2])
    end
end

This is very simple. Connects to Cloud Files, checks if the object (file) exists, if not, it creates a new object and writes data to it. There are a few things to note here;

  1. An 'object' is the name of anything in Cloud Files; a file or a directory can be an 'object'
  2. Objects don't have a path, they are named with their path. ie /var/log/my.log gets sent as '/var/log/my.log' - the path is part of the name. (At least this is how I understand it)
  3. Objects can be created but not written to. This concept means that you need to create and object and then write data to it (you can see this in the code above.)

Now that I have my Ruby script, I will write a small bash script;

#!/bin/sh
function checkExists {
    if [ ! -e "$1" ]
    then
        sleep 5
        checkExists $1
    fi
}

inotifywait -mr --timefmt '%d/%m/%y-%H:%M' --format '%T %w %f' -e modify,moved_to,create,delete /home/skylines/html/forums/uploads | while read date dir file; do

    cloudpath=${dir:20}${file}
    # I only want everything after /home/skylines/html/
    localpath=${dir}${file}
    checkExists $localpath
    ruby /home/cbiggins/bin/pushToCloud.rb skylinesaustralia.com $cloudpath $localpath
    echo "${date} ruby /home/cbiggins/bin/pushToCloud.rb skylinesaustralia.com $cloudpath $localpath" >> /var/log/pushToCloud.log
done

So, this very simple script uses inotifywait to monitor my uploads directory for changes (modify, moved_to, create and delete changes) and writes the path and file to stdout, then my while loop grabs that output and reads it into the date, directory and file. We create a few paths for Cloud Files and locally and we pass them to my Ruby script. I am also a big advocate of logging everything, so I write my changes to a log file also. Note the checkExists function - if this bash script gets called before the file has completed uploading, then its not available to be pushed to Cloud FIles and we end up with errors, so this function just sleeps for 5 seconds if its not there and trys again until the file exists.

As you can see, pushing files to the 'Cloud' is extremely easy and apps like inotify and inotify-tools make it super simple to monitor and perform actions based on file system changes.

If you have any suggestions or questions, please don't hesitate to leave a comment below.

Thanks!

Dont get caught by differing super globals in PHP


Recently I discovered that scripts I had written on my Mac would not run on our Linux servers. It wasn’t until I had a good look at the contents of the $_SERVER super global that I noticed a few anomalies with the contents of mine compared with others that I have been used to for so long…

Here is the main difference between mine and Mikes $_SERVER super global.

PHP 5.2.9 on Fedora 10 (Mikes dev machine)
["REQUEST_URI"]=> string(9) “/test.php”

PHP 5.3.0 on Mac OS X (10.6)
["REQUEST_URI"]=> string(28) “http://fliquid.dev/blog.html”

As you can see, the changes are sufficient to cause a lot of problems to a lot of scripts. Writing a regex to match either your REQUEST_URI (in this example) would fail on a script I write when it goes live and also any script that Mike writes that I run.

To get around this, we will likely write a small ‘compat’ (compatibility) library for detecting which style of super global is being used and append specific changes to each one so we only need to write one version of the code.

This is just a little ‘heads up’ to reduce the amount of hair pulling that can occur when scripts are pushed to a nix production environment after being developed on a Mac.

←Older