Making sure you only have one instance of a script running

One of my tasks in my current job is to build large indexes for our Sphinx powered search engine. This sounds pretty simple, but when you have 2,000,000 records that have, in turn, one-to-many relationships with other records, you can be processing a huge amount of data at any given time.

The script that I wrote to gather the data in preparation for indexing was a little heavy on the memory side of things and it was a huge problem when that script had not completed in time for the next cron job, which would evidently start a giant snowball as for every instance of the script, it ran slower, so there would be no end in sight and we’d have to manually kill all the processes. Thats not a great solution for many reasons, but when data integrity is of the utmost importance, you can’t just go killing your scripts willy nilly.

I found a great little class written by Chris Hope at that touches a file and writes the scripts Process ID (PID) to the file so any other instances can check that file to see if their own PID is in there, if its not their PID, they die. Otherwise, they can continue. Also, if the file exists, but the PID inside the file is not valid anymore (ie the script was killed and had no opportunity to remove the file) the new instance can still run. Its a two-stage check.

I have made a few small adaptations to the original class and my version is here;

    class pid
        protected $filename;
        public $already_running = false;

        function __construct($directory)
            $this->filename = $directory . '/' . basename($_SERVER['PHP_SELF']) . '.pid';
            if(is_writable($this->filename) || is_writable($directory)) {
                if(file_exists($this->filename)) {
                    $pid = (int)trim(file_get_contents($this->filename));
                    if(file_exists('/proc/' . $pid)) {
                        $this->already_running = true;
            } else {
                die("Cannot write to pid file '$this->filename'. Program execution halted.\n");

            if(!$this->already_running) {
                $pid = getmypid();
                file_put_contents($this->filename, $pid);

        public function kill()
            // Make sure this script owns the file before we delete it...
            $pid = (int)trim(file_get_contents($this->filename));
            if(file_exists('/proc/' . $pid) && $pid == getmypid()) {

Can can be used like so;

    class newclass
        public function __construct()
            if ($this->checkPid()) {
                // Continue...

        private function killPid()

        private function checkPid()
            $this->pid = new pid('/tmp');
            if ($this->pid->already_running) {
                print 'Already running. Exiting.' . PHP_EOL;
            } else {
                return true;

The main difference is that I don’t use __destruct() as I wanted to be able to call a kill() method. The other notable difference is that my version does not use posix_kill() and instead checks that the process file in /proc exists. Keep in mind that this will not work on Windows or Mac operating systems as the /proc directory is unique to Linux. The reason I made this change is that posix_kill requires a PHP extension.

Thanks to Chris Hope for the original class.

Sphinx Full Text Search, What I have learned

For those of you that do not know what Sphinx is, check out their site here.


I started writing a post on Sphinx a few weeks ago, only to realise that the way I was approaching Sphinx was wrong and that I had to rethink my strategy. Anybody that follows my personal blog will see that I had to rewrite my Sphinx implementation several times because of my approach to the problem.

Let me just say that I was initially attracted to Sphinx a few years ago, when my site Skylines Australia kept hitting table-locks due to MyISAM. We needed MyISAM for Full Text searches, or at least we did, until we discovered Sphinx. Once our indexes were built, we were searching very quickly across 4.5 million posts with far less overhead than MySQL. It also meant we could change our table engine to InnoDB so we could get around the table-level-locks.

The issue is, that we have a lot of data here at work and we have lots of search inputs. Think ‘keyword’ and ‘category’ in ‘location’; We initially approached it like so;

  1. Search the location index, get a location id
  2. Search the Category index, get a category id
  3. Search the Main index using the location id and category id as filters.

Now, thats ok, but what if you have hierarchical categories and locations? Ie, Searching for ‘Mechanics’ needs to include ‘Motor Mechanics,’ ‘Bike Mechanics,’ ‘Boat Mechanics’ etc… We had one search that needed to search across 1800 categories! The other issue we ran into is we needed to have proximity searches. If there were less than n results at the Suburb Level, we would expand to the Area, then the Region, all the way to the State. This meant that we would need to run the searches 4 times (or 12 in total) so we could keep our results separate. Couple this with 1.9 million searchable rows, 25,000 locations and 2000 categories, searches were taking a __LONG__ time to complete with Sphinx.

So.. Back to the drawing board.

My boss mentioned that Sphinx is built to search documents and that maybe we need to make a ‘document’ structure for each of our records. I was a little dubious at first.. Here is the idea;

As Sphinx searches for matches and does not actually return the data it has found (only the data’s “document id” so you can query MySQL for the exact match) it means that our data in Sphinx does not need to be readable to humans, it can be a mashup of multiple data sources.

  1. We get all the items we want searchable and all their keywords, put them into a single column named ‘document’ and create a new table in MySQL.
  2. We get all the categories that this item must appear in, so not exact categories (like ‘Motor Mechanics’) but all the parents also, so we have fewer search terms, we get the category id’s and prefix them with ‘cat_’.
  3. We create a new column called ‘location’ and in that column, we get the id’s for the location the item is in, we prefix each ‘part’ of the location with its ‘type’, ie ’suburb_’, ‘area_’, ‘region_’, ’state_’, etc.

Once we are done, we have a row that looks like this;
Id: Actual id of the item row (relating to its original table)
Document: ‘Jims Mechanics quality mechanics we do good work cat_199 cat_432 cat_909 cat_2 cat_93′
Location: ‘Manly Sydney NSW Australia suburb_1022 area_300 region_23 state_3′

Once we build this table and index it, we can query the index using the PECL Sphinx library (much better than the supplied Sphinx PHP API) and Sphinx’s ‘EXTENDED2′ query type like so;
‘@document $keywords @document (cat_1|cat_2|cat_3) @location suburb_1′
AND is implied with Sphinx, so the above query is saying match the keywords, any of the 3 categories AND location. More about the cool EXTENDED2 query syntax can be found in the Sphinx Docs.

Basically, because we restructured our data to match Sphinx and didn’t try to make Sphinx work the way it wasn’t designed to, we saved lots and lots of queries, made it perform hundreds of times better (think thousandth’s of seconds to perform a search) and have alleviated MySQL of a lot of potentially damaging work. If you do anything with Full Text or require indexes for anything at all, I’d highly recommend Sphinx. It may seem a little odd at first and hard to get to do something, but persevere and try to rethink your situation, it will be worth it in the end.


Setting up a Facebook style link preview image

Have you ever noticed in facebook when you add a link that it generates a list of images from that link so you can choose one to display? It can really help to give an idea of what the link is about and helps to brighten things up by adding some colour.

I’ve recently implemented similar functionality on one of my sites so I thought I’d share some of it. The whole process is quite long and involved and you may want to do things differently depending on your needs so in this article we will just be focussing on the initial steps to build an array of image URLs in PHP. The idea behind this functionality is really very simple: Look at a page and extract all of the images matching a certain criteria – in this case we’ll look for any images larger (or equal to) in width and height than the thumbnail we want to display.

So let’s get started…


$link = '';
$width = 120;
$height = 90;
$regex = '/<img[^\/]+src="([^"]+\.(jpe?g|gif|png))/';

Basically in the lines above we’re setting up a few variables. The $link variable could really come from anywhere, it is the link that we’re going to pull images from. $width and $height are the minimum size of the images we’ll use – This can be set to whatever size you like depending on the thumbnail size you need. Lastly, we’re defining a regular expression that will be used to locate the src attribute of image tags. We’re looking for any img tag that has a JPG, GIF or PNG.


function rel2abs($url, $host) {
    if (substr($url, 0, 4) == 'http') {
        return $url;
    } else {
        $hparts = explode('/', $host);

        if ($url[0] == '/') {
            return implode('/', array_slice($hparts, 0, 3)) . $url;
        } else if ($url[0] != '.') {
            return implode('/', $hparts) . '/' . $url;

Here we are defining a function named rel2abs that is intended to convert a relative image src attribute (or any URL) into an absolute URL using a $host variable that is passed to it. We will use the original $link variable for this purpose. The function used here is very simplistic to keep the code short. It may not catch all possible cases. I would recommend expanding on this function a bit to make it more comprehensive before using it.


if (($data = file_get_contents($link)) && preg_match_all($regex, $data, $m, PREG_PATTERN_ORDER)) {
    if (isset($m[1]) && is_array($m[1])) {
        $thumbs = array();
        foreach (array_unique($m[1]) as $url) {
            if (
                ($url = rel2abs($url, $link)) &&
                ($i = getimagesize($url)) &&
                $i[0] >= ($width-10) &&
                $i[1] >= ($height-10)
            ) {
                $thumbs[] = $url;


This is the code that does all the work. The PHP function file_get_contents is used to retrieve the HTML content of the link, preg_match_all is then used with the previously defined regular expression to find all the images. Once all the images are found we then loop over them to determine if they match our size requirements and if so, we add them into the $thumbs array.

What you’re left with at the end of this code is an array of absolute URLs pointing to images that were on the link page. From here it is up to you what you do with the images. It might be that you want to display them for the user to choose one somehow or perhaps you’ve got something else in mind….

That’s all for now.

Milk framework example –

Yesterday the first fully functional site using the new Fliquid Studios Milk framework was lanched, This site is quite basic as far as the capabilities of the framework go and will no doubt implement some of the more advances features further down the track.

Some of the base features it does make good use of include:

  • A large number of base controls including form controls, layout controls etc.
  • Automatic Javascript & CSS concatenation and compression.
  • Good caching capabilities
  • CSS Sprite capabilities
  • Themes for controls

The site was developed relatively quickly on along with the actual development of the framework. Many new features were added over the last couple of weeks before the launch which is a testiment to the ease of development provided by Milk.

Creating a virtual development server using Virtualbox

For those of you who read Installing Fedora 10 on Windows XP using VirtualBox, this is another post along the same lines as that one, except this one is for configuring your virtual machine as a development server for when you are forced to code on your Windows host.


I am a bit of a snob when it comes to development, I simply refuse to develop on a windows host as I just think that the app will end up being deployed on a nix system, so it should be developed on a nix system. I also think using a windows host for development takes away from a lot of the raw development involved with using nix systems. And, with the amount of quality development and virtualization tools available to us these days, we can develop on anything, using any tool.

If you haven’t set up a virtual machine using Virtualbox and your choice of Linux OS on a Windows machine, then read through my other post before continuing with this one. Also note, you will need to have Apache configured and running on your virtual machine.

Now, what I am going to do is configure VirtualBox to accept connections to itself on port 80 and port 22, so we can develop on one machine (Windows) but using a Linux server.

Configure the network

The first thing we need to do is get our machine on our local network and a proper IP address for it. Open up the settings for your virtual machine and click on the ‘network’ tab, then select ‘Adapter 2’ and change ‘Attached To’ to ‘Host Interface’. This will allow your virtual machine to get a DHCP assigned IP address and actually put it on the network.


Save your changes and boot your VM.

Note: In later versions of VirtualBox (ie, 2.2.4) your Network adapter screen may look like the following, in which case just copy the settings I have used here.

Alternate VM Network

Configuring Apache / Port 80

Now we need to make sure our Guest Linux OS can accept connections on port 80. Fedora 10 blocks these connections by default, to open them, edit the file /etc/sysconfig/iptables using vim, emacs, gedit, pico, whatever. Find the line accepting port 22 connections that looks like this;

-A INPUT -m state –state NEW -m tcp -p tcp –dport 22 -j ACCEPT

And add a line below it like this;

-A INPUT -m state –state NEW -m tcp -p tcp –dport 80 -j ACCEPT

Now restart iptables;

$ service iptables restart

Forwarding port 80

The next step is to forward your port 80 to your virtual machine. You can forward any port you like, for example, you might have a web server on your Host machine and not want to forward port 80, so you can choose port 8080. I want a seamless virtual server environment, so I chose port 80.

Open up a command prompt (Start > Run > ‘cmd’ + Enter) and change to your Virtual Box directory;

cd C:\Program Files\Sun\xVM VirtualBox

Now we will get aquainted with the command line tool, ‘VBoxManage.exe’.

To forward port 80 on your host to port 80 on your guest, type the following (make sure you substitute “Fedora 10” with your VM’s name);
NOTE: Leo in the comments has recommended changing ‘pcnet’ to ‘e1000′ for Intel network cards!

VBoxManage.exe setextradata "Fedora 10" VBoxInternal/Devices/pcnet/0/LUN#0/Config/Apache/HostPort 80

VBoxManage.exe setextradata "Fedora 10" VBoxInternal/Devices/pcnet/0/LUN#0/Config/Apache/GuestPort 80

VBoxManage.exe setextradata "Fedora 10" VBoxInternal/Devices/pcnet/0/LUN#0/Config/Apache/Protocol TCP

Then we can view our changes using the following command;

VBoxManage.exe getextradata "Fedora 10" enumerate


Browsing to your virtual server

At this stage you should be able to browse to your virtual server using any of the name based or home based virtual hosts on your vm.
ie, browsing to localhost on your host machine should display the default virtual host on your guest.

However, if you have name based virtual hosts on your guest (as I do) then you need to add those names to your windows ‘hosts’ file so it knows to look for them locally and not on the internet. This is very easy to do.

Open the file c:\Windows\System32\drivers\etc\hosts and add the names of all your name based hosts after ‘localhost’ on the line starting with, eg;         localhost host1 host2 mysite

Then, after saving, these hosts on your guest should load in your browser from the guest machine.

Setup ssh server

If you want to edit the files on your guest from your host using SFTP through an IDE (like Netbeans, Zend Studio, Eclipse) or an app like WinSCP, then keep reading.

Having an SFTP connection to a server is a great way to integrate your remote files into your local development environment and makes editing much easier and faster. SFTP is FTP over SSH and requires only an SSH connection to the server to work. We need to make sure ssh is running on our Guest;

$ service sshd start

Now, we want to make sure it starts when the machine starts;

chkconfig sshd on

We now need to follow the same procedure to forward the host port 22 to the guest port 22 as we did for port 80 above;
NOTE: Leo in the comments has recommended changing ‘pcnet’ to ‘e1000′ for Intel network cards!

VBoxManage.exe setextradata "Fedora 10" VBoxInternal/Devices/pcnet/0/LUN#0/Config/ssh/HostPort 22

VBoxManage.exe setextradata "Fedora 10" VBoxInternal/Devices/pcnet/0/LUN#0/Config/ssh/GuestPort 22

VBoxManage.exe setextradata "Fedora 10" VBoxInternal/Devices/pcnet/0/LUN#0/Config/ssh/Protocol TCP

Now you will be able to ssh to the guest from your host. You can test this in putty (or anything you’d like) by connecting to on port 22 and logging into your guest machine with your normal user details.

Thats it. You now have a fully integrated virtual server for development. If you have any questions, suggestions, problems or concerns, please let me know in the comments.

Javascript Event Library Updated

Just a small post to announce that the Javascript Event Library has been updated with a few new methods and smaller size (reduced variable length, removed semicolons).

The updated code can be viewed and downloaded via our Projects page.

If you encounter any bugs or would like to collaborate in its development, please let me know.

5 development techniques to improve software quality

Most of these things you should really already be doing and if you’re not then I’d suggest starting to do so right away.

1. Step back and plan
It’s often hard not to just jump straight in to coding, especially with a project you are excited about. Try to resist that urge by stepping back and taking a bit of time to think about things before you start typing. Think about the problem are you trying to solve, any difficulties that may arise and come up with a potential solution. Even better than thinking about the problem and solution, write it down somewhere. If you start coding before you’ve completely got your head around the problem there’s a very good chance you will end up with little more than a mess.

After all if you start coding a solution before you know what that solution is how are you going to know when you arrive at that solution?

2. Document before coding
Documentation is another aspect of coding that is too often overlooked. Some developers believe it is not their job to document, others just don’t get around to it because of time constraints. Documentation of functions/methods and how particular sections of code are supposed to work can help with debugging and avoid the old “What was I/he/she thinking with this code?” situation.

I suggest that whenever you create a new class/function/method (or any other relevant piece of code) create the skeleton of it ONLY, then write the documentation. Only once the documentation is done should you think about implementing the guts of the function. This will ensure documentation doesn’t get forgotten and you will have a much clearer picture of what you are trying to achieve before you set out.

3. Adopt a coding standard and stick to it.
Use consistent indentation, layout, naming conventions etc across all of your code. It not only makes it easier for you and other to read and modify but it will also make it much easier to debug when you come back to look at the code in a few months time. This is easily one of the most important things you can do to improve the quality of your code but is also one of the most forgotten about or ignored.

4. Write test plans and make sure they are used
A lot of the time testing is performed very minimally or in a haphazard way. This is bad because leaves you open to miss areas while testing or not discovering obvious problems because the tester does know understand how the software will be used. This is unfortunately always a risk but it can be greatly minimised by writing various test plans and ensuring they are actually used. Test plans can be quite complex and extensive or simple use cases that are followed by the tester. Either way they are an excellent way of showing what has been tested and over time they can be expanded to cover more of your software.

Similarly to the writing of documentation for each class/function/method when they are created you should additionally create a set of tests for every function as or before it is coded. Again the tests can be expanded over time and are a great way of performing quick regression checks.

5. Reviews
This is by no means a new concept but it is definitely one that is under-utilised. Developers are often afraid of peer reviews because they don’t like their code being criticised and other developers can often be quite harsh. Don’t take it personally, it can be very useful to  have someone else’s eyes spot things you missed and after a few reviews your code will more than likely start to improve in general.

In addition to peer reviews, review your own code. Go back and have a quick read of what you’ve done in 1 week, 1 month, 4 months and 12 months later if you can. You’re almost guaranteed to come up with a better solution than you did before. Although you may not be able to implement your new ideas in that old project you may be able to use them in future.

I am a big believer in continually trying to improve my skills, knowledge and most of all coding standards. By using the techniques above (as well as others) the quality of software can be improved and over time so will your skills. Let us know your thoughts on the techniques listed above. Do you use them already? Do you think they will work for you or do you hate the idea altogether?

Also, I would love to hear of any other techniques people use relating to improving software quality. Leave a comment and let us know.

Resizing images in PHP with GD and Imagick

One task that is quite often required in websites and web applications is resizing of images for thumbnails or any other purpose. In PHP there’s a number of different ways you can approach image resizing each of which has it’s own advantages and disadvantages. In this post we will be using the GD and Imagick PHP extensions to resize images and comparing the differences.

When creating thumbnails there’s a few decisions that are required before you start coding. You need to decide on the width and height of your thumbnails, you need to decide whether you want your images to be cropped to fit your specified dimensions or not and you need to decide on an image format. In the examples below we will be showing how to resizing both using the “crop” method and without cropping.

So let’s launch into this article with a very brief look at the GD PHP extension.

GD is the more commonly used extension for PHP. It is generally easy to install (`yum install php-gd` on Fedora, CentOS etc or `sudo apt-get php5-gd` on ubuntu etc). It has quite extensive documentation with examples and as it is more commonly used there’s a lot more web content available for it.

The disadvantages of GD are that it is slower and more memory intensive. Overall it has slightly less functionality and can be more complex to use.
Let’s take a look at a quick resizing example. In this example we have a jpeg image that we are going to resize to be 150×150 pixels. We are not worried about the format of the image so we won’t be changing it.

function resize_image($file, $w, $h, $crop=FALSE) {
    list($width, $height) = getimagesize($file);
    $r = $width / $height;
    if ($crop) {
        if ($width > $height) {
            $width = ceil($width-($width*($r-$w/$h)));
        } else {
            $height = ceil($height-($height*($r-$w/$h)));
        $newwidth = $w;
        $newheight = $h;
    } else {
        if ($w/$h > $r) {
            $newwidth = $h*$r;
            $newheight = $h;
        } else {
            $newheight = $w/$r;
            $newwidth = $w;
    $src = imagecreatefromjpeg($file);
    $dst = imagecreatetruecolor($newwidth, $newheight);
    imagecopyresampled($dst, $src, 0, 0, 0, 0, $newwidth, $newheight, $width, $height);

    return $dst;

$img = resize_image(‘/path/to/some/image.jpg’, 150, 150);

Resizing images with GD requires that you create a new image and copy the re-sampled section to it. This means that any EXIF data contained in the image will be lost. Note that in the code above we are not actually doing anything with the image. You could either output it directly to the browser or save it to a file using the imagejpeg function.

On the other hand, the Imagick PECL extension has it’s own advantages and disadvantages.

Imagick is the less frequently used PECL extension for the ImageMagick project. ImageMagick is a free suite of tools used for image creation and manipulation supporting over 100 different image formats. It is typically a command line tool but also has interfaces available for a number of different programming languages (including PHP).

The Imagick extension essentially provides an API for all of the functionality available in the `convert` command line tool. Imagick is fast, uses less memory and has some powerful functionality available. Quite often Imagick is a lot easier to use (once you figure out how), your code may end up smaller and cleaner.

The down side of using this extension is that the documentation is extremely limited and there are next to no examples available on the web. Installation can often be a painful task as well although it should just be a matter of running the command `pecl install imagick`.

So now let’s move on to an example. As above we are resizing a jpeg image to 150×150 pixels.

function resize_image($file, $w, $h, $crop=FALSE) {
    $img = new Imagick($file);
    if ($crop) {
        $img->cropThumbnailImage($w, $h);
    } else {
        $img->thumbnailImage($w, $h, TRUE);

    return $img;
resize_image(‘/path/to/some/image.jpg’, 150, 150);

As you can see the Imagick code is much less complex. It doesn’t duplicate the image so it uses less memory and doesn’t lose the image EXIF data. Again note that in the code we are not actually doing anything with the image. You could save it to a file using the writeImage method or output it directly to the browser.

As mentioned, depending on your purpose there is advantages and disadvantages to using both GD and Imagick. It is really just a matter of deciding what you need to do and determining which will work best for you.

As usual all comments are welcomed. I’d love to hear your thoughts and preferences relating using either GD or Imagick. Additionally if you have any specific image manipulation requests let us know and we will do our best to answer your questions.

That’s all for now.

10 Sites that every web developer should know

This is a simple post listing some sites that I use very regularly. It is not an exhaustive list, by any means, but it might contain some sites that you didn’t know about. If you like any or have any sites to add to the list, please leave a comment.

Whats My Agent String?

Ok, so this is a small plug for our small Agent String displaying site. Its sometimes irritating to try and grab the agent strings from the browsers you are trying to support, especially if a client has a strange agent string and you are trying to get them to tell you what it is. This just takes a step out of the process.

Browser Shots

This is a brilliant site to render your code in many different browsers and then you can see screen shots of how it looks. Its not overly helpful for debugging but if you are confident your code supports the latest browsers, you can double check here.


Xenocode is a cool web based virtualization hub. It allows you to run a variety of applications (including all major browsers) from your browser without needing to install anything (oh except their firefox plugin). Its a great service. Its a little hard to find but you can also download the executable files from their site also meaning you can run the virtual browsers locally without needing to be online (as I have).


Alltop is an information resource of the current ‘hot topic’s’ online. It has a ‘tech’ category with hundreds of subcategories like ‘php’ and ‘web design’. You can also create your own account and have your own alltop custom page with news from areas that you think are relevant to you (which saves you having to browse around to find it). See my Alltop page here.


Nettuts is a blog for general web-relevant tutorials. From designing Wordpress themes, to building fancy UI features with JQuery. This site is a must-subscribe-to for any self respecting web worker.

Delicious – Programming

A great resource of the latest bookmarks on Delicious tagged with ‘programming’. You can also follow @delicious_prog on twitter for the best bookmarked sites tweeted to you. I am not sure if this is officially a Delicious service or not.

Clean CSS

Clean CSS is simply a site for cleaning up css. It has an options pane that allows you to make changes to the codes layout so that it fits with your existing coding conventions. If you have ever taken over a project with less-than-impressive coders, then this site will save you lots of time while making the project nicer to work on.

Smashing Magazine


While Smashing Magazine is primarily a design resource, they do have many good development posts that pop up from time to time. This blog is a fantastic way to keep yourself up to date with the latest in web design trends.

Devshed Forums

The Devshed forums are a great resource for all things tech. The users are friendly and due to their reputation system, it makes it fun to provide your help so you can gain reputation. Well worth looking at when you have that niggling problem that will just not go away.


Github is a free (and paid) git repository host with a social networking angle. Public repositories are free but if you want to make them private then you have to pay. A lof of big projects are on github including railsBlueprint, haml, YUI (Yahoo User Interface) and  scriptaculous. If you have a need for a certain script, its worth searching on here, also if you have a simple script or a work in progress, github make it very easy to manage your code. See my Github profile here.

UPDATE: Sorry about the broken links folks, all fixed now.

I have just used JSTidy and I have to say, its very VERY cool. If you have any compressed js that you need to return to a readable / editable format, this is just the tool for you.

Installing Fedora 10 on Windows XP using VirtualBox

For a lot of developers, they like to have multiple operating systems available for development testing or office chores on one and development on the other. Personally, I like Windows for email, organisation, gaming, but hate developing on windows, I prefer to develop in a Linux environment. Mike on the other hand, likes to have his primary OS as Linux but likes the option of a Windows machine every now and then for testing in IE or using Windows-only applications. For a long time the tried and tested method of running multiple OS’ on a single machine was to do the ol’ dual boot action which was ok, but became tedious if you forgot something from one OS and had to reboot a few times to get it. Virtualisation is not new by any means, but what is relatively new is the free availability of an extremely easy to use Virtual machine client such as VirtualBox. It makes setting up VirtualMachines extremely easy and takes the requirement of dual booting away for ever, especially when you can get such amazing performance from your guest operating system.

I will be installing Fedora 10 (guest) on a Windows XP (host).

What’s needed;
Virtualbox (I am using 2.1.4 in this post)
Fedora 10

Firstly, I’ll go step-by-step (oooh baby) to install VirtualBox and get the Live CD version of Fedora up and running under XP.

Once you have downloaded the VirtualBox installed and obtained a copy of the Fedora 10 Live CD iso (A Live CD is a CD that will boot and run a copy of Linux without needing to be installed, it’ll run entirely from memory – we do get an option to install to HDD once its booted though, which we will do), we need to simply run the VirtualBox setup and create a new virtual machine.

Install VirtualBox

Installing VirtualBox is as easy as installing any other application. Just follow the prompts. As it needs to install a bunch of extra network controllers and what not, you might get a bunch of warnings / confirmations from Windows about whether you would like to continue. If you don’t continue you may as well stop reading now. :)

Once VirtualBox is installed, open the app and click on the ‘New’ button.

Click next to start moving through the wizard.

Name your Virtual Machine and select its type from the drop downs.

Select the amount of memory to allocate to your Virtual Machine. I usually choose a minimum of 512mb or it gets just a little too slow to use. (You can also increase your video memory alocation at a later stage).

On this next screen we need to create the hard drive for the vm. Whats needed is a file for VirtualBox to cloak as a hard drive for the vm.
Click ‘New’ and then click ‘Next’ to start the virtual disk wizard.

This next screen is up to you. I like to know that if I need the extra space on the vm, the disk will grow as its needed, but if you are short on space or want to restrict it, you can simply have a fixed-size image. Whatever you prefer.

Name the image and select its base size (default of 8.0GB is fine).

Click Finish to confirm your image settings.

You will be sent back to the Virtual Hard Disk screen again, just click next.

Now you’re pretty much done. Just click through to the end of the wizard and you have created your Virtual Machine. What we need to do now is tell our new VM to use the Live CD iso disk image when we boot up. Go into the settings of your VM and click ‘CD/DVD-ROM’ and check the ‘Mount CD/DVD Drive’. Put a check in ‘ISO Image File’ and click the little folder icon on the right hand side.
Once that opens, click the ‘add’ button and select your iso file for Fedora 10.

Click ‘Select’ and you’ll return to the ‘CD/DVD-ROM’ screen.

Click ‘OK’ to exit the settings.

Now comes the exciting part. Click ‘Start’ above your Fedora 10 VM and watch the magic. You might get a notification about the host key (Right CTRL) or the colour settings (24 instead of 32 bit), just check ‘do not show this message again’ and click ‘ok’.

When Fedora has booted, you’ll notice its just running Fedora without installing anything, this is the Live CD at work. Click the icon on the desktop that says ‘Install to had drive’.

Install Fedora 10

Follow the prompts through the installer. All of these settings can be changed later so don’t worry too much. Make sure you disable ‘System clock uses UTC’ as it will conflict with the system clock and you’ll get all sorts of weird times displaying in your guest.

After inserting your root password, you will get a prompt stating that the partition cannot be read and would you like to format, click ‘Yes’. You will then come to the partitioning screen. Some people like to create separate partitions for /home and /boot and all the others, but personally as this is for development only (ie, nothing really saved on there) I just leave it as default and click next.

Now we just need to sit back and wait for the installer to do its thing.

When the installer is complete, click on ‘System’ and ‘Shut down’ – do not reboot as the Live CD image is still ‘in’ the machine and you will just boot back into that.

Once the machine is shut down, you can go back into the settings of the CD/DVD-ROM and uncheck the ‘Mount CD/DVD Drive’ (or you could leave that checked and select ‘Host CB/DVD Drive’ and check ‘Enable Passthrough’ for your system cd/dvd drive to work in the guest OS). Once that is done, start the VM again.
Fedora will boot into a ‘Welcome’ setup wizard.

Keep clicking forward until you reach the Create User screen. If you are new to Linux, then you should be aware that you need to have at least one other non-root account which you will use on a day to day basis, only switching to root for certain actions. Create your user account, select the date and time and then (if you want) send your hardware profile.

Once complete, you will receive the login screen, login with the user you created.

Installing KDE

If you are new to Linux, you wont know that there are many different ‘desktop environments’ available and the two most popular are Gnome (installed by default) and KDE. I much prefer KDE, so before I do anything in my new Guest OS, I will install that.
To install KDE:
Click ‘Applications’ > ‘System Tools’ > ‘Terminal’.
Then type the following (each new line represents ‘enter’)

$ su –
{Enter your password}
$ yum -y install kdebase kde-workspace

This is about 90MB.

Once installed, we need to log out (‘Administration’ > ‘Logout’) and we need to restart the X environment by holding CTRL+ALT and hitting Backspace.
When X comes back up, we can log back in, but we want to go back to the main login screen (not the ‘this screen is locked’ screen) and down the bottom, from the ’sessions’ list, select KDE, then login.
Once you log in, you’ll see that KDE is super sexy!
Now, we need to update our system and install some other bits to make our VirtualBox experience even greater.

Update the system

In our ‘F’ menu, click into ’search’ type ‘Term’ and open the terminal.

$ su –
{enter your password}
$ yum -y update

Once that has downloaded and installed all updates (337 updates for me at 400MB) we’ll start getting the system ready for the really cool VirtualBox features and we’ll also get a better resolution, too.

In the terminal, we need to install a few new items.

Installing VirtualBox Guest Additions


$ yum -y install kernel-headers gcc

Note: thanks to Carlos in the comments, kernel-devel is also needed!
After it has installed, reboot the machine (its not Windows, I know, but we did just change the kernel files).
When it has booted back up and you have logged in, un-capture your mouse with right CTRL, then on the VirtualBox window, click ‘Devices’ > ‘Install Guest Additions’. This mounts a directory within your Guest file system but you wont be able to see it till you open it with a file browser, so open the F menu, go to File Manager (Dolphin) and browse to /media/VBOXADDITIONS* (Whatever version you have). Then in your terminal…

$ su –
{enter your password}
$ cd /media/VBOXADDITIONS*
$ sh

Wait till the install has finished and reboot again (I know, I know…)

Now it starts getting sexy, on boot up, you’ll receive a message about mouse pointer integration, this means that you can mouse in and out of your guest without needing to capture and release your mouse from VirtualBox. You’ll also likely have a better resolution and you will also have cut’n'paste between host and guest (although in my experience, this functionality can be a little flakey). Right CTRL + F will make your guest full screen and if you click on ‘Machine’ (in VirtualBox top menu) and select ‘Auto resize guest display’ you can have your display resize when you resize / full screen the window which is cool.


You can also do a lot of other configuration tweaks to VirtualBox based on what you want out of it, but for me, this is all I (and most other devs) need. Head on over to the VirtualBox site and check out their wiki if you want to explore some of the other features (like port forwarding to the guest, which is also cool but a bugger to get working).

Now you have a fully functioning Fedora 10 install on your windows machine. You can install / remove / develop / whatever you want. If its in full screen, you wont even notice windows is there (which is always nice).

Following these instructions, you can install any OS inside any other OS. Windows in Mac, FreeBSD in Fedora, Centos in Mac. Whatever you want. VirtualBox really does make virtualisation seriously easy and available to anybody with only minimal tech knowledge needed.

For developers, you might want to start by installing a few of the development packages available to you.


Do you use virtualisation or dual booting? Is there anything you liked or didnt like about this post? Please leave a comment.