Recently I discovered that scripts I had written on my Mac would not run on our Linux servers. It wasn’t until I had a good look at the contents of the $_SERVER super global that I noticed a few anomalies with the contents of mine compared with others that I have been used to for so long…
Here is the main difference between mine and Mikes $_SERVER super global.
PHP 5.2.9 on Fedora 10 (Mikes dev machine)
["REQUEST_URI"]=> string(9) “/test.php”
PHP 5.3.0 on Mac OS X (10.6)
["REQUEST_URI"]=> string(28) “http://fliquid.dev/blog.html”
As you can see, the changes are sufficient to cause a lot of problems to a lot of scripts. Writing a regex to match either your REQUEST_URI (in this example) would fail on a script I write when it goes live and also any script that Mike writes that I run.
To get around this, we will likely write a small ‘compat’ (compatibility) library for detecting which style of super global is being used and append specific changes to each one so we only need to write one version of the code.
This is just a little ‘heads up’ to reduce the amount of hair pulling that can occur when scripts are pushed to a nix production environment after being developed on a Mac.
One of my tasks in my current job is to build large indexes for our Sphinx powered search engine. This sounds pretty simple, but when you have 2,000,000 records that have, in turn, one-to-many relationships with other records, you can be processing a huge amount of data at any given time.
The script that I wrote to gather the data in preparation for indexing was a little heavy on the memory side of things and it was a huge problem when that script had not completed in time for the next cron job, which would evidently start a giant snowball as for every instance of the script, it ran slower, so there would be no end in sight and we’d have to manually kill all the processes. Thats not a great solution for many reasons, but when data integrity is of the utmost importance, you can’t just go killing your scripts willy nilly.
I found a great little class written by Chris Hope at electrictoolbox.com that touches a file and writes the scripts Process ID (PID) to the file so any other instances can check that file to see if their own PID is in there, if its not their PID, they die. Otherwise, they can continue. Also, if the file exists, but the PID inside the file is not valid anymore (ie the script was killed and had no opportunity to remove the file) the new instance can still run. Its a two-stage check.
I have made a few small adaptations to the original class and my version is here;
<?php
class pid
{
protected $filename;
public $already_running = false;
function __construct($directory)
{
$this->filename = $directory . '/' . basename($_SERVER['PHP_SELF']) . '.pid';
if(is_writable($this->filename) || is_writable($directory)) {
if(file_exists($this->filename)) {
$pid = (int)trim(file_get_contents($this->filename));
if(file_exists('/proc/' . $pid)) {
$this->already_running = true;
}
}
} else {
die("Cannot write to pid file '$this->filename'. Program execution halted.\n");
}
if(!$this->already_running) {
$pid = getmypid();
file_put_contents($this->filename, $pid);
}
}
public function kill()
{
// Make sure this script owns the file before we delete it...
$pid = (int)trim(file_get_contents($this->filename));
if(file_exists('/proc/' . $pid) && $pid == getmypid()) {
unlink($this->filename);
}
}
}
Can can be used like so;
<?php
class newclass
{
public function __construct()
{
if ($this->checkPid()) {
// Continue...
$this->killPid();
}
}
private function killPid()
{
$this->pid->kill();
}
private function checkPid()
{
$this->pid = new pid('/tmp');
if ($this->pid->already_running) {
print 'Already running. Exiting.' . PHP_EOL;
exit;
} else {
return true;
}
}
}
The main difference is that I don’t use __destruct() as I wanted to be able to call a kill() method. The other notable difference is that my version does not use posix_kill() and instead checks that the process file in /proc exists. Keep in mind that this will not work on Windows or Mac operating systems as the /proc directory is unique to Linux. The reason I made this change is that posix_kill requires a PHP extension.
For those of you that do not know what Sphinx is, check out their site here.
I started writing a post on Sphinx a few weeks ago, only to realise that the way I was approaching Sphinx was wrong and that I had to rethink my strategy. Anybody that follows my personal blog will see that I had to rewrite my Sphinx implementation several times because of my approach to the problem.
Let me just say that I was initially attracted to Sphinx a few years ago, when my site Skylines Australia kept hitting table-locks due to MyISAM. We needed MyISAM for Full Text searches, or at least we did, until we discovered Sphinx. Once our indexes were built, we were searching very quickly across 4.5 million posts with far less overhead than MySQL. It also meant we could change our table engine to InnoDB so we could get around the table-level-locks.
The issue is, that we have a lot of data here at work and we have lots of search inputs. Think ‘keyword’ and ‘category’ in ‘location’; We initially approached it like so;
Search the location index, get a location id
Search the Category index, get a category id
Search the Main index using the location id and category id as filters.
Now, thats ok, but what if you have hierarchical categories and locations? Ie, Searching for ‘Mechanics’ needs to include ‘Motor Mechanics,’ ‘Bike Mechanics,’ ‘Boat Mechanics’ etc… We had one search that needed to search across 1800 categories! The other issue we ran into is we needed to have proximity searches. If there were less than n results at the Suburb Level, we would expand to the Area, then the Region, all the way to the State. This meant that we would need to run the searches 4 times (or 12 in total) so we could keep our results separate. Couple this with 1.9 million searchable rows, 25,000 locations and 2000 categories, searches were taking a __LONG__ time to complete with Sphinx.
So.. Back to the drawing board.
My boss mentioned that Sphinx is built to search documents and that maybe we need to make a ‘document’ structure for each of our records. I was a little dubious at first.. Here is the idea;
As Sphinx searches for matches and does not actually return the data it has found (only the data’s “document id” so you can query MySQL for the exact match) it means that our data in Sphinx does not need to be readable to humans, it can be a mashup of multiple data sources.
We get all the items we want searchable and all their keywords, put them into a single column named ‘document’ and create a new table in MySQL.
We get all the categories that this item must appear in, so not exact categories (like ‘Motor Mechanics’) but all the parents also, so we have fewer search terms, we get the category id’s and prefix them with ‘cat_’.
We create a new column called ‘location’ and in that column, we get the id’s for the location the item is in, we prefix each ‘part’ of the location with its ‘type’, ie ’suburb_’, ‘area_’, ‘region_’, ’state_’, etc.
Once we are done, we have a row that looks like this;
Id: Actual id of the item row (relating to its original table)
Document: ‘Jims Mechanics quality mechanics we do good work cat_199 cat_432 cat_909 cat_2 cat_93′
Location: ‘Manly Sydney NSW Australia suburb_1022 area_300 region_23 state_3′
Once we build this table and index it, we can query the index using the PECL Sphinx library (much better than the supplied Sphinx PHP API) and Sphinx’s ‘EXTENDED2′ query type like so;
‘@document $keywords @document (cat_1|cat_2|cat_3) @location suburb_1′
AND is implied with Sphinx, so the above query is saying match the keywords, any of the 3 categories AND location. More about the cool EXTENDED2 query syntax can be found in the Sphinx Docs.
Basically, because we restructured our data to match Sphinx and didn’t try to make Sphinx work the way it wasn’t designed to, we saved lots and lots of queries, made it perform hundreds of times better (think thousandth’s of seconds to perform a search) and have alleviated MySQL of a lot of potentially damaging work. If you do anything with Full Text or require indexes for anything at all, I’d highly recommend Sphinx. It may seem a little odd at first and hard to get to do something, but persevere and try to rethink your situation, it will be worth it in the end.
Have you ever noticed in facebook when you add a link that it generates a list of images from that link so you can choose one to display? It can really help to give an idea of what the link is about and helps to brighten things up by adding some colour.
I’ve recently implemented similar functionality on one of my sites so I thought I’d share some of it. The whole process is quite long and involved and you may want to do things differently depending on your needs so in this article we will just be focussing on the initial steps to build an array of image URLs in PHP. The idea behind this functionality is really very simple: Look at a page and extract all of the images matching a certain criteria – in this case we’ll look for any images larger (or equal to) in width and height than the thumbnail we want to display.
Basically in the lines above we’re setting up a few variables. The $link variable could really come from anywhere, it is the link that we’re going to pull images from. $width and $height are the minimum size of the images we’ll use – This can be set to whatever size you like depending on the thumbnail size you need. Lastly, we’re defining a regular expression that will be used to locate the src attribute of image tags. We’re looking for any img tag that has a JPG, GIF or PNG.
Here we are defining a function named rel2abs that is intended to convert a relative image src attribute (or any URL) into an absolute URL using a $host variable that is passed to it. We will use the original $link variable for this purpose. The function used here is very simplistic to keep the code short. It may not catch all possible cases. I would recommend expanding on this function a bit to make it more comprehensive before using it.
This is the code that does all the work. The PHP function file_get_contents is used to retrieve the HTML content of the link, preg_match_all is then used with the previously defined regular expression to find all the images. Once all the images are found we then loop over them to determine if they match our size requirements and if so, we add them into the $thumbs array.
What you’re left with at the end of this code is an array of absolute URLs pointing to images that were on the link page. From here it is up to you what you do with the images. It might be that you want to display them for the user to choose one somehow or perhaps you’ve got something else in mind….
Yesterday the first fully functional site using the new Fliquid Studios Milk framework was lanched, www.martialartslife.net. This site is quite basic as far as the capabilities of the framework go and will no doubt implement some of the more advances features further down the track.
Some of the base features it does make good use of include:
A large number of base controls including form controls, layout controls etc.
Automatic Javascript & CSS concatenation and compression.
Good caching capabilities
CSS Sprite capabilities
Themes for controls
The site was developed relatively quickly on along with the actual development of the framework. Many new features were added over the last couple of weeks before the launch which is a testiment to the ease of development provided by Milk.
For those of you who read Installing Fedora 10 on Windows XP using VirtualBox, this is another post along the same lines as that one, except this one is for configuring your virtual machine as a development server for when you are forced to code on your Windows host.
I am a bit of a snob when it comes to development, I simply refuse to develop on a windows host as I just think that the app will end up being deployed on a nix system, so it should be developed on a nix system. I also think using a windows host for development takes away from a lot of the raw development involved with using nix systems. And, with the amount of quality development and virtualization tools available to us these days, we can develop on anything, using any tool.
If you haven’t set up a virtual machine using Virtualbox and your choice of Linux OS on a Windows machine, then read through my other post before continuing with this one. Also note, you will need to have Apache configured and running on your virtual machine.
Now, what I am going to do is configure VirtualBox to accept connections to itself on port 80 and port 22, so we can develop on one machine (Windows) but using a Linux server.
Configure the network
The first thing we need to do is get our machine on our local network and a proper IP address for it. Open up the settings for your virtual machine and click on the ‘network’ tab, then select ‘Adapter 2’ and change ‘Attached To’ to ‘Host Interface’. This will allow your virtual machine to get a DHCP assigned IP address and actually put it on the network.
Save your changes and boot your VM.
Note: In later versions of VirtualBox (ie, 2.2.4) your Network adapter screen may look like the following, in which case just copy the settings I have used here.
Configuring Apache / Port 80
Now we need to make sure our Guest Linux OS can accept connections on port 80. Fedora 10 blocks these connections by default, to open them, edit the file /etc/sysconfig/iptables using vim, emacs, gedit, pico, whatever. Find the line accepting port 22 connections that looks like this;
-A INPUT -m state –state NEW -m tcp -p tcp –dport 22 -j ACCEPT
And add a line below it like this;
-A INPUT -m state –state NEW -m tcp -p tcp –dport 80 -j ACCEPT
Now restart iptables;
$ service iptables restart
Forwarding port 80
The next step is to forward your port 80 to your virtual machine. You can forward any port you like, for example, you might have a web server on your Host machine and not want to forward port 80, so you can choose port 8080. I want a seamless virtual server environment, so I chose port 80.
Open up a command prompt (Start > Run > ‘cmd’ + Enter) and change to your Virtual Box directory;
cd C:\Program Files\Sun\xVM VirtualBox
Now we will get aquainted with the command line tool, ‘VBoxManage.exe’.
To forward port 80 on your host to port 80 on your guest, type the following (make sure you substitute “Fedora 10” with your VM’s name); NOTE:Leo in the comments has recommended changing ‘pcnet’ to ‘e1000′ for Intel network cards!
Then we can view our changes using the following command;
VBoxManage.exe getextradata "Fedora 10" enumerate
Browsing to your virtual server
At this stage you should be able to browse to your virtual server using any of the name based or home based virtual hosts on your vm. ie, browsing to localhost on your host machine should display the default virtual host on your guest.
However, if you have name based virtual hosts on your guest (as I do) then you need to add those names to your windows ‘hosts’ file so it knows to look for them locally and not on the internet. This is very easy to do.
Open the file c:\Windows\System32\drivers\etc\hosts and add the names of all your name based hosts after ‘localhost’ on the line starting with 127.0.0.1, eg;
127.0.0.1 localhost host1 host2 mysite
Then, after saving, these hosts on your guest should load in your browser from the guest machine.
Setup ssh server
If you want to edit the files on your guest from your host using SFTP through an IDE (like Netbeans, Zend Studio, Eclipse) or an app like WinSCP, then keep reading.
Having an SFTP connection to a server is a great way to integrate your remote files into your local development environment and makes editing much easier and faster. SFTP is FTP over SSH and requires only an SSH connection to the server to work. We need to make sure ssh is running on our Guest;
$ service sshd start
Now, we want to make sure it starts when the machine starts;
chkconfig sshd on
We now need to follow the same procedure to forward the host port 22 to the guest port 22 as we did for port 80 above; NOTE:Leo in the comments has recommended changing ‘pcnet’ to ‘e1000′ for Intel network cards!
Now you will be able to ssh to the guest from your host. You can test this in putty (or anything you’d like) by connecting to 127.0.0.1 on port 22 and logging into your guest machine with your normal user details.
Thats it. You now have a fully integrated virtual server for development. If you have any questions, suggestions, problems or concerns, please let me know in the comments.
Just a small post to announce that the Javascript Event Library has been updated with a few new methods and smaller size (reduced variable length, removed semicolons).
The updated code can be viewed and downloaded via our Projects page.
If you encounter any bugs or would like to collaborate in its development, please let me know.
Most of these things you should really already be doing and if you’re not then I’d suggest starting to do so right away.
1. Step back and plan
It’s often hard not to just jump straight in to coding, especially with a project you are excited about. Try to resist that urge by stepping back and taking a bit of time to think about things before you start typing. Think about the problem are you trying to solve, any difficulties that may arise and come up with a potential solution. Even better than thinking about the problem and solution, write it down somewhere. If you start coding before you’ve completely got your head around the problem there’s a very good chance you will end up with little more than a mess.
After all if you start coding a solution before you know what that solution is how are you going to know when you arrive at that solution?
2. Document before coding
Documentation is another aspect of coding that is too often overlooked. Some developers believe it is not their job to document, others just don’t get around to it because of time constraints. Documentation of functions/methods and how particular sections of code are supposed to work can help with debugging and avoid the old “What was I/he/she thinking with this code?” situation.
I suggest that whenever you create a new class/function/method (or any other relevant piece of code) create the skeleton of it ONLY, then write the documentation. Only once the documentation is done should you think about implementing the guts of the function. This will ensure documentation doesn’t get forgotten and you will have a much clearer picture of what you are trying to achieve before you set out.
3. Adopt a coding standard and stick to it.
Use consistent indentation, layout, naming conventions etc across all of your code. It not only makes it easier for you and other to read and modify but it will also make it much easier to debug when you come back to look at the code in a few months time. This is easily one of the most important things you can do to improve the quality of your code but is also one of the most forgotten about or ignored.
4. Write test plans and make sure they are used
A lot of the time testing is performed very minimally or in a haphazard way. This is bad because leaves you open to miss areas while testing or not discovering obvious problems because the tester does know understand how the software will be used. This is unfortunately always a risk but it can be greatly minimised by writing various test plans and ensuring they are actually used. Test plans can be quite complex and extensive or simple use cases that are followed by the tester. Either way they are an excellent way of showing what has been tested and over time they can be expanded to cover more of your software.
Similarly to the writing of documentation for each class/function/method when they are created you should additionally create a set of tests for every function as or before it is coded. Again the tests can be expanded over time and are a great way of performing quick regression checks.
5. Reviews
This is by no means a new concept but it is definitely one that is under-utilised. Developers are often afraid of peer reviews because they don’t like their code being criticised and other developers can often be quite harsh. Don’t take it personally, it can be very useful to have someone else’s eyes spot things you missed and after a few reviews your code will more than likely start to improve in general.
In addition to peer reviews, review your own code. Go back and have a quick read of what you’ve done in 1 week, 1 month, 4 months and 12 months later if you can. You’re almost guaranteed to come up with a better solution than you did before. Although you may not be able to implement your new ideas in that old project you may be able to use them in future.
Conclusion
I am a big believer in continually trying to improve my skills, knowledge and most of all coding standards. By using the techniques above (as well as others) the quality of software can be improved and over time so will your skills. Let us know your thoughts on the techniques listed above. Do you use them already? Do you think they will work for you or do you hate the idea altogether?
Also, I would love to hear of any other techniques people use relating to improving software quality. Leave a comment and let us know.
One task that is quite often required in websites and web applications is resizing of images for thumbnails or any other purpose. In PHP there’s a number of different ways you can approach image resizing each of which has it’s own advantages and disadvantages. In this post we will be using the GD and Imagick PHP extensions to resize images and comparing the differences.
When creating thumbnails there’s a few decisions that are required before you start coding. You need to decide on the width and height of your thumbnails, you need to decide whether you want your images to be cropped to fit your specified dimensions or not and you need to decide on an image format. In the examples below we will be showing how to resizing both using the “crop” method and without cropping.
So let’s launch into this article with a very brief look at the GD PHP extension.
GD
GD is the more commonly used extension for PHP. It is generally easy to install (`yum install php-gd` on Fedora, CentOS etc or `sudo apt-get php5-gd` on ubuntu etc). It has quite extensive documentation with examples and as it is more commonly used there’s a lot more web content available for it.
The disadvantages of GD are that it is slower and more memory intensive. Overall it has slightly less functionality and can be more complex to use.
Let’s take a look at a quick resizing example. In this example we have a jpeg image that we are going to resize to be 150×150 pixels. We are not worried about the format of the image so we won’t be changing it.
Resizing images with GD requires that you create a new image and copy the re-sampled section to it. This means that any EXIF data contained in the image will be lost. Note that in the code above we are not actually doing anything with the image. You could either output it directly to the browser or save it to a file using the imagejpeg function.
On the other hand, the Imagick PECL extension has it’s own advantages and disadvantages.
Imagick
Imagick is the less frequently used PECL extension for the ImageMagick project. ImageMagick is a free suite of tools used for image creation and manipulation supporting over 100 different image formats. It is typically a command line tool but also has interfaces available for a number of different programming languages (including PHP).
The Imagick extension essentially provides an API for all of the functionality available in the `convert` command line tool. Imagick is fast, uses less memory and has some powerful functionality available. Quite often Imagick is a lot easier to use (once you figure out how), your code may end up smaller and cleaner.
The down side of using this extension is that the documentation is extremely limited and there are next to no examples available on the web. Installation can often be a painful task as well although it should just be a matter of running the command `pecl install imagick`.
So now let’s move on to an example. As above we are resizing a jpeg image to 150×150 pixels.
<?php
function resize_image($file, $w, $h, $crop=FALSE) {
$img = new Imagick($file);
if ($crop) {
$img->cropThumbnailImage($w, $h);
} else {
$img->thumbnailImage($w, $h, TRUE);
}
return $img;
}
resize_image(‘/path/to/some/image.jpg’, 150, 150);
As you can see the Imagick code is much less complex. It doesn’t duplicate the image so it uses less memory and doesn’t lose the image EXIF data. Again note that in the code we are not actually doing anything with the image. You could save it to a file using the writeImage method or output it directly to the browser.
Conclusion
As mentioned, depending on your purpose there is advantages and disadvantages to using both GD and Imagick. It is really just a matter of deciding what you need to do and determining which will work best for you.
As usual all comments are welcomed. I’d love to hear your thoughts and preferences relating using either GD or Imagick. Additionally if you have any specific image manipulation requests let us know and we will do our best to answer your questions.
This is a simple post listing some sites that I use very regularly. It is not an exhaustive list, by any means, but it might contain some sites that you didn’t know about. If you like any or have any sites to add to the list, please leave a comment.
Ok, so this is a small plug for our small Agent String displaying site. Its sometimes irritating to try and grab the agent strings from the browsers you are trying to support, especially if a client has a strange agent string and you are trying to get them to tell you what it is. This just takes a step out of the process.
This is a brilliant site to render your code in many different browsers and then you can see screen shots of how it looks. Its not overly helpful for debugging but if you are confident your code supports the latest browsers, you can double check here.
Xenocode is a cool web based virtualization hub. It allows you to run a variety of applications (including all major browsers) from your browser without needing to install anything (oh except their firefox plugin). Its a great service. Its a little hard to find but you can also download the executable files from their site also meaning you can run the virtual browsers locally without needing to be online (as I have).
Alltop is an information resource of the current ‘hot topic’s’ online. It has a ‘tech’ category with hundreds of subcategories like ‘php’ and ‘web design’. You can also create your own account and have your own alltop custom page with news from areas that you think are relevant to you (which saves you having to browse around to find it). See my Alltop page here.
Nettuts is a blog for general web-relevant tutorials. From designing Wordpress themes, to building fancy UI features with JQuery. This site is a must-subscribe-to for any self respecting web worker.
A great resource of the latest bookmarks on Delicious tagged with ‘programming’. You can also follow @delicious_prog on twitter for the best bookmarked sites tweeted to you. I am not sure if this is officially a Delicious service or not.
Clean CSS is simply a site for cleaning up css. It has an options pane that allows you to make changes to the codes layout so that it fits with your existing coding conventions. If you have ever taken over a project with less-than-impressive coders, then this site will save you lots of time while making the project nicer to work on.
While Smashing Magazine is primarily a design resource, they do have many good development posts that pop up from time to time. This blog is a fantastic way to keep yourself up to date with the latest in web design trends.
The Devshed forums are a great resource for all things tech. The users are friendly and due to their reputation system, it makes it fun to provide your help so you can gain reputation. Well worth looking at when you have that niggling problem that will just not go away.
Github is a free (and paid) git repository host with a social networking angle. Public repositories are free but if you want to make them private then you have to pay. A lof of big projects are on github including rails, Blueprint, haml, YUI (Yahoo User Interface) and scriptaculous. If you have a need for a certain script, its worth searching on here, also if you have a simple script or a work in progress, github make it very easy to manage your code. See my Github profile here.
UPDATE: Sorry about the broken links folks, all fixed now.
UPDATE #2:
I have just used JSTidy and I have to say, its very VERY cool. If you have any compressed js that you need to return to a readable / editable format, this is just the tool for you.