An introduction to distance-based searching in Sphinx

A friend of mine emailed the other day to ask how I implemented distance based searching and ordering on my old employer’s web sites. Instead of simply replying to him, I thought I’d write up a blog post as its been a fairly long time since I posted about anything and as I cant see myself doing anything like that again in the near future, it would be a good way to help retain the information.

Concepts

The concepts we’ll use are fairly simple. You will use several tools and technologies together to provide this service to your users. Keep in mind, this is not a HOW TO on installing and configuring Sphinx. This post assumes you can handle this part of the tasks on your own. I am happy to help with problems in the comments though.

MySQL

MySQL will be used to store your locations. I wont be going through versions, table creation, column creation or storage types etc. It really doesn’t matter as Sphinx will be handling all the spatial calculations.

Sphinx

Sphinx is a search engine. It powers a lot of big sites and I have been using it to power the search at Skylines Australia for many years. A new version 2 has recently been released (albeit still in beta) but we will be using 0.9.9 in this post.

Data

Now, geodata can be an expensive thing to obtain, but a wonderful free resource geonames.org have downloads for many locations which makes it affordable and easy. Australia’s file is here. Thanks Geonames!
The file is simply a tab separated file and is easy to parse (beyond the scope of this post.) But because I am so nice, I have converted it to sql for you all here. :)

The Setup

MySQL Setup

Create a database and import that linked SQL file above. You’ll now have a table named suburbs in that database. This is pretty much all you need to do for MySQL.

Sphinx Setup

Once Sphinx is installed (pretty easy based on your system; Redhat based OS can use Yum, Debian can use apt, Mac can use Macports or maybe Homebrew) you will need to copy the sphinx.conf.dist to sphinx.conf (in the directory it is in, or you can move it to /etc/sphinx.conf) and open it in a text editor.
This is a very basic config with only minimal changes. Things to note are the RADIANS() function calls in the sql query and that we store the radians as float attributes. Anything you want to filter on in Sphinx needs to be registered as an attribute. This config should be pretty straight forward.

Now we can start searchd and index the suburbs

$ searchd
$ sudo indexer --all --rotate

Hopefully you had no errors, or if you did, they were only permissions errors (or you didnt create the /var/data directory.) If no errors, lets move on.
To test the index, lets do a quick search using the Sphinx command line utility.
$ search -i suburbs ‘manly’
This is saying search the index (-i) suburbs with the term ‘manly’ and hopefully you see something like this;


Elwood /var/data: search -i suburbs 'manly'
Sphinx 0.9.9-release (r2117)
Copyright (c) 2001-2009, Andrew Aksyonoff

using config file '/opt/local/etc/sphinx/sphinx.conf'...
index 'suburbs': query 'manly ': returned 6 matches of 6 total in 0.019 sec

displaying matches:
1. document=159, weight=1, postcode=1655, lat=-0.589921, lon=2.640392
id=159
suburb=Manly
state=New South Wales
postcode=1655
lat=-33.80000000000000
lon=151.28330000000000

This row shows you what Sphinx is returning (The first row prefixed with 1) you can see the unique document ID (the first column in our sql statement in the config) the weight (various ways to weight records in Sphinx, wont get into that here) and also the attributes we registered (postcode, lat, lon.) The following rows are provided by MySQL using our second query in the config (sql_query_info = SELECT * FROM suburbs WHERE id=$id.)

So, now that its working, we’ll write some PHP so we can utilise the Sphinx API.

PHP Setup

Set yourself up a virtual host or put your code somewhere where you can access it, you can use php over the command line if you’d like.
In the Sphinx archive (you may need to obtain this from the Sphinx site if your package management doesn’t provide it. Using Macports, the archive was in “/opt/local/var/macports/distfiles/sphinx/”.) extract the archive and go to sphinx-0.9.9/api. You’ll need to copy the sphinxapi.php file into the directory where your test app is.
NOTE: There are other Sphinx API’s available including a pear one (http://pear.php.net/sphinx.)

The first few results look like this;


[186] => Array
(
[weight] => 1
[attrs] => Array
(
[postcode] => 2000
[lat] => -0.591083705425
[lon] => 2.63908100128
[@geodist] => 1.90258026123
)
)
[188] => Array
(
[weight] => 1
[attrs] => Array
(
[postcode] => 2000
[lat] => -0.591083705425
[lon] => 2.63908100128
[@geodist] => 1.90258026123
)
)
[189] => Array
(
[weight] => 1
[attrs] => Array
(
[postcode] => 2000
[lat] => -0.591083705425
[lon] => 2.63908100128
[@geodist] => 1.90258026123
)
)

And thats it. Once you have set Sphinx up and write your indexes, it can be seriously powerful and can provide many great searching possibilities for your application.

This has been a very shallow dive into Sphinx and was only intended as a simple showcase of how easy it is to have Sphinx do great things.

Will be happy to answer any questions (that I can) in the comments.

  • Many thanks to what the article had to share, but I have seen some of your questions. Can I ask you a few questions is no, please give me your email address.

  • Him Eng

    Yeah, Thank so much for you share useful article for us.
    I would like to ask you a question. I want to get id and suburb when I run PHP. Could you tell me how to do that?

  • adrian

    nice post very imformative. though the link to the sql file is broken

  • nx8

    I am glad to catch idea from your article. It has
    information I have been searching for a long time. Thanks so much.

  • ViperB

    Also - $sphinx->SetFilterFloatRange('@geodist',floatval(0),floatval(10000));
    Else it gives an assert warning. Other than that, great article!

blog comments powered by Disqus