I recently created Tera-WURFL Explorer to allow people to browse through the WURFL, search for devices and upload images to the WURFL images collection. I originally used MySQL’s FULLTEXT index to let people search for devices, but quickly realized that it did not suit my needs. The main problem was that it does not index words smaller than what is specified in my.cnf (ft_min_word_len), and if you want to change it, you need to change it server-wide. This was not a good option for a large virtual host setup since it would affect all the FULLTEXT indices on the server; also, if you do change it, you need to reindex every FULLTEXT column in every database to prevent data corruption.
I did some research on search engines and eventually settled on Sphinx - mainly because it has a cool name, but also because there are some big-name success stories from companies like Craigslist who switched to it and never looked back.
Here’s how I installed it on Ubuntu 9.10:
First, you need to install the dependencies and download sphinx, then extract the archive and make it:
apt-get install g++ libmysql++-dev cd /tmp wget http://www.sphinxsearch.com/downloads/sphinx-0.9.9.tar.gz tar -zxvf sphinx-0.9.9.tar.gz cd sphinx-0.9.9 ./configure --prefix=/usr/local/sphinx make make install
Now all the sphinx-related files are in
Next, I created a system user and group called
adduser --system --group sphinx
Note: on RedHat-like systems, you can use
adduser -r -n sphinx
Now, I created an init script for it. I would recommend downloading my init.d script.
wget http://www.tera-wurfl.com/blog_assets/searchd mv searchd /etc/init.d/ chown root:root /etc/init.d/searchd chmod 755 /etc/init.d/searchd
This script adds the following functionality:
# Start the Sphinx service service searchd start # Stop Sphinx service searchd stop # Check if Sphinx is running service searchd status # Reindex every Sphinx index (works while started or stopped) service searchd reindex
Now we’ll add sphinx to the startup and use the
config option to setup sphinx to run as the
update-rc.d searchd defaults service searchd config
Note: on RedHat-like systems you can use
chkconfig --add searchd
Lastly, you need to configure sphinx. I would copy the default config file and edit that one:
cp /usr/local/sphinx/sphinx.conf.dist /usr/local/sphinx/sphinx.conf
You can follow along with the comments in the file, or jump on the documentation site and figure out what all the settings do.
Now everything is setup and should work properly!
If you followed my directions and put the tarball in
/tmp, the sphinx PHP and Python APIs and some examples are in
/tmp/sphinx-0.9.9/api/. You should put a copy of the PHP or Python API somewhere else on the system so you can use it from your applications.
To see my use of the Sphinx search engine, take a look at this site: http://www.tera-wurfl.com/explore/browse/