Archive for December, 2012

Monthly Donations: LibreOffice and The Pegasus Foundation

Wednesday, December 26th, 2012

It’s the jolly festive time of year and I’m mostly spending time with my family and consuming lots of pierogi, but at this very moment I’m taking a break from that to take care of the monthly donations.

This time I’m donating to LibreOffice, the free and open source office suite. It’s the Phoenix that arose from the ashes of OpenOffice.org, when the latter fell prey to the behemoth Oracle, and it’s yet another open source application that I use almost every day. The Document Foundation, which is the organization behind LibreOffice, is currently collecting funds for their 2013 budget, and my little contribution is going to support that.

My second donation is to The Pegasus Foundation, based in Warsaw, dedicated to saving horses bound for slaughter. The Foundation is in operation since 2002 and has rescued more than 170 horses, giving them a new life in children hippotherapy centers or allowing them to live out their remaining years in a safe place.

Happy Birthday Perl

Tuesday, December 18th, 2012

Happy 25th Birthday, Perl! I made you a card:

perl -C -e'print map+(v9635 x8^unpack B8).(v10)[++$n%9],(unpack
    u,"MK__^=^WW_S_WK)FNO\$3F5U\$7BJJ.=>U51S-WK)GNM>UF=W%[K[N>?___SW__")=~/./g'

Building a Search Web App with Dancer and Sphinx

Friday, December 14th, 2012

In this article, we’ll develop a basic search application using Dancer and Sphinx. Sphinx is an open source search engine that’s fairly easy to use, but powerful enough to be deployed in high-traffic sites, such as Craigslist and Dailymotion.

In keeping with this year’s Dancer Advent Calendar trend, the example app will be built on Dancer 2, but it should work just as well with Dancer 1.

Alright, let’s get to work.

The Data

Our web application will be used to search through documents stored in a MySQL database. We’ll use a simple table with the following structure:

CREATE TABLE documents (
    id int NOT NULL AUTO_INCREMENT,
    title varchar(200) NOT NULL,
    contents_text text NOT NULL,
    contents_html text NOT NULL,
    PRIMARY KEY (id)
);

Each document has an unique ID, a title, and contents, stored as both plain text and as HTML. We need the two formats for different purposes — HTML will be used to display the document in the browser, while plain text will be fed to the indexing mechanism of the search engine (because we do not want to index the HTML tags, obviously).

We can populate the database with any kind of document data — for my test version, I used a simple script to fill the database with POD documentation extracted from Dancer distribution. The script is included at the end of this article, in case you’d like to use it yourself.

Installation and Configuration of Sphinx

Sphinx can be installed pretty easily, using one of the pre-compiled .rpm or .deb packages, or the source tarball. These are available at the download page at SphinxSearch.com — grab the one that suits you and follow the installation instructions.

When Sphinx is installed, it needs to be configured before we can play with it. Its main configuration file is usually located at /etc/sphinx/sphinx.conf. For our purposes, a very basic setup will do — we’ll put the following in the sphinx.conf file:

source documents
{
    type        = mysql
    sql_host    = localhost
    sql_user    = user
    sql_pass    = hunter1
    sql_db      = docs

    sql_query   = \
        SELECT id, title, contents_text FROM documents
}

index test
{
    source          = documents
    charset_type    = utf-8
    path            = /usr/local/sphinx/data/test
}

This defines one source, which is what Sphinx uses to gather data, and one index, which will be created by processing the collected data and will then be queried when we perform the searches. In our case, the source is the documents database that we just created. Thesql_query directive defines the SELECT query that Sphinx will use to pull the data, and it includes all the fields from the documents table, except contents_html — like we said, HTML is not supposed to be indexed.

That’s all that we need to start using Sphinx. After we make sure the searchd daemon is running, we can proceed with indexing the data. We call indexer with the name of the index:

$ indexer test

It should spit out some information about the indexing operation, and when it’s done, we can do our first search:

$ search "plugin"

index 'test': query 'plugin ': returned 8 matches of 8 total in 0.002 sec

displaying matches:
1. document=19, weight=2713
2. document=44, weight=2694
3. document=20, weight=1713
4. document=2, weight=1672
5. document=1, weight=1640
6. document=13, weight=1640
7. document=27, weight=1601
8. document=28, weight=1601

Apparently, there are 8 documents in the Dancer documentation with the word plugin, and the one with the ID of 19 is the highest ranking result. Let’s see which document that is:

mysql> SELECT title FROM documents WHERE id = 19;
+----------------------------------------------------+
| title                                              |
+----------------------------------------------------+
| Dancer::Plugin - helper for writing Dancer plugins |
+----------------------------------------------------+

It’s the documentation for Dancer::Plugin, and it makes total sense that this is the first result for the word plugin. Sphinx setup is thus ready and we can get to the web application part of our little project.

(more…)

  • Archives

  • Categories

  • Meta

  • Latest Tweets


    Warning: Illegal string offset 'last_access' in /usr/local/www/odyniec.net/public/blog/wp-content/plugins/twitget/twitget.php on line 334

    Warning: Illegal string offset 'time_limit' in /usr/local/www/odyniec.net/public/blog/wp-content/plugins/twitget/twitget.php on line 334

    Warning: Illegal string offset 'last_access' in /usr/local/www/odyniec.net/public/blog/wp-content/plugins/twitget/twitget.php on line 336

    Warning: Illegal string offset 'twitter_api' in /usr/local/www/odyniec.net/public/blog/wp-content/plugins/twitget/twitget.php on line 234

    Warning: Illegal string offset 'user_token' in /usr/local/www/odyniec.net/public/blog/wp-content/plugins/twitget/twitget.php on line 262

    Warning: Illegal string offset 'user_secret' in /usr/local/www/odyniec.net/public/blog/wp-content/plugins/twitget/twitget.php on line 263

    Warning: Illegal string offset 'consumer_key' in /usr/local/www/odyniec.net/public/blog/wp-content/plugins/twitget/twitget.php on line 264

    Warning: Illegal string offset 'consumer_secret' in /usr/local/www/odyniec.net/public/blog/wp-content/plugins/twitget/twitget.php on line 265

    Warning: Illegal string offset 'twitter_username' in /usr/local/www/odyniec.net/public/blog/wp-content/plugins/twitget/twitget.php on line 270

    Warning: Illegal string offset 'show_retweets' in /usr/local/www/odyniec.net/public/blog/wp-content/plugins/twitget/twitget.php on line 272

    Warning: Illegal string offset 'exclude_replies' in /usr/local/www/odyniec.net/public/blog/wp-content/plugins/twitget/twitget.php on line 275

    Warning: Illegal string offset 'twitter_data' in /usr/local/www/odyniec.net/public/blog/wp-content/plugins/twitget/twitget.php on line 282

    Warning: Illegal string offset 'twitter_data' in /usr/local/www/odyniec.net/public/blog/wp-content/plugins/twitget/twitget.php on line 350

    Warning: Illegal string offset 'twitter_data' in /usr/local/www/odyniec.net/public/blog/wp-content/plugins/twitget/twitget.php on line 351
    Twitter outputted an error:
    .
    Warning: Illegal string offset 'time_format' in /usr/local/www/odyniec.net/public/blog/wp-content/plugins/twitget/twitget.php on line 484
  • Follow odyniec on Twitter