Archive for May, 2008

Favorite Things

Snowboarding, rally driving and Primus. Too good!

HaskellDB Performance

Things did not go well with HaskellDB when I ran my query across a large data set. MySQL as a query optimizer, sucked. Postgres seems to be doing better.

I started with a mysql database containing 11 years of day data from the ASX. This includes the closing price (and some other info) of all stocks traded on the Australian Stock Exchange (ASX) since 1997.

The tables are organized with unique keys and, based on some testing, include appropriate indexes.Things still ran slowly.

I tried tuning mysql. Then, based on the slow queries logged, tried running them by hand to see if I was missing any indexes. No such luck. The query just didn’t perform.

I simplified the query and ran it again (see below). Now it ran fast, but doesn’t help with the generated query. Looks like MySQL’s optimizer doesn’t optimize enough.

MySQL timing:

  • Query 1: 6.25s (anywhere from 24s - 6s)
  • Query 2: 0.15s

Next step. Install Postgres.

Postgres timing:

  • Query 1: 0.03s
  • Query 2: 0.03s

To avoid the cache impacting the results, both servers were stopped and re-started between each query run. With caching, Postgres gets down to 0.2s. MySQL will still have 6s/0.3s. Tests were run on a MacBook with 2Gb of memory.

Both queries were run with the command line tool of the respective database engines, mysql and psql. The end_of_day table has 4,749,025 records and the stock table has 25,863 records.

Query 1 – HaskellDB generated


SELECT closing_price2 as closing_price,
       trade_date2 as trade_date
FROM (SELECT stock_id as stock_id2,
             trade_date as trade_date2,
             closing_price as closing_price2
      FROM end_of_day as T1) as T1,
     (SELECT stock_id as stock_id1,
             ticker as ticker1
      FROM stock as T1) as T2
WHERE stock_id1 = stock_id2
  AND ticker1 IN ('FXJ','NWS','WAN')
  AND trade_date2 = '2008-02-14 00:00:00';

Query 2 – Hand written


SELECT e.closing_price as closing_price,
       e.trade_date as trade_date
FROM
    stock s, end_of_day e
WHERE
    s.stock_id = e.stock_id
    AND s.ticker IN ('FXJ','NWS','WAN')
    AND e.trade_date = '2008-02-14 00:00:00';

So don’t use HaskellDB and MySQL if you are doing joins across large tables. Use Postgres instead.

The Haskell query:


indexStock db tkrs stockDate = do
  let q = do
      s < - table S.stock
      e <- table E.end_of_day
      restrict (s!S.stock_id .==. e!E.stock_id .&&.
              s!S.ticker `_in` tickers .&&.
              e!E.trade_date .==. constant stockDate)
      r <- project (closing_price << e!E.closing_price #
                    trade_date << e!E.trade_date)
      return r
      where
        tickers = map (constant) tkrs
  rs <- query db q
  return $ sum $ map (\r -> read (r!closing_price)) rs

Rocks have their place

One day, a Tibetan Lama was speaking to a group of monks and, to make a point, pulled out a large jar, set it on the table in front of him, produced a few fist-sized rocks, and placed them into the jar.

When no more rocks would fit inside, he asked, ‘Is this jar full?’. Everyone said, ‘Yes’. He reached under the table and pulled out a bucket of gravel, poured some in and shook the jar, the gravel worked between the rocks.

Again he asked: ‘Is this jar full?’ The monks were catching on. ‘Probably not,’ one answered. ‘Good!’ he replied and reached under the table and brought out a bucket of sand. He dumped the sand into the jar until it filled all the crevices.

Once more he asked: ‘Is this jar full?’ ‘No!’ the monks shouted. ‘Good!’ He said and grabbed a pitcher of water and poured until the jar was filled to the brim. Then asked, ‘What is the point of this illustration?’

One young monk responded, ‘The point is, no matter how full your day you can always fit some more things in.’ ‘No,’ the speaker replied, ‘the point is that if you don’t put the big rocks in first, you’ll never get them all in at all. What are the priorities in your life?’

– from the art of looking sideways, p141.

Bonus quote …

Computers are useless. They can only give you answers.

Pablo Picasso

HaskellDB Basics

HaskellDB is a domain specific embedded language for specifying database queries and operations.

The upside of this is the full Haskell language for writing queries and the Haskell compiler and type checker for validating database code. The (big) downside is the error messages are horribly cryptic; although once is compiles it is likely to be correct.

The original version of HaskellDB was re-written to be more portable and usable in more situations. Unfortunately, the documentation is out of step with the current code.
Installation is particularly tricky.

The following is an example query:


stockInfo :: Database -> String
             -> IO [(CompanyName, TickerSymbol)]
stockInfo db val = do
  let q = do
      s < - table stock
      restrict (fromNull (constant "") (s!company_name)
                `like` constant ("%"++val++"%"))
      r <- project (company_name << s!company_name #
                    ticker << s!ticker)
      return r
  rs <- query db q
  return $ map (\r ->
    (maybe "" id (r!company_name), r!ticker)) rs

To run the query, a database connection detail is passed, along with the search string. In this case, find all company names and stock tickers where the company name contains ‘news’.


vals :: IO [(CompanyName, TickerSymbol)]
vals = withDB $ \db -> stockInfo db "news"

The result is:


[("APN NEWS & MEDIA LIMITED","APN"),
 ("WEST AUSTRALIAN NEWSPAPERS HOLDINGS LIMITED","WAN"),
 ("NEWS CORPORATION","NWS"),
 ("NEWSAT LIMITED","NWT")]

The database connection can be abstracted into a separate module:


module StockConnection where

import Database.HaskellDB
import Database.HaskellDB.HSQL.MySQL

opts = MySQLOptions {
        server="localhost",
        db="stocks",
        uid="stock_user",
        pwd="stock_pass"}

withDB :: (Database -> IO a) -> IO a
withDB = mysqlConnect opts

If you want more information, try:

This approach to database programming is included in Microsoft’s .NET platform, called LINQ. Eric Meijer is the link between the two.

Innovation

From the delightful Jessica Hagy:

Where would you rather be?

Programming Fonts Revisited

Fixed width fonts don’t play nice with non-Western character sets.

Recently was editing some code in Vim that included strings in Vietnamese. Ignoring the bit where code and strings shouldn’t be in the same file, the font should at least display properly.

Not good.

Bitstream Vera Sans Mono, my default fixed width font for everything, did not work.
Playing around a bit, I found Courier New worked perfectly for English and Vietnamese.

Not wanting to have to swap fonts based on language, I tried out the other language I use a bit for work, Mandarin Chinese.

Nope.

So I’m now in the unfortunate situation of needing to pick a font for my programming editor based on the language of the strings mixed in with the code. Not cool.

My current fonts (on Windows) now look like:

  • Bitstream Vera Sans Mono – English
  • Courier New – English + Vietnamese
  • NSimSun – English + Mandarin

Any suggestions on a good fixed width font with characters for other languages?