pseudofish

powered by text files

haskell

HaskellDB Performance

Things did not go well with HaskellDB when I ran my query across a large data set. MySQL as a query optimizer, sucked. Postgres seems to be doing better.

I started with a mysql database containing 11 years of day data from the ASX. This includes the closing price (and some other info) of all stocks traded on the Australian Stock Exchange (ASX) since 1997.

The tables are organized with unique keys and, based on some testing, include appropriate indexes.Things still ran slowly.

I tried tuning mysql. Then, based on the slow queries logged, tried running them by hand to see if I was missing any indexes. No such luck. The query just didn’t perform.

I simplified the query and ran it again (see below). Now it ran fast, but doesn’t help with the generated query. Looks like MySQL’s optimizer doesn’t optimize enough.

MySQL timing:

Next step. Install Postgres.

Postgres timing:

To avoid the cache impacting the results, both servers were stopped and re-started between each query run. With caching, Postgres gets down to 0.2s. MySQL will still have 6s/0.3s. Tests were run on a MacBook with 2Gb of memory.

Both queries were run with the command line tool of the respective database engines, mysql and psql. The end_of_day table has 4,749,025 records and the stock table has 25,863 records.

Query 1 – HaskellDB generated

SELECT closing_price2 as closing_price,
       trade_date2 as trade_date
FROM (SELECT stock_id as stock_id2,
             trade_date as trade_date2,
             closing_price as closing_price2
      FROM end_of_day as T1) as T1,
     (SELECT stock_id as stock_id1,
             ticker as ticker1
      FROM stock as T1) as T2
WHERE stock_id1 = stock_id2
  AND ticker1 IN ('FXJ','NWS','WAN')
  AND trade_date2 = '2008-02-14 00:00:00';

Query 2 – Hand written

SELECT e.closing_price as closing_price,
       e.trade_date as trade_date
FROM
    stock s, end_of_day e
WHERE
    s.stock_id = e.stock_id
    AND s.ticker IN ('FXJ','NWS','WAN')
    AND e.trade_date = '2008-02-14 00:00:00';

So don’t use HaskellDB and MySQL if you are doing joins across large tables. Use Postgres instead.

The Haskell query:

indexStock db tkrs stockDate = do
  let q = do
      s < - table S.stock
      e <- table E.end_of_day
      restrict (s!S.stock_id .==. e!E.stock_id .&&.
                s!S.ticker `_in` tickers .&&.
                e!E.trade_date .==. constant stockDate)
      r <- project (closing_price << e!E.closing_price #
                    trade_date << e!E.trade_date)
      return r
      where
        tickers = map (constant) tkrs
rs <- query db q
return $ sum $ map (\\r -> read (r!closing_price)) rs