Archive for Haskell

HaskellDB Performance

Things did not go well with HaskellDB when I ran my query across a large data set. MySQL as a query optimizer, sucked. Postgres seems to be doing better.

I started with a mysql database containing 11 years of day data from the ASX. This includes the closing price (and some other info) of all stocks traded on the Australian Stock Exchange (ASX) since 1997.

The tables are organized with unique keys and, based on some testing, include appropriate indexes.Things still ran slowly.

I tried tuning mysql. Then, based on the slow queries logged, tried running them by hand to see if I was missing any indexes. No such luck. The query just didn’t perform.

I simplified the query and ran it again (see below). Now it ran fast, but doesn’t help with the generated query. Looks like MySQL’s optimizer doesn’t optimize enough.

MySQL timing:

  • Query 1: 6.25s (anywhere from 24s - 6s)
  • Query 2: 0.15s

Next step. Install Postgres.

Postgres timing:

  • Query 1: 0.03s
  • Query 2: 0.03s

To avoid the cache impacting the results, both servers were stopped and re-started between each query run. With caching, Postgres gets down to 0.2s. MySQL will still have 6s/0.3s. Tests were run on a MacBook with 2Gb of memory.

Both queries were run with the command line tool of the respective database engines, mysql and psql. The end_of_day table has 4,749,025 records and the stock table has 25,863 records.

Query 1 – HaskellDB generated


SELECT closing_price2 as closing_price,
       trade_date2 as trade_date
FROM (SELECT stock_id as stock_id2,
             trade_date as trade_date2,
             closing_price as closing_price2
      FROM end_of_day as T1) as T1,
     (SELECT stock_id as stock_id1,
             ticker as ticker1
      FROM stock as T1) as T2
WHERE stock_id1 = stock_id2
  AND ticker1 IN ('FXJ','NWS','WAN')
  AND trade_date2 = '2008-02-14 00:00:00';

Query 2 – Hand written


SELECT e.closing_price as closing_price,
       e.trade_date as trade_date
FROM
    stock s, end_of_day e
WHERE
    s.stock_id = e.stock_id
    AND s.ticker IN ('FXJ','NWS','WAN')
    AND e.trade_date = '2008-02-14 00:00:00';

So don’t use HaskellDB and MySQL if you are doing joins across large tables. Use Postgres instead.

The Haskell query:


indexStock db tkrs stockDate = do
  let q = do
      s < - table S.stock
      e <- table E.end_of_day
      restrict (s!S.stock_id .==. e!E.stock_id .&&.
              s!S.ticker `_in` tickers .&&.
              e!E.trade_date .==. constant stockDate)
      r <- project (closing_price << e!E.closing_price #
                    trade_date << e!E.trade_date)
      return r
      where
        tickers = map (constant) tkrs
  rs <- query db q
  return $ sum $ map (\r -> read (r!closing_price)) rs

HaskellDB Basics

HaskellDB is a domain specific embedded language for specifying database queries and operations.

The upside of this is the full Haskell language for writing queries and the Haskell compiler and type checker for validating database code. The (big) downside is the error messages are horribly cryptic; although once is compiles it is likely to be correct.

The original version of HaskellDB was re-written to be more portable and usable in more situations. Unfortunately, the documentation is out of step with the current code.
Installation is particularly tricky.

The following is an example query:


stockInfo :: Database -> String
             -> IO [(CompanyName, TickerSymbol)]
stockInfo db val = do
  let q = do
      s < - table stock
      restrict (fromNull (constant "") (s!company_name)
                `like` constant ("%"++val++"%"))
      r <- project (company_name << s!company_name #
                    ticker << s!ticker)
      return r
  rs <- query db q
  return $ map (\r ->
    (maybe "" id (r!company_name), r!ticker)) rs

To run the query, a database connection detail is passed, along with the search string. In this case, find all company names and stock tickers where the company name contains ‘news’.


vals :: IO [(CompanyName, TickerSymbol)]
vals = withDB $ \db -> stockInfo db "news"

The result is:


[("APN NEWS & MEDIA LIMITED","APN"),
 ("WEST AUSTRALIAN NEWSPAPERS HOLDINGS LIMITED","WAN"),
 ("NEWS CORPORATION","NWS"),
 ("NEWSAT LIMITED","NWT")]

The database connection can be abstracted into a separate module:


module StockConnection where

import Database.HaskellDB
import Database.HaskellDB.HSQL.MySQL

opts = MySQLOptions {
        server="localhost",
        db="stocks",
        uid="stock_user",
        pwd="stock_pass"}

withDB :: (Database -> IO a) -> IO a
withDB = mysqlConnect opts

If you want more information, try:

This approach to database programming is included in Microsoft’s .NET platform, called LINQ. Eric Meijer is the link between the two.

Genetic Algorithms in Haskell

I started with the idea of playing around with Genetic Algorithms (GAs) in Haskell.The links I found were to two research papers, not two implementations. Each research paper takes a different approach to developing a GA library, and includes some code as an example.

The two approaches differ mainly in the representation of the genome for the GA. The trade-offs are outlined below.

An Array of Bits

The genome is represented by a bit array :: (Fitness, [Bool]). This is the approach taken by the authors of “A Genetic Algorithm Framework Using Haskell”.

Advantages

  • easy expression of GA algorithms

Disadvantages

  • challenging to encode/decode problem space to a bit array :: [Bool]

My memory of this from Uni is that we spent lots of time simply constructing an appropriate representation in a bit array (in C++), and then building useful fitness functions. The GA framework then did all the heavy lifting as a black box.

I’d really hope there was a more elegant way of representing the Genome using general types. In nature, the genome is essentially a giant array of 2 bit values. GATTACA. However, it takes the entire planet to calculate the fitness function.

Type Classes

The genome is represented by a generic type. The basic operators are then expressed using Haskell type classes. This is the approach taken by the authors of “Genetic algorithms in Haskell with polytypic programming

Advantages

  • logical mapping of problem to genome

Disadvantages

  • requires polytypic language extensions to generalise algorithms
  • polytypic extension requires additional compilation step
  • current polytypic extension limits haskell features used

The polytypic solution is quite elegant in allowing functions such as mutate to be written once and apply to many types. That said, it does introduce some restrictions.

The challenge is to abstract the combinators of Genomes without requiring polytypic extensions. Otherwise, consumers of the library will need to write appropriate library functions specific to their types. This may be more time consuming than mapping the problem space to an array of bits.

My theory is that a well written GA library should be possible that caters to the majority cases for data structures without needing to resort to polytypic extensions. This should allow the consumer of the library to utilize the full power of Haskell without spending time writing too much GA specific code.

The History of Haskell

The history of Haskell has recently been published. It is a rather long read (55 pages), but a rewarding one. It covers how Haskell formed, decisions around why language constructs were introduced and surveys where it is at today.

Haskell was the second functional programming language I learnt. Miranda, my first, was the language used for first year programming at Melbourne Uni. Reading back on how Haskell split and evolved from Miranda filling in a lot of gaps for me.

There is also discussion on how Haskell has matured, stats on community growth, and case studies of commercial use. There is even a glimpse of the future:

One day, Haskell will be no more than a distant memory. But we believe that, when that day comes, the ideas and techniques that it nurtured will prove to have been of enduring value through their influence on languages of the future.

Haskell Quotes

Fear leads to uncertainty. Uncertainty leads to doubt. Doubt leads to theorem proving.

- monochrom in the Haskell Weekly News.

The Haskell Weekly News is my main source of new Haskell information, but the bit I love about it is the quotes section. Comedic gold, if you spend time thinking about type systems.

Remember, kids: if you program in a language with side effects, the terrorists win.

- Tommah in the Haskell Weekly News.