Favorite Things
Snowboarding, rally driving and Primus. Too good!
Things did not go well with HaskellDB when I ran my query across a large data set. MySQL as a query optimizer, sucked. Postgres seems to be doing better.
I started with a mysql database containing 11 years of day data from the ASX. This includes the closing price (and some other info) of all stocks traded on the Australian Stock Exchange (ASX) since 1997.
The tables are organized with unique keys and, based on some testing, include appropriate indexes.Things still ran slowly.
I tried tuning mysql. Then, based on the slow queries logged, tried running them by hand to see if I was missing any indexes. No such luck. The query just didn’t perform.
I simplified the query and ran it again (see below). Now it ran fast, but doesn’t help with the generated query. Looks like MySQL’s optimizer doesn’t optimize enough.
MySQL timing:
Next step. Install Postgres.
Postgres timing:
To avoid the cache impacting the results, both servers were stopped and re-started between each query run. With caching, Postgres gets down to 0.2s. MySQL will still have 6s/0.3s. Tests were run on a MacBook with 2Gb of memory.
Both queries were run with the command line tool of the respective database engines, mysql and psql. The end_of_day table has 4,749,025 records and the stock table has 25,863 records.
Query 1 – HaskellDB generated
SELECT closing_price2 as closing_price,
trade_date2 as trade_date
FROM (SELECT stock_id as stock_id2,
trade_date as trade_date2,
closing_price as closing_price2
FROM end_of_day as T1) as T1,
(SELECT stock_id as stock_id1,
ticker as ticker1
FROM stock as T1) as T2
WHERE stock_id1 = stock_id2
AND ticker1 IN ('FXJ','NWS','WAN')
AND trade_date2 = '2008-02-14 00:00:00';
Query 2 – Hand written
SELECT e.closing_price as closing_price,
e.trade_date as trade_date
FROM
stock s, end_of_day e
WHERE
s.stock_id = e.stock_id
AND s.ticker IN ('FXJ','NWS','WAN')
AND e.trade_date = '2008-02-14 00:00:00';
So don’t use HaskellDB and MySQL if you are doing joins across large tables. Use Postgres instead.
The Haskell query:
indexStock db tkrs stockDate = do
let q = do
s < - table S.stock
e <- table E.end_of_day
restrict (s!S.stock_id .==. e!E.stock_id .&&.
s!S.ticker `_in` tickers .&&.
e!E.trade_date .==. constant stockDate)
r <- project (closing_price << e!E.closing_price #
trade_date << e!E.trade_date)
return r
where
tickers = map (constant) tkrs
rs <- query db q
return $ sum $ map (\r -> read (r!closing_price)) rs
One day, a Tibetan Lama was speaking to a group of monks and, to make a point, pulled out a large jar, set it on the table in front of him, produced a few fist-sized rocks, and placed them into the jar.
When no more rocks would fit inside, he asked, ‘Is this jar full?’. Everyone said, ‘Yes’. He reached under the table and pulled out a bucket of gravel, poured some in and shook the jar, the gravel worked between the rocks.
Again he asked: ‘Is this jar full?’ The monks were catching on. ‘Probably not,’ one answered. ‘Good!’ he replied and reached under the table and brought out a bucket of sand. He dumped the sand into the jar until it filled all the crevices.
Once more he asked: ‘Is this jar full?’ ‘No!’ the monks shouted. ‘Good!’ He said and grabbed a pitcher of water and poured until the jar was filled to the brim. Then asked, ‘What is the point of this illustration?’
One young monk responded, ‘The point is, no matter how full your day you can always fit some more things in.’ ‘No,’ the speaker replied, ‘the point is that if you don’t put the big rocks in first, you’ll never get them all in at all. What are the priorities in your life?’
– from the art of looking sideways, p141.
Bonus quote …
Computers are useless. They can only give you answers.
HaskellDB is a domain specific embedded language for specifying database queries and operations.
The upside of this is the full Haskell language for writing queries and the Haskell compiler and type checker for validating database code. The (big) downside is the error messages are horribly cryptic; although once is compiles it is likely to be correct.
The original version of HaskellDB was re-written to be more portable and usable in more situations. Unfortunately, the documentation is out of step with the current code.
Installation is particularly tricky.
The following is an example query:
stockInfo :: Database -> String
-> IO [(CompanyName, TickerSymbol)]
stockInfo db val = do
let q = do
s < - table stock
restrict (fromNull (constant "") (s!company_name)
`like` constant ("%"++val++"%"))
r <- project (company_name << s!company_name #
ticker << s!ticker)
return r
rs <- query db q
return $ map (\r ->
(maybe "" id (r!company_name), r!ticker)) rs
To run the query, a database connection detail is passed, along with the search string. In this case, find all company names and stock tickers where the company name contains ‘news’.
vals :: IO [(CompanyName, TickerSymbol)]
vals = withDB $ \db -> stockInfo db "news"
The result is:
[("APN NEWS & MEDIA LIMITED","APN"),
("WEST AUSTRALIAN NEWSPAPERS HOLDINGS LIMITED","WAN"),
("NEWS CORPORATION","NWS"),
("NEWSAT LIMITED","NWT")]
The database connection can be abstracted into a separate module:
module StockConnection where
import Database.HaskellDB
import Database.HaskellDB.HSQL.MySQL
opts = MySQLOptions {
server="localhost",
db="stocks",
uid="stock_user",
pwd="stock_pass"}
withDB :: (Database -> IO a) -> IO a
withDB = mysqlConnect opts
If you want more information, try:
This approach to database programming is included in Microsoft’s .NET platform, called LINQ. Eric Meijer is the link between the two.