Calling Mahout from Clojure

Mahout is a set of libraries for running machine learning processes, such as recommendation, clustering and categorisation.

The libraries work against an abstract model that can be anything from a file to a full Hadoop cluster. This means you can start playing around with small data sets in files, a local database, a Hadoop cluster or a custom data store.

After a bit of research, it turned out not to be too complex to call via any JVM language. When you compile and install Mahout, the libraries are installed into your local Maven cache. This makes it very easy to include them into any JVM type project.

To help work through the various features, I’m reading the early access edition of Mahout in Action. I am trying out the examples in Clojure as I read through.

To get started, the following steps will setup a Clojure project to work with Mahout:

  1. Build and install Mahout

    The process installs the Mahout jars into your local maven repository, making them accessible to lein.

  2. Create a new project

    lein new mahoutexample
  3. Add dependencies to project.clj.

    For example:

    :dependencies [
      [org.clojure/clojure "1.2.1"]
      [org.apache.mahout/mahout-core "0.5"]
      [org.apache.mahout/mahout-math "0.5"]
      [org.apache.mahout/mahout-utils "0.5"]]

    Then load the dependencies into your project:

    lein deps
  4. Add your code

    For example, ch02.clj shows calling a recommendation engine.

    The time consuming part is adding in the right import statements.

  5. Evaluate from the REPL

    This is where it gets fun. Clojure makes it easy to try out different data sets and learning algorithms to rapidly iterate.

  6. [Optional] Add a run task

    In project.clj, you can assign a class as the main class. Lein will attempt to find a function called “-main” and call that.

    lein run

As an example, I’ve ported the first chapter of the book to Clojure and it is available on github.

Some other articles on using parts of Mahout with Clojure are the opus artificem probat blog:

Book Reviews

I’ve read a few books this year, thanks to Kindle for iPhone & Mac as well as a few actual printed books either found or borrowed from work’s extensive library (one of my favourite parts of our Stockholm office).

These are my highlights.

Paradigms of AI Programming

After re-discovering Lisp style programming with Clojure, this book came high on my to-read list and I was not disappointed.

The book covers a wide range of classic AI problems, and walks through solutions in Common Lisp. This makes the book partly about solving AI problems, and partly an advanced book on Lisp programming.

I loved it for both aspects.

Be warned however, Peter Norvig is a master of his tools and reading along you can be seduced into how obvious each next step is. It is worth trying to implement some of these algorithms yourself, or even try solving Sudoku.

This book won’t make you an expert on the latest AI techniques, but it will add to the class of problems you can solve, and make you a better programmer.

Into the Plex

A great insight into the history and workings of Google. There are lots of examples of how Google goes about solving problems and lots of behind the scenes information.

One section I found interesting was on staff allocation:

Around 2005, Google determined a simple formula to distribute its engineering talent: 70-20-10. Seventy percent of its engineers would work in either search or ads. Twenty percent would focus on key products such as applications. The remaining 10 percent would work on wild cards, which often emerged from the 20 percent time where people could choose their own projects.

Animal Spirits

After finishing Robert Schiller’s course on finance via iTunes University, I picked up his book Animal Spirits.

This is an approachable book that doesn’t assume too much finance background, and covers how behavioural finance fits into a macro economic framework. Or rather, how human emotions messes up existing approaches to making decisions for the broad economy.

If you’re interested in how government should be setting policy, or how human psychology impacts economics, this is well worth a read.

Toyota Production System

The Toyota Way

I started by reading Toyota Production System, as the Toyota Way was already loaned out. This turned out to be fortuitous.

The Toyota Production System is written by Taiichi Ohno, who was responsible for putting in place most of the practices that define their production system. The book is translated from the Japanese, and reads like the writings of a Zen master. Everything is very clear and concise with no superfluous words or explanations.

I found it enlightening. However, it is worth pausing often throughout the book to ponder the implications of what has been said, and was it assumes. This is a book to read to understand the fundamentals and the attitude required to implement a similar approach to your business.

The Toyota Way is an easier read, and based on the time Jeffrey Liker has spent studying Toyota. He had access to a range of plants in the US and interviewed many of the key Toyota people. His book is written as a series of stories that act as a set of case studies for how the Toyota Way was applied to a range of engineering challenges.

I particularly loved the chapters explaining how Toyota created the Lexus brand and the Prius.

To get a good understanding of how Toyota works, it is worth reading both books. I feel like it is clearer on how Kanban fits into a wider corporate picture, after having previously read David Anderson’s book.

The Time Traveller’s Wife

This was left by a previous occupant of the apartment we rented for the summer in Paris. It was a great holiday read.

Suspend disbelief for a while, and just let the story flow. The characters are well thought out and carry the story well.

Prolog in Clojure

After reading Peter Novig’s classic, Paradigms of AI Programming, I decided to try porting his implementation of Prolog in Lisp to Clojure.

The initial results are available on GitHub.

The interpreter allows solving of simple logic problems:

(<- (witch ?x) (burns ?x) (female ?x))
(<- (burns ?x) (wooden ?x))
(<- (wooden ?x) (floats ?x))
(<- (floats ?x) (sameweight Duck ?x))

(<- (female Girl))
(<- (sameweight Duck Girl))

(?- (witch Girl))

Porting this turned out to be an interesting challenge. There were a few Lisp functions that weren’t available in Clojure, along with some subtle changes in idiom.

This gave me an opportunity to dive further into Clojure and think about how to implement the algorithms and data structures in a way that made sense, yet was still close to the original.

There is still much to do, as the book further develops the interpreter with user interaction and then refactors it into a compiler. Should keep me busy for a while longer!

Note: Please don’t consider this production code. If you want to look deeper at logic programming, consider either a real Prolog implementation, the core.logic library in Clojure or check out Mercury.

Stockholm Tourist Sights

This is my current list of things to do and see if you happen to be visiting Stockholm for a weekend or a bit longer. The list is based on things we’ve seen, or that friends have found.

To get up-to-date tips, contact me directly or via Twitter (@gmwils).

Fjäderholmarna, Stockholm Archipeligo

Sights

  • Vasa Museum — an old ship that has been preserved due to failure to launch. Interesting museum on maritime history. There are sometimes queues to get in, so may be worth a morning visit.
  • Nordic Museum — provides a history of the nordic region, and is just near the Vasa Museum.
  • Skansen — an open air museum / zoo / cultural centre. This whole park is quite large, so it is worth spending a full day, as there is lots to do. A good chance to see reindeer and moose in Stockholm.
  • Fotographiska — a photography museum / gallery on Södermalm. Three floors of photography, with a nice café with a view of Stockholm on the top floor.
  • Modern Art Museum — meant to be quite good, but still on my todo list.
  • Gamla Stan - well worth half a day wandering around, see the palace, explore the streets. Almost everything is over priced, but then it is the main tourist district. It also has some of the better restaurants in town.
  • Södermalm — itself is one of the nicer areas of Stockholm. The shore of the island on the south is a popular place for hanging out. There are lots of nice restaurants near Slussen on the northern side too. The cliffs above Slussen provide a good vantage point to see Galma Stan (old town) and the city in general.
  • Östermalm — a nice area in between the city centre and Djurgården island (where the Vasa & Skansen are). Lots of nice cafés to pass the day in.
  • Boat trips — there are ferries that shuttle between the various islands and make a nice way to get from Skeppholmen to Djurgården.
  • Stockholm Archipelago — this city is all about islands, and there are a number of boat cruises that leave for the wider Archipelago. We visited Fjäderholmarna, which is small and close and made for a nice place to wander around. The article has some info on boats, they also depart from near Östermalm.

Restaurants

  • Rival Hotel — the restaurant here is worth a visit, a short walk from Mariatorget station
  • Gondolen — quite a cool location for a restaurant, suspended underneath a bridge near Slussen station. We went for drinks at their bar, which gives a good view of Stockholm. If you want a seat at the restaurant, it is worth booking ahead.
  • Vete-Katten — Downtown is mostly shops, however if you’re wandering around and looking for somewhere for lunch, Vete-Katten has a great range of traditional Swedish cakes.
  • Den Gyldene Freden — modern take on tradition Swedish food. We found the service to be friendly.
  • Sturehof — interesting restaurant in Östermalm.
  • Pelikan — Swedish home style food.
  • Oliver Twist (House of Ale) — cozy pub in the midst of Södermalm with a good range of local beers and hearty pub food. Bookings recommended, although we’ve been lucky before.
  • Djuret — means “the animal” in Swedish, and they only serve one animal at a time. Interesting concept, and has been recommended to me.

For suggestions of coffee places on Södermalm, check out this article.

Transport

To get around, it is worth getting tickets for the underground train line (Tunnelbana, abbreviated to large signs with a T). It runs pretty much everywhere and will make it easier to see the city.

To transfer from the airport, taxi works out cost effective if there are more than one person travelling. Otherwise, the train system is fast and easy to navigate.

Swedish News

For news in English about what is happening in Sweden, try The Local.

For Swedish, DN.se and for Swedish restaurant reviews try På Stan (with Google translate to help).

Language Basics

Almost everyone here speaks perfect English as their second language, so knowing some Swedish isn’t important. That said, some pleasantries always helps.

These few phrases will get you far:

  • Hej (pron. “Hey!) — hello, or goodbye, general greeting.
  • Hej då (pron. “Hey door”) — goodbye.
  • Tack — thanks, and please. Pointing at something and saying tack, or saying tack after receiving something is common. You’ll hear this word a lot.
  • Tack så mycket (pron. “Tack so mickey”) — thank you very much, often used at the end of transactions.

CS229 Machine Learning at Stanford (iTunes U)

Andrew Ng’s course at Stanford on Machine Learning provides a great overview of the different types and applications of current machine learning algorithms. The course is available for free either on YouTube or subscribe in iTunes University.

Be warned, this is a course on theory, so is taught through mathematics rather than programming. You will develop a deeper understanding, but it may hurt your brain.

The section notes provide some assistance in reviewing the key concepts that you’ll need.

I’ve watched the course, but I keep forgetting which lesson covered which topic. To remember, I’ve put together a brief summary below:

iTunes University is a great way to keep up to date with a range of topics. Worth checking out to see what else you can find.