Mahout is a set of libraries for running machine learning processes, such as recommendation, clustering and categorisation.
The libraries work against an abstract model that can be anything from a file to a full Hadoop cluster. This means you can start playing around with small data sets in files, a local database, a Hadoop cluster or a custom data store.
After a bit of research, it turned out not to be too complex to call via any JVM language. When you compile and install Mahout, the libraries are installed into your local Maven cache. This makes it very easy to include them into any JVM type project.
To get started, the following steps will setup a Clojure project to work with Mahout:
The process installs the Mahout jars into your local maven repository, making them accessible to lein.
Create a new project
lein new mahoutexample
Add dependencies to project.clj.
:dependencies [ [org.clojure/clojure "1.2.1"] [org.apache.mahout/mahout-core "0.5"] [org.apache.mahout/mahout-math "0.5"] [org.apache.mahout/mahout-utils "0.5"]]
Then load the dependencies into your project:
Add your code
For example, ch02.clj shows calling a recommendation engine.
The time consuming part is adding in the right import statements.
Evaluate from the REPL
This is where it gets fun. Clojure makes it easy to try out different data sets and learning algorithms to rapidly iterate.
[Optional] Add a run task
As an example, I’ve ported the first chapter of the book to Clojure and it is available on github.
Some other articles on using parts of Mahout with Clojure are the opus artificem probat blog: