Archive for Technology

Emacs or Vim

After over 12 years of using Vi(m), and a brief fling with TextMate, I started using Emacs as my primary editor.

Emacs

The switch has been very positive. The learning curve has been relatively steep, as my expectations from a text editor are quite high.

Emacs strength (and weakness) is that it is incredibly extensible. Where I’m finding Emacs a win over Vim is that I don’t have to leave Emacs to get things done. With Vim, I tend to use more of a mix of terminal windows and the editor.

I began by using the starter kit to get going with Emacs configuration. This made it quicker to get moving, but added a lot of things I ended up not needing. After getting more comfortable with elisp, I started from scratch and rebuilt my emacs.d folder.

To jump in quickly, I also purchased the tutorial video from Peepcode. This certainly helped as emacs is a mental shift coming from Vim.

The big benefit I have found with Emacs is the extension packages. These can be installed from the ELPA repository, and include a range of different modes.

Some of my favourite modes include:

  • paredit - essential for any lisp, it ensures your brackets match.
  • deft - simplified note taking. (I it sync via Dropbox)
  • magit - comprehensive Git workflow within Emacs
  • markdown-mode - my current default for writing notes, although I’m leaning towards org-mode now.
  • org-mode - highly capable note taking mode, with export options to everything. You can use it to write a book, create slides, or manage your todo list.

On the Mac, I’ve been using Aquamacs for most of my text editing, and used macports to install the command line client.

I have yet to try Emacs on Windows, as I haven’t been using Windows much at all. There is a release available here.

After cleaning up my .emacs.d configuration, I’ve now started using Emacs on Linux servers I use regularly. For temporary servers, I’ll fallback to Vim as Emacs is often not installed.

If you want to improve your emacs skills follow @emacs_knight.

Vim

To get started with Vim, it is worth reading The Vim Learning Curve is a Myth.

Over the years, the popular mythology around vim has become that it’s insanely difficult to learn; a task to be attempted by only those with the thickest of neck-beards. I’ve heard dozens of times from folks who are convinced it will take them months to reach proficiency.

These beliefs are false.

My feeling is that Vim is unrivalled for the simple task of text editing. Even after a day or two of learning, you will be faster.

Where things get a bit more complicated is when you start to realise that text editing isn’t the whole story for a text editor.

Platform support is superb. The first thing I do to a Windows machine is install Vim. It is rare for an application with a Unix heritage to be so comfortable on Windows.

Historically, I hadn’t been a fan of the graphical version of MacVim. This is something that is much better in recent releases. Load times are much improved.

Integration into external tools is where I feel Vim is lacking a bit. With the number of developers migrating from TextMate to Vim, this gap is being rapidly addressed. However, Vim isn’t as extensible as some other editors.

I really noticed this when trying out Clojure programming. If you are dealing with a REPL based environment, Vim has a way to go.

Closing Thoughts

If you aren’t using either of them, it’s not too late to start. You can’t make a bad choice.

If you are already using either Emacs or Vim, enjoy!

Both are great editors that allow you to be incredibly productive at working with text. Try learning a new feature each week. You use your text editor so often that a small improvement is a major payoff.

Calculating earlier dates using a shell script

Mongo has a database size limit in 32 bit mode, so I want to purge out items that are less than a certain date in the past. I decided that it would be easy to write a simple shell script to run the query.

The tricky part was calculating dates in shell. This is what I ended up with:

#!/bin/sh

MONTHS_AGO=3
DATE_AGO_EPOCH=$((`date +%s` - $MONTHS_AGO * 31 * 24 * 3600))
OSTYPE=`uname`
FORMAT='+%Y-%m-%d'

if [ "Linux" = $OSTYPE ] ; then
  DATE_AGO_ISO=`date -d "1970-01-01 00:00 UTC + $DATE_AGO_EPOCH seconds" $FORMAT`
else
  DATE_AGO_ISO=`date -r $DATE_AGO_EPOCH $FORMAT`
fi

DB_CMD="db.items.find( { publish_date : { $lt : \"$DATE_AGO_ISO\" } } ).count()"
# DB_CMD="db.items.remove( { publish_date : { $lt : \"$DATE_AGO_ISO\" } } )"

echo $DB_CMD | mongo pz

This now provides a simple way of purging out old items that can be called from cron.

Calling Mahout from Clojure

Mahout is a set of libraries for running machine learning processes, such as recommendation, clustering and categorisation.

The libraries work against an abstract model that can be anything from a file to a full Hadoop cluster. This means you can start playing around with small data sets in files, a local database, a Hadoop cluster or a custom data store.

After a bit of research, it turned out not to be too complex to call via any JVM language. When you compile and install Mahout, the libraries are installed into your local Maven cache. This makes it very easy to include them into any JVM type project.

To help work through the various features, I’m reading the early access edition of Mahout in Action. I am trying out the examples in Clojure as I read through.

To get started, the following steps will setup a Clojure project to work with Mahout:

  1. Build and install Mahout

    The process installs the Mahout jars into your local maven repository, making them accessible to lein.

  2. Create a new project

    lein new mahoutexample
  3. Add dependencies to project.clj.

    For example:

    :dependencies [
      [org.clojure/clojure "1.2.1"]
      [org.apache.mahout/mahout-core "0.5"]
      [org.apache.mahout/mahout-math "0.5"]
      [org.apache.mahout/mahout-utils "0.5"]]

    Then load the dependencies into your project:

    lein deps
  4. Add your code

    For example, ch02.clj shows calling a recommendation engine.

    The time consuming part is adding in the right import statements.

  5. Evaluate from the REPL

    This is where it gets fun. Clojure makes it easy to try out different data sets and learning algorithms to rapidly iterate.

  6. [Optional] Add a run task

    In project.clj, you can assign a class as the main class. Lein will attempt to find a function called “-main” and call that.

    lein run

As an example, I’ve ported the first chapter of the book to Clojure and it is available on github.

Some other articles on using parts of Mahout with Clojure are the opus artificem probat blog:

CS229 Machine Learning at Stanford (iTunes U)

Andrew Ng’s course at Stanford on Machine Learning provides a great overview of the different types and applications of current machine learning algorithms. The course is available for free either on YouTube or subscribe in iTunes University.

Be warned, this is a course on theory, so is taught through mathematics rather than programming. You will develop a deeper understanding, but it may hurt your brain.

The section notes provide some assistance in reviewing the key concepts that you’ll need.

I’ve watched the course, but I keep forgetting which lesson covered which topic. To remember, I’ve put together a brief summary below:

iTunes University is a great way to keep up to date with a range of topics. Worth checking out to see what else you can find.

Analysis of data with MongoDB and R

For my photography news site, I’ve ended up with a data store of a lot of blog articles in MongoDB. To try and look for patterns, I want to run some analysis. R looks like the right tool for the job.

Integration Installation

To link R to Mongo, use the RMongo library. Behind the scenes, this uses a R to Java bridge and the Java MongoDB driver.

Installation is reasonably simple:

  1. Validate

    $ cd ..
    $ R CMD check RMongo
  2. Build

    $ R CMD build RMongo
  3. Install

    $ R CMD install RMongo*.tar.gz

These steps assume you already have R installed and available from the command line.

Query for Data

Now from within R, you can connect to a local MongoDB. The mongoDbConnect function takes some additional arguments if your database is remote.

> library("RMongo")
> mongo  < - mongoDbConnect("db")
> result < - dbGetQuery(mongo, "items", "", 0, 10)
> result$publish_date
 [1] "2011-03-03" "2011-03-03" "2011-04-20" "2011-01-24" "2011-01-24" "2011-04-11" "2011-03-29"
 [8] "2011-03-28" "2011-03-22" "2011-03-14"

To some extent, it is as simple as that. You can use more complex queries to extract the data that you want.

The main query function takes five arguments:

  • database connection
  • collection name
  • query
  • skip - how many objects to skip
  • limit - total number of objects to return

For example, extract all fields from an items collection where an item was published during April 2011:

> result < - dbGetQuery(mongo, "items", "{'publish_date' : { '$gte' : '2011-04-01', '$lt' : '2011-05-01'}}")

To limit data returned to specific fields, use dbGetQueryForKeys instead:

> result < - dbGetQueryForKeys(mongo, "items", "{'publish_date' : { '$gte' : '2011-04-01', '$lt' : '2011-05-01'}}", "{'publish_date' : 1, 'rank' : 1}")

Plotting of Results

First, summarise the results into a table:

> res < - transform(result, day = weekdays(as.Date(publish_date)))
> res.days < - factor(res$day, levels = c('Sunday', 'Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday'), ordered=TRUE)
> table(res.days)
res.days
   Sunday    Monday   Tuesday Wednesday  Thursday    Friday  Saturday
       30        74       123       121       131       128        86
> 

Then, plot as a barchar:

> barplot(t(as.matrix(table(res.days))), main='Items per day of week')

Which gives the following graphic:

Items per day

This makes it easy to see how many articles were available per day across April.

While I am still scratching the surface, the combination of R with MongoDB looks to be quite useful.