Alt.Tekpub: Moving to MongoDB

Tuesday, February 2 2012 javascript alt-tekpub

I've chosen to use MongoDB - now what?

Exploring NoSQL

When we moved from .NET to Rails we started out with MongoDB using MongoMapper. I loved it a lot back then - incredibly easy to work with and quite fast. But Rails wasn't built with NoSQL in mind and frankly it seemed like I was trying to solve the wrong problem as ActiveRecord is built with a relational system in mind.

We used MongoMapper - and I really like it - but it forces you to declare your Schema upfront and that's the reverse mindset for using a document database. None of this is really a bad thing - but it didn't make me glow about using MongoDB. MongoMapper's abstraction is pretty complete: you really don't know you're using Mongo very much... which is kind of not the point.

Welcome Back

The whole reason I moved from MongoDB back to PostGres was simply to play with all the things Rails has to offer and let it work with me. That's a key to working with Rails: just let it do its thing and you worry about building your business/site. It's a compelling pitch - and so that's what we did.

I feel the reverse is now the case with MongoDB and Node: I'm in JS Land, so let's rock it.


OK, so I've installed MongoDB on my Mac using Homebrew:

brew install mongodb  

Instead of launching it automatically when my machine starts, I created an alias so I could start it with whatever options I wanted. When you install with homebrew it puts the data directory in its own little protected spot... which is dandy but I don't want it there. So I created an alias for starting mongo:

alias mm='mongod run --rest --dbpath /Users/rob/mongo/data'  

This simply telling MongoDB to start, to make sure the RESTful interface is working and then to use my "/mongo/data" directory to store its data. I type "mm" in the console and we're happy.

But what are we doing?


I've got all the tables in Tekpub you would imagine:

... plus a smattering of others. Looking at this, there are no surprises... but how would I "model" this with a document database?

There is no single answer and the good news is that I can change my mind later if I want to. One thing I have done in the past is to think of a documents as hierarchical in a DDD sort of way. The concept of Aggregate Roots really helps here - and so does Common Sense and Freeing Your Mind.

Consider an Order - which I'll rename here and now to be "Invoice". What "things" does an Invoice have within it? Off the top of my head (keeping in mind I do digital deliver, no need for addresses and so on):

When thinking about an Agg Root - you might ask "which of these do I care about on its own?". Or, put another way: which things can stand on their own without the context of the Invoice? To me, that's a Customer and a Transaction - although I'm not entirely sold on the idea that a Transaction means anything without the context of the Invoice.

Also - the Customer reference is argumentative. If I sell a video to "Jim Cheese" but later he changes his name to "Beef Stick" - do I change the Invoice? My accountant would probably say "no" since an Invoice is a matter of record and should reflect precisely the data that went into executing it (Jim could also change companies).

Given this - perhaps I'm not looking for a Customer reference here - maybe just some information about who bought it. This is where DDD kind of kicks in - asking business questions and defining our model based on the business requirements.

Puts business hat on

What I really need here is a way to be sure I can show users what they've bought - and it's important that the invoice reflect what they've bought using the information that they provided at the time of sale - in other words whatever's in my books. I can't have disparity as my accountant will not be happy. Also - users only rarely change their email - so typically that's not a concern, but it is a concern nonetheless.

Right - so the model needs to allow for an invoice listing for Customers and the Invoice needs to be a snapshot. By definition, then, we have at least 2 aggregate roots: Customers and Invoices, and they'll be two of our documents.

Next up is pretty simple: in our PostGres db a Production has a one to many with Episodes. An episode makes no sense outside of the context of a Production - so thats our next Agg Root.

Finally we have Coupons. We want to know who used them, we also want users to be able to know which ones they've used. Also, the use of a Coupon generates an Invoice so we'll need to track that relationship as well.Let's get to it!


No, not a Rails-style migration... I'm creating one for moving data from one system to another (PostGres to MongoDB). I'm on a Mac (which doesn't matter) and I like Rake for one-off tasks - so that's what I'll use.

mkdir ~/Projects/mongo_move  
cd ~/Projects/mongo_move  
touch Gemfile  
touch Rakefile.rb  
git initgit add .  
git commit -am "Initial"  

This created our project directory. The first thing I want to do is figure out how to move the data. Now I could have opened up my Rails app and used a Rake task with ActiveRecord (this is actually where I started) but that involves... well all of Rails which isn't bad, but I don't need it for moving data around.

For that matter there are a number of importers and data movers for MongoDB - but they won't transform things precisely as I want. I've done ETL enough to know that customization is the name of the game and I'll write a Rake task in 1/10th the time it will take to customize a tool (given that my data is so small).

Since I'm working closely to each system, I don't need a lot of abstraction here. Thank [Diety] for Sequel - if you've never used it it's just like Massive, but for Ruby. For MongoDB - again I could use MongoMapper or Mongoid, but I don't want all that abstraction - and I know Mongo's API pretty well - so I'm just going to use the straight up ruby mongo gem:

source :rubyforge  
gem 'sequel'  
gem 'pg'  
gem 'mongo'  
gem 'bson_ext'  

Note: "bsonext" is simply a perf-enhancer for the ruby mongo driver._

Run "bundle install" and we're off to the races....


The simplest thing to do, right now, is to add Coupons. There's nothing fancy I need to do with them - just insert the table directly. The first thing to do is set the connection in Sequel, and then initialize my Mongo collection:

require 'rubygems'  
require 'sequel'  
require 'mongo'  
def connect  
def init_collection(name)  
  conn =  
  db = conn['tekpub']  
  coll = db[name]  
  #delete them  

Sequel is amazingly simple to use. To pull out the coupons all I need to do is:

DB = connect  
#You can do full-scale models with validations with Sequel,  
#Or you can ignore the abstraction completely  
coupons =["SELECT * FROM coupons"].all  

"coupons" in the above code is going to a be a Hash Array - so if I was to iterate over it I could play with a coupon hash and pull out a coupon value using "coupon[:code]" for example. If I wanted dot-notation I could create a Coupon class using Sequel - but I don't need that... hashes will work fine.

And it turns out that the MongoDB driver likes hashes too! To insert something into MongoDB, I simply need to connect to a database, specify a collection, and then jam in what I want:

#uses our helper method above  
coll = init_collection("coupons")  
coll.insert({:code => "some_code", :description => "a test"})  

And that's it. Sort of - it turns out that if you use "coll.insert" with a hash array - it will batch insert them all for you... so here comes the punch line...

task :coupons do  
DB = connect  
coll = init_collection("coupons")  
coll.insert(DB["SELECT * FROM coupons"].all)  
puts "Done - #{coll.count} coupons"  

That's pretty slick. Here's what my coupon document looks like:

OK Not Always That Easy

Coupons were simple, but what about Invoices? And those Embedded Documents? This could be tricky. What's even more tricky is that MongoDB doesn't like Decimals, nor does it like Date types (without the time). So to get around this, I'll have to write some SQL. I could use Ruby to tweak the types, but SQL is simple enough for what I need:

inv_sql = "SELECT  
email as send_to,  
CAST(created_at as timestamp) as invoiced_at,  
CAST(created_at as timestamp) as due_at,  
'' as po,  
email as bill_to,  
orders.authorization, --authorization is a reserved word  
CAST(discount as float4) as discount,  
CAST(subtotal as float4) as subtotal,  
'paid' as status  
FROM orders"  

At first glance this is really "Table-y", meaning I could probably do more with the document style of MongoDB, but for now this will work. I'm casting decimal fields as floats and date fields as timestamps. I can loop over the results of this and add the items to the hash:

DB[inv_sql].each do |i|  
  puts "Adding order #{i[:number]}"    
  #get the items    
  i[:items] = DB["SELECT       
  CAST(price as float4) as price,      
  CAST(discount as float4) as discount,      
  FROM order_items WHERE order_id = #{i[:id]}"].all    

This will pop a document in the database that looks something like this (dummy data):

You'll notice the query I used here with MongoDB: "db.invoices.findOne({id:10491})" - that ID there has nothing to do with MongoDB's ID mechanism (which is called an ObjectID and I'll go into that later). That is a "Legacy ID" and it's ultra damn important that it stays with this transition - this is how the data is related!

Now you might think "but you don't need that with your document structure" and I would say "you trust me way too much". Documents are easy and it doesn't hurt to throw the extra value in there - so do yourself a favor and keep all the legacy keys.


The final thing I need to do is link my Customers to their Invoices. There are a number of ways to do this - the first is a formal DBReference, which is (basically) a pointer from one document to another using DBRef. I don't need that - for a few reasons.

If you read what Mongo's docs say:

DBRef is a more formal specification for creating references between documents. DBRefs (generally) include a collection name as well as an object id. Most developers only use DBRefs if the collection can change from one document to the next. If your referenced collection will always be the same, the manual references outlined above are more efficient.

I'm cool with efficiency. This makes things easier! Once I've rolled in my invoices and my customers, I can now "knit" them:

task :customers_invoices do  
  customers = init_collection("customers", false)  
  invoices = init_collection("invoices", false)  
  customers.find.each do |c|    
    puts "Loading invoices for #{c['email']}"    
    #get the invoices    
    customer_invoices = invoices.find({:email => c["email"]})    
    links ={|doc| doc["_id"]}    
    customers.update({"_id" => c["_id"]}, {"$set" => {"invoices" => links}})  

My Customer now looks like this:


That's how many lines of code it took to roll in my entire Tekpub database. And I'm sure I'm not done yet - but it doesn't matter, rerolling a document is incredibly easy with MongoDB.Next time, I'll get NodeJS up and rolling and we'll query us some MongoDB!