Article Image
read

Getting closer to pushing a first release so I thought I'd try to expand on what Biggy is and why I made it

Why Biggy?

I believe viewing all data as the same is a bit of a mistake. Some data needs to be available all the time, quickly - other data might sit inside your database for years and get pulled out very rarely. To put things in concrete terms, let's consider on online store:

  • Product and Customer information need to be at the ready for logging in and catalog display
  • Order information is displayed occasionally - much less than Catalog/Customer information
  • Log data is examined rarely

Our store is a simple process - the "input" data (Products and Customers) generate "output", or "record" data. The input data changes fairly often - Customers logging in and changing things, store owners changing prices, etc.

The output data doesn't change much - in analytical terms this is called "slowly changing over time" - you might go in and tweak an order here and there, but mostly it's a matter of historical record and should never be changed.

To me the "input" stuff (Products/Customers) is perfect for a looser, document structure. The output stuff should not only be in a relational structure - it should be denormalized, ready for analytical export to CSV or some other reporting system.

This is why I made Biggy: I wanted the best of both worlds. I want speed, I want fast writes, I want LINQ and the ability to store things in the simplest manner possible.

What Biggy Does

Let's sell something. Let's assume I have a Postgres database (Biggy was built with Postgres first) called "Tekpub" and I want to sell videos:

public class Product{  
  public int ID {get;set}
  public string Sku {get;set}
  public int CategoryID {get;set}
  public string Name {get;set}
  public string Description {get;set}
  public decimal Price {get;set}
}

A simple object. Since we're going to be doing list operations on this class, it's a good idea to override Equals():

public class Product{  
  //..
  public override void Equals(object o){
    var p = (Product)o;
    return p.Sku == this.Sku;
  }
}

Perfect - now let's store this in our Postgres database:

//open a list using the tekpub connection string and the products table
var products = new PGList<Product>("tekpub", "products");  
var newProduct = new Product();  
//fill it in...
products.Add(newProduct);  

This will:

  • serialize the newProduct down to JSON
  • create a table called "products" in my database with two columns: "id" and "body". "id" because, by convention, Biggy will look at the properties and use a value in "id" or "productID"
  • shove the JSON into the new record and also add newRecord to the List<Product> backing store behind products.

This is the power behind Biggy - it synchronizes data on disk to a list in memory. Since all the data is held in memory you're free to query as you wish using LINQ:

var hanselmanVids = products.Where(x => x.CategoryID == 5);  

This is lightning fast and there's no SQL translation to deal with - it just pulls that record out of memory. More on this in a bit.

Data Storage

There are currently 3 ways to store documents:

  • On disk in a JSON file (one per record type T)
  • In Postgres database using the built-in JSON data type
  • In SQL Server using regular text storage

We're working on more storage options currently, such as MongoDB and AzureTableStorage.

Going Further

Let's say I've loaded up 10,000 products and now I want users to be able to search. Seems like it might be hard - but remember you're querying in memory so you don't have to worry about your DBA kicking your butt for doing this:

var hanselmanVids = products.Where(x => x.Description.Contains("Hanselman"));  

But we can do better. Let's tweak our class to have FullText query ability:

public class Product{  
  public int ID {get;set}
  public string Sku {get;set}
  public int CategoryID {get;set}
  public string Name {get;set}

  [FullText]
  public string Description {get;set}
  public decimal Price {get;set}
}

Note: this doesn't work with File Storage

By flagging this we can split out the text in this column (you can use this attribute on a number of columns) and Biggy will store it separately. If you're using Postgres the text will be indexed on the fly using to_tsvector, with SQL Server it gets stored as text and you have to setup the indexing (which I'm trying to solve by the way).

To query the data you can now:

var hanselmanVids = products.FullText('Hanselman');  

This might not seem all that remarkable - but keep in mind that this is a document store and you're querying it with a powerful full text engine.

Speed

I'm crudely "benchmarking" our reads and writes - keep in mind this stuff varies by machine and so on. Here's what I have so far:

Biggy Benchmarks

Notice the reads from the 100K records? Yep, that's ZERO milliseconds - too low to even register. Most of the web applications I've written spend their time reading from the database - and occasionally writing. Why not make that fast and easy?

LINQ is one of the neatest things about C# - using it to query a super fast in-memory store is pretty dang fun!

A Note On Memory

You might have some alarms going off - which is a good thing. If you pull all the data into memory - doesn't that take time? Yes, it does, as you can see from the image above. 1000 records loads quite fast (about 30ms) but you don't want to be instantiating your Biggy bits all the time.

This means you'll want to have 1 instance of your data store open while your app is up. Biggy will read the data in, then you get to play with it from there. There's an example in the README on github of how you can setup your MVC app with a static property. It's simple stuff but yes there are possible issues.

One is threading. It's possible if you're using the FileStore to have write collisions and locks. For this reason It's a good idea to use the FileStore (BiggyList) for very rarely-written things - or stuff that might be updated by very few people.

In terms of server memory - I like how Karl Seguin puts it:

I do feel that some developers have lost touch with how little space data can take. The Complete Works of William Shakespeare takes roughly 5.5MB of storage

Azure web sites (free tier) give you 1G of RAM to play with - you'll have a pretty long time before you run out of RAM. All of Tekpub's data (customers, orders, logs, etc) capped out at 6.5Mb...

There's More, And I'll Keep Writing

I'm exhausted. I haven't gone on a tear like this in years but to see this come together has been extraordinarily fun. There's a lot more to touch on here, including:

  • Non-document Lists. We have em, and if you want to use a regular relational structure with Biggy, you can. PGList and SqlServerList will do just that.
  • Non in-memory stuff - we have that too. Under the covers it's all "just Massive" - so you can use PGTable and SqlServerTable to read and write (with dynamics) as you need to.

Onward! Time for a beer...

Blog Logo

Rob Conery

I am the Co-founder of Tekpub.com, Creator of This Developer's Life, an Author, Speaker, and sometimes a little bit opinionated.


Published