Late last week, I had the privilege of attending no:sql(east) in Atlanta, GA representing SugarCRM. At the "Cocktail Welcome" at Tap on Wednesday, I spoke at great length with a lot of people with very interesting backgrounds including running heavy-duty backend systems at Twitter, Microsoft Bing, Aol, and others, as well as people from small companies and startups. My bike ride home left me with a bunch of stitches in my chin which led to missing the first few sessions on Thursday, but I did finally get to the conference and learned about lots of different projects and products focusing on new ways to work with data.
There are those that say that the NoSQL "movement" is a bunch of people that just hate using SQL, but the definition that most conference attendees seemed to agree on is that "NoSQL means Not Only SQL". This means that there is a time and place for relational databases, and typically a relational database is the way to start solving problems at a small scale until you are fully aware of what problem you are actually trying to solve, but at some point it often becomes time to perform certain kinds of analysis using non-sql tools.
Several categories of tools were presented: from schema free graph databases, to key-value stores, to processing systems modeled after Google's BigTable and MapReduce, to tools that run on top of these (and combinations of these) to make using them easier for certain audiences. The majority of tools seemed to be some implementation of a key-value store using HTTP as a transport, and JSON or something similar as the wire format. All presenters gave great presentations, and the vibe in the air was that this was the best conference that many people in the audience had been to. It certainly didn't hurt that the subject matter was trendy, shiny, and new!
My award for coolest graphs goes to Microsoft's Yuan Yu for his real time coding demo (in Visual Studio!) of DryadLINQ's automatic clustering functionality and it's status graphs that show data and processing flow through a cluster as it performs computation. Unfortunately, all of the Dryad code is still encumbered in some way at Microsoft, but they're "working on it". My award for most unexpected goes to Tim Anglade's presentation on 'Tin' which is basically a method of storing sequential data in a set of text files with predictable names and accessing them via HTTP. It sounds simple, but it ran NASDAQ's Market Relay Service for some time serving petabytes of stock information every day! neo4j was the most interesting presentation to me because I like the idea of graph databases and the presenter was fantastic ("Sweden's best export!" according to someone in the conference IRC channel). Hopefully I stumble across some project in the future that I need a graph database for as I don't have anything on my plate currently that could use it.
Of all the tools presented, the one I'm most looking forward to playing with at SugarCRM is Pig which makes running certain kinds of data analysis very simple and allows for simple local processing as well as offloading jobs to a (Hadoop) cluster. We have some large datasets on large servers (32G RAM sorts of things), and we're prevented from doing a lot of things due to how slowly MySQL performs certain kinds of queries on these datasets. Hopefully I'll have something to post in a few weeks that uses Pig or one of the other tools I learned about last week, and maybe we'll get a Hadoop cluster up and running.