Sleepless Dev: 2012

Sunday, September 30, 2012

MongoDB Database: Replica Set, Autosharding, Journaling, Architecture (Part 2)

MongoDB Database: Replica Set, Autosharding, Journaling, Architecture Part 2

See part 1 of MongoDB Architecture...

Journaling: Is durability overvalued if RAM is the new Disk? Data Safety versus durability

It may seem strange to some that journaling was added as late as version 1.8 to MongoDB. Journaling is only now the default for 64 bit OS for MongoDB 2.0. Prior to that, you typically used replication to make sure write operations were copied to a replica before proceeding if the data was very important. The thought being that one server might go down, but two servers are very unlikely to go down at the same time. Unless somebody backs a truck over a high voltage utility poll causing all of your air conditioning equipment to stop working long enough for all of your servers to overheat at once, but that never happens (it happened to Rackspace and Amazon). And if you were worried about this, you would have replication across availability zones, but I digress.

At one point MongoDB did not have single server durability, now it does with addition of journaling. But, this is far from a moot point. The general thought from MongoDB community was and maybe still is that to achieve Web Scale, durability was thing of the past. After allmemory is the new disk. If you could get the data on second server or two, then the chances of them all going down at once is very, very low. How often do servers go down these days? What are the chances of two servers going down at once? The general thought from MongoDB community was (is?) durability is overvalued and was just not Web Scale. Whether this is a valid point or not, there was much fun made about this at MongoDB's expense (rated R, Mature 17+).

As you recall MongoDB uses memory mapped file for its storage engine so it could be a while for the data in memory to get synced to disk by the operating system. Thus if you did have several machines go down at once (which should be very rare), complete recoverability would be impossible. There were workaround with tradeoffs, for example to get around this (now non issue) or minimize this issue, you could force MongoDB to do an fsync of the data in memory to the file system, but as you guessed even with a RAID level four and a really awesome server that can get slow quick. The moral of the story is MongoDB has journaling as well as many other options so you can decide what the best engineering tradeoff in data safety, raw speed and scalability. You get to pick. Choose wisely.

The reality is that no solution offers "complete" reliability, and if you are willing to allow for some loss (which you can with some data), you can get enormous improvements in speed and scale. Let's face it your virtual farm game data is just not as important as Wells Fargo's bank transactions. I know your mom will get upset when she loses the virtual tractor she bought for her virtual farm with her virtual money, but unless she pays real money she will likely get over it. I've lost a few posts on twitter over the years, and I have not sued once. If your servers have an uptime of 99 percent and you block/replicate to three servers than the probability of them all going down at once is (0.000001) so the probability of them all going down is 1 in 1,000,000. Of course uptime of modern operating systems (Linux) is much higher than this so one in 100,000,000 or more is possible with just three servers. Amazon EC2 offers discounts if they can't maintain an SLA of 99.95% (other cloud providers have even higher SLAs). If you were worried about geographic problems you could replicate to another availability zone or geographic area connected with a high-speed WAN. How much speed and reliability do you need? How much money do you have?

An article on when to use MongoDB journaling versus older recommendations will be a welcome addition. Generally it seems journaling is mostly a requirement for very sensitive financial data and single server solutions. Your results may vary, and don't trust my math, it has been a few years since I got a B+ in statistics, and I am no expert on SLA of modern commodity servers (the above was just spit balling).

If you have ever used a single non-clustered RDBMS system for a production system that relied on frequent backups and transaction log (journaling) for data safety, raise your hand. Ok, if you raised your hand, then you just may not need autosharding or replica sets. To start with MongoDB, just use a single server with journaling turned on. If you require speed, you can configure MongoDB journaling to batch writes to the journal (which is the default). This is a good model to start out with and probably very much like quite a few application you already worked on (assuming that most application don't need high availability). The difference is, of course, if later your application deemed to need high availability, read scalability, or write scalability, MongoDB has your covered. Also setting up high availability seems easier on MongoDB than other more established solutions.

Figure 3: Simple setup with journaling and single server ok for a lot of applications

If you can afford two other servers and your app reads more than it writes, you can get improved high availability and increased read scalability with replica sets. If your application is write intensive then you might need autosharding. The point is you don't have to be Facebook or Twitter to use MongoDB. You can even be working on a one-off dinky application. MongoDB scales down as well as up.

Autosharding

Replica sets are good for failover and speeding up reads, but to speed up writes, you need autosharding. According to a talk by Roger Bodamer on Scaling with MongoDB, 90% of projects do not need autosharding. Conversely almost all projects will benefit from replication and high availability provided by replica sets. Also once MongoDB improves its concurrency in version 2.2 and beyond, it may be the case that 97% of projects don't need autosharding.

Sharding allows MongoDB to scale horizontally. Sharding is also called partitioning. You partition each of your servers a portion of the data to hold or the system does this for you. MongoDB can automatically change partitions for optimal data distribution and load balancing, and it allows you to elastically add new nodes (MongoDB instances). How to setup autosharding is beyond the scope of this introductory article. Autosharding can support automatic failover (along with replica sets). There is no single point of failure. Remember 90% of deployments don’t need sharding, but if you do need scalable writes (apps like Foursquare, Twitter, etc.), then autosharding was designed to work with minimal impact on your client code.

There are three main process actors for autosharding: mongod (database daemon), mongos, and the client driver library. Each mongod instance gets a shard. Mongod is the process that manages databases, and collections. Mongos is a router, it routes writes to the correct mongod instance for autosharding. Mongos also handles looking for which shards will have data for a query. To the client driver, mongos looks like a mongod process more or less (autosharding is transparent to the client drivers).

Figure 4: MongoDB Autosharding

Autosharding increases write and read throughput, and helps with scale out. Replica sets are for high availability and read throughput. You can combine them as shown in figure 5.

Figure 5: MongoDB Autosharding plus Replica Sets for scalable reads, scalable writes, and high availability

MongoDB Autosharding for Scalable Reads, Scalable Writes and High Availability

You shard on an indexed field in a document. Mongos collaborates with config servers(mongod instances acting as config servers), which have the shard topology (where do the key ranges live). Shards are just normal mongod instances. Config servers hold meta-data about the cluster and are also mongodb instances.

Shards are further broken down into 64 MB chunks called chunks. A chunk is 64 MB worth of documents for a collection. Config servers hold which shard the chunks live in. The autosharding happens by moving these chunks around and distributing them into individual shards. The mongos processes have a balancer routine that wakes up so often, it checks to see how many chunks a particular shard has. If a particular shard has too many chunks (nine more chunks than another shard), then mongos starts to move data from one shard to another to balance the data capacity amongst the shards. Once the data is moved then the config servers are updated in a two phase commit (updates to shard topology are only allowed if all three config servers are up).

The config servers contain a versioned shard topology and are the gatekeeper for autosharding balancing. This topology maps which shard has which keys. The config servers are like DNS server for shards. The mongos process uses config servers to find where shard keys live. Mongod instances are shards that can be replicated using replica sets for high availability. Mongos and config server processes do not need to be on their own server and can live on a primary box of a replica set for example. For sharding you need at least three config servers, and shard topologies cannot change unless all three are up at the same time. This ensures consistency of the shard topology. The full autosharding topology is show in figure 6. An excellent talk on the internals of MongoDB sharding was done by Kristina Chodorow, author of Scaling MongoDB, at OSCON 2011 if you would like to know more.

Figure 6: MongoDB Autosharding full topology for large deployment including Replica Sets, Mongos routers, Mongod Instance, and Config Servers

MongoDB Autosharding full topology for large deployment including Replica Sets, mongos routers, mongod instance, client drivers and config servers

Python MongoDB Install guide and getting started tutorial

Installing and setting up MongoDB with Python (Mongo DB tutorial for Python)

See setup guide to see how to install MongoDB.

Setting up Python and MongoDB are quite easy since Python has its own package manager.

To install mongodb lib for Python MAC OSX, you would do the following:

$ sudo env ARCHFLAGS='-arch i386 -arch x86_64'
$ python -m easy_install pymongo

To install Python MongoDB on Linux or Windows do the following:

$ easy_install pymongo

$ pip install pymongo

If you don't have easy_install on your Linux box you may have to do some sudo apt-get install python-setuptools or sudo yum install python-setuptools iterations, although it seems to be usually installed with most Linux distributions these days. If easy_install or pip is not installed on Windows, try reformatting your hard disk and installing a real OS, or if that is too inconvenient go here. The key here is to install pymongo.

Once you have it all setup, you will can create some code that is equivalent to the first console examples as shown in figure 1.

Figure 1: Python code listing part 1

Python does have literals for maps so working with Python is much closer to the JavaScript/Console from earlier than Java is. Like Java there are libraries for Python that work with MongoDB (MongoEngine, MongoKit, and more). Even executing queries is very close to the JavaScript experience as shown in figure 12.

Figure 2: Python code listing part 2

Here is the complete listing to make the cut and paste crowd (like me), happy.

Listing: Complete Python listing

import pymongo
from bson.objectid import ObjectId


connection = pymongo.Connection()

db = connection["tutorial"]
employees = db["employees"]

employees.insert({"name": "Lucas Hightower", 'gender':'m', 'phone':'520-555-1212', 'age':8})

cursor = db.employees.find()
for employee in db.employees.find():
    print employee


print employees.find({"name":"Rick Hightower"})[0]


cursor = employees.find({"age": {"$lt": 35}})
for employee in cursor:
     print "under 35: %s" % employee


diana = employees.find_one({"_id":ObjectId("4f984cce72320612f8f432bb")})
print "Diana %s" % diana

If you would like to learn more about MongoDB consider the following resources:

Excellent MongoDB article on MongoDB written by one of Mammatus Technologies founder Rick Hightower
MongoDB Training for Java Developers by Mammatus Technology
MongoDB Training for PHP Developers by Mammatus Technology
MongoDB Training for Python Developers by Mammatus Technology
The official MongoDB tutorial at MongoDB.org.

The output for the Python example is as follows:

{u'gender': u'm', u'age': 42.0, u'_id': ObjectId('4f964d3000b5874e7a163895'), u'name': u'Rick Hightower', u'phone':
u'520-555-1212'} 

{u'gender': u'f', u'age': 30, u'_id': ObjectId('4f984cae72329d0ecd8716c8'), u'name': u'Diana Hightower', u'phone':
u'520-555-1212'} 

{u'gender': u'm', u'age': 8, u'_id': ObjectId('4f9e111980cbd54eea000000'), u'name': u'Lucas Hightower', u'phone':
u'520-555-1212'}

MongoDB PHP install guide and tutorial

Installing MongoDB to work with PHP and Apache (tutorial)

Installing and setting up MongoDB to work with PHP

See installing MongoDB to install MongoDB.

Node.js , Ruby, and Python in that order are the trend setter crowd in our industry circa 2012. Java is the corporate crowd, and PHP is the workhorse of the Internet. The "get it done" crowd. You can't have a decent NoSQL solution without having good PHP support. Installing MongoDB to work with PHP is simple.

To install MongoDB support with PHP use pecl as follows:

$ sudo pecl install mongo

Add the mongo.so module to php.ini.

extension=mongo.so

Then assuming you are running it on apache, restart as follows:

$ apachectl stop
$ apachectl start

Figure 1 shows our roughly equivalent code listing in PHP.

Figure 1 PHP code listing

The output for figure 1 is as follows:

Output:
array ( '_id' => MongoId::__set_state(array( '$id' => '4f964d3000b5874e7a163895', )), 'name' => 'Rick Hightower', 
'gender' => 'm', 'phone' => '520-555-1212', 'age' => 42, )

array ( '_id' => MongoId::__set_state(array( '$id' => '4f984cae72329d0ecd8716c8', )), 'name' => 'Diana Hightower', 'gender' => ‘f', 
'phone' => '520-555-1212', 'age' => 30, )

array ( '_id' => MongoId::__set_state(array( '$id' => '4f9e170580cbd54f27000000', )), 'gender' => 'm', 'age' => 8, 'name' => 'Lucas Hightower', 
'phone' => '520-555-1212', )

The other half of the equation is in figure 2.

Figure 2 PHP code listing

The output for figure 2 is as follows:

Output
Rick? 
array ( '_id' => MongoId..., 'name' => 'Rick Hightower', 'gender' => 'm', 
'phone' => '520-555-1212', 'age' => 42, )
Diana? 
array ( '_id' => MongoId::..., 'name' => 'Diana Hightower', 'gender' => ‘f', 
'phone' => '520-555-1212', 'age' => 30, )
Diana by id? 
array ( '_id' => MongoId::..., 'name' => 'Diana Hightower', 'gender' => 'f', 
'phone' => '520-555-1212', 'age' => 30, )

Here is the complete PHP listing.

PHP complete listing

<!--?php

$m = new Mongo();
$db = $m->selectDB("tutorial");
$employees = $db->selectCollection("employees");
$cursor = $employees->find();

foreach ($cursor as $employee) {
  echo var_export ($employee, true) . "< br />";
}

$cursor=$employees->find( array( "name" => "Rick Hightower"));
echo "Rick? < br /> " . var_export($cursor->getNext(), true);

$cursor=$employees->find(array("age" => array('$lt' => 35)));
echo "Diana? < br /> " . var_export($cursor->getNext(), true);

$cursor=$employees->find(array("_id" => new MongoId("4f984cce72320612f8f432bb")));
echo "Diana by id? < br /> " . var_export($cursor->getNext(), true);
?>

If you like Object mapping to documents you should try the poorly named MongoLoid for PHP.

If you would like to learn more about MongoDB consider the following resources:

Excellent MongoDB article on MongoDB written by one of Mammatus Technologies founder Rick Hightower
MongoDB Training for Java Developers by Mammatus Technology
MongoDB Training for PHP Developers by Mammatus Technology
MongoDB Training for Python Developers by Mammatus Technology
The official MongoDB tutorial at MongoDB.org.

Mongo DB is wrong. It is MongoDB. Always put the DB next to Mongo.

MongoDB Install guide and tutorial for Java

Installing MongoDB to work with Java (MongoDB Java Tutorial)

Java and MongoDB

See install guide for a quick MongoDB Tutorial.

Pssst! Here is a dirtly little secret. Don't tell your Node.js friends or Ruby friends this. More Java developers use MongoDB than Ruby and Node.js. They just are not as loud about it. Using MongoDB with Java is very easy.

The language driver for Java seems to be a straight port of something written with JavaScript in mind, and the usuability suffers a bit because Java does not have literals for maps/objects like JavaScript does. Thus an API written for a dynamic langauge does not quite fit Java. There can be a lot of useability improvement in the MongoDB Java langauge driver (hint, hint). There are alternatives to using just the straight MongoDB language driver, but I have not picked a clear winner (mjorm, morphia, and Spring data MongoDB support). I'd love just some usuability improvements in the core driver without the typical Java annotation fetish, perhaps a nice Java DAO DSL (see section on criteria DSL if you follow the link).

Setting up Java and MongoDB

Let's go ahead and get started then with Java and MongoDB.

Download latest mongo driver from github (https://github.com/mongodb/mongo-java-driver/downloads), then put it somewhere, and then add it to your classpath as follows:

$ mkdir tools/mongodb/lib
$ cp mongo-2.7.3.jar tools/mongodb/lib

Assuming you are using Eclipse, but if not by now you know how to translate these instructions to your IDE anyway. The short story is put the mongo jar file on your classpath. You can put the jar file anywhere, but I like to keep mine in ~/tools/.

If you are using Eclipse it is best to create a classpath variable so other projects can use the same variable and not go through the trouble. Create new Eclipse Java project in a new Workspace. Now right click your new project, open the project properties, go to the Java Build Path->Libraries->Add Variable->Configure Variable shown in figure 7.

Figure 7: Adding Mongo jar file as a classpath variable in Eclipse

For Eclipse from the "Project Properties->Java Build Path->Libraries", click "Add Variable", select "MONGO", click "Extend…", select the jar file you just downloaded.

Figure 8: Adding Mongo jar file to your project

Once you have it all setup, working with Java and MongoDB is quite easy as shown in figure 9.

Figure 9 Using MongoDB from Eclipse

The above is roughly equivalent to the console/JavaScript code that we were doing earlier. TheBasicDBObject is a type of Map with some convenience methods added. The DBCursor is like a JDBC ResultSet. You execute queries with DBColleciton. There is no query syntax, just finder methods on the collection object. The output from the above is:

Out:
{ "_id" : { "$oid" : "4f964d3000b5874e7a163895"} , "name" : "Rick
Hightower" , "gender" : "m" , "phone" : "520-555-1212" ,
"age" : 42.0}
{ "_id" : { "$oid" : "4f984cce72320612f8f432bb"} , "name" : "Diana
Hightower" , "gender" : "f" , "phone" : "520-555-1212" ,
"age" : 30}

Once you create some documents, querying for them is quite simple as show in figure 10.

Figure 10: Using Java to query MongoDB

The output from figure 10 is as follows:

Rick?
{ "_id" : { "$oid" : "4f964d3000b5874e7a163895"} , "name" : "Rick
Hightower" , "gender" : "m" , "phone" : "520-555-1212" ,
"age" : 42.0}
Diana?
{ "_id" : { "$oid" : "4f984cae72329d0ecd8716c8"} , "name" : "Diana
Hightower" , "gender" : "f" , "phone" : "520-555-1212" ,
"age" : 30}

Diana by object id?
{ "_id" : { "$oid" : "4f984cce72320612f8f432bb"} , "name" : "Diana
Hightower" , "gender" : "f" , "phone" : "520-555-1212" ,
"age" : 30}

Just in case anybody wants to cut and paste any of the above, here it is again all in one go in the following listing.

Listing: Complete Java Listing

package com.mammatustech.mongo.tutorial;

import org.bson.types.ObjectId;

import com.mongodb.BasicDBObject;
import com.mongodb.DBCollection;
import com.mongodb.DBCursor;
import com.mongodb.DBObject;
import com.mongodb.Mongo;
import com.mongodb.DB;

public class Mongo1Main {
 public static void main (String [] args) throws Exception {
  Mongo mongo = new Mongo();
  DB db = mongo.getDB("tutorial");
  DBCollection employees = db.getCollection("employees");
  employees.insert(new BasicDBObject().append("name", "Diana Hightower")
    .append("gender", "f").append("phone", "520-555-1212").append("age", 30));
  DBCursor cursor = employees.find();
  while (cursor.hasNext()) {
   DBObject object = cursor.next();
   System.out.println(object);
  }
  
  //> db.employees.find({name:"Rick Hightower"})
  cursor=employees.find(new BasicDBObject().append("name", "Rick Hightower"));
  System.out.printf("Rick?\n%s\n", cursor.next());
  
  //> db.employees.find({age:{$lt:35}}) 
  BasicDBObject query = new BasicDBObject();
         query.put("age", new BasicDBObject("$lt", 35));
  cursor=employees.find(query);
  System.out.printf("Diana?\n%s\n", cursor.next());
  
  //> db.employees.findOne({_id : ObjectId("4f984cce72320612f8f432bb")})
  DBObject dbObject = employees.findOne(new BasicDBObject().append("_id", 
    new ObjectId("4f984cce72320612f8f432bb")));
  System.out.printf("Diana by object id?\n%s\n", dbObject);
  
  

 }
}

Please note that the above is completely missing any error checking, or resource cleanup. You will need do some of course (try/catch/finally, close connection, you know that sort of thing).

If you would like to learn more about MongoDB consider the following resources:

Excellent MongoDB article on MongoDB written by one of Mammatus Technologies founder Rick Hightower
MongoDB Training for Java Developers by Mammatus Technology
MongoDB Training for PHP Developers by Mammatus Technology
MongoDB Training for Python Developers by Mammatus Technology
The official MongoDB tutorial at MongoDB.org.

Mongo DB is wrong, MongoDB always put the DB next to Mongo.

MongoDB install guide and getting started guide

Getting up and running with MongoDB is fairly easy. MongoDB is a good way to get introduced to the NoSQL world too. In less than five minutes, you can be messing around with the console app and learning MongoDB. This short tutorial and install guide shows how to use JavaScript and commands from the MongoDB terminal.

Installing MongoDB: Guide to getting started with MongoDB and install guide

Now mix in some code samples to try out along with the concepts.

To install MongoDB go to their download page, download and untar/unzip the download to~/mongodb-platform-version/. Next you want to create the directory that will hold the data and create a mongodb.config file (/etc/mongodb/mongodb.config) that points to said directory as follows:

Listing: Installing MongoDB

$ sudo mkdir /etc/mongodb/data


$ cat /etc/mongodb/mongodb.config 
dbpath=/etc/mongodb/data

The /etc/mongodb/mongodb.config has one line dbpath=/etc/mongodb/data that tells mongo where to put the data. Next, you need to link mongodb to /usr/local/mongodb and then add it to the path environment variable as follows:

Listing: Setting up MongoDB on your path

$ sudo ln -s  ~/mongodb-platform-version/  /usr/local/mongodb
$ export PATH=$PATH:/usr/local/mongodb/bin

Run the server passing the configuration file that we created earlier.

Listing: Running the MongoDB server

$ mongod --config /etc/mongodb/mongodb.config

Short tutorial on using MongoDB

Mongo comes with a nice console application called mongo that let's you execute commands and JavaScript. JavaScript to Mongo is what PL/SQL is to Oracle's database. Let's fire up the console app, and poke around.

Firing up the mongos console application

$ mongo
MongoDB shell version: 2.0.4
connecting to: test
…
> db.version()
2.0.4
>

One of the nice things about MongoDB is the self describing console. It is easy to see what commands a MongoDB database supports with the db.help() as follows:

Client: mongo db.help()

> db.help()
DB methods:
db.addUser(username, password[, readOnly=false])
db.auth(username, password)
db.cloneDatabase(fromhost)
db.commandHelp(name) returns the help for the command
db.copyDatabase(fromdb, todb, fromhost)
db.createCollection(name, { size : ..., capped : ..., max : ... } )
db.currentOp() displays the current operation in the db
db.dropDatabase()
db.eval(func, args) run code server-side
db.getCollection(cname) same as db['cname'] or db.cname
db.getCollectionNames()
db.getLastError() - just returns the err msg string
db.getLastErrorObj() - return full status object
db.getMongo() get the server connection object
db.getMongo().setSlaveOk() allow this connection to read from the nonmaster member of a replica pair
db.getName()
db.getPrevError()
db.getProfilingStatus() - returns if profiling is on and slow threshold 
db.getReplicationInfo()
db.getSiblingDB(name) get the db at the same server as this one
db.isMaster() check replica primary status
db.killOp(opid) kills the current operation in the db
db.listCommands() lists all the db commands
db.logout()
db.printCollectionStats()
db.printReplicationInfo()
db.printSlaveReplicationInfo()
db.printShardingStatus()
db.removeUser(username)
db.repairDatabase()
db.resetError()
db.runCommand(cmdObj) run a database command.  if cmdObj is a string, turns it into { cmdObj : 1 }
db.serverStatus()
db.setProfilingLevel(level,{slowms}) 0=off 1=slow 2=all
db.shutdownServer()
db.stats()
db.version() current version of the server
db.getMongo().setSlaveOk() allow queries on a replication slave server
db.fsyncLock() flush data to disk and lock server for backups
db.fsyncUnock() unlocks server following a db.fsyncLock()

Just see how you can see some of the commands refer to concepts we discussed earlier. Now let's create a collection of employees, and do some create, read, update operations on it.

Create Employee Collection

 > use tutorial; 
switched to db tutorial 
> db.getCollectionNames(); [ ]
 > db.employees.insert({name:'Rick Hightower', gender:'m', gender:'m', phone:'520-555-1212', age:42}); 
Mon Apr 23 23:50:24 [FileAllocator] allocating new datafile /etc/mongodb/data/tutorial.ns, ...

The use command uses a database. If that database does not exist, it will be lazily created the first time we access it (write to it). The db object refers to the current database. The current database does not have any document collections to start with (this is why db.getCollections() returns an empty list). To create a document collection, just insert a new document. Collections like databases are lazily created when they are actually used. You can see that two collections are created when we inserted our first document into the employees collection as follows:

> db.getCollectionNames();
[ "employees", "system.indexes" ]

The first collection is our employees collection and the second collection is used to hold onto indexes we create.

To list all employees you just call the find method on the employees collection.

> db.employees.find()
{ "_id" : ObjectId("4f964d3000b5874e7a163895"), "name" : "Rick Hightower", 
    "gender" : "m", "phone" : "520-555-1212", "age" : 42 }

The above is the query syntax for MongoDB. There is not a separate SQL like language. You just execute JavaScript code, passing documents, which are just JavaScript associative arrays, err, I mean JavaScript objects. To find a particular employee, you do this:

> db.employees.find({name:"Bob"})

He quit so to find another employee, you would do this:

> db.employees.find({name:"Rick Hightower"})
{ "_id" : ObjectId("4f964d3000b5874e7a163895"), "name" : "Rick Hightower", "gender" : "m", "phone" : "520-555-1212", "age" : 42 }

The console just prints out the document right to the screen. I don't feel that old. At least I am not 100 as shown by this query:

> db.employees.find({age:{$lt:100}})
{ "_id" : ObjectId("4f964d3000b5874e7a163895"), "name" : "Rick Hightower", "gender" : "m", "phone" : "520-555-1212", "age" : 42 }

Notice to get employees less than a 100, you pass a document with a subdocument, the key is the operator ($lt), and the value is the value (100). Mongo supports all of the operators you would expect like $lt for less than, $gt for greater than, etc. If you know JavaScript, it is easy to inspect fields of a document, as follows:

> db.employees.find({age:{$lt:100}})[0].name
Rick Hightower

If we were going to query, sort or shard on employees.name, then we would need to create an index as follows:

db.employees.ensureIndex({name:1}); //ascending index, descending would be -1

Indexing by default is a blocking operation, so if you are indexing a large collection, it could take several minutes and perhaps much longer. This is not something you want to do casually on a production system. There are options to build indexes as a background task, to setup a unique index, and complications around indexing on replica sets, and much more. If you are running queries that rely on certain indexes to be performant, you can check to see if an index exists with db.employees.getIndexes(). You can also see a list of indexes as follows:

> db.system.indexes.find()
{ "v" : 1, "key" : { "_id" : 1 }, "ns" : "tutorial.employees", "name" : "_id_" }

By default all documents get an object id. If you don't not give it an object an _id, it will be assigned one by the system (like a criminal suspects gets a lawyer). You can use that _id to look up an object as follows with findOne:

> db.employees.findOne({_id : ObjectId("4f964d3000b5874e7a163895")})
{ "_id" : ObjectId("4f964d3000b5874e7a163895"), "name" : "Rick Hightower", 
   "gender" : "m", "phone" : "520-555-1212", "age" : 42 }

If you would like to learn more about MongoDB consider the following resources:

Excellent MongoDB article on MongoDB written by one of Mammatus Technologies founder Rick Hightower
MongoDB Training for Java Developers by Mammatus Technology
MongoDB Training for PHP Developers by Mammatus Technology
MongoDB Training for Python Developers by Mammatus Technology
The official MongoDB tutorial at MongoDB.org.
To finish this MongoDB Tutorial you can go here Tutorial/Install guide for Using MongoDB with Java, or here Tutorial/install guide for Using MongoDB with PHP or here Tutorial for Tutorial/Install Guide for using MongoDB with Python.

Thursday, September 27, 2012

MongoDB as a Gateway Drug to NoSQL

MongoDB combinations of features, simplicity, community, and documentation make it successful. The product itself has high availability, journaling (which is not always a given with NoSQL solutions), replication, auto-sharding, map reduce, and an aggregation framework (so you don't have to use map-reduce directly for simple aggregations). MongoDB can scale reads as well as writes.

NoSQL, in general, has been reported to be more agile than full RDBMS/ SQL due to problems with schema migration of SQL based systems. Having been on large RDBMS systems and witnessing the trouble and toil of doing SQL schema migrations, I can tell you that this is a real pain to deal with. RDBMS / SQL often require a lot of upfront design or a lot schema migration later. In this way, NoSQL is viewed to be more agile in that it allows the applications worry about differences in versioning instead of forcing schema migration and larger upfront designs. To the MongoDB crowd, it is said that MongoDB has dynamic schema not no schema (sort of like the dynamic language versus untyped language argument from Ruby, Python, etc. developers).

MongoDB does not seem to require a lot of ramp up time. Their early success may be attributed to the quality and ease-of-use of their client drivers, which was more of an afterthought for other NoSQL solutions ("Hey here is our REST or XYZ wire protocol, deal with it yourself"). Compared to other NoSQL solution it has been said that MongoDB is easier to get started. Also with MongoDB many DevOps things come cheaply or free. This is not that there are never any problems or one should not do capacity planning. MongoDB has become for many an easy on ramp for NoSQL, a gateway drug if you will.

MongoDB was built to be fast. Speed is a good reason to pick MongoDB. Raw speed shaped architecture of MongoDB. Data is stored in memory using memory mapped files. This means that the virtual memory manager, a very highly optimized system function of modern operating systems, does the paging/caching. MongoDB also pads areas around documents so that they can be modified in place, making updates less expensive. MongoDB uses a binary protocol instead of REST like some other implementations. Also, data is stored in a binary format instead of text (JSON, XML), which could speed writes and reads.

Another reason MongoDB may do well is because it is easy to scale out reads and writes with replica sets and autosharding. You might expect if MongoDB is so great that there would be a lot of big names using them, and there are like: MTV, Craigslist, Disney, Shutterfly, Foursqaure, bit.ly, The New York Times, Barclay’s, The Guardian, SAP, Forbes, National Archives UK, Intuit, github, LexisNexis and many more.

If you would like to learn more about MongoDB consider the following resources:

Excellent MongoDB article on MongoDB written by one of Mammatus Technologies founder Rick Hightower
MongoDB Training for Java Developers by Mammatus Technology
MongoDB Training for PHP Developers by Mammatus Technology
MongoDB Training for Python Developers by Mammatus Technology
The official MongoDB tutorial at MongoDB.org.
This post on MongoDB Caveats and Warnings.

Introduction to NoSQL Architecture with MongoDB

Using MongoDB is a good way to get started with NoSQL. Using MongoDB concepts introduces concepts that are common in other NoSQL solutions.

From no NoSQL to sure why not

The first time I heard of something that actually could be classified as NoSQL was from Warner Onstine, he is currently working on some CouchDB articles for InfoQ. Warner was going on and on about how great CouchDB was. This was before the term NoSQL was coined. I was skeptical, and had just been on a project that was converted from an XML Document Database back to Oracle due to issues with the XML Database implementation. I did the conversion. I did not pick the XML Database solution, or decide to convert it to Oracle. I was just the consultant guy on the project (circa 2005) who did the work after the guy who picked the XML Database moved on and the production issues started to happen.

This was my first document database. This bred skepticism and distrust of databases that were not established RDBMS (Oracle, MySQL, etc.). This incident did not create the skepticism. Let me explain.

First there were all of the Object Oriented Database (OODB) folks for years preaching how it was going to be the next big thing. It did not happen yet. I hear 2013 will be the year of the OODB just like it was going to be 1997. Then there were the XML Database people preaching something very similar, which did not seem to happen either at least at the pervasive scale that NoSQL is happening.

My take was, ignore this document oriented approach and NoSQL, see if it goes away. To be successful, it needs some community behind it, some clear use case wins, and some corporate muscle/marketing, and I will wait until then. Sure the big guys need something like Dynamo and BigTable, but it is a niche I assumed. Then there was BigTable, MapReduce, Google App Engine, Dynamo in the news with white papers. Then Hadoop, Cassandra, MongoDB, Membase, HBase, and the constant small but growing drum beat of change and innovation. Even skeptics have limits.

Then in 2009, Eric Evans coined the term NoSQL to describe the growing list of open-source distributed databases. Now there is this NoSQL movement-three years in and counting. LikeAjax, giving something a name seems to inspire its growth, or perhaps we don't name movements until there is already a ground swell. Either way having a name like NoSQL with a common vision is important to changing the world, and you can see the community, use case wins, and corporate marketing muscle behind NoSQL. It has gone beyond the buzz stage. Also in 2009 was the first project that I worked on that had mass scale out requirements that was using something that is classified as part of NoSQL.

2009 was when MongoDB was released from 10Gen, the NoSQL movement was in full swing. Somehow MongoDB managed to move to the front of the pack in terms of mindshare followed closely by Cassandra and others (see figure 1). MongoDB is listed as a top job trend onIndeed.com, #2 to be exact (behind HTML 5 and before iOS), which is fairly impressive given MongoDB was a relativly latecomer to the NoSQL party.

Figure 1: MongoDB leads the NoSQL pack

MongoDB takes early lead in NoSQL adoption race.

MongoDB is a distributed document-oriented, schema-less storage solution similar to CouchBase and CouchDB. MongoDB uses JSON-style documents to represent, query and modify data. Internally data is stored in BSON (binary JSON). MongoDB's closest cousins seem to be CouchDB/Couchbase. MongoDB supports many clients/languages, namely, Python, PHP, Java, Ruby, C++, etc. This article is going to introduce key MongoDB concepts and then show basic CRUD/Query examples in JavaScript (part of MongoDB console application), Java, PHP and Python.

Disclaimer: I have no ties with the MongoDB community and no vested interests in their success or failure. I am not an advocate. I merely started to write about MongoDB because they seem to be the most successful, seem to have the most momentum for now, and in many ways typify the very diverse NoSQL market. MongoDB success is largely due to having easy-to-use, familiar tools. I'd love to write about CouchDB, Cassandra, CouchBase, Redis, HBase or number of NoSQL solution if there was just more hours in the day or stronger coffee or if coffee somehow extended how much time I had. Redis seems truly fascinating.

MongoDB seems to have the right mix of features and ease-of-use, and has become a prototypical example of what a NoSQL solution should look like. MongoDB can be used as sort of base of knowledge to understand other solutions (compare/contrast). This article is not an endorsement. Other than this, if you want to get started with NoSQL, MongoDB is a great choice.

If you would like to learn more about MongoDB consider the following resources:

Excellent MongoDB article on MongoDB written by one of Mammatus Technologies founder Rick Hightower
MongoDB Training for Java Developers by Mammatus Technology
MongoDB Training for PHP Developers by Mammatus Technology
MongoDB Training for Python Developers by Mammatus Technology
The official MongoDB tutorial at MongoDB.org.
This post on MongoDB as a gateway drug to NoSQL.

Subscribe To

Rick

Sunday, September 30, 2012

MongoDB Database: Replica Set, Autosharding, Journaling, Architecture Part 2

Journaling: Is durability overvalued if RAM is the new Disk? Data Safety versus durability

Autosharding

Figure 4: MongoDB Autosharding

Figure 5: MongoDB Autosharding plus Replica Sets for scalable reads, scalable writes, and high availability

Installing and setting up MongoDB with Python (Mongo DB tutorial for Python)

Figure 1: Python code listing part 1

Figure 2: Python code listing part 2

Listing: Complete Python listing

Installing MongoDB to work with PHP and Apache (tutorial)

Installing and setting up MongoDB to work with PHP

Figure 1 PHP code listing

Figure 2 PHP code listing

PHP complete listing

Installing MongoDB to work with Java (MongoDB Java Tutorial)

Java and MongoDB

Setting up Java and MongoDB

Figure 8: Adding Mongo jar file to your project

Figure 9 Using MongoDB from Eclipse

Figure 10: Using Java to query MongoDB

Listing: Complete Java Listing

Installing MongoDB: Guide to getting started with MongoDB and install guide

Listing: Installing MongoDB

Listing: Setting up MongoDB on your path

Listing: Running the MongoDB server

Firing up the mongos console application

Client: mongo db.help()

Create Employee Collection

Thursday, September 27, 2012

MongoDB as a Gateway Drug to NoSQL

Introduction to NoSQL Architecture with MongoDB

Introduction to NoSQL Architecture with MongoDB

From no NoSQL to sure why not

About Me

Related sites

Blog Archive