Friday, April 25, 2014

Well put! When should you use MongoDB vs Couchbase versus Redis...

I saw this GEM on google+ By Riyad Kalla..... My thoughts are in parens and bold and red.

Riyad Kalla

Shared publicly  -  Oct 3, 2011
"Should I use MongoDB or CouchDB (or Redis)?"

I see this question asked a lot online. Fortunately, once you get familiar enough with each of the NoSQL solutions and their ins and outs, strengths and weaknesses, it becomes much clearer when you would use one over the other.

From the outside, so many of them look the same, especially Mongo and Couch. Below I will try and break down the big tie-breakers that will help you decide between them.

[**] Querying - If you need the ability to dynamically query your data like SQL, MongoDB provides a query syntax that will feel very similar to you. CouchDB is getting a query language in the form of UNQL in the next year or so, but it is very much under development and we have no knowledge of the impact on view generation and query speed it will have yet, so I cannot recommend this yet.

[**] Master-Slave Replication ONLY - MongoDB provides (great) support for master-slave replication across the members of what they call a "replica set". Unfortunately you can only write to the master in the set and read from all.

If you need multiple masters in a Mongo environment, you have to set up sharding in addition to replica sets and each shard will be its own replica set with the ability to write to each master in each set. Unfortunately this leads to much more complex setups and you cannot have every server have a full copy of the data set (which can be handy/critical for some geographically dispersed systems - like a CDN or DNS service).

(I have seen this. It is gets complex fast.)

[**] Read Performance - Mongo employs a custom binary protocol (and format) providing at least a magnitude times faster reads than CouchDB at the moment. There is work in the CouchDB community to try and add a binary format support in addition to JSON, but it will still be communicated over HTTP.

(What about CouchBase? Their reads are very fast. Also... BSON is as$. Fast binary my .... )

[**] Provides speed-oriented operations like upserts and update-in-place mechanics in the database.

[**] Master-Master Replication - Because of the append-only style of commits Couch does every modification to the DB is considered a revision making conflicts during replication much less likely and allowing for some awesome master-master replication or what Cassandra calls a "ring" of servers all bi-directionally replicating to each other. It can even look more like a fully connected graph of replication rules.

[**] Reliability of the actual data store backing the DB. Because CouchDB records any changes as a "revision" to a document and appends them to the DB file on disk, the file can be copied or snapshotted at any time even while the DB is running and you don't have to worry about corruption. It is a really resilient method of storage. 

[**] Replication also supports filtering or selective replication by way of filters that live inside the receiving server and help it decide if it wants a doc or not from another server's changes stream (very cool).

Using an EC2 deployment as an example, you can have a US-WEST db replicate to US-EAST, but only have it replicate items that meet some criteria. like the most-read stories for the day or something so your east-coast mirror is just a cache of the most important stories that are likely getting hit from that region and you leave the long-tail (less popular) stories just on the west coast server. 

(** Nice feature. Replication of only things that matter most. I can see where this would be important.
MongoDB is fundamentally broken for high-speed writes. If you have to stream and save a lot of data, MongoDB is not it. If you have a typical db app, MongoDB is great. I give them a D+ on tech and an A++ on tooling.

Another example of this is say you use CouchDB to store data about your Android game in the app store that everyone around the world plays. Say you have 5 million registered users, but only 100k of them play regularly. Say that you want to duplicate the most activeaccounts to 10 other servers around the globe, so the users that play all the time get really fast response times when they login and update scores or something. You could have a big CouchDB setup in the west code, and then much smaller/cheaper ones spread out across the world in a few disparate VPS servers that all use a filtered replication from your west coast master to only duplicate the most active players.

(Good example!)

[**] Mobile platform support. CouchDB actually has installs for iOS and Android. When you combine the ability to run Couch on your mobile devices AND have it bidirectionally sync back to a master DB when it is online and just hold the results when it is offline, it is an awesome combination.

(Comparing apples and oranges. CouchBase would be a better comparison to MongoDB. I know it is derived from CouchDB. CouchBase is fast and scalable and has the performance the tuned the hell out of the CouchDB core. To me it makes little sense to compare CouchDB to MongoDB. I would consider CouchBase on a large write intensive app. I would not consider MongoDB or CouchDB for many high user count apps.)

[**] Queries are written using map-reduce functions. If you are coming from SQL this is a really odd paradigm at first, but it will click relatively quickly and when it does you'll see some beauty to it. These are some of the best slides describing the map-reduce functionality in couch I've read:

[**] Every mutation to the data in the database is considered a "revision" and creates a duplicate of the doc. This is excellent for redundancy and conflict resolution but makes the data store bigger on disk. Compaction is what removes these old revisions.

[**] HTTP REST JSON interaction only. No binary protocol (yet - vote for http://ubjson.org support), everything is callable and testable from the command line with curl or your browser. Very easy to work with.

These are the biggest features, I would say if you need dynamic query support or raw speed is critical, then Mongo. If you need master-master replication or sexy client-server replication for a mobile app that syncs with a master server every time it comes online, then you have to pick Couch. At the root these are the sort of "deal maker/breaker" divides.

Fortunately once you have your requirements well defined, selecting the right NoSQL data store becomes really easy. If your data set isn't that stringent on its requirements, then you have to look at secondary features of the data stores to see if one speaks to you... for example, do you like that CouchDB only speaks in raw JSON and HTTP which makes it trivial to test/interact with it from the command line? Maybe you hate the idea of compaction or map-reduce, ok then use Mongo. Maybe you fundamentally like the design of one over the other, etc.etc.

If anyone has questions about Redis or others, let me know and I'll expand on how those fit into this picture. 

You hear Redis described a lot as a "data structure" server, and if you are like me this meant nothing to you for the longest time until you sat down and actually used Redis, then it probably clicked.

If you have a problem you are trying to solve that would be solved really elegantly with a List, Set, Hash or Sorted Set you need to take a look at Redis. The query model is simple, but there are a ton of operations on each of the data structure types that allow you to use them in really robust ways... like checking the union between two sets, or getting the members of a hash value or using the Redis server itself as a sub/pub traffic router (yea, it supports that!)

Redis is not a DB in the classic sense. If you are trying to decide between MySQL and MongoDB because of the dynamic query model, Redis is not the right choice. In order to map you data to the simple name/value structure in Redis and the simplified query approach, you are going to spend a significant amount of time trying to mentally model that data inside of Redis which typically requires a lot of denormalization.

(Really? How is that different then moving from RDBMS and BSON/JSON storage. BSON/JSON storage being a map. Seems like rich map... with a lot more operations.)

If you are trying to deploy a robust caching solution for your app and memcache is too simple, Redis is probably perfect.

If you are working on a queuing system or a messaging system... chances are Redis is probably perfect.

If you have a jobs server that grinds through prioritized jobs and you have any number of nodes in your network submitting work to the job server constantly along with a priority, look no further, Redis is the perfect fit. Not only can you use simple Sorted Sets for this, but the performance will be fantastic (binary protocol) 

(Hmmmm.. Binary is faster but not usually that big of a difference for most apps.  IO / persistence. Journaling. Etc. Thread handling, async I/O events. etc. JSON is pretty easy to parse. And you can get within 50% to 80% of the fastest binary serialization. I know. I have done it.)

AND you get the added win of the database-esque features Redis has like append only logging and flushing changes to disk as well as replication if you ever grew your jobs server beyond a single node.

A better comparison against Redis might be hazelcast or perhaps hazelcast + vertx, but here you are dealing with Java Collection API instead of wire protocol, but the rest is similar. Queue, Collections, Maps, Lists, etc.)

NOTE: Redis clustering has been in the works for a long time now. Salvatore has been doing some awesome work with it and it is getting close to launch so if you have a particularly huge node distribute and want a robust/fast processing cluster based on Redis, you should be able to set that up soon.

(Vaporware opensource software. :)
Redis has some of these more classic database features, but it is not targeted at competing with Mongo or Couch or MySQL. It really is a robust, in-memory data structure server AND if you happen to need very well defined data structures to solve your problem, then Redis is the tool you want. If your data is inherently "document" like in nature, sticking it in a document store like Mongo or CouchDB may just make a lot more mental-model sense for you.

NOTE: I wouldn't underestimate how important mentally understanding your data model is. Look at Redis, and if you are having a hard time mapping your data to a List, Set or Hash you probably shouldn't use it. 

(Isn't BSON, JSON, MongoDB documents just a map. Tell me how a Redis map is different than a MongoDB to model you data model?)

Also note that you WILL use many more data structures than you might realize and it may feel odd... for example a twitter app may have a single user as a hash, and then they might have a number of lists of all the people they follow and follow them -- you would need this duplication to make querying in either direction fast. This was one of the hardest or most unnatural things I experienced when trying to solve more "classic DB" problems with Redis and what helped me decide when to use it and when not to use it.

I would say that Redis compliments most systems (whether it is a classic SQL or NoSQL deployment) wonderfully in most cases in a caching capacity or queue capacity.

If you are working on a simple application that just needs to store and retrieve simple values really quickly, then Redis is a perfectly valid fit. For example, if you were working on a high performance algorithmic trading and you were pulling ticker prices out of a firehose and needing to store them at an insane rate so they could be processed, Redis is exactly the kind of datastore you would want to turn to for that -- definitely not Mongo, CouchDB or MySQL.

(CouchBase can do some damn fast writes and reads. )
IMPORTANT: Redis's does not have a great solution to operating in environments where the data set is bigger than ram and as of right now solutions for "data bigger than ram" has been abandoned. For the longest time this was one of the gripes with Redis and Salvatore (http://twitter.com/#!/antirez) solved this problem with the VM approach. This was quickly deprecated by the diskstore approach after some reported failures and unhappiness with how VM was behaving.

Last I read (a month ago?) was that Salvatore wasn't entirely happy with how diskstore turned out either and that attempts should really be made to keep Redis's data set entirely in memory when possible for the best experience.

(Hazelcast is the same way more or less wrt to it should be in memory.)
I say this not because it is impossible, but just to be aware of the "preferred" way to use Redis so if you have a 100GB data set and were thinking of throwing it on a machine with 4GB of ram, you probably don't want to do that. It may not perform or behave like you want it to.

ADDENDUM: I answered the question about Couch vs Mongo again on Quora with some more tech details if you are interested:

Ok.. I think CouchBase is a data store that I trust. To me the question for mainstream seems to be MongoDB, CouchBase, and Redis, and maybe CouchDB for client side / mobile sync... sort of the Lotus Notes of NoSQL. Most people leave out Hazelcast, Inispan, etc....


Popularity wise, I can see why the comparision exists:

(For some of the apps that I have worked on CouchBase, Cassandra, Redis, Hazelcast, Vertx were the only ones of the lot I could consider. And we picked Hazelcast, and Vertx for the last app. We have also had good experience with Redis and we strongly considered CouchBase. In a couple of instances, we rolled our own with great results. MongoDB was tried and failed in a high-speed writer even after getting tons of consulting. I am not against MongoDB, but if you have a high speed write, high volume app, you should think twice.)

My thoughts on mongoDB FUD

Repeated inilne here:

Some brilliant MongoDB analysis algorithm to compare OracleCoherence and MongoDB.

Then I googled "Coherence sucks"... 1.7 Million hits while "MongoDB sucks" only has 43,000 hits.

Clearly MongoDB 40x less sucky than Coherence!

If you can't find a blog complaining about some tech, then you are not looking very hard.

MongoDB has some warts. It fixed some. Some are being fixed. It is not right for every application.  Developers have to read the documents before using it but the same can be said for all other solutions. 

I was in a meeting the other day. We wanted to use MySQL to stage some data so QA could see the results for testing. It is high-speed writes app.  A dev wanted to use MongoDB... I was like NOOOO! Then he said MySQL would not scale... I WAS LIKE ARGH! Sometimes, I want to punch myself in the eye when I hear people regurgitate crap they read.

But spilling this FUD does not help either.  There are many apps where MongoDB would work out fine.  There are many where MySQL, Oracle or Cassandra would be a much better choice. Read. Learn. Don't listened to people who have a vested interests (or don't only listen to them).

How many projects I have been on where someone says, why is this listing so slow. How many records are in the table? 10,000,000. Ok what are you sorting on? Last name? I see. What indexes are on this table? BLANK STARE? PUNCH ME IN THE NUTS! F! F! F! Really? Do I go on TSS and say how slow SQL is? No! Knowing the tool is the developer's job! Writing a tool that is fool proof is impossible. 

The same things they say about MongoDB, 20 years ago the MAINFRAME DB guys were saying about Oracle DB. I think Mongo's success has more to do with marketing and ease-of-use than technical merit, but this does not mean it does not have a place or some merit. The same could be said about a lot of companies. Cough! cough!

Here is a point to take home. If you don't fsync on every call, you could lose data. If you do fsync on every call, you will not scale! (For Java devs, outputstream.flush()). If you use a single machine, have a DBA who is monitoring things from time to time, and doing backups, it is probably ok for a lot of apps. If you can't afford downtime or data loss, you need replication. As reliable as disks are, they fail. You can mitigate failure with RAID and hot swap-able backups. All replication has failure edge cases.

There is no magic in the world. If you replicate the data, then you have a better chance of it not getting loss.

If you pick a DB that is geared for high-speed reads and low writes, and try to scale it out to do high-speed writes, you will hurt yourself. At a minimum you will spend a lot more money on hardware. Last I checked MongoDB uses MMap and MMap uses virtual memory, which gets paged in/out by the OS in the most efficient manner possible, which is actually quite good, but.... If you store a giant BTree in memory and then ask the OS to page it in and out of memory to HD in small pieces by nature this does a lot of disk seeking. It also by nature is going to get behind, and the longer it gets behind the more data loss you will have if you use a single machine. This is why the last five versions of MongoDB come with journaling turned on by default because if you are running MongoDB on a single machine, you have to periodically flush or you have a high probability of data loss in an outage due the hysteresis of the eventual consistency model. If you replicate to at least two machines, you can reduce the data loss (also if you use the it-must-exist-on-the-other-box-option of data safety that comes with MongoDB).

Data safety is an application requirement. It has a cost.  Less = faster. More = less fast (or more expensive to go fast). You have to balance that cost with how safe you want the data. Can you afford a few virtual tractor sales getting lost on Farmville then dial back the data safety a bit? Security is a continuum like data safety. It is more secure to keep your servers in a closet with no Internet access. See.... Clearly since it is more secure to do it this way... all apps must be done with servers locked in a closet with no Internet. Of course not. My point is that the each app has a different tolerance for security as it does data safety and availability. There is not a one size fits all solution because each problem has its own requirements. Hell banks don't use distributed two phase commits not because they don't want consistency but because they need some level of performance and it is easier to manually correct the 1 in 10,000,000 times something goes wrong than buy and engineer the hardware to handle a two phase commit at the scale they need. Engineering tradeoffs. Period.

Disks are still slow at seeking after all of these years. Disks are really fast at reading / writing long sequences (300MB per second, higher if you use RAID 0 20 disk machine) but bad at seeking around the disk for small bits. This is why Cassandra and Couchbase are so good at high-speed writes, and MongoDB not so great (yet, but I think of 100,000,000 reasons why it will get better). You can speed up MongoDB write by sharding, but that is not a free lunch (got to pick a good shard key, setup is a lot harder, mongos, auto-shard, loss of operational predictability, etc.). MySQL and Oracle can be setup to be very fast at high-speed writes.

LevelDB is good at high-speed writes. It is more or less a write only database that uses bloom filters (is my data in the gak), perfectly balanced BTrees written into gak blocks and the equiv. of GC for perfectly balanced long sequences of gak (gak gets compressed and filtered into other longer sequences of gak, it never gets updated, just consolidated and deleted/merged into large sequences of gak, this allow it to prune / consolidate / merge quickly -- same as google tablets). LevelDB is good at high speed writes, but mediocre at high-speed reads (not bad, but not great).

How can you mitigate MongoDBs disk seeking weakness? Use SSD would help so would shards. SSD have no issues seeking. Shards spread the writes across many machines. Still though are you sure MongoDB is right for this app? Have you compared / contrasted the way it works to CouchBase, Cassandra, MySQL, Oracle, LevelDB? Do you know how much data you are going to handle? Do you know how many write per second and at what size? Have you benchmarked it? Have you done load testing? When you deploy it, are you monitoring it? How far behind is the eventual consistency?

If it gets too far behind, what is your mitigation plan? At some level of writes, all DBs become hard to deal with MySQL, Oracle, NoSQL, NewSQL, etc. There is no free lunch and magic web scale sauce... there exists physical limitation of hardware. Physics will eventually stop you. Oracle DB can and does handle petabytes of data so NoSQL fan boys step off. You don't have to use a fully normalized RDBMS, you can skip some normalization, and setup really high internal caching, this would get you close to NoSQL ease of use with some very tried and true products. MySQL is actually pretty damn amazing tech as is Oracle DB. MongoDB has its place too. :) LevelDB, MariaDB, Drizzle, etc. they don't get enough attention. MongoDB and Redis get all of the NoSQL glory. They are not the only two NoSQL solutions by a long shot.

More of my thoughts on the MongoDB architecture as I consider MongoDB as sort of a reference implementations for NoSQL, but feel it is better off to go with CouchBase or Cassandra for most people.

Introduction to NoSQL Architecture with MongoDB

Introduction to NoSQL Architecture with MongoDB

Using MongoDB is a good way to get started with NoSQL. Using MongoDB concepts introduces concepts that are common in other NoSQL solutions. 

From no NoSQL to sure why not

The first time I heard of something that actually could be classified as NoSQL was from Warner Onstine, he is currently working on some CouchDB articles for InfoQ. Warner was going on and on about how great CouchDB was. This was before the term NoSQL was coined. I was skeptical, and had just been on a project that was converted from an XML Document Database back to Oracle due to issues with the XML Database implementation. I did the conversion. I did not pick the XML Database solution, or decide to convert it to Oracle. I was just the consultant guy on the project (circa 2005) who did the work after the guy who picked the XML Database moved on and the production issues started to happen.

This was my first document database. This bred skepticism and distrust of databases that were not established RDBMS (Oracle, MySQL, etc.). This incident did not create the skepticism. Let me explain.
First there were all of the Object Oriented Database (OODB) folks for years preaching how it was going to be the next big thing. It did not happen yet. I hear 2013 will be the year of the OODB just like it was going to be 1997. Then there were the XML Database people preaching something very similar, which did not seem to happen either at least at the pervasive scale that NoSQL is happening.

My take was, ignore this document oriented approach and NoSQL, see if it goes away. To be successful, it needs some community behind it, some clear use case wins, and some corporate muscle/marketing, and I will wait until then. Sure the big guys need something like Dynamo and BigTable, but it is a niche I assumed. Then there was BigTable, MapReduce, Google App Engine, Dynamo in the news with white papers. Then Hadoop, Cassandra, MongoDB, Membase, HBase, and the constant small but growing drum beat of change and innovation. Even skeptics have limits.
Then in 2009, Eric Evans coined the term NoSQL to describe the growing list of open-source distributed databases. Now there is this NoSQL movement-three years in and counting. LikeAjax, giving something a name seems to inspire its growth, or perhaps we don't name movements until there is already a ground swell. Either way having a name like NoSQL with a common vision is important to changing the world, and you can see the community, use case wins, and corporate marketing muscle behind NoSQL. It has gone beyond the buzz stage. Also in 2009 was the first project that I worked on that had mass scale out requirements that was using something that is classified as part of NoSQL.
2009 was when MongoDB was released from 10Gen, the NoSQL movement was in full swing. Somehow MongoDB managed to move to the front of the pack in terms of mindshare followed closely by Cassandra and others (see figure 1). MongoDB is listed as a top job trend onIndeed.com, #2 to be exact (behind HTML 5 and before iOS), which is fairly impressive given MongoDB was a relativly latecomer to the NoSQL party.

Figure 1: MongoDB leads the NoSQL pack
MongoDB takes early lead in NoSQL adoption race.
MongoDB is a distributed document-oriented, schema-less storage solution similar to CouchBase and CouchDB. MongoDB uses JSON-style documents to represent, query and modify data. Internally data is stored in BSON (binary JSON). MongoDB's closest cousins seem to be CouchDB/Couchbase. MongoDB supports many clients/languages, namely, Python, PHP, Java, Ruby, C++, etc. This article is going to introduce key MongoDB concepts and then show basic CRUD/Query examples in JavaScript (part of MongoDB console application), Java, PHP and Python.
Disclaimer: I have no ties with the MongoDB community and no vested interests in their success or failure. I am not an advocate. I merely started to write about MongoDB because they seem to be the most successful, seem to have the most momentum for now, and in many ways typify the very diverse NoSQL market. MongoDB success is largely due to having easy-to-use, familiar tools. I'd love to write about CouchDB, Cassandra, CouchBase, Redis, HBase or number of NoSQL solution if there was just more hours in the day or stronger coffee or if coffee somehow extended how much time I hadRedis seems truly fascinating.

MongoDB seems to have the right mix of features and ease-of-use, and has become a prototypical example of what a NoSQL solution should look like. MongoDB can be used as sort of base of knowledge to understand other solutions (compare/contrast). This article is not an endorsement. Other than this, if you want to get started with NoSQL, MongoDB is a great choice.

MongoDB as a Gateway Drug to NoSQL

MongoDB combinations of features, simplicity, community, and documentation make it successful. The product itself has high availability, journaling (which is not always a given with NoSQL solutions), replication, auto-sharding, map reduce, and an aggregation framework (so you don't have to use map-reduce directly for simple aggregations). MongoDB can scale reads as well as writes.

NoSQL, in general, has been reported to be more agile than full RDBMS/ SQL due to problems with schema migration of SQL based systems. Having been on large RDBMS systems and witnessing the trouble and toil of doing SQL schema migrations, I can tell you that this is a real pain to deal with. RDBMS / SQL often require a lot of upfront design or a lot schema migration later. In this way, NoSQL is viewed to be more agile in that it allows the applications worry about differences in versioning instead of forcing schema migration and larger upfront designs. To the MongoDB crowd, it is said that MongoDB has dynamic schema not no schema (sort of like the dynamic language versus untyped language argument from Ruby, Python, etc. developers).
MongoDB does not seem to require a lot of ramp up time. Their early success may be attributed to the quality and ease-of-use of their client drivers, which was more of an afterthought for other NoSQL solutions ("Hey here is our REST or XYZ wire protocol, deal with it yourself"). Compared to other NoSQL solution it has been said that MongoDB is easier to get started. Also with MongoDB many DevOps things come cheaply or free. This is not that there are never any problems or one should not do capacity planning. MongoDB has become for many an easy on ramp for NoSQL, a gateway drug if you will.

MongoDB was built to be fast. Speed is a good reason to pick MongoDB. Raw speed shaped architecture of MongoDB. Data is stored in memory using memory mapped files. This means that the virtual memory manager, a very highly optimized system function of modern operating systems, does the paging/caching. MongoDB also pads areas around documents so that they can be modified in place, making updates less expensive. MongoDB uses a binary protocol instead of REST like some other implementations. Also, data is stored in a binary format instead of text (JSON, XML), which could speed writes and reads.
Another reason MongoDB may do well is because it is easy to scale out reads and writes with replica sets and autosharding. You might expect if MongoDB is so great that there would be a lot of big names using them, and there are like: MTV, Craigslist, Disney, Shutterfly, Foursqaure, bit.ly, The New York Times, Barclay’s, The Guardian, SAP, Forbes, National Archives UK, Intuit, github, LexisNexis and many more.

MongoDB Database: Replica Set, Autosharding, Journaling, Architecture Part 2

See part 1 of MongoDB Architecture...

Journaling: Is durability overvalued if RAM is the new Disk? Data Safety versus durability

It may seem strange to some that journaling was added as late as version 1.8 to MongoDB. Journaling is only now the default for 64 bit OS for MongoDB 2.0. Prior to that, you typically used replication to make sure write operations were copied to a replica before proceeding if the data was very important. The thought being that one server might go down, but two servers are very unlikely to go down at the same time. Unless somebody backs a truck over a high voltage utility poll causing all of your air conditioning equipment to stop working long enough for all of your servers to overheat at once, but that never happens (it happened to Rackspace and Amazon). And if you were worried about this, you would have replication across availability zones, but I digress.

At one point MongoDB did not have single server durability, now it does with addition of journaling. But, this is far from a moot point. The general thought from MongoDB community was and maybe still is that to achieve Web Scale, durability was thing of the past. After allmemory is the new disk. If you could get the data on second server or two, then the chances of them all going down at once is very, very low. How often do servers go down these days? What are the chances of two servers going down at once? The general thought from MongoDB community was (is?) durability is overvalued and was just not Web Scale. Whether this is a valid point or not, there was much fun made about this at MongoDB's expense (rated R, Mature 17+).

As you recall MongoDB uses memory mapped file for its storage engine so it could be a while for the data in memory to get synced to disk by the operating system. Thus if you did have several machines go down at once (which should be very rare), complete recoverability would be impossible. There were workaround with tradeoffs, for example to get around this (now non issue) or minimize this issue, you could force MongoDB to do an fsync of the data in memory to the file system, but as you guessed even with a RAID level four and a really awesome server that can get slow quick. The moral of the story is MongoDB has journaling as well as many other options so you can decide what the best engineering tradeoff in data safety, raw speed and scalability. You get to pick. Choose wisely.

The reality is that no solution offers "complete" reliability, and if you are willing to allow for some loss (which you can with some data), you can get enormous improvements in speed and scale. Let's face it your virtual farm game data is just not as important as Wells Fargo's bank transactions. I know your mom will get upset when she loses the virtual tractor she bought for her virtual farm with her virtual money, but unless she pays real money she will likely get over it. I've lost a few posts on twitter over the years, and I have not sued once. If your servers have an uptime of 99 percent and you block/replicate to three servers than the probability of them all going down at once is (0.000001) so the probability of them all going down is 1 in 1,000,000. Of course uptime of modern operating systems (Linux) is much higher than this so one in 100,000,000 or more is possible with just three servers. Amazon EC2 offers discounts if they can't maintain an SLA of 99.95% (other cloud providers have even higher SLAs). If you were worried about geographic problems you could replicate to another availability zone or geographic area connected with a high-speed WAN. How much speed and reliability do you need? How much money do you have?

An article on when to use MongoDB journaling versus older recommendations will be a welcome addition. Generally it seems journaling is mostly a requirement for very sensitive financial data and single server solutions. Your results may vary, and don't trust my math, it has been a few years since I got a B+ in statistics, and I am no expert on SLA of modern commodity servers (the above was just spit balling).

If you have ever used a single non-clustered RDBMS system for a production system that relied on frequent backups and transaction log (journaling) for data safety, raise your hand. Ok, if you raised your hand, then you just may not need autosharding or replica sets. To start with MongoDB, just use a single server with journaling turned on. If you require speed, you can configure MongoDB journaling to batch writes to the journal (which is the default). This is a good model to start out with and probably very much like quite a few application you already worked on (assuming that most application don't need high availability). The difference is, of course, if later your application deemed to need high availability, read scalability, or write scalability, MongoDB has your covered. Also setting up high availability seems easier on MongoDB than other more established solutions.
Figure 3: Simple setup with journaling and single server ok for a lot of applications

Simple Non-Sharded, non replicated installation

If you can afford two other servers and your app reads more than it writes, you can get improved high availability and increased read scalability with replica sets. If your application is write intensive then you might need autosharding. The point is you don't have to be Facebook or Twitter to use MongoDB. You can even be working on a one-off dinky application. MongoDB scales down as well as up.


Replica sets are good for failover and speeding up reads, but to speed up writes, you need autosharding. According to a talk by Roger Bodamer on Scaling with MongoDB, 90% of projects do not need autosharding. Conversely almost all projects will benefit from replication and high availability provided by replica sets. Also once MongoDB improves its concurrency in version 2.2 and beyond, it may be the case that 97% of projects don't need autosharding. 
Sharding allows MongoDB to scale horizontally. Sharding is also called partitioning. You partition each of your servers a portion of the data to hold or the system does this for you. MongoDB can automatically change partitions for optimal data distribution and load balancing, and it allows you to elastically add new nodes (MongoDB instances). How to setup autosharding is beyond the scope of this introductory article. Autosharding can support automatic failover (along with replica sets). There is no single point of failure. Remember 90% of deployments don’t need sharding, but if you do need scalable writes (apps like Foursquare, Twitter, etc.), then autosharding was designed to work with minimal impact on your client code.

There are three main process actors for autosharding: mongod (database daemon), mongos, and the client driver library. Each mongod instance gets a shard. Mongod is the process that manages databases, and collections. Mongos is a router, it routes writes to the correct mongod instance for autosharding. Mongos also handles looking for which shards will have data for a query. To the client driver, mongos looks like a mongod process more or less (autosharding is transparent to the client drivers).

Figure 4: MongoDB Autosharding

Sharding with MongoDB
Autosharding increases write and read throughput, and helps with scale out. Replica sets are for high availability and read throughput. You can combine them as shown in figure 5.

Figure 5: MongoDB Autosharding plus Replica Sets for scalable reads, scalable writes, and high availability

MongoDB Autosharding for Scalable Reads, Scalable Writes and High Availability
You shard on an indexed field in a document. Mongos collaborates with config servers(mongod instances acting as config servers), which have the shard topology (where do the key ranges live). Shards are just normal mongod instances. Config servers hold meta-data about the cluster and are also mongodb instances. 

Shards are further broken down into 64 MB chunks called chunks. A chunk is 64 MB worth of documents for a collection. Config servers hold which shard the chunks live in. The autosharding happens by moving these chunks around and distributing them into individual shards. The mongos processes have a balancer routine that wakes up so often, it checks to see how many chunks a particular shard has. If a particular shard has too many chunks (nine more chunks than another shard), then mongos starts to move data from one shard to another to balance the data capacity amongst the shards. Once the data is moved then the config servers are updated in a two phase commit (updates to shard topology are only allowed if all three config servers are up).

The config servers contain a versioned shard topology and are the gatekeeper for autosharding balancing. This topology maps which shard has which keys. The config servers are like DNS server for shards. The mongos process uses config servers to find where shard keys live. Mongod instances are shards that can be replicated using replica sets for high availability. Mongos and config server processes do not need to be on their own server and can live on a primary box of a replica set for example. For sharding you need at least three config servers, and shard topologies cannot change unless all three are up at the same time. This ensures consistency of the shard topology. The full autosharding topology is show in figure 6. An excellent talk on the internals of MongoDB sharding was done by Kristina Chodorow, author of Scaling MongoDB, at OSCON 2011 if you would like to know more.

Figure 6: MongoDB Autosharding full topology for large deployment including Replica Sets, Mongos routers, Mongod Instance, and Config Servers 
MongoDB Autosharding full topology for large deployment including Replica Sets, mongos routers, mongod instance, client drivers and config servers

Anyway... NoSQL and distributed storage and scalability is on my mind. 

SlumberDB - Simple Java Key / Value Store using MySQL, LevelDB, Kryo and JSON serialization

Slumber DB - Key Value Store 

The JSON/Java database for REST and Websocket storage.
Boon is the fastest JSON serialization for the JVM. Kryo is the fastest Java serialization for the JVM.
This project marries Boon/Kryo with LevelDB, MySQL, RocksDB, and LMDB to provide simple key/value storage.
The focus is not on data grid usage, but just as a data safe reliable key/value store for Java.
We are at 95% plus code coverage. We care about providing a quality, simple fast storage mechanism with an easy to use interface for Java.

Show less

No comments:

Post a Comment

Kafka and Cassandra support, training for AWS EC2 Cassandra 3.0 Training