This page discusses the options of elastic persistence and how they compare
Main MongoDB vs CouchDB differences
- MongoDB is binary JSON and native language drivers, CouchDB is JSON with a REST interface
- MongoDB has master-slave replication, and CouchDB has master-master replication. The difference is that MongoDB can replicate changes from a primary to a secondary database server and that CouchDB provides full replication with scenarios like working offline and syncing both ways when you're online again.
- MongoDB provides adhoc query support, CouchDB doesn't.
- MongoDB does updates-in-place, which means existing documents are modified. CouchDB uses MVCC and creates new versions of documents when there's an update. MongoDB write operations are therefore typically faster in performance, CouchDB can do more fine grained replication and master-master replication because of this document versioning feature.
- MongoDB uses asynchronous write operations by default, where CouchDB ensures that the write operations are synced. MongoDB provides functionality to ensure syncen both to disk as well as to a minimum number of nodes in the replication set, by using the getLastError command. Because CouchDB write operations are always synced there's no need for a shutdown process. Which means that a kill -9 doesn't prevent CouchDB from starting up as normal again. MongoDB does have a shutdown process to verify integrity etc. Therefore a kill -9 means that you are advised to run a repair command before starting up the MongoDB server again.
- MongoDB has out-of-the-box sharding and clustering (via replication sets) support, CouchDB doesn't. There are frameworks like CouchDB Lounch which do provide sharding and clustering support.
- Comparing MongoDB and CouchDB --> http://nosql.mypopescu.com/post/298557551/couchdb-vs-mongodb
- Comparing MongoDB, CouchDB and RavenDB --> http://nosql.mypopescu.com/post/978742866/document-databases-compared-couchdb-mongodb-ravendb
- Comparing MongoDB and CouchDB --> http://www.mongodb.org/display/DOCS/Comparing+Mongo+DB+and+Couch+DB
- Comparing lots of NoSQL databases --> http://kkovacs.eu/cassandra-vs-mongodb-vs-couchdb-vs-redis
- BSON format (binary JSON), so more efficiency in data transfer and storage
- Apache license
- Supports updates in-place, which means that no new documents have to be created but the existing document is modified. This improves the write performance a lot.
- Adhoc queries, which means that indexes are not mandatory because in a lot of cases MongoDB can find an optimal query path.
- Language specific drivers
- Only atomic transactions
- Master-Slave replication
- Out-of-the-box clustering and sharding support
- MongoDB is lead by 10gen
- Active community with around 20 posts on user list per day
- Main users: Github, bit.ly, Foursquare, sourceforge (http://www.mongodb.org/display/DOCS/Production+Deployments)
- Supported on CloudFoundry
- Replication set, http://www.mongodb.org/display/DOCS/Replica+Sets. Replication sets are asynchronous master/slave replications to increase availability. A replication set consists of at least 3 servers, 3 full servers or 2 full servers and one arbiter.
- Primary. A replication set always selects a primary (or master), which is the entry point for writing data. When a primary crashes another primary is automatically selected.
- Secondary. A secondary receives replication messages from the primary to keep them synchronized. When a primary crashes one of the secondaries will become the primary. Most MongoDB database drivers support the option to enable read operations on slave. This is not enabled by default. When you allow reads from a secondary, there's a possibility that new write data is not yet synced from the primary.
- Arbiter. An arbiter is a node in a replication set that only participates in elections (who becomes the primary). An arbiter never becomes a primary and data is not replicated to this node. An arbiter is needed when you have an even numbers of secondaries to prevent a deadlock in elections.
- Sharding, http://www.mongodb.org/display/DOCS/Sharding+Introduction. Sharding is the partitioning of data among multiple servers in an order-preserving manner.
With sharding one can divide collections in data chunks that are managed on different database servers. The definition of these data chunks is done by shard keys which look like index definitions. When a shard reaches a specified maximum size limit (for example 1 GB) the data chunks managed by this shard get divided over the other shards automatically. Typically a shard consists of a replication set of database servers.
- Routing processes (mongos). Routing or coordination process that make the cluster of servers look like one system for a database driver. You can have multiple mongos running in parallel as they get their data from a config server and don't have any relationship with eachother. The overhead of a mongo is minimal.
- getLastError, http://www.mongodb.org/display/DOCS/getLastError+Command, http://www.mongodb.org/display/DOCS/Verifying+Propagation+of+Writes+with+getLastError. By default MongoDB write operations are asynchronous and don't give back a response code. To know that a write operation has succeeded or to know the error, the getLastError command can be used. In addition the getLastError command provides functionality to ensure that the data is being replicated over a specified number of servers in the replication set. With the fsync parameter of the getLastError command you can also ensure that the data is fsynced to the drive.
- MongoDB was designed with a multi server architecture in mind. The idea is to increase availability and durability by adding more servers to a replication set and/or to a shard. Single server durability was no priority until the addition of journalling in the recent release. To ensure that important data is at least replicated over x servers in a replication set, you have to invoke the getLastError command.
- When you execute a kill -9 on a mongodb server you'll have to run the repair command to get it nicely started again. With journalling enabled and by using getLastError for important write operations, this seems to be no problem. CouchDB has a crash only design, with no additional shutdown process so you can do a kill -9 without causing a problem. CouchDB will startup normally again.
- How to work with update commands? Write side? Is our idea valid?
- How to test scalability?
- How to deal with relations?
- How to handle schema updates in the code?
- How to make a versioned update like [update ... where oid = ... and version = 2] and then check how many documents were updated.
- From version 1.7.5. journalling support is added. We hear mixed feedback about the maturity and the functionality of this journalling support, could you share your standpoints with us?
- Is journalling support already at enterprise level, do they expect to improve the support?
- When journalling is enabled, is a repair still needed in case of a immediate shutdown (kill -9)?
- Are there best practices for cluster configuration? Like what time interval should you specify between master and slaves?
- the getLastError command seems to be very important for validating that a fsync has been performed or the a x number of slaves have been synchronized. Is that also emphasized at MongoSF?
- Good replication set configuration post --> http://blog.boxedice.com/2010/08/03/automating-partitioning-sharding-and-failover-with-mongodb/
- Replication sets on AWS --> http://jectbd.com/?p=1579
- Two phase commit with MongoDB --> http://www.mongodb.org/display/DOCS/two-phase+commit
- Nice presentation about replication sets --> http://www.slideshare.net/mongodb/replica-sets
- MongoDB issues --> http://leifw.wickland.net/2011/04/weird-buggy-and-disappointing-behavior.html
- Journalling explained --> http://www.mongodb.org/display/DOCS/Journaling
- MongoDB on amazon --> http://www.mongodb.org/display/DOCS/Amazon+EC2
- MongoDB is a supported service on CloudFoundry --> http://www.mongodb.org/display/DOCS/VMware+CloudFoundry
- BSON (binary JSON) instead of JSON, which optimizes the storage and communication of documents
- Good support for queries
- High read and write performance
- Less durability (no crash-only design) because writes are not immediately synced with the file system. This means that the write performance is higher, but data can get lost after a crash. From version 1.7.5 journalling is supported which provides more durability, but this seems to be work-in-progress.
- JSON format
- Apache license
- MVCC (Multi-Version Concurrency Control), which means that updates always result in new documents with a version+1. This leads to less performance in writes, but contributes to the crash-only design of CouchDB. This approach also enables you to use versioned updates, so you can check if the document still has the same version identifier to check if there was another update.
- No adhoc query, so indexes have to be created
- General REST interface that can be used for CRUD operations. There are several driver projects which make it easier to use the REST interface.
- Only atomic transactions
- Master-Master replication, which means that it's very well suited for offline work getting synced to the database server.
- No out-of-the-box sharding. There are additional frameworks for CouchDB available which do provide sharding support.
- No out-of-the-box clustering support. There are additional frameworks for CouchDB available which do provide sharding support.
- Apache project, the main supporting company is CouchBase. Cloudant provides CouchDB hosting and has created the BigCouch project.
- User mailing list has around 5-10 posts per day
- Main users: BBC, WWF, Ubuntu One, and others (http://www.couchbase.com/customers/case-studies, http://wiki.apache.org/couchdb/CouchDB_in_the_wild)
- Nice documentation book at http://guide.couchdb.org/
- Master/master replication. CouchDB can do replication in a lot of different use cases:
- Replicate a whole database
- Get the changes between two dates
- Replicate the changed collections and documents.
- Support for offline work. So replicate all changes made on a mobile phone to the server and replicate all server changes to the mobile phone.
- Replication with conflict resolution.
- Replication. Replication is supported out-of-the-box by the CouchDB API. So you can replicate one-way between two databases via the API. This replication can be made continuous, however at a server restart this continuous replication is aborted and must be manually started again. Frameworks like CouchDB Lounch provide support to create clusters with replication enabled.
- BigCouch is a fork of the CouchDB project by Cloudant (https://github.com/cloudant/bigcouch), which provides out-of-the-box support for creating clusters without breaking the CouchDB REST interface API. Seems to be a lot like the replication set functionality of MongoDB, but with CouchDB all nodes are equal masters, so requests can be handled by all nodes of a cluster.
- Sharding is not supported out-of-the-box, but there are frameworks like Lounch and Pillow that provide sharding support for CouchDB --> http://nosql.mypopescu.com/post/683838234/scaling-couchdb
- Durable crash-only design, which makes sure no data gets lost after a crash
- Versioning is natively supported with MVCC
- Master-master replication for offline work
- No additional drivers needed due to REST interface
- JSON and REST interface make it less optimal in network traffic
- Query support is more limited than with MongoDB
- Binary format
- BSD license
- Writes (and also reads) are very fast, because Redis keeps a large in-memory set. The in-memory set is synced to disk at certain intervals (for example each 60 seconds).
- Key-value pair database
- Native language drivers
- Transaction support
- Master-Slave replication
- Sponsored by VMWare
- User list has between 1-5 posts a day
- Main users: Github, Stackoverflow, Guardian (http://redis.io/topics/whos-using-redis)
- Supported on CloudFoundry
- Redis on CloundFoundry --> http://blog.springsource.com/2011/04/27/getting-started-redis-spring-cloud-foundry/
- Binary format
- Apache license
- Cassandra is designed to run on multiple servers, so its writing mechanism is mostly designed to replicate write actions across the network of servers.
- Key-value pair database, query options are limited
- Thrift communication protocol (binary)
- Apache project, Datastax provides support for Cassandra. In the PMC are employees of Facebook, Twitter, LinkedIn and Datastax
- Active community with around 20 posts per day on the user list
- Main users: Facebook, Twitter (not for the tweets), Rackspace, Cisco WebEx
- Introduction --> http://www.mikeperham.com/2010/03/13/cassandra-internals-writing/
- Architecture --> http://wiki.apache.org/cassandra/ArchitectureInternals