CouchDB a NoSQL database
Traditionally relational databases has been the primary way of storing, sorting and searching data, and for most purposes they are very good at it. However, in the last few years – with the growth of cloud computing and sites such as Facebook pushing relational databases to its limits, people have started to look for other alternatives. According to some people the problems with relational databases is that it has been hard to scale them in a vertical way (scale them over several servers with linear, or close to linear, performance increase). A few databases in a cluster is no problem, but when it comes to several terabytes of data, which should be searched in real time, they are simply not enough. CouchDB and the other NoSQL databases aims to provide a truly horizontally scalable database.
This opinion that relational databases databases can not scale in the same way is not shared by everyone, a good read about this is Dennis Forbes Getting Real about NoSQL and the SQL-Isn’t-Scalable Lie. However, I think there is definitely a place where NoSQL can prove to be very usable, and it is an interesting technology, so I will let the experts fight about this and instead give you more insight into what NoSQL really is all about.
NoSQL is a umbrella term for a wide variety of data stores, which all have in common that they do not store data in a relational way. Some examples of NoSQL databases are CouchDB, MongoDB, Amazon SimpleDB and Google BigTable. This is some of the properties they have in common:
1. No schema
The data is stored in one big hashtable like data structure. No schema is needed.
2. No more joins
Joins are slow in general, when they are spread out over several servers it gets even worse. In CouchDB there is no join, instead data should be duplicated. This might sound odd if you like me was taucht back in collage that normalization is of highest importance, and that you should really really avoid to duplicate data in your database. This is still true for most relational databases, but keep in mind that when relational databases was invented, disk space was expensive and normalization was a great way to save a few bytes. Storage is everything but expensive today, this is why NoSQL empathizes that data should instead be duplicated.
3. Eventual consistency
When you update the database there is no longer a quarantine that all subsequent queries will get the updated value immediately, it might take some time depending on the system load. For some systems such as banking systems, this kind of behavior would be a big no no. But for most large web sites, this is no problem, the data is going to be cached in one way or another anyway.
CouchDB is a document database, accessible via a RESTful JSON API. Everything stored in the database is a “document” and is stored in a flat address space.There are no schemas, the documents are stored and retrieved as JSON objects.
CouchDB is written in Erlang and runs on all systems that the Erlang runtime supports (Linux, Windows, OSX and other unix systems). I have tested CouchDB on Linux and OSX. This is a screenshot of the CouchDBX GUI running on OSX:
The JSON representation of the same document:
In my next post I will show you some hands on action with CouchDB.