1)Q: How is the data stored in MongoDB?
A: The format of the data in Mongo is BSON, BSON is a binary-encoded serialization of JSON-like documents. BSON is designed to be lightweight, traversable, and efficient. BSON, like JSON, supports the embedding of objects and arrays within other objects and arrays. See bsonspec.org for the spec and more information in general. When we are using relational DB-s we are talking about tuples(or rows), when we are using Mongo the unit is a document.The notion of a table is substituted by the notion of collection. You could think of your database as of a set of collections holding a number of documents each of them aggregating a concept from your domain and storing it as a tree. So the linear structure of a common database became a tree-like structure.
2) Q: I’ve heard that MongoDB does not have transactions and the write mechanism is not safe. How could you comment on that?
A: First thing to consider is “safe” mode or “getLastError()”.
If you issue a “safe” write, you know that the database has received the insert and applied the write. However, MongoDB only flushes to disk every 60 seconds, so the server can fail without the data on disk.
Second thing to consider is “journaling” (v1.8+). With journaling turned on, data is flushed to the journal every 100ms. So you have a smaller window of time before failure. The drivers have an “fsync” option (check that name) that goes one step further than “safe”, it waits for acknowledgement that the data has be flushed to the disk (i.e. the journal file).However, this only covers one server. What happens if the hard drive on the server just dies? Well you need a second copy.An interesting thing is that “journaling” is the default option from 2.0+.
Third thing to consider is replication. The drivers support a “W” parameter that says “replicate this data to N nodes” before returning. If the write does not reach “N” nodes before a certain timeout, then the write fails (exception is thrown). However, you have to configure “W” correctly based on the number of nodes in your replica set. Again, because a hard drive could fail, even with journaling, you’ll want to look at replication. Then there’s replication across data centers which is too long to get into here. The last thing to consider is your requirement to “roll back”. From my understanding, MongoDB does not have this “roll back” capacity. If you’re doing a batch insert the best you’ll get is an indication of which elements failed.
3) Q: What makes the sharding and partition-tolerance possible? How do you keep track of id-s and make sure they are unique?
A: Each collection in Mongo contains in MongoId. The rules described below take the machine the document was created on into account so each document is unique throughout the distributed database.
BSON ObjectID is a 12-byte value consisting of a 4-byte timestamp (seconds since epoch), a 3-byte machine id, a 2-byte process id, and a 3-byte counter. Note that the timestamp and counter fields must be stored big endian unlike the rest of BSON. This is because they are compared byte-by-byte and we want to ensure a mostly increasing order.
TimeStamp. This is a unix style timestamp. It is a signed int representing the number of seconds before or after January 1st 1970 (UTC).
Machine. This is the first three bytes of the (md5) hash of the machine host name, or of the mac/network address, or the virtual machine id.
Pid. This is 2 bytes of the process id (or thread id) of the process generating the object id.
Increment. This is an ever incrementing value, or a random number if a counter can’t be used in the language/runtime.
4) Q: MongoDb is a non-relational DB. In RDB-s we have a well-known and defined set of mathematical rules of how to design the DB in order to avoid anomalies. What is the right design for Mongo Database?
A: There are several facts you should keep in mind:
Number 1 – There is no right strategy of dividing your aggregate into collection in mongo comparing to RDBMS-s. Each design is case-dependent.
Number 2 - despite of number 3 remark about the “no right way”, there is a general recommendation to keep your documents less than 10 MB size.
Number 3 – The consistency is part is left for yourself. The RDBM-s come with foreign key constraint mechanism which is now a developer’s headache, transaction adds up to it, etc. This is the cost of what is formally described as CAP theorem nicely described . We have both Partition – tolerance and availability for Mongo. Consistency which is the developer’s headache is the price you pay.
Credits: Amdaris LLP