Adv DB: Document DBs

Main concepts

Data models are how we see, interact with, and transform our data in a system like a database (MUSE, 2015). A data model to a dev person is an ERD, whereas a metamodel is what is used to describe how a database organizes data in four key ways: Key-values, document, column-family, and graph databases (although graph databases are not aggregates) (Sadalage & Fowler, 2012).

In relational data models, tuples are a set of values (divided and stored information) that cannot be nested, nor placed within another, so all operations must be thought of as reading or writing tuples.  For aggregate data models, we want to do more complex things (like key values, column family and documents) rather than just dealing with tuples (Sadalage & Fowler, 2012). Aggregates are related sets of data that we would like to treat as a unit (MUSE, 2015). Relationships between units/aggregates are captured in relational mapping, and a relational or graph database has no idea that the aggregate exists, also known as “aggregate-ignorant” (Sadalage & Fowler, 2012).

Let’s consider a UPS.  For transactions like amazon.com or ebay.com, we need to know only the shipping address if we are a distributor, but paypal.com or your bank cares about the billing address to give you credit into your account.  UPS must collect both.  Thus, UPS, in their relational models they may have in an ERD with two Entities called: Billing Address and Shipping Address.  Naturally, we can group these into one unit (aggregate) called: Address with an indicator/key to state which address is which.  Thus, I can query the key for shipping addresses.

Finally, atomic operations are supported on a single aggregate at a time, and ACID is not followed for transactions across multiple aggregates at a time (Sadalage & Fowler, 2012).

Document Databases

A document database is able to look into the structure of a unit because we need to use a query, which can return a subset/part of the aggregate (Sadalage & Fowler, 2012). You can think of this as either a chapter or a section in a document (MUSE, 2015).  It can be limited by the size restrictions, but also in what can be placed (structure and type).  People can blur the line between this and key-value databases by placing an ID field, but for the most part, you will query a document database rather than look up a key or ID (Sadalage & Fowler, 2012).

Pros and Cons of Aggregate Data model

Aggregate ignorance allows for manipulation of data, replication, and sharding because if not, we would have to search every unit, yet manipulation of data, replication, and sharding can be easier when done in these units.  Thus it can help in some cases and not in others.  Also, there is no correct or right way on where aggregate boundaries should or shouldn’t exist, which can add to the complexity in understanding a data model.  It is great if we want to run our transactions on as little nodes as possible on a cluster, and dealing with units is easier on a cluster (Sadalage & Fowler, 2012).  It is not great for mapping out the relationships of units of different formats (MUSE, 2015).

References: