The three keys pattern for enterprise data design: NoSQL Modeling for Relational Databases
In the beginning, Codd proposed Third Normal Form, and it was good. Today, we need something better...
Relational databases are great. They provide a robust and scalable representation of data in a way that people easily understand, and can manipulate with a minimum of training. However, the usefulness of that data is only as valuable as how the data is modeled.
In this post, the "three keys pattern" for entity design will be presented. We will attempt to show how it allows relational databases to continue to scale horizontally and vertically for the internet age while retaining the simple query characteristics that are so appealing.
This key is only used for human reference. It should be indexed for lookup, but not used outside of a dedicated reference and type table. I call this column UniqueId because the name is readable, and the value is also readable. The dedicated reference table would look something like this: UID, UniqueId, Type; where type could be a class name or enumeration of entity types.
Relational databases are great. They provide a robust and scalable representation of data in a way that people easily understand, and can manipulate with a minimum of training. However, the usefulness of that data is only as valuable as how the data is modeled.
In this post, the "three keys pattern" for entity design will be presented. We will attempt to show how it allows relational databases to continue to scale horizontally and vertically for the internet age while retaining the simple query characteristics that are so appealing.
Model Complete Entities, not Tables
Much of the data we model is hierarchal in nature: parent-child, master-detail, and so forth... When this data is modeled in a database, the levels of the hierarchy are split, and the relationships described. Other data represents the edges between primary entities: uses of, instances of, etc... These relationships may have additional attributes; for example: a purchase order line has attributes of quantity and cost, but has a reference to the product.
It can be problematic to model both types of relationships in a relational database when we rely on foreign keys and all keys look the same.
Key number one - the local key
The local key is a surrogate key. Surrogate keys are important in modeling relational data. They minimize the amount of redundant data that must be copied where natural keys are used, while improving performance by being compact. Typically surrogate keys are generated by the database by a sequence (either global or local to the table). It is the primary key to the table.
Every table representing elements in an entity should have a surrogate key, this key should be simply named "Id". Many of you are cringing right now because your standard is {TableName}Id, or some variation of that. Naming the surrogate key "Id" provides the most simple pattern available to anyone who accesses the table. Foreign keys within an aggregate should be named {TableName}Id.
Pure relationship tables (many to many) do not need an Id field, as soon as you add an additional attribute to the relationship table (beyond tracking fields), then you also need an Id.
Use the local key to link together the tables within an aggregate, tables that you query and update (from an API) as a group. Imagine that you were storing the entire aggregate in one XML or JSON field, the data contained in that field is your aggregate.
The local key should never leave your database, and should not extend beyond an aggregate. It is a local key. Designed for terseness and speed. I recommend using logical deletes, but if you must allow for physical deletes, then enabling a cascading delete within an aggregate is the way to go.
Congratulations, you've created a database in third normal form. What now?
Key number two - the universal key
Sitting right next to the local key on the main aggregate table is another key, the universal key. This key is a universally global identifier; typically a guid or uniqueidentifier. It is automatically generated and immutable. I like to call this column UId.
The universal key is used to identify one single instance of data, no matter where it is created.
The universal key is used to model the edges between aggregates. In small (single database) systems, check constraints may be used to enforce integrity, but the real beauty of the universal key is that it becomes a reference for all external systems, whether or not they are relational databases.
Key number three - the human-readable key
The third key is a "natural" key. People are really bad at remembering (or typing) universal identifiers. However, a more readable key can be created for use by humans. It is always bad practice to put logic into this key. I recommend using a re-encoding of the unique identifier using 0-9 and upper-case A-Z (without O or I to avoid confusion with 0 and 1) and a checksum digit. This gives you 10+24 characters that are easily distinguished, and can be spoken over the telephone or written compactly.This key is only used for human reference. It should be indexed for lookup, but not used outside of a dedicated reference and type table. I call this column UniqueId because the name is readable, and the value is also readable. The dedicated reference table would look something like this: UID, UniqueId, Type; where type could be a class name or enumeration of entity types.
Comments