One thing that I’ve had to get my head around when learning about DynamoDB is what exactly partition and sort keys are, and how they tie into the more familiar concept of a primary key. DynamoDB as you may know is a hosted NoSql database that can store heterogeneous items of varying structures; imagine a system that stores miscellaneous JSON documents, each with a unique key, and you’ll get the idea (though it’s not strictly a JSON document store)
The DynamoDB documentation talks of two concepts, a partition key and a sort key, and it’s not immediately obvious how these relate to the more common concept of a primary key that you’d get in a traditional database. Really it’s quite simple, but first let’s clarify what each term means.
- A primary key is an attribute that identifies a unique item in the database.
- A composite primary key is the same as a primary key but is made up of several attributes, and is needed when there is no one single attribute that can be used to identify an item.
- A partition key is an attribute whose value is hashed by DynamoDB and then used to determine which partition within the DynamoDB storage system will be used to store the item. It is not intrinsically meant as an identifying ‘primary’ key.
- A sort key is a second attribute which DynamoDB uses to sort each partition by. The use of a sort key is optional but is needed when the paryition key attribute is not unique, and this is a crucial point when it comes to understanding DynamoDB.
So, we have a DynamoDB table, we need to be able to uniquely identify items, so we need a primary key. What do we use?
- If the partition key attribute (which remember, is more about determining which partition to use) is enough to uniquely identify each item, then we can also use that as a primary key (so it’ll be used both to determine which partition and as an identifier itself)
- However, if the partition key attribute is not unique enough to to be used as primary key, you’ll need to find an additional attribute with which you can combine the primary key to produce a composite primary key.
- This additional attribute (with which you are creating the composite primary key) is the table’s sort key.
So, in a single sentence, a DynamoDB tables partition key is also it’s primary key, except when the table also has a specified sort key, in which case both the partition key and sort key combined form a composite primary key.
I hope that has cleared things up. It’s a little bit annoying as AWS’s documentation doesn’t really talk in terms of primary keys etc, so it’s a little hard to make the jump.
The question now is, why would you specify a non-unique, non primary key as a partition key (thus creating the need for a sort key?). Stay tuned for the answer…..