Search This Blog

Thursday 18 June 2020

Dynamo Db - Good To Remember stuff

Post is a collection of interesting pointers on the working of Dynamo Db. I wanted to have them in a single place as a ready reference
DynamoDB is a key value database.
A key-value database stores data as a collection of key-value pairs in which a key
serves as a unique identifier. Both keys and values can be anything, ranging from
simple objects to complex compound objects. Key-value databases are highly 
partitionable and allow horizontal scaling at scales that other types of databases
cannot achieve

Tables in DynamoDb

  • No schema unlike a RDBMS table - you only provide the primary key. (You can define all attributes if you want - using the CreateTable action, but its not needed.)
  • DynamoDb "Item" is the equivalent of a table row.
  • There is no need for a commit after adding/updating/deleting an item. They are permanent

What's inside the Item ?

  • Each Item has the primary key and zero or more attributes. No Limit on number of attributes( Although Item size cannot exceed 400KB)
  • Primary Key can be composed of a single attribute called Partition Key or of two attributes (Partition Key and Range Key)
A simple primary key or hash key, composed of one attribute known as the partition key. 
DynamoDB uses the partition key's value as input to an internal hash 
function. The output from the hash function determines the partition (physical
storage internal to DynamoDB) in which the item will be stored.
In a table that has only a partition key, no two items can have the same 
partition key value.
  • When two attributes form the primary key, two items can have same hash key, but the range (or sort key) has to be different. If sort key exists, then items with same partition key are stored sorted by the range key. 
  • What are the benefit of having a sort key ? As Dynamo Db is a key value store, all its queries are of the form "key=value". So if we have only Partition key, we can only do equality based lookups on it. The sort key however does not have this constraint. With Sort Key we can apply non equals constraints
As seen here, the sort key allows us to efficiently retrieve records (I personally love 'Begins with')

Storing of Data and Retrieval

  • The data in DynamoDb is stored in partitions. (10 GB storage units (Link) - but that could have changed)
A partition is an allocation of storage for a table, backed by solid state drives
(SSDs) and automatically replicated across multiple Availability Zones within an
AWS Region. Partition management is handled entirely by DynamoDB — you never have
to manage partitions yourself.
  • DynamoDb allocates additional partitions - if an existing partition fills up or existing partitions cannot support any changes to table throughput.
  • DynamoDb's performance is closely tied to the partitions - Single Partition in Dynamo supports a maximum of 3000 read capacity units (RCUs) or 1000 write capacity units (WCUs).
  • The Read and Write capacity of DynamoDb is evenly divided among the table's physical partitions. Therefore, a partition key design that doesn't distribute I/O requests evenly can create "hot" partitions that result in throttling and use your provisioned I/O capacity inefficiently (Doc Link).
The optimal usage of a table's provisioned throughput depends not only on the 
workload patterns of individual items, but also on the partition key design. 
This doesn't mean that you must access all partition key values to achieve an 
efficient throughput level, or even that the percentage of accessed partition 
key values must be high. It does mean that the more distinct partition key 
values that your workload accesses, the more those requests will be spread 
across the partitioned space. In general, you will use your provisioned throughput
more efficiently as the ratio of partition key values accessed to the total 
number of partition key values increases.

Local Secondary Indexes

  • Dynamo Db equivalent of an SQL Index. The goal of creating an index on a particular table in a database is to make it faster to search through the table and find the row or rows that we want.
A secondary index lets you query the data in the table using an alternate key,
in addition to queries against the primary key.
In case of Local Secondary Index, it has the same partition key as the table,
but a different sort key.
A local secondary index is "local" in the sense that every partition of a 
local secondary index is scoped to a base table partition that has the same
partition key value.
Local secondary indexes are created at the same time that you create a table.
You cannot add a local secondary index to an existing table, nor can you 
delete any local secondary indexes that currently exist.
  • With Indexes you get pointers to the data, so DynamoDb will need to do additional read to fetch the data from the partition. 
to retrieve any additional attributes, DynamoDB must perform additional read operations
against the Thread table. These extra reads are known as fetches, and they can increase
the total amount of provisioned throughput required for a query. From an application's point of view, fetching additional attributes from the base table is automatic and transparent, so there is no need to rewrite any application logic. However, such fetching can greatly reduce the performance advantage of using a local secondary index.
To avoid extra read costs, DynamoDb can copy additional attributes to the index. This ensures Index read will return all values needed from this record. The attributes to be copied can also be configured to be none/some/all attributes. (The partition key and sort key of the table are always projected into the index)

  • Local secondary indexes on a table are created when the table is created. When you delete a table, any local secondary indexes on that table are also deleted.

Global Secondary Index

  • Dynamo Db equivalent of a single table view in SQL. With this index you have a hash key and a range key.
  • The partition key and sort key of the table are always projected into the index. Additional attributes can be included as needed.
  • The updates to GSI happen in an async manner. Updates are eventually consistent.

No comments:

Post a Comment