December 2023
Intermediate to advanced
464 pages
12h 35m
English
Keep the number of partitions fixed so that the mapping of data to partition stays the same when the size of a cluster changes.
To split data across a set of cluster nodes, each data item needs to be mapped to them. There are two requirements for mapping data to the cluster nodes.
The distribution should be uniform.
It should be possible to know which cluster node stores a particular data item without making a request to all the nodes.
Consider a key-value store, which is a good proxy for many storage systems. Both requirements can be fulfilled by taking a hash of the key and using the modulo operation to map it to a cluster node. So if we have a three-node cluster, we can map keys Alice, Bob, Mary, and ...