
Sam Horradarn
AI & ML interests
Recent Activity
Organizations
sirahd's activity

Xet is on the Hub

Xet is on the Hub

How can we find the chunk content using chunk hash?
Chunk hash is calculated via content-defined chunking (CDC), which means that if two chunks have the same content they will share the same hash. CDC removes the need to store the mapping between chunk hash -> chunk content because we know if two chunks share the same hash, they will have identical content.
The CAS system only stores "block_hash -> block_content", Where does the map of chunk to block?
This is explained in the "key chunks" section in the blog post above. Essentially we only store a tiny subset of chunk -> block by leveraging spatial locality in the file. Trying to store every mapping of chunk -> block can get impractical very quickly.
what does the shards store? Is it "file_name, shard_id, chunk_hash, block_hash"
You can think of the shards as storing mappings between file (identified via file hash) to list of chunks that make up the file.
I hope this help explains our underlying tech better!
