Sam Horradarn

sirahd

AI & ML interests

None yet

Recent Activity

updated a model 8 days ago

xet-team/SmolVLM-256M-Instruct-test

upvoted an article 16 days ago

Xet is on the Hub

published an article 16 days ago

Xet is on the Hub

View all activity

Organizations

sirahd's activity

updated a model 8 days ago

xet-team/SmolVLM-256M-Instruct-test

Image-Text-to-Text • Updated 8 days ago

upvoted an article 16 days ago

Article

Xet is on the Hub

and 5 others •

16 days ago

• 41

published an article 16 days ago

Article

Xet is on the Hub

and 5 others •

16 days ago

• 41

updated a model about 1 month ago

sirahd/test-xet-migration-2

Updated Feb 20

published a model about 1 month ago

sirahd/test-xet-migration-2

Updated Feb 20

commented on From Chunks to Blocks: Accelerating Uploads and Downloads on the Hub about 1 month ago

How can we find the chunk content using chunk hash?

Chunk hash is calculated via content-defined chunking (CDC), which means that if two chunks have the same content they will share the same hash. CDC removes the need to store the mapping between chunk hash -> chunk content because we know if two chunks share the same hash, they will have identical content.

The CAS system only stores "block_hash -> block_content", Where does the map of chunk to block?

This is explained in the "key chunks" section in the blog post above. Essentially we only store a tiny subset of chunk -> block by leveraging spatial locality in the file. Trying to store every mapping of chunk -> block can get impractical very quickly.

what does the shards store? Is it "file_name, shard_id, chunk_hash, block_hash"

You can think of the shards as storing mappings between file (identified via file hash) to list of chunks that make up the file.

I hope this help explains our underlying tech better!

upvoted an article about 1 month ago

Article

From Chunks to Blocks: Accelerating Uploads and Downloads on the Hub

and 3 others •

Feb 12

• 55

updated a dataset 6 months ago

sirahd/test

Viewer • Updated Oct 17, 2024 • 14 • 14