The Quick Hash
The Quick Hash is a specialized sample-based hash function, generated for the primary purpose of rapidly identifying files.
While a standard cryptographic hash (like SHA-256) must read every single byte of a file into memory, when a Quick Hash is generated, only a small, deterministic sample of the file's content is used.
Because file hashing is generally an I/O-bound process, it is usually much faster to generate a Quick Hash for large files, than it would be to calculate a secure hash.
How It Works
The Quick Hash is generated through a multi-step sampling process designed to be both fast and deterministic (meaning the exact same file will always produce the exact same quick hash).
- File Size Validation: The process only runs on files within a specific size range. By default, the file must be at least 32 MiB and no larger than 1 PiB.
- Deterministic Seeding: The sampling process is seeded with a number generated from the file's exact size in bytes. This guarantees that for any two identical files, the exact same sample locations will be chosen every time.
- Chunk Sampling: The hasher selects a number of 1 MiB chunks to read. This number scales with the file's size, from a minimum of 8 chunks up to a maximum of 1,024.
- Final Hash: The hasher seeks directly to each sample location, reads the 1 MiB chunk of data, and feeds it into a SHA-256 hash instance. The final Quick Hash is the digest of all combined samples.
Use Cases and Limitations
When to Use a Quick Hash
- Finding Duplicates: For a fast "first pass" on a massive directory to identify potential duplicate files.
- Large File Triage: Quickly identifying large video files, disk images, or scientific datasets where a full hash would be too slow.
- As an Identifier: It is available as the
quick_hashfield in the Dorsal File Record.
Important Limitations
The Quick Hash should not be used in place of a secure hash.
- Risk of Collisions: Because it does not read the entire file, it is possible for two different files to have the same Quick Hash. This makes it unsuitable as a globally unique identifier.
- Not for Similarity: The Quick Hash is not a similarity hash. For finding similar files, Dorsal supports the TLSH hash (available via
dorsal file hash --tlsh).
Generating a Quick Hash
1. Generating a Quick Hash
You can generate a Quick Hash for a single file using the dorsal file hash command with the --quick flag.
For a large file (over 32 MiB):
You can generate a Quick Hash for a single file by using the get_quick_hash function:
The output when generating a Quick Hash is a 64 character string