Local File Record Cache
Dorsal uses a local SQLite database to cache File Records.
This means scanning a file will be faster the second time, as the record is retrieved from the cache.
The cache is a single cache.db located in the .dorsal configuration in your home directory (e.g. /home/yourname/.dorsal/cache.db)
How it Works
-
When you scan a file, Dorsal checks the cache using a composite key of:
- File Path (Absolute path)
- Date Modified (
mtimefrom the filesystem)
-
If a record matches both, the cached result is returned.
-
If the file has been modified (different
mtime) or moved (differentpath), Dorsal treats it as a new file, runs the full extraction pipeline, and at the end writes back the result to the cache. -
Subsequent scans of the same file will then retrieve the record from the cache.
Bypassing or Overwriting the Cache
Sometimes you want to force a re-scan, for example, if you have updated an Annotation Model or want to debug an extractor.
When you force a re-scan, you can choose to skip the cache completely, or write-back the result:
-
Use the
--skip-cacheflag to bypass the cache completely. -
This forces a scan and doesn't touch the cache:
-
Use the
--overwrite-cacheto both force a scan (skip reading the cache) and write-back the result to the cache. -
This is a full refresh for that file record in the cache.
-
Set
use_cache=Falsein supported classes or functions to bypass the cache completely. -
This forces a scan, and doesn't touch the cache:
from dorsal import LocalFile
# Forces a full scan, even if the record is in the cache
lf = LocalFile("./documents/mydocument.pdf", use_cache=False)
-
Set
overwrite_cache=Truein supported classes or functions to both force a scan (skip reading the cache) and write-back the result to the cache. -
This is a full refresh for that file record in the cache.
Configuration
You can control the cache behavior globally using the Dorsal configuration file or by setting environment variables.
Precedence Order:
- Runtime Arguments (
--skip-cache,use_cache=False) - Environment Variables
- Configuration File (
dorsal.toml) - Defaults (
enabled=true)
Environment Variables
| Variable | Type | Default | Description |
|---|---|---|---|
DORSAL_CACHE_ENABLED |
bool |
true |
Set to false or 0 to disable all reading/writing to the cache. |
DORSAL_CACHE_COMPRESSION |
bool |
true |
Set to false to store uncompressed JSON instead of zlib-compressed blobs. |
Config File
Settings in your dorsal.toml config.
Managing the Cache
To clear, prune, or build the cache, use the CLI tools.
- CLI Guide: Cache Commands (
show,prune,clear,build)