Welcome to the DorsalHub Documentation!
DorsalHub is your file metadata platform.
DorsalHub makes it easy to securely search and manage file metadata records from anywhere.
Built with developers in mind, DorsalHub is powered by Dorsal - a local-first metadata generation toolkit.
The Quick Start guide below will get you started with Dorsal.
What is Dorsal?
Dorsal is a Python library and command line tool for generating, validating, and managing structured file metadata.
Dorsal is:
- Local First: Metadata extraction happens locally on your machine, not in the cloud.
- Strictly Validated: All metadata records are checked against strict JSON Schemas and Pydantic models.
- Extensible: Support your own file types and metadata annotation needs by integrating your own models.
Quick Start
This guide covers:
- Installing Dorsal
- Scanning your first file
- Authenticating with DorsalHub (Optional)
- Pushing metadata to DorsalHub (Requires Authentication)
Install Dorsal
Dorsal is available on PyPI as dorsalhub.
uv is a popular Python package installer and resolver, known for being fast. Installing UV
Once installation is complete, verify the install by running dorsal --version:
Your output should resemble this, showing you the version of Dorsal which is installed:
Scan a File
At its heart Dorsal is a toolkit for creating and managing structured metadata records from your files, and it ships with offline metadata extractors for a number of different file types, including PDFs, Office Documents, Video and Audio files.
The quickest way to get started is in the terminal you just installed Dorsal.
-
Locate a file you'd like to scan, and copy its path.
-
Use the
dorsal file scancommand with the path to that file:When the scan completes, you should see something similar to this:
📄 Scanning metadata for PDFSPEC.pdf ╭───────────────────────────────── File Record: PDFSPEC.pdf ─────────────────────────────────╮ │ │ │ Hashes │ │ SHA-256: 3383fb2ab568ca7019834d438f9a14b9d2ccaa2f37f319373848350005779368 │ │ BLAKE3: 9abdfb32750a278d5ca550b876e94a72cd8eec82d0e506a127dfb94bd56ca4b2 │ │ TLSH: T13465D67BB4C61D6DF893CA46571C579B8B0D71533BAEA58604BDAF0AC6338029AC3F41 │ │ │ │ File Info │ │ Full Path: /dev/test/docs/PDFSPEC.pdf │ │ Modified: 2025-04-09 15:09:05 │ │ Name: PDFSPEC.pdf │ │ Size: 1 MiB │ │ Media Type: application/pdf │ │ │ │ Tags │ │ No tags found. │ │ │ │ Pdf Info │ │ author: Tim Bienz, Richard Cohn, James R. Meehan │ │ title: Portable Document Format Reference Manual (v 1.2) │ │ creator: FrameMaker 5.1.1 │ │ producer: Acrobat Distiller 3.0 for Power Macintosh │ │ subject: Description of the PDF file format │ │ keywords: Acrobat PDF │ │ version: 1.2 │ │ page_count: 394 │ │ creation_date: 1996-11-12T03:08:43 │ │ modified_date: 1996-11-12T07:58:15 │ │ │ │ │ ╰────────────────────────────────────────────────────────────────────────────────────────────╯This panel shows the core metadata fields for this record.
-
You can export the record to JSON straight from the CLI by adding the
--jsonflagThis outputs the JSON to stdout, so you can redirect it to a file or pipe it to other tools:
The JSON output is a fully-validated File Record 👇
Example File Record: PDFSPEC.pdf
Notice the{ "hash": "3383fb2ab568ca7019834d438f9a14b9d2ccaa2f37f319373848350005779368", "validation_hash": "9abdfb32750a278d5ca550b876e94a72cd8eec82d0e506a127dfb94bd56ca4b2", "annotations": { "file/base": { "record": { "hash": "3383fb2ab568ca7019834d438f9a14b9d2ccaa2f37f319373848350005779368", "name": "PDFSPEC.pdf", "extension": ".pdf", "size": 1512313, "media_type": "application/pdf", "media_type_prefix": "application" }, "source": { "type": "Model", "model": "dorsal/base", "version": "1.0.0" } }, "file/pdf": { "record": { "author": "Tim Bienz, Richard Cohn, James R. Meehan", "title": "Portable Document Format Reference Manual (v 1.2)", "creator": "FrameMaker 5.1.1", "producer": "Acrobat Distiller 3.0 for Power Macintosh", "subject": "Description of the PDF file format", "keywords": "Acrobat PDF", "version": "1.2", "page_count": 394, "creation_date": "1996-11-12T03:08:43", "modified_date": "1996-11-12T07:58:15" }, "private": true, "source": { "type": "Model", "model": "dorsal/pdf", "version": "1.0.0", "variant": "pypdfium2" } } }, "tags": [], "source": "disk", "local_attributes": { "date_modified": "2025-04-09 15:09:05.533199+01:00", "date_accessed": "2025-11-28 10:37:08.225267+00:00", "date_created": "2025-07-17 11:07:52.875623+01:00", "file_path": "/dev/test/docs/PDFSPEC.pdf", "file_size_bytes": 1512313, "file_permissions_mode": 33279, "inode": 3940649675394997, "number_of_links": 1 }, "local_filesystem": { "full_path": "/dev/test/docs/PDFSPEC.pdf", "date_created": "2025-07-17T11:07:52.875623+01:00", "date_modified": "2025-04-09T15:09:05.533199+01:00" } }file/pdfkey underannotationsstores a separate object housing PDF-specific fieldsFor more information on the
dorsal filecommands, see the full CLI Guide: Files
-
LocalFileis a python class which you can use to create, update and manage a metadata record for a single file.To scan a file, create a new
LocalFileinstance with the file path: -
LocalFileexposes some base metadata fields as top-level attributes:hash: The file's SHA-256 hash e.g."3383fb2ab568ca7019834d438f9a14b9d2ccaa2f37f319373848350005779368"name: The file's name e.g."PDFSPEC.pdf"extension: The file's extension e.g.".pdf"size: The file's size in bytes e.g.1512313size_text: The file's size in human-readable text e.g."1 MiB"media_type: The media type e.g."application/pdf"
You can access these fields as attributes on the
LocalFileinstance:Accessing attributes on a LocalFile instance Output:
-
You can visualize the entire record by calling the
to_dictorto_jsonmethods:Display the full file record in Python The JSON printout is a fully-validated File Record 👇
Example File Record: PDFSPEC.pdf
Notice the{ "hash": "3383fb2ab568ca7019834d438f9a14b9d2ccaa2f37f319373848350005779368", "validation_hash": "9abdfb32750a278d5ca550b876e94a72cd8eec82d0e506a127dfb94bd56ca4b2", "annotations": { "file/base": { "record": { "hash": "3383fb2ab568ca7019834d438f9a14b9d2ccaa2f37f319373848350005779368", "name": "PDFSPEC.pdf", "extension": ".pdf", "size": 1512313, "media_type": "application/pdf", "media_type_prefix": "application" }, "source": { "type": "Model", "model": "dorsal/base", "version": "1.0.0" } }, "file/pdf": { "record": { "author": "Tim Bienz, Richard Cohn, James R. Meehan", "title": "Portable Document Format Reference Manual (v 1.2)", "creator": "FrameMaker 5.1.1", "producer": "Acrobat Distiller 3.0 for Power Macintosh", "subject": "Description of the PDF file format", "keywords": "Acrobat PDF", "version": "1.2", "page_count": 394, "creation_date": "1996-11-12T03:08:43", "modified_date": "1996-11-12T07:58:15" }, "private": true, "source": { "type": "Model", "model": "dorsal/pdf", "version": "1.0.0", "variant": "pypdfium2" } } }, "tags": [], "source": "disk", "local_attributes": { "date_modified": "2025-04-09 15:09:05.533199+01:00", "date_accessed": "2025-11-28 10:37:08.225267+00:00", "date_created": "2025-07-17 11:07:52.875623+01:00", "file_path": "/dev/test/docs/PDFSPEC.pdf", "file_size_bytes": 1512313, "file_permissions_mode": 33279, "inode": 3940649675394997, "number_of_links": 1 }, "local_filesystem": { "full_path": "/dev/test/docs/PDFSPEC.pdf", "date_created": "2025-07-17T11:07:52.875623+01:00", "date_modified": "2025-04-09T15:09:05.533199+01:00" } }file/pdfkey underannotationsstores a separate object housing PDF-specific fieldsFor more information on the Python API and the
LocalFileclass, see the Python API Docs
Authenticate (Optional)
While Dorsal is a capable offline tool, connecting it to DorsalHub unlocks its full potential.
-
To authenticate, first generate an API Key on DorsalHub.
- Visit the Manage API Keys page and click the "New API Key" button.
-
Authenticate Dorsal:
There are two ways to authenticate:
- Use the
dorsal auth logincommand in your terminal - Set the
DORSAL_API_KEYenvironment variable
-
Run
dorsal auth login: -
Paste your API key when prompted:
-
Dorsal is now authenticated in both the Python API and Command Line Interface.
-
Your API Key is stored in Dorsal's global configuration file (e.g.
/home/user/.dorsal/dorsal.toml). -
For more information on the
dorsal authcommands, see the CLI Guide: Autentication
Set the environment variable
DORSAL_API_KEYto your API Key.This command for setting environment variables varies by operating system and shell.
-
macOS / Linux:
-
Windows (PowerShell):
-
Windows (Command Prompt):
Setting the
DORSAL_API_KeYenvironment variable will authenticate Dorsal within the current terminal session.Once the environment variable is set, can confirm you are logged in by running the
dorsal auth whoamicommand:This command prints your current logged-in status:
Verifying session with DorsalHub... ╭─────────── Authenticated User ──────────╮ │ │ │ User ID: 1230321 │ │ Name: yourname │ │ Email: your.email@example.com │ │ Account Status: Member │ │ │ ╰──────────────────────────────────────────╯This confirms that Dorsal is now authenticated.
Session Only
Setting an environment variable this way authenticates Dorsal for your current terminal session only. You will need to set it again if you open a new terminal.
For a persistent login, use
dorsal auth loginto save the API Key to the global config file.API Key Safety
Your API Key is a secure credential, just like a password. You must store it safely and never share it.
For more information about API Keys, including safety tips, see API Keys.
- Use the
Push a Record to DorsalHub
Note
This step requires API-Key authentication.
-
Use the
dorsal file pushcommand to create and securely publish a structured metadata record to DorsalHub.When complete it will show something like:
📡 Preparing to push metadata for PDFSPEC.pdf as a private record... ╭───────────────────────────── Push Complete ──────────────────────────────────╮ │ The file record was successfully pushed to DorsalHub. │ │ │ │ SHA256 Hash: 3383fb2ab568ca7019834d438f9a14b9d2ccaa2f37f319373848350005779368 │ ╰───────────────────────────────────────────────────────────────────────────────╯Note that while the metadata record is pushed to DorsalHub, the file itself never leaves your machine.
DorsalHub is Private by Default
When you run
dorsal file push "docs/PDFSPEC.pdf", you are telling the server to create a private record about that file.Private records are only visible to you.
To make a public record, you should add the
--publicargument to the command: -
View it Online
Head over to your DorsalHub Dashboard to see the newly indexed file and its extracted metadata.
-
The
LocalFile.pushmethod uploads the entire File RecordLocalFile.model(as a validatedFileRecordStrictobject) to the DorsalHub API.Indexing to DorsalHub Note that while the metadata record is pushed to DorsalHub, the file itself never leaves your machine.
DorsalHub is Private by Default
When you call
LocalFile.push, you are telling the server to create a private record about that file.Private records are only visible to you.
To make a public record, you should add the
public=Trueargument to thepushmethod: -
View it Online
Head over to your DorsalHub Dashboard to see the newly indexed file and its extracted metadata.
Custom Extractors
Dorsal is for more than just core file metadata. You can create custom Annotation Models in Python to extract specific data from your files.
Annotation Models can be as simple or as complex as you like. For example, a simple model to count words in a text file:
| Simple Word Counting Annotation Model | |
|---|---|
This model can be registered to your local Model Pipeline to run automatically every time you scan a text file.
Check out the tutorial: Introduction to Annotation Models
Next Steps
You've indexed your first file! Here's where to go next:
-
⌨️ Learn the CLI
Learn how to manage files, add tags, create collections, and more, directly from your terminal.
-
🐍 Learn the Python API
Integrate Dorsal into your applications for custom metadata workflows, analysis, and automation.
-
🖥️ Explore DorsalHub
Get oriented with the DorsalHub website. View and organize your indexed files from your dashboard.
-
🧑💻 I want to contribute...
Dorsal is open source, and provided under the Apache 2.0 license. Report an issue, or suggest new features on our GitHub repository.