Welcome to the DorsalHub Documentation!
DorsalHub is your file metadata platform.
Built with developers in mind, DorsalHub is powered by the open-source Dorsal library.
The Quick Start guide below will get you started with Dorsal.
Quick Start 🚀
This guide covers:
- Installing Dorsal
- Authenticating with DorsalHub (Optional)
- Extracting metadata from a file
- Pushing metadata to DorsalHub (Requires Authentication)
Install Dorsal 💾
-
Dorsal is available on PyPI as
dorsalhub: -
Once installation is complete, verify the install by running
dorsal --version:You should get an output that resembles this, showing you the version of Dorsal which is installed:
Authenticate with DorsalHub (Optional) 🔑
While Dorsal is a capable offline tool, connecting it to DorsalHub unlocks its full potential.
-
To authenticate, first generate an API Key on DorsalHub.
- Visit the Manage API Keys page and click the "New API Key" button.
-
Authenticate Dorsal:
There are two ways to authenticate:
- Use the
dorsal auth logincommand in your terminal - Set the
DORSAL_API_KEYenvironment variable
-
Run
dorsal auth login: -
Paste your API key when prompted:
-
Dorsal is now authenticated!
-
Your API Key is stored in Dorsal's global configuration file (e.g.
/home/user/.dorsal/dorsal.toml). -
For more information on the
dorsal authcommands, see the CLI Guide: Autentication
Set the environment variable
DORSAL_API_KEYto your API Key.This command for setting environment variables varies by operating system and shell.
-
macOS / Linux:
-
Windows (PowerShell):
-
Windows (Command Prompt):
Setting the
DORSAL_API_KeYenvironment variable will authenticate Dorsal within the current terminal session.Once the environment variable is set, can confirm you are logged in by running the
dorsal auth whoamicommand:This command prints your current logged-in status:
Verifying session with DorsalHub... ╭─────────── Authenticated User ──────────╮ │ │ │ User ID: 1230321 │ │ Name: yourname │ │ Email: your.email@example.com │ │ Account Status: Member │ │ │ ╰──────────────────────────────────────────╯This confirms that Dorsal is now authenticated.
Session Only
Setting an environment variable this way authenticates Dorsal for your current terminal session only. You will need to set it again if you open a new terminal.
For a persistent login, use
dorsal auth loginto save the API Key to the global config file.API Key Safety
Your API Key is a secure credential, just like a password. You must store it safely and never share it.
For more information about API Keys, including safety tips, see API Keys.
- Use the
Extract some metadata 🚀
At its heart Dorsal is a toolkit for creating and managing structured metadata records from your files, and it ships with offline metadata extractors for a number of different file types, including PDFs, Office Documents, Video and Audio files.
The quickest way to get started is right from the command line:
-
Locate a file you'd like to scan, and copy its path.
-
Use the
dorsal file scancommand with the path to that file:When the scan completes, you should see something similar to this:
📄 Scanning metadata for PDFSPEC.pdf ╭───────────────────────────────── File Record: PDFSPEC.pdf ─────────────────────────────────╮ │ │ │ Hashes │ │ SHA-256: 3383fb2ab568ca7019834d438f9a14b9d2ccaa2f37f319373848350005779368 │ │ BLAKE3: 9abdfb32750a278d5ca550b876e94a72cd8eec82d0e506a127dfb94bd56ca4b2 │ │ TLSH: T13465D67BB4C61D6DF893CA46571C579B8B0D71533BAEA58604BDAF0AC6338029AC3F41 │ │ │ │ File Info │ │ Full Path: /dev/test/docs/PDFSPEC.pdf │ │ Modified: 2025-04-09 15:09:05 │ │ Name: PDFSPEC.pdf │ │ Size: 1 MiB │ │ Media Type: application/pdf │ │ │ │ Tags │ │ No tags found. │ │ │ │ Pdf Info │ │ author: Tim Bienz, Richard Cohn, James R. Meehan │ │ title: Portable Document Format Reference Manual (v 1.2) │ │ creator: FrameMaker 5.1.1 │ │ producer: Acrobat Distiller 3.0 for Power Macintosh │ │ subject: Description of the PDF file format │ │ keywords: Acrobat PDF │ │ version: 1.2 │ │ page_count: 394 │ │ creation_date: 1996-11-12T03:08:43 │ │ modified_date: 1996-11-12T07:58:15 │ │ │ │ │ ╰────────────────────────────────────────────────────────────────────────────────────────────╯This panel shows the core metadata fields for this record.
-
You can export the record to JSON straight from the CLI by adding the
--jsonflagThis outputs the JSON to stdout, so you can redirect it to a file or pipe it to other tools:
The JSON output is a fully-validated File Record 👇
Example File Record: PDFSPEC.pdf
Notice the{ "hash": "3383fb2ab568ca7019834d438f9a14b9d2ccaa2f37f319373848350005779368", "validation_hash": "9abdfb32750a278d5ca550b876e94a72cd8eec82d0e506a127dfb94bd56ca4b2", "annotations": { "file/base": { "record": { "hash": "3383fb2ab568ca7019834d438f9a14b9d2ccaa2f37f319373848350005779368", "name": "PDFSPEC.pdf", "extension": ".pdf", "size": 1512313, "media_type": "application/pdf", "media_type_prefix": "application" }, "source": { "type": "Model", "model": "dorsal/base", "version": "1.0.0" } }, "file/pdf": { "record": { "author": "Tim Bienz, Richard Cohn, James R. Meehan", "title": "Portable Document Format Reference Manual (v 1.2)", "creator": "FrameMaker 5.1.1", "producer": "Acrobat Distiller 3.0 for Power Macintosh", "subject": "Description of the PDF file format", "keywords": "Acrobat PDF", "version": "1.2", "page_count": 394, "creation_date": "1996-11-12T03:08:43", "modified_date": "1996-11-12T07:58:15" }, "private": true, "source": { "type": "Model", "model": "dorsal/pdf", "version": "1.0.0", "variant": "pypdfium2" } } }, "tags": [], "source": "disk", "local_attributes": { "date_modified": "2025-04-09 15:09:05.533199+01:00", "date_accessed": "2025-11-28 10:37:08.225267+00:00", "date_created": "2025-07-17 11:07:52.875623+01:00", "file_path": "/dev/test/docs/PDFSPEC.pdf", "file_size_bytes": 1512313, "file_permissions_mode": 33279, "inode": 3940649675394997, "number_of_links": 1 }, "local_filesystem": { "full_path": "/dev/test/docs/PDFSPEC.pdf", "date_created": "2025-07-17T11:07:52.875623+01:00", "date_modified": "2025-04-09T15:09:05.533199+01:00" } }file/pdfkey underannotationsstores a separate object housing PDF-specific fieldsFor more information on the
dorsal filecommands, see the full CLI Guide: Files
Index it to DorsalHub
Note
This step requires API-Key authentication.
-
Use the
dorsal file pushcommand to create and publish a structured metadata record to DorsalHub.When complete it will show something like:
📡 Preparing to push metadata for PDFSPEC.pdf as a private record... ╭───────────────────────────── Push Complete ──────────────────────────────────╮ │ The file record was successfully pushed to DorsalHub. │ │ │ │ SHA256 Hash: 3383fb2ab568ca7019834d438f9a14b9d2ccaa2f37f319373848350005779368 │ ╰───────────────────────────────────────────────────────────────────────────────╯DorsalHub is Private by Default
When you run
dorsal file push "docs/PDFSPEC.pdf", you are telling the server to create a private record about that file.Private records are only visible to you.
To make a public record, you should add the
--publicargument to the command: -
View it Online ✨
Head over to your DorsalHub Dashboard to see the newly indexed file and its extracted metadata.
Next Steps
You've indexed your first file! Here's where to go next:
-
⌨️ Learn the CLI
Learn how to manage files, add tags, create collections, and more, directly from your terminal.
-
🐍 Learn the Python API
Integrate Dorsal into your applications for custom metadata workflows, analysis, and automation.
-
🖥️ Explore DorsalHub
Get oriented with the DorsalHub website. View and organize your indexed files from your dashboard.
-
🧑💻 I want to contribute...
Dorsal is open source, and provided under the Apache 2.0 license. Report an issue, or suggest new features on our GitHub repository.