Skip to content

Welcome to the DorsalHub Documentation!

DorsalHub is your file metadata platform.

Built with developers in mind, DorsalHub is powered by the open-source Dorsal library.

The Quick Start guide below will get you started with Dorsal.


Quick Start 🚀

This guide covers:

  1. Installing Dorsal
  2. Authenticating with DorsalHub (Optional)
  3. Extracting metadata from a file
  4. Pushing metadata to DorsalHub (Requires Authentication)

Install Dorsal 💾

  1. Dorsal is available on PyPI as dorsalhub:

    pip install dorsalhub
    
  2. Once installation is complete, verify the install by running dorsal --version:

    dorsal --version
    

    You should get an output that resembles this, showing you the version of Dorsal which is installed:

    dorsal CLI version: 0.1.0
    

Authenticate with DorsalHub (Optional) 🔑

While Dorsal is a capable offline tool, connecting it to DorsalHub unlocks its full potential.

  1. To authenticate, first generate an API Key on DorsalHub.

  2. Authenticate Dorsal:

    There are two ways to authenticate:

    • Use the dorsal auth login command in your terminal
    • Set the DORSAL_API_KEY environment variable
    • Run dorsal auth login:

      dorsal auth login
      
    • Paste your API key when prompted:

      API Key: ***********************
      🔑 Verifying key with DorsalHub...
      
      ╭──────────  Login Successful ─────────────╮
      │                                          │
      │  User ID:        1230321                 │
      │  Name:           yourname                │
      │  Email:          your.email@example.com  │
      │  Account Status: Member                  │
      │                                          │
      ╰──────────────────────────────────────────╯
      
    • Dorsal is now authenticated!

    • Your API Key is stored in Dorsal's global configuration file (e.g. /home/user/.dorsal/dorsal.toml).

    • For more information on the dorsal auth commands, see the CLI Guide: Autentication

    Set the environment variable DORSAL_API_KEY to your API Key.

    This command for setting environment variables varies by operating system and shell.

    • macOS / Linux:

      export DORSAL_API_KEY="YourAPIKey"
      
    • Windows (PowerShell):

      $env:DORSAL_API_KEY="YourAPIKey"
      
    • Windows (Command Prompt):

      set DORSAL_API_KEY="YourAPIKey"
      

    Setting the DORSAL_API_KeY environment variable will authenticate Dorsal within the current terminal session.

    Once the environment variable is set, can confirm you are logged in by running the dorsal auth whoami command:

    dorsal auth whoami
    

    This command prints your current logged-in status:

    Verifying session with DorsalHub...
    
    ╭───────────  Authenticated User ──────────╮
    │                                          │
    │  User ID:        1230321                 │
    │  Name:           yourname                │
    │  Email:          your.email@example.com  │
    │  Account Status: Member                  │
    │                                          │
    ╰──────────────────────────────────────────╯
    

    This confirms that Dorsal is now authenticated.

    Session Only

    Setting an environment variable this way authenticates Dorsal for your current terminal session only. You will need to set it again if you open a new terminal.

    For a persistent login, use dorsal auth login to save the API Key to the global config file.

    API Key Safety

    Your API Key is a secure credential, just like a password. You must store it safely and never share it.

    For more information about API Keys, including safety tips, see API Keys.

Extract some metadata 🚀

At its heart Dorsal is a toolkit for creating and managing structured metadata records from your files, and it ships with offline metadata extractors for a number of different file types, including PDFs, Office Documents, Video and Audio files.

The quickest way to get started is right from the command line:

  1. Locate a file you'd like to scan, and copy its path.

  2. Use the dorsal file scan command with the path to that file:

    dorsal file scan "docs/PDFSPEC.pdf"
    

    When the scan completes, you should see something similar to this:

    📄 Scanning metadata for PDFSPEC.pdf
    ╭───────────────────────────────── File Record: PDFSPEC.pdf ─────────────────────────────────╮
    │                                                                                            │
    │    Hashes                                                                                  │
    │       SHA-256:  3383fb2ab568ca7019834d438f9a14b9d2ccaa2f37f319373848350005779368           │
    │        BLAKE3:  9abdfb32750a278d5ca550b876e94a72cd8eec82d0e506a127dfb94bd56ca4b2           │
    │          TLSH:  T13465D67BB4C61D6DF893CA46571C579B8B0D71533BAEA58604BDAF0AC6338029AC3F41   │
    │                                                                                            │
    │    File Info                                                                               │
    │     Full Path:  /dev/test/docs/PDFSPEC.pdf                                                 │
    │      Modified:  2025-04-09 15:09:05                                                        │
    │          Name:  PDFSPEC.pdf                                                                │
    │          Size:  1 MiB                                                                      │
    │    Media Type:  application/pdf                                                            │
    │                                                                                            │
    │    Tags                                                                                    │
    │        No tags found.                                                                      │
    │                                                                                            │
    │    Pdf Info                                                                                │
    │            author:  Tim Bienz, Richard Cohn, James R. Meehan                               │
    │             title:  Portable Document Format Reference Manual (v 1.2)                      │
    │           creator:  FrameMaker 5.1.1                                                       │
    │          producer:  Acrobat Distiller 3.0 for Power Macintosh                              │
    │           subject:  Description of the PDF file format                                     │
    │          keywords:  Acrobat PDF                                                            │
    │           version:  1.2                                                                    │
    │        page_count:  394                                                                    │
    │     creation_date:  1996-11-12T03:08:43                                                    │
    │     modified_date:  1996-11-12T07:58:15                                                    │
    │                                                                                            │
    │                                                                                            │
    ╰────────────────────────────────────────────────────────────────────────────────────────────╯
    

    This panel shows the core metadata fields for this record.

  3. You can export the record to JSON straight from the CLI by adding the --json flag

    dorsal file scan "docs/PDFSPEC.pdf" --json
    

    This outputs the JSON to stdout, so you can redirect it to a file or pipe it to other tools:

    dorsal file scan "docs/PDFSPEC.pdf" --json > "example.json"
    

    The JSON output is a fully-validated File Record 👇

    Example File Record: PDFSPEC.pdf

    {
      "hash": "3383fb2ab568ca7019834d438f9a14b9d2ccaa2f37f319373848350005779368",
      "validation_hash": "9abdfb32750a278d5ca550b876e94a72cd8eec82d0e506a127dfb94bd56ca4b2",
      "annotations": {
        "file/base": {
          "record": {
            "hash": "3383fb2ab568ca7019834d438f9a14b9d2ccaa2f37f319373848350005779368",
            "name": "PDFSPEC.pdf",
            "extension": ".pdf",
            "size": 1512313,
            "media_type": "application/pdf",
            "media_type_prefix": "application"
          },
          "source": {
            "type": "Model",
            "model": "dorsal/base",
            "version": "1.0.0"
          }
        },
        "file/pdf": {
          "record": {
            "author": "Tim Bienz, Richard Cohn, James R. Meehan",
            "title": "Portable Document Format Reference Manual (v 1.2)",
            "creator": "FrameMaker 5.1.1",
            "producer": "Acrobat Distiller 3.0 for Power Macintosh",
            "subject": "Description of the PDF file format",
            "keywords": "Acrobat PDF",
            "version": "1.2",
            "page_count": 394,
            "creation_date": "1996-11-12T03:08:43",
            "modified_date": "1996-11-12T07:58:15"
          },
          "private": true,
          "source": {
            "type": "Model",
            "model": "dorsal/pdf",
            "version": "1.0.0",
            "variant": "pypdfium2"
          }
        }
      },
      "tags": [],
      "source": "disk",
      "local_attributes": {
        "date_modified": "2025-04-09 15:09:05.533199+01:00",
        "date_accessed": "2025-11-28 10:37:08.225267+00:00",
        "date_created": "2025-07-17 11:07:52.875623+01:00",
        "file_path": "/dev/test/docs/PDFSPEC.pdf",
        "file_size_bytes": 1512313,
        "file_permissions_mode": 33279,
        "inode": 3940649675394997,
        "number_of_links": 1
      },
      "local_filesystem": {
        "full_path": "/dev/test/docs/PDFSPEC.pdf",
        "date_created": "2025-07-17T11:07:52.875623+01:00",
        "date_modified": "2025-04-09T15:09:05.533199+01:00"
      }
    }
    
    Notice the file/pdf key under annotations stores a separate object housing PDF-specific fields

    For more information on the dorsal file commands, see the full CLI Guide: Files

Index it to DorsalHub

Note

This step requires API-Key authentication.

  1. Use the dorsal file push command to create and publish a structured metadata record to DorsalHub.

    dorsal file push "docs/PDFSPEC.pdf"
    

    When complete it will show something like:

    📡 Preparing to push metadata for PDFSPEC.pdf as a private record...
    ╭─────────────────────────────  Push Complete ──────────────────────────────────╮
    │ The file record was successfully pushed to DorsalHub.                         │
    │                                                                               │
    │ SHA256 Hash: 3383fb2ab568ca7019834d438f9a14b9d2ccaa2f37f319373848350005779368 │
    ╰───────────────────────────────────────────────────────────────────────────────╯
    

    DorsalHub is Private by Default

    When you run dorsal file push "docs/PDFSPEC.pdf", you are telling the server to create a private record about that file.

    Private records are only visible to you.

    To make a public record, you should add the --public argument to the command:

    dorsal file push "docs/PDFSPEC.pdf" --public
    
  2. View it Online ✨

    Head over to your DorsalHub Dashboard to see the newly indexed file and its extracted metadata.


Next Steps

You've indexed your first file! Here's where to go next:

  • ⌨️ Learn the CLI


    Learn how to manage files, add tags, create collections, and more, directly from your terminal.

    See the CLI Guide

  • 🐍 Learn the Python API


    Integrate Dorsal into your applications for custom metadata workflows, analysis, and automation.

    See the Python Guide

  • 🖥️ Explore DorsalHub


    Get oriented with the DorsalHub website. View and organize your indexed files from your dashboard.

    Go to the main website

  • 🧑‍💻 I want to contribute...


    Dorsal is open source, and provided under the Apache 2.0 license. Report an issue, or suggest new features on our GitHub repository.

    View Source on GitHub