Skip to content

Working with Remote File Records

This chapter covers working with File Records which exist on DorsalHub.

We will cover:

  • Retrieving and viewing file records
  • Adding and removing tags
  • Deleting file records

Authentication

This guide requires API-Key authentication with DorsalHub.

See the Quick Start Guide for details on how to authenticate Dorsal.


The DorsalFile Class

When working with file records stored on DorsalHub, most tasks can be achieved using DorsalFile class.

DorsalFile instances represent file records on DorsalHub, and contain core metadata as well as tags and annotations.

To use DorsalFile, import the class and create an instance by pointing it to a valid hash for a file record on DorsalHub:

Creating a DorsalFile instance
from dorsal import DorsalFile

# Triggers an API request
df = DorsalFile("137cb58b6e9afd606fd3b0a953aeba6caf0b0f1dc3998e1b889216faa26c037b")
  • On Success: The API returns a File Record (specifically a FileRecordDateTime object). This is stored as the model attribute of the DorsalFile instance.

  • On Failure: If the hash is not found, a DorsalClientError is raised.

Cool! But how do I get the file hash?

The SHA-256 hash of a file is its primary key on DorsalHub.

There are a few ways to get it:

  • If you have the file, you can run dorsal file scan or dorsal file hash in the CLI, or use LocalFile to read it. The file's hash is a top level field.

  • If you have ever indexed the file to DorsalHub, you can run dorsal search in the CLI, with the name of the file. You can also search directly on DorsalHub

  • If you don't have access to the file, and you've never indexed it, you can try the Global Search on DorsalHub, which you can use to search through publicly indexed files (Note: this is a premium feature)

Accessing Metadata

Base Metadata

Just like LocalFile in the previous chapter, the DorsalFile instance exposes some base metadata as top-level attributes:

  • hash: The file's SHA-256 hash e.g. "137cb58b6e9afd606fd3b0a953aeba6caf0b0f1dc3998e1b889216faa26c037b"
  • name: The file's name e.g. "Things fall apart.epub"
  • extension: The file's extension e.g. ".epub"
  • size: The file's size in bytes e.g. 174908
  • size_text: The file's size in human-readable text e.g. "171 KiB"
  • media_type: The media type e.g. "application/epub+zip"

You can access these fields as attributes on the DorsalFile instance:

Accessing attributes on a DorsalFile instance
from dorsal import DorsalFile

df = DorsalFile("137cb58b6e9afd606fd3b0a953aeba6caf0b0f1dc3998e1b889216faa26c037b")

# Accessing base file properties
print(f"File Name: {df.name}")
print(f"File Size: {df.size_text}")
print(f"Media Type: {df.media_type}")
print(f"SHA-256: {df.hash}")

Output:

File Name: Things fall apart.epub
File Size: 171 KiB
Media Type: application/epub+zip
SHA-256: 137cb58b6e9afd606fd3b0a953aeba6caf0b0f1dc3998e1b889216faa26c037b

Filetype Metadata

As with the LocalFile class in the previous chapter, Filetype-specific metadata, (originally extracted using one of the Core Annotation Models) can be found by accessing its entry in the annotations object directly, or using a named top level attribute:

Show Core Ebook Annotation Record
from dorsal import DorsalFile

df = DorsalFile("137cb58b6e9afd606fd3b0a953aeba6caf0b0f1dc3998e1b889216faa26c037b")

print(df.ebook.model_dump_json(indent=2))
Example Ebook File Record
{
  "hash": "137cb58b6e9afd606fd3b0a953aeba6caf0b0f1dc3998e1b889216faa26c037b",
  "annotations": {
    "file/base": {
      "record": {
        "hash": "137cb58b6e9afd606fd3b0a953aeba6caf0b0f1dc3998e1b889216faa26c037b",
        "name": "Things fall apart.epub",
        "extension": ".epub",
        "size": 174908,
        "media_type": "application/epub+zip",
        "media_type_prefix": "application"
      },
      "source": {
        "type": "Model",
        "model": "file/base",
        "version": "1.0.0"
      }
    },
    "file/ebook": {
      "record": {
        "title": "Things fall apart",
        "authors": [
          "Chinua Achebe"
        ],
        "contributors": [],
        "publisher": "Penguin",
        "subjects": [
          "Fiction"
        ],
        "description": "SUMMARY:\nTHINGS FALL APART tells two overlapping, intertwining stories, both of which center around Okonkwo, a \"strong man\" of an Ibo village in Nigeria. The first of these stories traces Okonkwo's fall from grace with the tribal world in which he lives, and in its classical purity of line and economical beauty it provides us with a powerful fable about the immemorial conflict between the individual and society. The second story, which is as modern as the first is ancient, and which elevates the book to a tragic plane, concerns the clash of cultures and the destruction of Okonkwo's world through the arrival of aggressive, proselytizing European missionaries. These twin dramas are perfectly harmonized, and they are modulated by an awareness capable of encompassing at once the life of nature, human history, and the mysterious compulsions of the soul. THINGS FALL APART is the most illuminating and permanent monument we have to the modern African experience as seen from within.",
        "language": "English",
        "language_code": "eng",
        "locale_code": "en",
        "isbn": "9780141186887",
        "other_identifiers": [
          "1937a3ba-99b1-47bf-8225-efeafca7136f"
        ],
        "tools": [
          "calibre (0.7.12) [http://calibre-ebook.com]"
        ],
        "cover_path": "cover.jpeg",
        "publication_date": "2001-08-14T23:00:00Z"
      },
      "private": true,
      "source": {
        "type": "Model",
        "model": "dorsal/ebook",
        "version": "1.0.0"
      }
    }
  },
  "tags": [
    {
      "id": "6929f7bfeba560c999a562c5",
      "name": "language",
      "value": "English",
      "value_code": "eng",
      "private": false,
      "hidden": false,
      "upvotes": 1,
      "downvotes": 0,
      "origin": "DorsalHub"
    }
  ],
  "date_created": "2025-11-28T19:24:27.248000Z",
  "date_modified": "2025-11-28T19:24:27.248000Z"
}

As well as core metadata, all annotations and tags that exist for the file record on DorsalHub are also available on the DorsalFile instance:

Accessing Tags

Just like the LocalFile in the previous chapter, tags are available on the DorsalFile as the tags attribute, which is an array of FileTag objects:

Display all tags
from dorsal import DorsalFile

df = DorsalFile("137cb58b6e9afd606fd3b0a953aeba6caf0b0f1dc3998e1b889216faa26c037b")

if df.tags:
    tag = df.tags[0]  # Access the first tag in the File Record
    print("First tag:")
    print(tag.model_dump_json(indent=2))  # Display the tag as a JSON

Output:

First tag:
{
  "id": "6929f7bfeba560c999a562c5",
  "name": "language",
  "value": "English",
  "value_code": "eng",
  "private": true,
  "hidden": false,
  "upvotes": 1,
  "downvotes": 0,
  "origin": "DorsalHub"
}

Accessing User-Contributed Annotations

  • On DorsalFile instances, only the core annotations are populated on the annotations attribute.
  • User-contributed annotations are available as Annotation Stubs on the annotation_stubs attribute
  • We can retrieve these Annotation Stubs by calling the DorsalFile.get_annotation method.

Annotation Stubs

A single File Record on DorsalHub may have hundreds of annotations added by multiple different users. To download them all would be inefficient.

A Stub is a lightweight python object containing only summary metadata: the id, source, user_id, and date_modified; as well as a download method to retrieve the annotation content.

DorsalFile.annotation_stubs is a dictionary:

Let's inspect our DorsalFile instance and count the non-core annotations:

Retrieve a Non-Core Annotation
from dorsal import DorsalFile

df = DorsalFile("137cb58b6e9afd606fd3b0a953aeba6caf0b0f1dc3998e1b889216faa26c037b")

# For each annotation stub array, print the schema ID and the count
for schema_id, stubs in df.annotation_stubs.items():
    print(f"'{schema_id}' annotation count: {len(stubs)}")

Which could output something like:

'open/classification' annotation count: 1
'open/document-extraction' annotation count: 2
'open/generic' annotation count: 6

Let's assume we are interested in the open/classification annotation.

First, let's look at the summary information for it.

Inspect a Non-Core Annotation
from dorsal import DorsalFile

df = DorsalFile("137cb58b6e9afd606fd3b0a953aeba6caf0b0f1dc3998e1b889216faa26c037b")

annotations = df.get_annotation("open/classification")
if annotations:
    target = annotations[0]
    print(target.info)

This will output a python dictionary, containing fields such as "source":

{
    'id': UUID('0b2c8d3a-cf1d-4753-aff9-a2afc8d4fcdb'),
    'source': {'type': 'Manual', 'detail': 'MyLanguageClassifier v1.2'},
    'user_id': 1000001,
    'date_modified': datetime.datetime(2025, 11, 5, 16, 4, 19, 278000, tzinfo=TzInfo(0)),
    'url': '/files/3383fb2ab568ca7019834d438f9a14b9d2ccaa2f37f319373848350005779368/annotations/0b2c8d3a-cf1d-4753-aff9-a2afc8d4fcdb'
}
  • id: The globally unique identifier for the annotation. Forms part of the download URL
  • source: Information about where the annotation came from.
  • user_id: The identifier for the user who added the annotation. This is visible for all public annotations.
  • date_modified: The time the annotation was added
  • url: The download URL for the annotation

The annotation stub object also includes a download helper method so we can retrieve the full content of the annotation.

Download a Non-Core Annotation
from dorsal import DorsalFile

df = DorsalFile("137cb58b6e9afd606fd3b0a953aeba6caf0b0f1dc3998e1b889216faa26c037b")

annotations = df.get_annotation("open/classification")
if annotations:
    target = annotations[0]
    full_annotation = target.download()
    print(full_annotation.model_dump_json(indent=2))

Here's what a full annotation looks like:

{
  "record": {
    "labels": [
      {
        "label": "eng",
        "score": 0.95
      }
    ],
    "vocabulary": [
      "eng",
      "fra",
      "deu",
    ]
  },
  "private": true,
  "source": {
    "type": "Manual",
    "detail": "207c778fbb4ddcf11aea662b"
  },
  "schema_version": "1.0.0"
}

Managing Tags

The DorsalFile class has some useful methods for managing remote tags:

All of DorsalHub's update operations modify the File Record on DorsalHub

Notes:

Tagging Permissions

  • You can tag and annotate files you have indexed.
  • If you have never indexed a file, you will not be able to tag or annotate it
  • You can only delete your own tags.

Adding Tags

  • The example below shows how to tag a file record via a DorsalFile instance:

    Tag a Remote File Record
    from dorsal import DorsalFile
    
    df = DorsalFile("137cb58b6e9afd606fd3b0a953aeba6caf0b0f1dc3998e1b889216faa26c037b")
    
    df.add_private_tag(
        name="current_page", value=44
    )
    
  • Now let's view the tag we just added:

    View a Remote File Record Tag
    from dorsal import DorsalFile
    
    df = DorsalFile("137cb58b6e9afd606fd3b0a953aeba6caf0b0f1dc3998e1b889216faa26c037b")
    
    for tag in df.tags:
        if tag.name == "current_page":
            print(tag.model_dump_json(indent=2))
            break
    
  • This prints the following JSON representation of a FileTag:

    {
      "id": "690e4551289584473575f9b1",
      "name": "current_page",
      "value": 44,
      "value_code": null,
      "private": true,
      "hidden": false,
      "upvotes": 1,
      "downvotes": 0,
      "origin": "DorsalHub"
    }
    
  • For a full guide to the tag fields, see the Tagging System article

Deleting Tags

  • You can only delete your own tags.

  • You delete tags by their ID.

  • In the above example, the tag ID is "690e4551289584473575f9b1"

  • The example below shows how to delete file file tag via the DorsalFile instance:

    Tag a Remote File Record
    from dorsal import DorsalFile
    
    df = DorsalFile("137cb58b6e9afd606fd3b0a953aeba6caf0b0f1dc3998e1b889216faa26c037b")
    
    tag_id = "690e4551289584473575f9b1" 
    
    df.delete_tag(tag_id=tag_id)
    
  • If this is successful, there won't be any output, but the df instance will quietly refresh itself after the deletion is completed. You will immediately see that the tag you targeted is no longer in the tags array attribute.

  • If the operation fails for any reason (e.g. the tag ID is incorrect, or you are not the tag owner) you will get a NotFoundError:

    dorsal.common.exceptions.NotFoundError: Resource not found: Tag with id '690e4551289584473575f9b1' not found.
    

➡️ Continue to 3. Automation & Analysis