Working with Remote File Records
This chapter covers working with File Records which exist on DorsalHub.
We will cover:
- Retrieving and viewing file records
- Adding and removing tags
- Deleting file records
Authentication
This guide requires API-Key authentication with DorsalHub.
See the Quick Start Guide for details on how to authenticate Dorsal.
The DorsalFile Class
When working with file records stored on DorsalHub, most tasks can be achieved using DorsalFile class.
DorsalFile instances represent file records on DorsalHub, and contain core metadata as well as tags and annotations.
To use DorsalFile, import the class and create an instance by pointing it to a valid hash for a file record on DorsalHub:
from dorsal import DorsalFile
# Triggers an API request
df = DorsalFile("137cb58b6e9afd606fd3b0a953aeba6caf0b0f1dc3998e1b889216faa26c037b")
-
On Success: The API returns a File Record (specifically a
FileRecordDateTimeobject). This is stored as themodelattribute of theDorsalFileinstance. -
On Failure: If the hash is not found, a
DorsalClientErroris raised.
Cool! But how do I get the file hash?
The SHA-256 hash of a file is its primary key on DorsalHub.
There are a few ways to get it:
-
If you have the file, you can run
dorsal file scanordorsal file hashin the CLI, or useLocalFileto read it. The file'shashis a top level field. -
If you have ever indexed the file to DorsalHub, you can run
dorsal searchin the CLI, with the name of the file. You can also search directly on DorsalHub -
If you don't have access to the file, and you've never indexed it, you can try the Global Search on DorsalHub, which you can use to search through publicly indexed files (Note: this is a premium feature)
Accessing Metadata
Base Metadata
Just like LocalFile in the previous chapter, the DorsalFile instance exposes some base metadata as top-level attributes:
hash: The file's SHA-256 hash e.g."137cb58b6e9afd606fd3b0a953aeba6caf0b0f1dc3998e1b889216faa26c037b"name: The file's name e.g."Things fall apart.epub"extension: The file's extension e.g.".epub"size: The file's size in bytes e.g.174908size_text: The file's size in human-readable text e.g."171 KiB"media_type: The media type e.g."application/epub+zip"
You can access these fields as attributes on the DorsalFile instance:
from dorsal import DorsalFile
df = DorsalFile("137cb58b6e9afd606fd3b0a953aeba6caf0b0f1dc3998e1b889216faa26c037b")
# Accessing base file properties
print(f"File Name: {df.name}")
print(f"File Size: {df.size_text}")
print(f"Media Type: {df.media_type}")
print(f"SHA-256: {df.hash}")
Output:
File Name: Things fall apart.epub
File Size: 171 KiB
Media Type: application/epub+zip
SHA-256: 137cb58b6e9afd606fd3b0a953aeba6caf0b0f1dc3998e1b889216faa26c037b
Filetype Metadata
As with the LocalFile class in the previous chapter, Filetype-specific metadata, (originally extracted using one of the Core Annotation Models) can be found by accessing its entry in the annotations object directly, or using a named top level attribute:
from dorsal import DorsalFile
df = DorsalFile("137cb58b6e9afd606fd3b0a953aeba6caf0b0f1dc3998e1b889216faa26c037b")
print(df.ebook.model_dump_json(indent=2))
Example Ebook File Record
{
"hash": "137cb58b6e9afd606fd3b0a953aeba6caf0b0f1dc3998e1b889216faa26c037b",
"annotations": {
"file/base": {
"record": {
"hash": "137cb58b6e9afd606fd3b0a953aeba6caf0b0f1dc3998e1b889216faa26c037b",
"name": "Things fall apart.epub",
"extension": ".epub",
"size": 174908,
"media_type": "application/epub+zip",
"media_type_prefix": "application"
},
"source": {
"type": "Model",
"model": "file/base",
"version": "1.0.0"
}
},
"file/ebook": {
"record": {
"title": "Things fall apart",
"authors": [
"Chinua Achebe"
],
"contributors": [],
"publisher": "Penguin",
"subjects": [
"Fiction"
],
"description": "SUMMARY:\nTHINGS FALL APART tells two overlapping, intertwining stories, both of which center around Okonkwo, a \"strong man\" of an Ibo village in Nigeria. The first of these stories traces Okonkwo's fall from grace with the tribal world in which he lives, and in its classical purity of line and economical beauty it provides us with a powerful fable about the immemorial conflict between the individual and society. The second story, which is as modern as the first is ancient, and which elevates the book to a tragic plane, concerns the clash of cultures and the destruction of Okonkwo's world through the arrival of aggressive, proselytizing European missionaries. These twin dramas are perfectly harmonized, and they are modulated by an awareness capable of encompassing at once the life of nature, human history, and the mysterious compulsions of the soul. THINGS FALL APART is the most illuminating and permanent monument we have to the modern African experience as seen from within.",
"language": "English",
"language_code": "eng",
"locale_code": "en",
"isbn": "9780141186887",
"other_identifiers": [
"1937a3ba-99b1-47bf-8225-efeafca7136f"
],
"tools": [
"calibre (0.7.12) [http://calibre-ebook.com]"
],
"cover_path": "cover.jpeg",
"publication_date": "2001-08-14T23:00:00Z"
},
"private": true,
"source": {
"type": "Model",
"model": "dorsal/ebook",
"version": "1.0.0"
}
}
},
"tags": [
{
"id": "6929f7bfeba560c999a562c5",
"name": "language",
"value": "English",
"value_code": "eng",
"private": false,
"hidden": false,
"upvotes": 1,
"downvotes": 0,
"origin": "DorsalHub"
}
],
"date_created": "2025-11-28T19:24:27.248000Z",
"date_modified": "2025-11-28T19:24:27.248000Z"
}
As well as core metadata, all annotations and tags that exist for the file record on DorsalHub are also available on the DorsalFile instance:
Accessing Tags
Just like the LocalFile in the previous chapter, tags are available on the DorsalFile as the tags attribute, which is an array of FileTag objects:
from dorsal import DorsalFile
df = DorsalFile("137cb58b6e9afd606fd3b0a953aeba6caf0b0f1dc3998e1b889216faa26c037b")
if df.tags:
tag = df.tags[0] # Access the first tag in the File Record
print("First tag:")
print(tag.model_dump_json(indent=2)) # Display the tag as a JSON
Output:
First tag:
{
"id": "6929f7bfeba560c999a562c5",
"name": "language",
"value": "English",
"value_code": "eng",
"private": true,
"hidden": false,
"upvotes": 1,
"downvotes": 0,
"origin": "DorsalHub"
}
Accessing User-Contributed Annotations
- On
DorsalFileinstances, only the core annotations are populated on theannotationsattribute. - User-contributed annotations are available as Annotation Stubs on the
annotation_stubsattribute - We can retrieve these Annotation Stubs by calling the
DorsalFile.get_annotationmethod.
Annotation Stubs
A single File Record on DorsalHub may have hundreds of annotations added by multiple different users. To download them all would be inefficient.
A Stub is a lightweight python object containing only summary metadata: the id, source, user_id, and date_modified; as well as a download method to retrieve the annotation content.
DorsalFile.annotation_stubs is a dictionary:
-
Its keys are Validation Schema IDs
-
Its values are arrays of
FileAnnotationStubobjects
Let's inspect our DorsalFile instance and count the non-core annotations:
from dorsal import DorsalFile
df = DorsalFile("137cb58b6e9afd606fd3b0a953aeba6caf0b0f1dc3998e1b889216faa26c037b")
# For each annotation stub array, print the schema ID and the count
for schema_id, stubs in df.annotation_stubs.items():
print(f"'{schema_id}' annotation count: {len(stubs)}")
Which could output something like:
'open/classification' annotation count: 1
'open/document-extraction' annotation count: 2
'open/generic' annotation count: 6
Let's assume we are interested in the open/classification annotation.
First, let's look at the summary information for it.
from dorsal import DorsalFile
df = DorsalFile("137cb58b6e9afd606fd3b0a953aeba6caf0b0f1dc3998e1b889216faa26c037b")
annotations = df.get_annotation("open/classification")
if annotations:
target = annotations[0]
print(target.info)
This will output a python dictionary, containing fields such as "source":
{
'id': UUID('0b2c8d3a-cf1d-4753-aff9-a2afc8d4fcdb'),
'source': {'type': 'Manual', 'detail': 'MyLanguageClassifier v1.2'},
'user_id': 1000001,
'date_modified': datetime.datetime(2025, 11, 5, 16, 4, 19, 278000, tzinfo=TzInfo(0)),
'url': '/files/3383fb2ab568ca7019834d438f9a14b9d2ccaa2f37f319373848350005779368/annotations/0b2c8d3a-cf1d-4753-aff9-a2afc8d4fcdb'
}
id: The globally unique identifier for the annotation. Forms part of the download URLsource: Information about where the annotation came from.user_id: The identifier for the user who added the annotation. This is visible for all public annotations.date_modified: The time the annotation was addedurl: The download URL for the annotation
The annotation stub object also includes a download helper method so we can retrieve the full content of the annotation.
from dorsal import DorsalFile
df = DorsalFile("137cb58b6e9afd606fd3b0a953aeba6caf0b0f1dc3998e1b889216faa26c037b")
annotations = df.get_annotation("open/classification")
if annotations:
target = annotations[0]
full_annotation = target.download()
print(full_annotation.model_dump_json(indent=2))
Here's what a full annotation looks like:
{
"record": {
"labels": [
{
"label": "eng",
"score": 0.95
}
],
"vocabulary": [
"eng",
"fra",
"deu",
]
},
"private": true,
"source": {
"type": "Manual",
"detail": "207c778fbb4ddcf11aea662b"
},
"schema_version": "1.0.0"
}
Managing Tags
The DorsalFile class has some useful methods for managing remote tags:
DorsalFile.add_private_tag- add a new private tag to the File Record on DorsalHub.DorsalFile.add_public_tag- add a new public tag to the File Record on DorsalHub.DorsalFile.delete_tag- delete a tag from the File Record on DorsalHub.
All of DorsalHub's update operations modify the File Record on DorsalHub
Notes:
Tagging Permissions
- You can tag and annotate files you have indexed.
- If you have never indexed a file, you will not be able to tag or annotate it
- You can only delete your own tags.
Adding Tags
-
The example below shows how to tag a file record via a DorsalFile instance:
-
Now let's view the tag we just added:
-
This prints the following JSON representation of a
FileTag: -
For a full guide to the tag fields, see the Tagging System article
Deleting Tags
-
You can only delete your own tags.
-
You delete tags by their ID.
-
In the above example, the tag ID is
"690e4551289584473575f9b1" -
The example below shows how to delete file file tag via the DorsalFile instance:
-
If this is successful, there won't be any output, but the
dfinstance will quietly refresh itself after the deletion is completed. You will immediately see that the tag you targeted is no longer in thetagsarray attribute. -
If the operation fails for any reason (e.g. the tag ID is incorrect, or you are not the tag owner) you will get a
NotFoundError: