Annotations
An Annotation is a metadata sub-record, which forms part of the overall File Record.
Unlike a Tag (which is a single key-value label), an Annotation is a structured record, typically containing multiple pieces of information about the file.
All Annotations have a predictable structure, conforming to a named schema.
Their strictly-validated nature makes it easy to include Annotations as part of automated scripts and workflows.
Annotation Structure
Every annotation includes three top-level fields: file_hash, record, and source.
Example: an open/classification annotation:
{
"file_hash": "abcd1234efab5678abcd1234efab5678abcd1234efab5678abcd1234efab5678",
"record": {
"labels": [
{ "label": "legal-contract", "score": 0.98 }
],
"vocabulary": ["invoice", "legal-contract", "memo"]
},
"source": {
"type": "Model",
"model": "DocTypeClassifier",
"version": "1.0.0",
"variant": "large"
}
}
file_hash
Every annotation is linked to a specific file. The the SHA-256 hash is the unique identifier for the file to which the annotation pertains
record
This is the content of the annotation. Its structure is defined by the schema it is validated against.
In the example above, the record conforms to the open/classification, and contains a list of labels and scores, as well as a vocabulary listing all possible labels.
source
This provides information about the annotation itself:
type: This tells us how the annotation was created. This will be either"Manual"or"Model"."Manual": The annotation was added directly by the script or user."Model": The annotation was generated by the model named in the "model" field, as part of an Annotation Pipeline.
Depending on the type of the annotation, additional fields may be included:
model:: ("Model"-type) The name of the Annotation Model which generated therecord(e.g."DocTypeClassifier")version: ("Model"-type) Optionally tells us which version of the namedmodelwhich generated therecord(e.g."1.0.0")variant: ("Model"-type) Optionally provides additional detail about themodelwhich generated therecord(e.g."large","nano")detail: ("Manual"-type) Describes the source of a manually-added annotation (e.g."human-reviewer-2,"Backup-2025-02-01.sql")
Validation and Schemas
Annotations are all strictly validated against a known, named schema.
See: validation Schemas