Skip to content

Annotations

An Annotation is a metadata sub-record, which forms part of the overall File Record.

Unlike a Tag (which is a single key-value label), an Annotation is a structured record, typically containing multiple pieces of information about the file.

All Annotations have a predictable structure, conforming to a named schema.

Their strictly-validated nature makes it easy to include Annotations as part of automated scripts and workflows.

Annotation Structure

Every annotation includes three top-level fields: file_hash, record, and source.

Example: an open/classification annotation:

{
  "file_hash": "abcd1234efab5678abcd1234efab5678abcd1234efab5678abcd1234efab5678",
  "record": {
    "labels": [
      { "label": "legal-contract", "score": 0.98 }
    ],
    "vocabulary": ["invoice", "legal-contract", "memo"]
  },
  "source": {
    "type": "Model",
    "model": "DocTypeClassifier",
    "version": "1.0.0",
    "variant": "large"
  }
}

file_hash

Every annotation is linked to a specific file. The the SHA-256 hash is the unique identifier for the file to which the annotation pertains

record

This is the content of the annotation. Its structure is defined by the schema it is validated against.

In the example above, the record conforms to the open/classification, and contains a list of labels and scores, as well as a vocabulary listing all possible labels.

source

This provides information about the annotation itself:

  • type: This tells us how the annotation was created. This will be either "Manual" or "Model".
    • "Manual": The annotation was added directly by the script or user.
    • "Model": The annotation was generated by the model named in the "model" field, as part of an Annotation Pipeline.

Depending on the type of the annotation, additional fields may be included:

  • model:: ("Model"-type) The name of the Annotation Model which generated the record (e.g. "DocTypeClassifier")
  • version: ("Model"-type) Optionally tells us which version of the named model which generated the record (e.g. "1.0.0")
  • variant: ("Model"-type) Optionally provides additional detail about the model which generated the record (e.g. "large", "nano")
  • detail: ("Manual"-type) Describes the source of a manually-added annotation (e.g. "human-reviewer-2, "Backup-2025-02-01.sql")

Validation and Schemas

Annotations are all strictly validated against a known, named schema.

See: validation Schemas