AnnotationModel
dorsal.common.model.AnnotationModel
The abstract base class for all Annotation Models in the pipeline.
An Annotation Model is processes a file and returns a structured dictionary of metadata (an Annotation).
See: AnnotationModel docs
1. The Input Contract (Attributes)
When the ModelRunner instantiates your model, it automatically populates instance attributes before calling main().
Attributes:
| Name | Type | Description |
|---|---|---|
file_path |
str
|
The absolute path to the file on disk. |
media_type |
str | None
|
The IANA Media Type (e.g., 'application/pdf') identified by the Base model. |
extension |
str | None
|
The file extension (lowercase, e.g., '.docx'). |
size |
int | None
|
The file size in bytes. |
hash |
str | None
|
The SHA-256 hash of the file. |
name |
str | None
|
The filename (e.g., 'report.pdf'). |
follow_symlinks |
bool
|
Defaults to |
2. The Output Contract (Return Values)
Your subclass must implement the main() method, which must return:
dict: A dictionary containing the extracted metadata. Validates against the schema ID configured for this model in the pipeline.None: To indicate that the model ran successfully but found no relevant data, or encountered a handled error.
3. Identity (Class Attributes)
To ensure annotations are traceable and unique, your subclass must define:
id(str): A global identifier (e.g., "github:dorsalhub/pdf-model").version(str): Semantic version of the model logic (e.g., "1.0.0").variant(str, optional): Specific engine or config used (e.g., "v2-large").
4. Error Handling
Do not raise exceptions for expected failures. Call self.set_error("Reason") and return None.
Initializes the model, setting the file_path.
Source code in venv/lib/python3.12/site-packages/dorsal/common/model.py
__init_subclass__
Sets 'id' and 'version' if they are not provided.
Source code in venv/lib/python3.12/site-packages/dorsal/common/model.py
log_debug
Logs a debug message with standardized model context.
Source code in venv/lib/python3.12/site-packages/dorsal/common/model.py
main
The main entrypoint for the annotation model. This method must be implemented by the subclass.
- On success: return a dictionary of the annotation data.
- On graceful failure: call self._set_error("reason") and return None.
- On critical failure: raise an Exception (e.g., a missing dependency).
Source code in venv/lib/python3.12/site-packages/dorsal/common/model.py
set_error
Sets a graceful error message for the model and logs it.
Call this and return None from main() for non-critical,
expected failures (e.g., file is not the right type).