Model Development & Testing
dorsal.testing
RunModelResult
pydantic-model
Bases: BaseModel
The standardized result object returned by ModelRunner execution steps.
This object encapsulates the output of a single Annotation Model, including its generated data, source identity, execution timing, and any errors encountered.
Fields:
-
name(str) -
source(AnnotationModelSource) -
record(dict[str, Any] | None) -
schema_id(DatasetID | None) -
schema_version(str | None) -
time_taken(int | float | None) -
error(str | None)
error
pydantic-field
A descriptive error message if the model failed, crashed, or if a dependency was not met.
record
pydantic-field
The generated annotation record (dict). None if the model failed, was skipped, or produced no output.
schema_id
pydantic-field
The validation schema/model ID against which this record was validated.
schema_version
pydantic-field
The version of the schema/model against which this record was validated.
source
pydantic-field
Structured metadata identifying the model source (ID, version, variant).
get_json_schema_validator
Prepares a configured jsonschema validator instance for a given schema.
This function first performs structural validation (metaschema check) to ensure the input schema adheres to the rules of the JSON Schema specification (Draft 2020-12).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
schema
|
dict
|
The JSON Schema (as a dictionary) to validate against. |
required |
strict
|
bool
|
If True, performs an added "liveness" check to ensure the schema contains actual validation keywords (e.g., 'type', 'properties'). Defaults to False. |
False
|
Returns:
| Name | Type | Description |
|---|---|---|
JsonSchemaValidatorType |
JsonSchemaValidatorType
|
A callable jsonschema validator instance. |
Raises:
| Type | Description |
|---|---|
TypeError
|
If the input schema is not a dictionary. |
ValueError
|
If the input schema is empty, or if |
SchemaFormatError
|
If the input schema is structurally invalid (fails the metaschema check, e.g., 'type' is not a string/array). |
DorsalError
|
For unexpected errors during initialization. |
Source code in venv/lib/python3.13/site-packages/dorsal/common/validators/json_schema.py
get_open_schema
Loads a built-in Dorsal 'open/' validation schema by its short name.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
name
|
OpenSchemaName
|
The short name of the open schema (e.g., "generic", "llm-output"). Provides autocompletion in supported editors. |
required |
Returns:
| Type | Description |
|---|---|
dict
|
The JSON schema as a Python dictionary. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If the name is not a valid, known schema. |
Source code in venv/lib/python3.13/site-packages/dorsal/file/schemas.py
get_open_schema_validator
Gets the pre-built, cached JsonSchemaValidator instance for a Dorsal 'open/' schema by its short name.
Source code in venv/lib/python3.13/site-packages/dorsal/file/validators/open_schema.py
make_file_extension_dependency
Helper function to create a file extension dependency configuration.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
extensions
|
Sequence[str]
|
A sequence (list or tuple) of file extensions (e.g., [".pdf", ".txt"]). |
required |
silent
|
bool
|
If False, raises an error if the dependency isn't met. |
True
|
Source code in venv/lib/python3.13/site-packages/dorsal/file/dependencies.py
make_media_type_dependency
Helper function to create a media type dependency configuration.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
include
|
Sequence[str] | None
|
A sequence (list or tuple) of media types (e.g., ["application/pdf"]). |
None
|
exclude
|
Sequence[str] | None
|
A sequence (list or tuple) of media types to explicitly exclude. |
None
|
pattern
|
str | Pattern | None
|
A regex pattern to match against the media type. |
None
|
silent
|
bool
|
If False, raises an error if the dependency isn't met. |
True
|
Source code in venv/lib/python3.13/site-packages/dorsal/file/dependencies.py
run_model
run_model(
annotation_model,
file_path,
*,
schema_id=None,
validation_model=None,
dependencies=None,
options=None
)
Tests a single AnnotationModel in isolation.
FileCoreAnnotationModelretrieves base metadata.- (Optional) Checks your model's dependencies
- Runs your model
- Returns the result of your model's execution.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
model_class
|
The custom AnnotationModel class you want to test (e.g., |
required | |
file_path
|
str
|
The absolute path to the file to test against. |
required |
schema_id
|
str | None
|
(Optional) The target schema ID (e.g., "open/generic"). If this is an "open/" schema, the standard validator will be used automatically. |
None
|
validation_model
|
Type[BaseModel] | JsonSchemaValidator | None
|
(Optional) A custom Pydantic model or JsonSchemaValidator. This overrides the automatic validator from 'schema_id'. |
None
|
dependencies
|
list[ModelRunnerDependencyConfig] | ModelRunnerDependencyConfig | None
|
(Optional) A list of dependency configs to check before running. |
None
|
options
|
dict[str, Any] | None
|
(Optional) A dictionary of options to pass to the model's |
None
|
Returns:
| Type | Description |
|---|---|
RunModelResult
|
A RunModelResult object containing your model's output or any errors. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If 'schema_id' is an "open/" schema and a 'validation_model' is also provided, as this is an ambiguous configuration. |
Source code in venv/lib/python3.13/site-packages/dorsal/testing.py
58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 | |
dorsal.file.configs.model_runner
DependencyConfig
pydantic-model
Bases: BaseModel
type(str): The primary identifier of the dependency.checker(CallableImportPath) - Defines the path to the dependency check function - The dependency check function returns a boolean indicating whether the dependency was met - The dependency check function always takes, as input, the list of prior model outputs (which is always at least the base file model result, at index 0)silent(bool): - When set toFalse, raises aDependencyNotMetErrorexception whenever the dependency is not met - When set toTrue, no exception is raised in the case of a dependency not being met.
Fields:
FileExtensionDependencyConfig
pydantic-model
Bases: DependencyConfig
This dependency configures which file extensions to execute a model for.
Fields:
FileSizeDependencyConfig
pydantic-model
Bases: DependencyConfig
This dependency configures a model to run based on file size.
Fields:
-
type(Literal['file_size']) -
checker(CallableImportPath) -
silent(bool) -
min_size(int | None) -
max_size(int | None)
FilenameDependencyConfig
pydantic-model
MediaTypeDependencyConfig
pydantic-model
Bases: DependencyConfig
This dependency configures which Media Types to execute a model for.
You can define the match rule for Media Type using any combination of pattern, include, or exclude.
silent(bool = True): - by default, when not met, the MediaTypeDependencyConfig does not raise. - Set toFalseif you want it to raise an exceptionpattern(str or re.Pattern): match the media type using a regular expression. If the media type matches, the model executesinclude- If the Media Type is in this sequence, the model executesexclude- Exclusion rule: if the media type is in this sequence (even if it matches viapatternorinclude) it is blocked
Fields:
-
type(Literal['media_type']) -
checker(CallableImportPath) -
silent(bool) -
pattern(str | Pattern | None) -
include(set[MediaTypePartString] | None) -
exclude(set[MediaTypePartString] | None)
ModelRunnerPipelineStep
pydantic-model
Bases: BaseModel
Single step in the ModelRunner execution pipeline.
- annotation_model: Two-part path to an Annotation Model.
- dependencies: Rules to trigger execution.
- validation_model: Path to validation logic.
- schema_id: Unique dataset ID.
- options: Runtime options for the model.
- ignore_linter_errors: Skip strict linting if True.
- deactivated: (Optional) If True, this step is skipped. Defaults to False.
Fields:
-
annotation_model(CallableImportPath) -
dependencies(list[ModelRunnerDependencyConfig] | None) -
validation_model(CallableImportPath | dict | None) -
schema_id(DatasetID) -
options(dict[str, Any] | None) -
ignore_linter_errors(bool) -
deactivated(bool)
RunModelResult
pydantic-model
Bases: BaseModel
The standardized result object returned by ModelRunner execution steps.
This object encapsulates the output of a single Annotation Model, including its generated data, source identity, execution timing, and any errors encountered.
Fields:
-
name(str) -
source(AnnotationModelSource) -
record(dict[str, Any] | None) -
schema_id(DatasetID | None) -
schema_version(str | None) -
time_taken(int | float | None) -
error(str | None)
error
pydantic-field
A descriptive error message if the model failed, crashed, or if a dependency was not met.
record
pydantic-field
The generated annotation record (dict). None if the model failed, was skipped, or produced no output.
schema_id
pydantic-field
The validation schema/model ID against which this record was validated.
schema_version
pydantic-field
The version of the schema/model against which this record was validated.
source
pydantic-field
Structured metadata identifying the model source (ID, version, variant).
check_extension_dependency
Check whether the extension is within the scope of the annotation model.
Source code in venv/lib/python3.13/site-packages/dorsal/file/configs/model_runner.py
check_media_type_dependency
Check whether the media type is within the scope of the annotation model.
- Performs exact string matches or regex matches on the full and partial media type.
- Checks both full and partial media type in the following order:
- If in
exclude, return False - If in
includeor matchingpattern, return True - If
includeorpatternwere provided, then it failed. Return False - If neither
includenorpatternwere provided, then it passed. Return True
- If in
Source code in venv/lib/python3.13/site-packages/dorsal/file/configs/model_runner.py
check_name_dependency
Check whether the file's name matches the regex pattern.
Source code in venv/lib/python3.13/site-packages/dorsal/file/configs/model_runner.py
check_size_dependency
Check whether the file size is within the scope of the annotation model.