Helpers
dorsal.file.helpers
ClassificationLabel
build_classification_record
build_classification_record(
labels,
vocabulary=None,
score_explanation=None,
vocabulary_url=None,
)
Builds a valid 'open/classification' annotation record.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
labels
|
list[str | ClassificationLabel]
|
A list of simple strings (e.g., ["cat", "dog"]) or dictionaries (e.g., [{"label": "cat", "score": 0.95}]). |
required |
vocabulary
|
list[str] | None
|
Optional list of all possible labels. |
None
|
score_explanation
|
str | None
|
Optional string explaining the 'score' field. |
None
|
vocabulary_url
|
str | None
|
Optional URL pointing to an external vocabulary. |
None
|
Returns:
| Type | Description |
|---|---|
dict[str, Any]
|
A dictionary structured to match the 'open/classification' schema. |
Source code in venv/lib/python3.13/site-packages/dorsal/file/helpers.py
build_embedding_record
Builds a valid 'open/embedding' annotation record.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
vector
|
list[float]
|
The embedding vector. |
required |
model
|
str | None
|
Optional name of the model or model used. |
None
|
attributes
|
dict[str, Any] | None
|
Optional arbitrary metadata (max 16 items, flat). |
None
|
Returns:
| Type | Description |
|---|---|
dict[str, Any]
|
A dictionary structured to match the 'open/embedding' schema. |
Source code in venv/lib/python3.13/site-packages/dorsal/file/helpers.py
build_generic_record
Builds a valid 'open/generic' annotation record.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
dict[str, Union[str, int, float, bool, None]]
|
A flat dictionary of key-value pairs. |
required |
description
|
str | None
|
A description of the data (max 256 chars). Can be None. |
None
|
Returns:
| Type | Description |
|---|---|
dict[str, Any]
|
A dictionary structured to match the 'open/generic' schema. |
Source code in venv/lib/python3.13/site-packages/dorsal/file/helpers.py
build_llm_output_record
build_llm_output_record(
model,
response_data,
prompt=None,
language=None,
score=None,
score_explanation=None,
generation_params=None,
generation_metadata=None,
)
Builds a valid 'open/llm-output' annotation record.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
model
|
str
|
The ID or name of the generative model used. |
required |
response_data
|
str | dict[str, Any]
|
The generative output (string or simple dict). |
required |
prompt
|
str | None
|
Optional prompt provided to the model. |
None
|
language
|
str | None
|
Optional 3-letter ISO-639-3 language code. |
None
|
score
|
float | None
|
Optional confidence or evaluation score [-1, 1]. |
None
|
score_explanation
|
str | None
|
Optional explanation of what the score represents. |
None
|
generation_params
|
dict[str, Any] | None
|
Optional dict of parameters sent to the API. |
None
|
generation_metadata
|
dict[str, Any] | None
|
Optional dict of metadata returned by the API. |
None
|
Returns:
| Type | Description |
|---|---|
dict[str, Any]
|
A dictionary structured to match the 'open/llm-output' schema. |
Source code in venv/lib/python3.13/site-packages/dorsal/file/helpers.py
build_location_record
build_location_record(
longitude,
latitude,
id=None,
timestamp=None,
camera_make=None,
camera_model=None,
bbox=None,
properties=None,
)
Builds a valid 'open/geolocation' record for a simple Point.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
longitude
|
float
|
The longitude coordinate. |
required |
latitude
|
float
|
The latitude coordinate. |
required |
id
|
str | int | float | None
|
Optional unique identifier for the feature. |
None
|
timestamp
|
str | None
|
Optional ISO 8601 timestamp. |
None
|
camera_make
|
str | None
|
Optional make of the camera/sensor. |
None
|
camera_model
|
str | None
|
Optional model of the camera/sensor. |
None
|
bbox
|
list[float] | None
|
Optional Bounding Box array (RFC 7946). |
None
|
properties
|
dict[str, Any] | None
|
Optional dictionary of additional properties (GeoJSON 'properties'). Must not exceed 100 items. |
None
|
Returns:
| Type | Description |
|---|---|
dict[str, Any]
|
A dictionary structured to match the 'open/geolocation' schema (GeoJSON Feature). |
Source code in venv/lib/python3.13/site-packages/dorsal/file/helpers.py
build_regression_point
build_regression_point(
value,
*,
statistic=None,
quantile_level=None,
interval_lower=None,
interval_upper=None,
score=None,
timestamp=None,
attributes=None
)
Constructs a validated dictionary for a single regression data point.
This helper is designed to be used when building complex datasets (like time-series or multi-point forecasts) where you need to generate a list of points before wrapping them in a full record.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
value
|
float | None
|
The predicted or sampled value. Can be |
required |
statistic
|
str
|
The statistical nature of this value.
Must be one of: |
None
|
quantile_level
|
float
|
If |
None
|
interval_lower
|
float
|
The lower bound of the confidence interval or prediction interval. |
None
|
interval_upper
|
float
|
The upper bound of the confidence interval or prediction interval. |
None
|
score
|
float
|
A quality or confidence score for this specific point (0.0 to 1.0). |
None
|
timestamp
|
str | datetime
|
The specific time this prediction applies to.
If a |
None
|
attributes
|
dict
|
Arbitrary metadata relevant to this specific point
(e.g., |
None
|
Returns:
| Type | Description |
|---|---|
dict[str, Any]
|
dict[str, Any]: A dictionary representing a single regression point, ready |
dict[str, Any]
|
to be included in the |
Examples:
1. Basic Point
2. Point with Confidence Interval
p = build_regression_point(
value=105.5,
interval_lower=100.0,
interval_upper=110.0,
statistic="mean"
)
3. Building a Time-Series List
data = [("2025-01-01", 50.0), ("2025-01-02", 55.5)]
points = [
build_regression_point(value=price, timestamp=date)
for date, price in data
]
Source code in venv/lib/python3.13/site-packages/dorsal/file/helpers.py
323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 | |
build_regression_record
build_regression_record(
points,
*,
target=None,
unit=None,
producer=None,
score_explanation=None,
attributes=None
)
Builds a full open/regression record from a list of point dictionaries.
Use this function when you have manually constructed a list of points (e.g.
using build_regression_point in a loop) and want to wrap them in the
standard record structure with global metadata.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
points
|
list[dict]
|
A list of point dictionaries. |
required |
target
|
str
|
The name of the variable being predicted (e.g., 'house_price', 'temperature', 'credit_score'). |
None
|
unit
|
str
|
The unit of measurement (e.g., 'USD', 'celsius', 'kg'). |
None
|
producer
|
str
|
The creator (model, tool, or author) of this regression data. |
None
|
score_explanation
|
str
|
A description of what the |
None
|
attributes
|
dict
|
Arbitrary metadata relevant to the entire record. |
None
|
Returns:
| Type | Description |
|---|---|
dict[str, Any]
|
dict[str, Any]: A complete dictionary valid against the |
Examples:
Constructing a Time-Series Record
# 1. Create points
points = [
build_regression_point(value=10, timestamp="2025-01-01"),
build_regression_point(value=12, timestamp="2025-01-02")
]
# 2. Build record
record = build_regression_record(
points=points,
target="daily_active_users",
producer="AnalyticsBot v1"
)
Source code in venv/lib/python3.13/site-packages/dorsal/file/helpers.py
build_single_point_regression_record
build_single_point_regression_record(
value,
*,
target=None,
unit=None,
producer=None,
score_explanation=None,
statistic=None,
quantile_level=None,
interval_lower=None,
interval_upper=None,
score=None,
timestamp=None,
attributes=None
)
Convenience helper to build a full open/regression record containing exactly one point.
This function abstracts away the points array structure for the common use case
of a single scalar prediction or measurement. It combines arguments for both the
record (e.g. target) and the point (e.g. value).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
value
|
float | None
|
The predicted or sampled value. |
required |
target
|
str
|
The name of the variable being predicted. |
None
|
unit
|
str
|
The unit of measurement. |
None
|
producer
|
str
|
The creator of this data. |
None
|
score_explanation
|
str
|
Description of the score metric. |
None
|
statistic
|
str
|
The statistical nature of the value. |
None
|
quantile_level
|
float
|
Level for quantile statistics. |
None
|
interval_lower
|
float
|
Lower bound of confidence interval. |
None
|
interval_upper
|
float
|
Upper bound of confidence interval. |
None
|
score
|
float
|
Quality score for the point. |
None
|
timestamp
|
str | datetime
|
Time of the prediction. |
None
|
attributes
|
dict
|
Attributes for the point. |
None
|
Returns:
| Type | Description |
|---|---|
dict[str, Any]
|
dict[str, Any]: A complete dictionary valid against the |
dict[str, Any]
|
containing a single item in the |
Examples:
Simple Prediction
Source code in venv/lib/python3.13/site-packages/dorsal/file/helpers.py
build_transcription_record
Builds a simple 'open/audio-transcription' record.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
text
|
str
|
The full transcribed text. |
required |
language
|
str | None
|
Optional 3-letter ISO-639-3 language code. |
None
|
track_id
|
str | int | None
|
Optional identifier for the audio track. |
None
|
Returns:
| Type | Description |
|---|---|
dict[str, Any]
|
A dictionary structured to match the 'open/audio-transcription' schema. |