Core Annotation Models
Core Annotation Models extract core metadata fields from files.
Core metadata fields simply describe something meaningful about a certain kind of file. Examples of a core fields would be page_count, which is relevant for certain document types, such as PDF, or size, which is relevant for any file of any type.
Most Core Annotation Models are specific to one or more file types, such as PDF documents or Zip Archives.
This page details the core Annotation models currently available in Dorsal and the file types they support.
In each case a code example is provided, however, in practice you would almost always run the models as part of a pipeline orchestrated by the ModelRunner class.
| Model | Schema ID | Extracts | Runs On |
|---|---|---|---|
| FileCoreAnnotationModel | file/base |
General metadata (file hashes, name, size, media type). | All files |
| EbookAnnotationModel | file/ebook |
Ebook metadata (e.g. title, authors, isbn). | Ebooks (Only EPUB currently supported) |
| OfficeAnnotationModel | file/office |
Office Doc metadata (e.g. author, page_count, sheets). | Audio/Video/Image file formats |
| MediaInfoAnnotationModel | file/mediainfo |
Media file metadata (e.g. codecs, duration, dimensions). | Office Documents (.docx, .xlsx, .pptx) |
| PDFAnnotationModel | file/pdf |
PDF-specific metadata (e.g. pages, creator, dates). | PDF documents |
dorsal.file.annotation_models.base.model.FileCoreAnnotationModel
Bases: AnnotationModel
Annotation model for extracting core file metadata.
- Calculate file hashes (SHA256, TLSH, BLAKE3 and QUICK).
- Determine basic file attributes: name, extension, size and media type.
- This model is designed for use in the
ModelRunner - Its
mainmethod outputs a dictionary conforming toFileCoreValidationModel.
Source code in venv/lib/python3.13/site-packages/dorsal/common/model.py
main
Main execution method for the FileCoreAnnotationModel.
Orchestrates the extraction of all fundamental file metadata.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
calculate_similarity_hash
|
bool
|
If True, the TLSH similarity hash will be calculated and included in the results. Defaults to False. |
False
|
Returns:
| Type | Description |
|---|---|
dict[str, Any] | None
|
A dictionary containing the extracted file metadata, conforming to |
dict[str, Any] | None
|
the structure expected by |
dict[str, Any] | None
|
Returns None if a recoverable error specific to this model's logic occurs |
dict[str, Any] | None
|
and |
dict[str, Any] | None
|
critical OS/IO errors propagate). |
Raises:
| Type | Description |
|---|---|
(FileNotFoundError, IOError, OSError)
|
If critical issues occur during file access (e.g., for hashing, size, media type determination). These are expected to be caught by the ModelRunner. |
Source code in venv/lib/python3.13/site-packages/dorsal/file/annotation_models/base/model.py
117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 | |
Code Example
FileCoreAnnotationModel is the most general of the Core Annotation Models, running on every file as the first step in the default Annotation Model pipeline.
It extracts general file metadata, including file hashes, name, size and media type.
In the example below, the FileCoreAnnotationModel is run for a single file:
from dorsal.file.annotation_models.base.model import FileCoreAnnotationModel
model = FileCoreAnnotationModel("./big_buck_bunny_1080p_h264.mov") # Path to file on local system
model.main(calculate_similarity_hash=True) # The output of the `main` method returns the annotation
Output
{
"hash": "dc2146a2b1172def56730143ad80cd1825b7fad15f1fc9c23a4e7d01a741ac11",
"similarity_hash": "T1AE7933F5A7329607815E36F8EB025F12DC44FC931E3E976A339B12B91E853256C63B18",
"quick_hash": "9810dd75bba04f061f0ea52021930edd2d24fc895850b8eeeb659d17e13b64a6",
"all_hashes": [
{
"id": "SHA-256",
"value": "dc2146a2b1172def56730143ad80cd1825b7fad15f1fc9c23a4e7d01a741ac11"
},
{
"id": "BLAKE3",
"value": "6e2705bec1ae55bbf3ddd0d44305bf83fa847339bb32f05494a307b7ff223ac4"
},
{
"id": "TLSH",
"value": "T1AE7933F5A7329607815E36F8EB025F12DC44FC931E3E976A339B12B91E853256C63B18"
},
{
"id": "QUICK",
"value": "9810dd75bba04f061f0ea52021930edd2d24fc895850b8eeeb659d17e13b64a6"
}
],
"name": "big_buck_bunny_1080p_h264.mov",
"extension": ".mov",
"size": 725106140,
"media_type": "video/quicktime"
}
dorsal.file.annotation_models.ebook.model.EbookAnnotationModel
Bases: AnnotationModel
Extracts metadata from common ebook formats (e.g., EPUB, MOBI).
Source code in venv/lib/python3.13/site-packages/dorsal/common/model.py
main
Extracts metadata by dispatching to the correct format-specific parser.
Returns:
| Type | Description |
|---|---|
dict[str, Any] | None
|
|
dict[str, Any] | None
|
|
Source code in venv/lib/python3.13/site-packages/dorsal/file/annotation_models/ebook/model.py
Code Example
EbookAnnotationModel extracts metadata from Ebook files. Currently only EPUB format is supported. Fields extracted include title, authors, publisher and isbn.
In the example below, the EbookAnnotationModel is run for a single file:
from dorsal.file.annotation_models.epub.model import EbookAnnotationModel
model = EbookAnnotationModel("./books/Stephenson - The Diamond Age.epub") # Path to file on local system
model.main() # The output of the `main` method returns the annotation
Output
{
'title': 'The Diamond Age',
'authors': ['Neal Stephenson'],
'contributors': [],
'publisher': 'Random House Publishing Group',
'language': 'English',
'subjects': [],
'description': None,
'rights': 'Copyright 2003',
'isbn': '9780553898200',
'other_identifiers': [],
'cover_path': None,
'tools': [],
'publication_date': datetime.datetime(2003, 6, 18, 0, 0),
'creation_date': None,
'modification_date': None
}
| File Extension | Media Type |
|---|---|
| .epub | application/epub+zip |
dorsal.file.annotation_models.office_document.model.OfficeDocumentAnnotationModel
Bases: AnnotationModel
Extracts metadata from Microsoft Office formats (OOXML: .docx, .xlsx, .pptx). This model acts as a dispatcher, calling the correct stdlib-based parser.
Source code in venv/lib/python3.13/site-packages/dorsal/common/model.py
main
Dispatches to the correct format-specific parser based on media_type.
Source code in venv/lib/python3.13/site-packages/dorsal/file/annotation_models/office_document/model.py
Code Example
OfficeDocumentAnnotationModel extracts metadata from Microsoft Office formats (OOXML: .docx, .xlsx, .pptx). It acts as a dispatcher, calling the correct format-specific parser based on the file's media type.
It extracts common properties like author, title, and creation date, as well as format-specific details, such as page/word counts for Word documents, sheet names for Excel, and slide counts for PowerPoint.
In the example below, the OfficeDocumentAnnotationModel is run for a single .docx file:
from dorsal.file.annotation_models.office_document.model import OfficeDocumentAnnotationModel
model = OfficeDocumentAnnotationModel("./reports/Q3_Report.docx") # Path to file on local system
model.main() # The output of the `main` method returns the annotation
Output (for a .docx file)
{
'author': 'Jane Doe',
'last_modified_by': 'John Smith',
'title': 'Q3 Financial Report',
'subject': 'Quarterly Earnings',
'keywords': ['finance', 'report', 'Q3'],
'revision': 3,
'creation_date': datetime.datetime(2025, 10, 1, 9, 0, 0),
'modified_date': datetime.datetime(2025, 10, 5, 14, 30, 15),
'application_name': 'Microsoft Office Word',
'application_version': '16.0300',
'template': 'CompanyReport.dotx',
'structural_parts': [
'application/vnd.openxmlformats-officedocument.wordprocessingml.document.main+xml',
'application/vnd.openxmlformats-officedocument.wordprocessingml.comments+xml',
'application/vnd.openxmlformats-officedocument.wordprocessingml.styles+xml',
],
'has_comments': True,
'custom_properties': {
'Department': 'Finance',
'Status': 'Draft'
},
'language': 'English',
'language_code': 'eng',
'locale_code': 'en-US',
'default_font': 'Calibri',
'all_fonts': ['Calibri', 'Times New Roman'],
'is_password_protected': False,
'word': {
'page_count': 15,
'word_count': 3450,
'char_count': 18970,
'paragraph_count': 120,
'has_track_changes': True,
'hyperlinks': ['https://dorsalhub.com'],
'embedded_images': ['word/media/image1.png']
},
'excel': None,
'powerpoint': None
}
| File Extension | Media Type |
|---|---|
| .docx | application/vnd.openxmlformats-officedocument.wordprocessingml.document |
| .xlsx | application/vnd.openxmlformats-officedocument.spreadsheetml.sheet |
| .pptx | application/vnd.openxmlformats-officedocument.presentationml.presentation |
dorsal.file.annotation_models.pdf.model.PDFAnnotationModel
Bases: AnnotationModel
Extract metadata from PDF files using pypdfium2.
Source code in venv/lib/python3.13/site-packages/dorsal/common/model.py
main
Extract, normalize, and return metadata from the PDF file.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
password
|
str | None
|
Optional password from pipeline config. |
None
|
Returns:
| Type | Description |
|---|---|
dict[str, Any] | None
|
Dictionary of normalized PDF metadata if successful. |
dict[str, Any] | None
|
None if the PDF cannot be read or essential metadata extraction fails, with |
Raises:
| Type | Description |
|---|---|
ImportError
|
If |
Source code in venv/lib/python3.13/site-packages/dorsal/file/annotation_models/pdf/model.py
Code Example
PDFAnnotationModel extracts metadata from PDF documents, such as page count, creator, version, and creation dates, using the pypdfium2 library.
In the example below, the PDFAnnotationModel is run for a single file:
from dorsal.file.annotation_models.pdf.model import PDFAnnotationModel
model = PDFAnnotationModel("./PDFSPEC.pdf") # Path to file on local system
model.main() # The output of the `main` method returns the annotation
Output
{
'creator': 'FrameMaker 5.1.1',
'producer': 'Acrobat Distiller 3.0 for Power Macintosh',
'creation_date': datetime.datetime(1996, 11, 12, 3, 8, 43),
'author': 'Tim Bienz, Richard Cohn, James R. Meehan',
'modified_date': datetime.datetime(1996, 11, 12, 7, 58, 15),
'title': 'Portable Document Format Reference Manual (v 1.2)',
'subject': 'Description of the PDF file format',
'keywords': 'Acrobat PDF',
'version': '1.2',
'page_count': 394
}
| File Extension | Media Type |
|---|---|
application/pdf |
dorsal.file.annotation_models.mediainfo.model.MediaInfoAnnotationModel
Bases: AnnotationModel
Extract metadata from media files using the pymediainfo library.
This model parses the output of MediaInfo (obtained as JSON) and organizes it into a structured dictionary with a main "General" track and lists for other track types (Video, Audio, Text, etc.).
Source code in venv/lib/python3.13/site-packages/dorsal/common/model.py
main
Extract, normalize, and structure metadata from the media file.
Returns:
| Type | Description |
|---|---|
dict[str, Any] | None
|
|
dict[str, Any] | None
|
|
Source code in venv/lib/python3.13/site-packages/dorsal/file/annotation_models/mediainfo/model.py
Code Example
MediaInfoAnnotationModel extracts technical metadata from a wide variety of audio, video, and image files using the underlying MediaInfo library (via pymediainfo).
In the example below, the MediaInfoAnnotationModel is run for a single file:
from dorsal.file.annotation_models.mediainfo.model import MediaInfoAnnotationModel
model = MediaInfoAnnotationModel("./big_buck_bunny_1080p_h264.mov") # Path to file on local system
model.main()
Output
{
"Audio_Codec_List": "AAC LC",
"AudioCount": 1,
"Audio_Channels_Total": 6,
"Audio_Format_List": "AAC LC",
"Audio_Format_WithHint_List": "AAC LC",
"Audio_Language_List": "English",
"CodecID_Compatible": "qt ",
"CodecID": "qt ",
"CodecID_String": "qt 2005.03 (qt )",
"CodecID_Url": "http://www.apple.com/quicktime/download/standalone.html",
"CodecID_Version": "2005.03",
"Count": 359,
"DataSize": 724711422,
"Duration": 596.462,
"Duration_String1": "9 min 56 s 462 ms",
"Duration_String2": "9 min 56 s",
"Duration_String3": "00:09:56.462",
"Duration_String4": "00:09:56:11",
"Duration_String5": "00:09:56.462 (00:09:56:11)",
"Duration_String": "9 min 56 s",
"Encoded_Date": "2008-05-27 18:40:35 UTC",
"Encoded_Library": "Apple QuickTime 7.4.1",
"Encoded_Library_Name": "Apple QuickTime",
"Encoded_Library_String": "Apple QuickTime 7.4.1",
"Encoded_Library_Version": "7.4.1",
"FileExtension": "mov",
"File_Modified_Date_Local": "2024-05-16 16:58:01",
"File_Modified_Date": "2024-05-16 15:58:01 UTC",
"FileNameExtension": "big_buck_bunny_1080p_h264.mov",
"FileName": "big_buck_bunny_1080p_h264",
"FileSize": "725106140",
"FileSize_String1": "692 MiB",
"FileSize_String2": "692 MiB",
"FileSize_String3": "692 MiB",
"FileSize_String4": "691.5 MiB",
"FileSize_String": "692 MiB",
"FooterSize": 0,
"Format_Commercial": "MPEG-4",
"Format_Extensions": "braw mov mp4 m4v m4a m4b m4p m4r 3ga 3gpa 3gpp 3gp 3gpp2 3g2 k3g jpm jpx mqv ismv isma ismt f4a f4b f4v",
"Format": "MPEG-4",
"Format_Profile": "QuickTime",
"Format_String": "MPEG-4",
"FrameCount": 14315,
"FrameRate": 24.0,
"FrameRate_String": "24.000 FPS",
"HeaderSize": 394718,
"InternetMediaType": "video/mp4",
"IsStreamable": "Yes",
"Other_Codec_List": "QuickTime TC",
"OtherCount": 1,
"Other_Format_List": "QuickTime TC",
"Other_Format_WithHint_List": "QuickTime TC",
"Other_Language_List": "English",
"OverallBitRate": 9725429.0,
"OverallBitRate_String": "9 725 kb/s",
"StreamCount": 1,
"StreamKindID": 0,
"StreamKind": "General",
"StreamKind_String": "General",
"StreamSize": 395728,
"StreamSize_Proportion": "0.00055",
"StreamSize_String1": "386 KiB",
"StreamSize_String2": "386 KiB",
"StreamSize_String3": "386 KiB",
"StreamSize_String4": "386.5 KiB",
"StreamSize_String5": "386 KiB (0%)",
"StreamSize_String": "386 KiB (0%)",
"Tagged_Date": "2008-05-27 18:43:05 UTC",
"Video_Codec_List": "AVC",
"VideoCount": 1,
"Video_Format_List": "AVC",
"Video_Format_WithHint_List": "AVC",
"Video_Language_List": "English",
"extra": {
"com_apple_quicktime_player_movie_audio_gain": "1.000",
"com_apple_quicktime_player_movie_audio_treble": "0.000",
"com_apple_quicktime_player_movie_audio_bass": "0.000",
"com_apple_quicktime_player_movie_audio_balance": "0.000",
"com_apple_quicktime_player_movie_audio_pitchshift": "0.000",
"com_apple_quicktime_player_movie_audio_mute": "(Binary)",
"com_apple_quicktime_player_movie_visual_brightness": "0.000",
"com_apple_quicktime_player_movie_visual_color": "1.000",
"com_apple_quicktime_player_movie_visual_tint": "0.000",
"com_apple_quicktime_player_movie_visual_contrast": "1.000"
},
"Audio": [
{
"BitRate": 448000.0,
"BitRate_Mode": "CBR",
"BitRate_Mode_String": "Constant",
"BitRate_String": "448 kb/s",
"ChannelLayout": "C L R Ls Rs LFE",
"ChannelPositions": "Front: L C R, Side: L R, LFE",
"ChannelPositions_String2": "3/2/0.1",
"Channels": 6,
"Channels_String": "6 channels",
"CodecID": "mp4a-40-2",
"Compression_Mode": "Lossy",
"Compression_Mode_String": "Lossy",
"Count": 285,
"Delay_DropFrame": "No",
"Delay": 0.0,
"Delay_Source": "Container",
"Delay_Source_String": "Container",
"Delay_String3": "00:00:00.000",
"Delay_String5": "00:00:00.000",
"Duration": 596.462,
"Duration_String1": "9 min 56 s 462 ms",
"Duration_String2": "9 min 56 s",
"Duration_String3": "00:09:56.462",
"Duration_String5": "00:09:56.462",
"Duration_String": "9 min 56 s",
"Encoded_Date": "2008-05-27 18:40:12 UTC",
"Format_AdditionalFeatures": "LC",
"Format_Commercial": "AAC",
"Format_Info": "Advanced Audio Codec Low Complexity",
"Format": "AAC",
"Format_String": "AAC LC",
"FrameCount": 27959,
"FrameRate": 46.875,
"FrameRate_String": "46.875 FPS (1024 SPF)",
"ID": "3",
"ID_String": "3",
"Language": "en",
"Language_String1": "English",
"Language_String2": "en",
"Language_String3": "eng",
"Language_String4": "en",
"Language_String": "English",
"SamplesPerFrame": 1024.0,
"SamplingCount": 28630176,
"SamplingRate": 48000.0,
"SamplingRate_String": "48.0 kHz",
"Source_Duration": 596.48,
"Source_Duration_String1": "9 min 56 s 480 ms",
"Source_Duration_String2": "9 min 56 s",
"Source_Duration_String3": "00:09:56.480",
"Source_Duration_String5": "00:09:56.480",
"Source_Duration_String": "9 min 56 s",
"Source_FrameCount": 27960,
"Source_StreamSize": 32627874,
"Source_StreamSize_Proportion": "0.04500",
"Source_StreamSize_String1": "31 MiB",
"Source_StreamSize_String2": "31 MiB",
"Source_StreamSize_String3": "31.1 MiB",
"Source_StreamSize_String4": "31.12 MiB",
"Source_StreamSize_String5": "31.1 MiB (4%)",
"Source_StreamSize_String": "31.1 MiB (4%)",
"StreamCount": 1,
"StreamKindID": 0,
"StreamKind": "Audio",
"StreamKind_String": "Audio",
"StreamOrder": "2",
"StreamSize": 32626892,
"StreamSize_Proportion": "0.04500",
"StreamSize_String1": "31 MiB",
"StreamSize_String2": "31 MiB",
"StreamSize_String3": "31.1 MiB",
"StreamSize_String4": "31.12 MiB",
"StreamSize_String5": "31.1 MiB (4%)",
"StreamSize_String": "31.1 MiB (4%)",
"Tagged_Date": "2008-05-27 18:43:05 UTC",
"Video_Delay": 0.0,
"Video_Delay_String3": "00:00:00.000",
"Video_Delay_String5": "00:00:00.000"
}
],
"Other": [
{
"Count": 195,
"Duration": 596.458,
"Duration_String1": "9 min 56 s 458 ms",
"Duration_String2": "9 min 56 s",
"Duration_String3": "00:09:56.458",
"Duration_String4": "00:09:56:11",
"Duration_String5": "00:09:56.458 (00:09:56:11)",
"Duration_String": "9 min 56 s",
"Format_Commercial": "QuickTime TC",
"Format": "QuickTime TC",
"Format_String": "QuickTime TC",
"FrameCount": 14315,
"FrameRate_Den": 1,
"FrameRate": 24.0,
"FrameRate_Num": 24,
"FrameRate_String": "24.000 FPS",
"ID": "2",
"ID_String": "2",
"Language": "en",
"Language_String1": "English",
"Language_String2": "en",
"Language_String3": "eng",
"Language_String4": "en",
"Language_String": "English",
"StreamCount": 1,
"StreamKindID": 0,
"StreamKind": "Other",
"StreamKind_String": "Other",
"StreamOrder": "1",
"TimeCode_DropFrame": "No",
"TimeCode_FirstFrame": "00:00:00:00",
"TimeCode_LastFrame": "00:09:56:10",
"TimeCode_Stripped": "Yes",
"TimeCode_Stripped_String": "Yes",
"Type": "Time code",
"extra": {
"Encoded_Date": "2008-04-21 20:24:31 UTC",
"Tagged_Date": "2008-05-27 18:43:05 UTC"
}
}
],
"Video": [
{
"BitDepth": 8,
"BitDepth_String": "8 bits",
"BitRate": 9282573.0,
"BitRate_String": "9 283 kb/s",
"BitsPixel_Frame": 0.187,
"ChromaSubsampling": "4:2:0",
"ChromaSubsampling_Position": "Type 2",
"ChromaSubsampling_String": "4:2:0 (Type 2)",
"CodecID_Info": "Advanced Video Coding",
"CodecID": "avc1",
"ColorSpace": "YUV",
"colour_description_present": "Yes",
"colour_description_present_Source": "Container / Stream",
"colour_primaries": "BT.709",
"colour_primaries_Source": "Container / Stream",
"colour_range": "Limited",
"colour_range_Source": "Stream",
"Count": 391,
"Delay_DropFrame": "No",
"Delay": 0.0,
"Delay_Settings": "DropFrame=No / 24HourMax=No / IsVisual=No",
"Delay_Source": "Container",
"Delay_Source_String": "Container",
"Delay_String3": "00:00:00.000",
"Delay_String4": "00:00:00:00",
"Delay_String5": "00:00:00.000 (00:00:00:00)",
"DisplayAspectRatio": 1.778,
"DisplayAspectRatio_String": "16:9",
"Duration": 596.458,
"Duration_String1": "9 min 56 s 458 ms",
"Duration_String2": "9 min 56 s",
"Duration_String3": "00:09:56.458",
"Duration_String4": "00:09:56:11",
"Duration_String5": "00:09:56.458 (00:09:56:11)",
"Duration_String": "9 min 56 s",
"Encoded_Date": "2008-04-21 20:24:31 UTC",
"Format_Commercial": "AVC",
"Format_Info": "Advanced Video Codec",
"Format_Level": "4.1",
"Format": "AVC",
"Format_Profile": "Main",
"Format_Settings_CABAC": "No",
"Format_Settings_CABAC_String": "No",
"Format_Settings_GOP": "M=2, N=24",
"Format_Settings": "2 Ref Frames",
"Format_Settings_RefFrames": 2,
"Format_Settings_RefFrames_String": "2 frames",
"Format_Settings_SliceCount": 8,
"Format_Settings_SliceCount_String": "8 slices per frame",
"Format_String": "AVC",
"Format_Url": "http://developers.videolan.org/x264.html",
"FrameCount": 14315,
"FrameRate_Den": 1,
"FrameRate": 24.0,
"FrameRate_Mode": "CFR",
"FrameRate_Mode_String": "Constant",
"FrameRate_Num": 24,
"FrameRate_String": "24.000 FPS",
"Height": 1080,
"Height_String": "1 080 pixels",
"ID": "1",
"ID_String": "1",
"InternetMediaType": "video/H264",
"Language": "en",
"Language_String1": "English",
"Language_String2": "en",
"Language_String3": "eng",
"Language_String4": "en",
"Language_String": "English",
"matrix_coefficients": "BT.709",
"matrix_coefficients_Source": "Container / Stream",
"PixelAspectRatio": 1.0,
"Rotation": "0.000",
"Sampled_Height": 1080,
"Sampled_Width": 1920,
"ScanType": "Progressive",
"ScanType_String": "Progressive",
"Stored_Height": 1088,
"StreamCount": 1,
"StreamKindID": 0,
"StreamKind": "Video",
"StreamKind_String": "Video",
"StreamOrder": "0",
"StreamSize": 692083520,
"StreamSize_Proportion": "0.95446",
"StreamSize_String1": "660 MiB",
"StreamSize_String2": "660 MiB",
"StreamSize_String3": "660 MiB",
"StreamSize_String4": "660.0 MiB",
"StreamSize_String5": "660 MiB (95%)",
"StreamSize_String": "660 MiB (95%)",
"Tagged_Date": "2008-05-27 18:43:05 UTC",
"transfer_characteristics": "BT.709",
"transfer_characteristics_Source": "Container / Stream",
"Width": 1920,
"Width_String": "1 920 pixels",
"extra": {
"CodecConfigurationBox": "avcC"
}
}
],
"creatingLibrary": {
"name": "MediaInfoLib",
"version": 24.12,
"url": "https://mediaarea.net/MediaInfo"
}
}
Supported Video Formats
| File Extension | Media Type |
|---|---|
| .mp4 | video/mp4 |
| .mkv | video/x-matroska |
| .avi | video/x-msvideo |
| .mov | video/quicktime |
| .webm | video/webm |
| .wmv | video/x-ms-wmv |
| .flv | video/x-flv |
| .mpeg | video/mpeg |
| .mpg | video/mpeg |
| .m4v | video/x-m4v |
...and many others. See: MediaInfo - Supported Formats
Supported Audio Formats
| File Extension | Media Type |
|---|---|
| .mp3 | audio/mpeg |
| .wav | audio/wav |
| .flac | audio/flac |
| .ogg | audio/ogg |
| .m4a | audio/mp4 |
| .wma | audio/x-ms-wma |
| .aac | audio/aac |
...and many others. See: MediaInfo - Supported Formats
Supported Image Formats**
The MediaInfoModel provides basic image metadata (dimensions, format, color depth). A specialized EXIFModel for deeper photo metadata is on the roadmap.
| File Extension | Media Type |
|---|---|
| .jpg | image/jpeg |
| .jpeg | image/jpeg |
| .png | image/png |
| .gif | image/gif |
| .bmp | image/bmp |
| .tiff | image/tiff |
| .webp | image/webp |
| .ico | image/vnd.microsoft.icon |
...and many others. See: MediaInfo - Supported Formats
Contribute
Is there a file type you'd like to see a Core Annotation Model for? Dorsal is an open-source project!
We encourage you to open a feature request on GitHub and tell us what you need!