Skip to content

FileCoreValidationModel

dorsal.file.validators.base.FileCoreValidationModel pydantic-model

Bases: BaseModel

Base Pydantic model for validating the core metadata extracted by FileCoreAnnotationModel. This record forms the foundational data about a file.

Fields:

  • hash (SHA256Hash)
  • quick_hash (QuickHash | None)
  • similarity_hash (TLSHash | None)
  • name (TString255)
  • extension (FileExtension | None)
  • size (int)
  • media_type (MediaTypeString)
  • all_hashes (list[FileCoreValidationModelHash] | None)
  • all_hash_ids (dict[HashFunctionId, str] | None)

Validators:

populate_and_validate_all_hash_ids pydantic-validator

populate_and_validate_all_hash_ids()

Populates all_hash_ids, overwrites top-level hashes, and performs validation checks: - Populates all_hash_ids from all_hashes. - Overwrites quick_hash and similarity_hash from all_hashes. - Ensures SHA256 and BLAKE3 hashes are present. - Verifies no duplicate hash IDs. - Confirms the primary self.hash matches the SHA256 value in all_hashes.

Source code in venv/lib/python3.13/site-packages/dorsal/file/validators/base.py
@model_validator(mode="after")
def populate_and_validate_all_hash_ids(self) -> Self:
    """
    Populates `all_hash_ids`, overwrites top-level hashes, and performs validation checks:
    - Populates `all_hash_ids` from `all_hashes`.
    - Overwrites `quick_hash` and `similarity_hash` from `all_hashes`.
    - Ensures SHA256 and BLAKE3 hashes are present.
    - Verifies no duplicate hash IDs.
    - Confirms the primary `self.hash` matches the SHA256 value in `all_hashes`.
    """
    if self.all_hashes is not None:
        temp_all_hash_ids: dict[HashFunctionId, str] = {}
        for hash_record in self.all_hashes:
            if hash_record.id in temp_all_hash_ids:
                raise ValueError(f"Duplicate hash ID '{hash_record.id}' found in record.all_hashes.")
            temp_all_hash_ids[hash_record.id] = hash_record.value
        self.all_hash_ids = temp_all_hash_ids

        if "QUICK" in self.all_hash_ids:
            self.quick_hash = self.all_hash_ids["QUICK"]

        if "TLSH" in self.all_hash_ids:
            self.similarity_hash = self.all_hash_ids["TLSH"]

        if len(set(self.all_hash_ids.values())) < len(self.all_hash_ids):
            logger.warning(
                "Potential issue: Identical hash values found for different hash IDs in all_hashes. Record hash: %s",
                self.hash,
            )

        if "SHA-256" not in self.all_hash_ids:
            raise ValueError("SHA-256 file hash missing from record.all_hashes.")
        if "BLAKE3" not in self.all_hash_ids:
            raise ValueError("BLAKE3 file hash missing from record.all_hashes.")
        if self.hash != self.all_hash_ids["SHA-256"]:
            raise ValueError("Record 'hash' (primary SHA256) does not match SHA-256 value in record.all_hashes.")
    return self