Python Guide
Welcome to the Dorsal Python Guide!
Learn how to use Dorsal to annotate files, manage file records, and create your own custom annotation models.
The File Record
Throughout this guide, you'll become familiar with the File Record. This is one of the fundamental building blocks of Dorsal.
A single File Record is a highly structured, typed and validated object, which contains the metadata for one file.
It's what you will create using LocalFile (Chapter 1) and fetch using DorsalFile (Chapter 2).
We will cover this as we go, but if you want to get a clearer picture of its structure, you can read the full File Record article.
Chapters
This guide is split into 6 chapters:
| Chapter | Description |
|---|---|
| 1. Working with Files | Use the LocalFile class to extract and manage metadata for files on your local machine. |
| 2. Working with Remote File Records | Use the DorsalFile class to fetch, update, and manage your file metadata on DorsalHub. |
| 3. Automation & Analysis | Use dorsal.api for batch processing, file deduplication and analysis |
| 4. Custom Annotation Models Part 1: Hello, Word! | Build a basic word counting model to learn how to test and integrate your own code into a pipeline |
| 5. Custom Annotation Models Part 2: Classification | Build a document classifier to learn about file pre-processing and schema validation |
| 6. Custom Annotation Models Part 3: Entity Extraction | Build a machine-learning powered entity extractor using LayoutLM |
| Demo: Document Summarization | Build a model that sends PDF/Docx content to ChatGPT for automatic summarization. |
Prerequisites
- A working knowledge of Python: Familiarity with python or another programming language will help you follow this guide.
dorsalis installed: You should have a python environment (e.g. an IDE or Jupyter Notebook) where in whichdorsalis installed.- You have some files: Any kinds of files (documents, images etc.) on your computer that you can work with.
Let's Begin!
The best place to start is at the beginning: