Back to posts

Page classification now defaults to multi-label (multiple classes per page)

Page classification now defaults to multi-label mode, allowing pages to receive multiple classification labels simultaneously.

Key Highlights

  • Single pages can be classified as multiple page types (e.g., account_info AND transactions)
  • Better handling of complex documents like bank statements and legal docs
  • Backward compatible - multi-class mode still available via configuration

What's new#

Page classification now defaults to multi-label mode, allowing each page to be assigned multiple page classes simultaneously. Previously, pages could only receive one classification label (multi-class mode). Multi-class classification remains available as a configuration option.

Why it matters#

  • Complex documents often have overlapping content types on single pages
  • Bank statements - first page contains both account info AND transaction data
  • Legal documents - pages mix contract terms, signatures, and exhibits
  • Better extraction targeting - extract account_info AND transactions from the same page

Highlights#

  • Single page can receive multiple relevant classifications
  • More accurate representation of complex document structure
  • Improved downstream extraction accuracy
  • Backward compatible - multi-class mode still available

Technical details#

Multi-class (old default): Each page gets exactly one label - mutually exclusive
Multi-label (new default): Each page can get multiple labels - non-exclusive

Example: A bank statement's first page that contains account details AND the start of transaction history now gets both account_info and transactions labels instead of forcing a choice.

How to use#

Multi-label classification works automatically with existing page classification prompts:

doc_ai = DocumentAI()

result = doc_ai.parse_and_wait(
    file="bank_statement.pdf",
    page_classification=PageClassificationConfig(
        page_classes=["account_info", "transactions", "summary"]
    )
    # Multi-label is now default - no config change needed
)

# Pages can now have multiple classifications
for page in result.pages:
    classifications = page.classifications  # Can contain multiple labels

To revert to multi-class behavior:

page_classification=PageClassificationConfig(
    page_classes=["account_info", "transactions", "summary"],
    classification_mode="multiclass"  # Explicitly set multi-class
)

Status#

✅ Multi-label by default is live now. Existing prompts will automatically benefit from multi-label classification. 🚧 Choosing between mutli-label and multi-class is coming to the API and SDK soon.

This website uses cookies to enhance your browsing experience. By clicking "Accept All Cookies", you consent to the use of ALL cookies. By clicking "Decline", only essential cookies will be used. Read our Privacy Policy for more details.