Document management interface

Document Processor is a powerful AI-driven platform that helps you automatically classify and extract information from your documents. Upload documents and use AI to analyze their content with custom classifiers and extractors.

Getting Started

1. Upload Your Documents

Use the Files List panel on the left to upload documents (PDF, Word, text files, etc.). Your uploaded files will appear in the list where you can select them for processing.

2. Create Classifiers

Go to the Classifiers tab to create document classification rules:

Classifier Sets: Groups of related classification categories
Categories: Individual classification types (e.g., "Invoice", "Contract", "Report")
Terms: Keywords and phrases that help identify each category
Weights & Distance: Fine-tune how strictly terms must match

Using Wildcards in Terms

Make your classifier terms more flexible with wildcards:

* (asterisk): Matches any word or number
Example: "invoice *" matches "invoice 123", "invoice total", "invoice amount"
? (question mark): Matches any single word
Example: "contract ?" matches "contract date", "contract terms" but not "contract end date"
# (hash): Matches any number
Example: "total #" matches "total 1500", "total 99.99" but not "total amount"

Run classifiers against your files to automatically categorize documents.

3. Build Extractors

Visit the Extractors tab to create information extraction tools:

Prompt: Describe what information you want to extract
Fields: Define specific data points to extract (names, dates, amounts, etc.)
Testing: Run extractors against selected files to pull out structured data

Perfect for extracting key information like invoice amounts, contract dates, or contact details.

Pro Tips

Start Simple: Begin with basic classifiers and extractors, then refine them based on results
Use Multiple Files: Test your tools against several documents to ensure accuracy
Iterate: Adjust terms, prompts, and fields based on the results you see
Collapse Results: Use the arrow buttons to hide result panels when editing
Bulk Operations: Select multiple files to process them all at once

Typical Workflow

Upload a batch of documents you want to process
Create a classifier to categorize document types
Run the classifier to see how well it identifies your document categories
Create extractors for each document type to pull out specific information
Test and refine your extractors until they capture the data you need
Process new documents using your trained classifiers and extractors

Need Help?

If you encounter any issues or have questions about using Document Processor, don't hesitate to reach out for support. The system is designed to learn and improve with your feedback.

curl -X PUT -u "your-api-key:your-api-secret" \ -H "Content-Type: application/json" \ -d '{ "content": "# My Document Title\n\nThis is the content of my markdown document.\n\n## Section 1\n\nSome more content here." }' \ service/file/markdown

curl -X POST -u "your-api-key:your-api-secret" \ -H "Content-Type: application/json" \ -d '{ "extractor_id": 1, "file_id": 123, "web_hook": "https://your-domain.com/api/extraction-webhook", "csrf_token": "abc123-secure-token-xyz789" }' \ service/extractor

{ "result": { "confidence": 1, "found": true, "explanation": "Explanation ...", "extracted_data": { "field_1": { "value": "value 1", "citation": ["Quote from document supporting field_1"] }, "field_2": { "value": "value 2", "citation": ["Quote 1", "Quote 2"] } } }, "file_name": "document.pdf", "document_id": 123, "csrf_token": "abc123-secure-token-xyz789", "marked_pdf_available": true, "marked_pdf_path": "/path/to/document.marked.1.pdf" }

Aspect	Asynchronous (POST)	Synchronous (GET)
Response	{"status": "started"}	Complete results immediately
Webhook	Required	Not needed
Use Case	High volume, fire-and-forget	Interactive, immediate results
PDF Markup	✅ Generated	✅ Generated

Aspect

Asynchronous (POST)

Synchronous (GET)

Response

{"status": "started"}

Complete results immediately

Webhook

Required

Not needed

Use Case

High volume, fire-and-forget

Interactive, immediate results

PDF Markup

✅ Generated

{ "extractor_id": 1, "document_id": 123, "file_name": "document.pdf", "extraction_result": { "confidence": 1, "found": true, "explanation": "Successfully extracted the following information...", "extracted_data": { "field_1": { "value": "extracted value 1", "citation": ["Quote from document supporting field_1"] }, "field_2": { "value": "extracted value 2", "citation": ["Quote 1", "Quote 2"] } } }, "marked_pdf_available": true, "marked_pdf_path": "/path/to/document.marked.1.pdf", "success": true }

{ "file_id": 123, "pdf_available": true, "marked_versions": [ { "extractor_id": 1, "extractor_name": "Invoice Extractor", "file_size": 245760 }, { "extractor_id": 2, "extractor_name": "Contract Extractor", "file_size": 238950 } ], "total_marked_versions": 2 }

Getting Started

1. Upload Your Documents

2. Create Classifiers

Using Wildcards in Terms

3. Build Extractors

Pro Tips

Typical Workflow

Need Help?

API Credentials

API Reference

API Documentation

📋 Quick Start Workflow

File Upload

Markdown Upload

Request Parameters:

Usage Notes:

Example Request:

Example Response:

List Available Classifiers

Response Format:

Example Request:

List Available Extractors

Response Format:

Example Request:

Classification

Extraction (Asynchronous)

Request Parameters:

Webhook Details:

CSRF Token Usage:

Marked PDF Generation:

Example Request:

Example Webhook Payload:

Extraction (Synchronous)

Path Parameters:

When to Use:

Comparison with Asynchronous Extraction:

Example Request:

Example Response:

Error Responses:

Marked PDF Download

Usage:

Example:

Marked PDF Status

Response Format:

Example:

File Cleanup