Use the following endpoints with HTTP Basic Authentication using your API key as username and API secret as password:
File Upload
POST /service/file
Upload a document file for processing. Returns file ID.
curl -X POST -u "your-api-key:your-api-secret" \
-F "file=@document.pdf" \
service/file
Markdown Upload
PUT /service/file/markdown
Upload Markdown content as a document. The first line of the content is used as the filename.
Request Parameters:
- content (required): The Markdown content to upload
Usage Notes:
- First line becomes the filename (markdown formatting like # is stripped)
- Filename is automatically sanitized and given .md extension
- Content is processed through the same extraction pipeline as file uploads
- Returns document ID and generated filename
- Compatible with SmolPageBot Chrome extension for easy web page capture
Example Request:
curl -X PUT -u "your-api-key:your-api-secret" \
-H "Content-Type: application/json" \
-d '{
"content": "# My Document Title\n\nThis is the content of my markdown document.\n\n## Section 1\n\nSome more content here."
}' \
service/file/markdown
Example Response:
{
"id": 124,
"filename": "My-Document-Title.md"
}
List Available Classifiers
GET /service/classifiers
Get a list of all classifier sets available to your account. Use the returned IDs with the classification endpoint.
Response Format:
[
{
"id": 1,
"name": "Document Type Classifier"
},
{
"id": 2,
"name": "Invoice vs Receipt Classifier"
}
]
Example Request:
curl -u "your-api-key:your-api-secret" \
service/classifiers
List Available Extractors
GET /service/extractors
Get a list of all extractors available to your account. Use the returned IDs with the extraction endpoints.
Response Format:
[
{
"id": 1,
"name": "Invoice Data Extractor"
},
{
"id": 2,
"name": "Contract Terms Extractor"
},
{
"id": 3,
"name": "Contact Information Extractor"
}
]
Example Request:
curl -u "your-api-key:your-api-secret" \
service/extractors
Classification
GET /service/classifier/{classifier_id}/{file_id}
Classify a document using the specified classifier. Use the classifiers endpoint above to get available classifier IDs.
curl -u "your-api-key:your-api-secret" \
service/classifier/1/123
Extraction (Asynchronous)
POST /service/extractor
Extract information from a document using the specified extractor. This operation runs asynchronously in the background and returns results via webhook.
Request Parameters:
- extractor_id (required): The ID of the extractor to use (get from extractors endpoint above)
- file_id (required): The ID of the uploaded document file
- web_hook (required): URL where extraction results will be sent via POST
- csrf_token (optional): CSRF protection token for webhook validation
Webhook Details:
When extraction completes, results are sent to your webhook URL via POST request:
- The webhook receives the extracted data as JSON
- Include a CSRF token to verify webhook authenticity
- Ensure your webhook endpoint can handle POST requests
- Webhook should return 200 status for successful receipt
CSRF Token Usage:
The CSRF token provides security verification for webhook calls:
- Generate a unique token for each extraction request
- Store the token to validate incoming webhook calls
- The system will include this token in the webhook payload
- Verify the token matches before processing webhook data
Marked PDF Generation:
When extraction finds data with citations, the system automatically creates a marked-up PDF:
- Citations from extracted fields are highlighted in yellow
- Non-PDF documents are automatically converted to PDF first
- Each extractor creates its own marked version
- Webhook payload includes
marked_pdf_available flag
- Use marked PDF endpoints to download highlighted versions
Example Request:
curl -X POST -u "your-api-key:your-api-secret" \
-H "Content-Type: application/json" \
-d '{
"extractor_id": 1,
"file_id": 123,
"web_hook": "https://your-domain.com/api/extraction-webhook",
"csrf_token": "abc123-secure-token-xyz789"
}' \
service/extractor
Example Webhook Payload:
Your webhook will receive a POST request similar to:
{
"result": {
"confidence": 1,
"found": true,
"explanation": "Explanation ...",
"extracted_data": {
"field_1": {
"value": "value 1",
"citation": ["Quote from document supporting field_1"]
},
"field_2": {
"value": "value 2",
"citation": ["Quote 1", "Quote 2"]
}
}
},
"file_name": "document.pdf",
"document_id": 123,
"csrf_token": "abc123-secure-token-xyz789",
"marked_pdf_available": true,
"marked_pdf_path": "/path/to/document.marked.1.pdf"
}
Extraction (Synchronous)
GET /service/extractor/{extractor_id}/{document_id}
Extract information from a document using the specified extractor. This operation runs synchronously and returns results immediately in the response.
Path Parameters:
- extractor_id (required): The ID of the extractor to use (get from extractors endpoint above)
- document_id (required): The ID of the uploaded document file
When to Use:
- Simple integrations that need immediate results
- Interactive applications where users wait for results
- Testing and debugging extractions
- When webhook setup is not feasible
Comparison with Asynchronous Extraction:
| Aspect |
Asynchronous (POST) |
Synchronous (GET) |
| Response |
{"status": "started"} |
Complete results immediately |
| Webhook |
Required |
Not needed |
| Use Case |
High volume, fire-and-forget |
Interactive, immediate results |
| PDF Markup |
✅ Generated |
✅ Generated |
Example Request:
curl -u "your-api-key:your-api-secret" \
service/extractor/1/123
Example Response:
{
"extractor_id": 1,
"document_id": 123,
"file_name": "document.pdf",
"extraction_result": {
"confidence": 1,
"found": true,
"explanation": "Successfully extracted the following information...",
"extracted_data": {
"field_1": {
"value": "extracted value 1",
"citation": ["Quote from document supporting field_1"]
},
"field_2": {
"value": "extracted value 2",
"citation": ["Quote 1", "Quote 2"]
}
}
},
"marked_pdf_available": true,
"marked_pdf_path": "/path/to/document.marked.1.pdf",
"success": true
}
Error Responses:
- 404: Extractor or document not found
- 400: Document has no content available for extraction
- 500: Extraction process failed
Marked PDF Download
GET /service/marked-pdf/{extractor_id}/{file_id}
Download a marked-up PDF with highlighted citations from an extraction. The PDF contains visual highlights showing where extracted information was found in the original document.
Usage:
- Only available after successful extraction with citations
- Automatically converts non-PDF documents to PDF before marking
- Returns 404 if no marked version exists
- Each extractor creates its own marked version
Example:
curl -u "your-api-key:your-api-secret" \
--output "marked_document.pdf" \
service/marked-pdf/1/123
Marked PDF Status
GET /service/marked-pdf-status/{file_id}
Check which marked PDF versions are available for a document across all your extractors.
Response Format:
{
"file_id": 123,
"pdf_available": true,
"marked_versions": [
{
"extractor_id": 1,
"extractor_name": "Invoice Extractor",
"file_size": 245760
},
{
"extractor_id": 2,
"extractor_name": "Contract Extractor",
"file_size": 238950
}
],
"total_marked_versions": 2
}
Example:
curl -u "your-api-key:your-api-secret" \
service/marked-pdf-status/123
File Cleanup
DELETE /service/file/{file_id}
Remove an uploaded file from the system. This also deletes any associated marked PDF versions.
curl -X DELETE -u "your-api-key:your-api-secret" \
service/file/123