Custom extraction rules allow you to tailor the output of JUHE API's data extraction services to your specific needs. This guide will show you how to create, apply, and manage custom rules for extracting precise information from various data sources.
What Are Custom Extraction Rules?
Custom extraction rules are user-defined templates that specify exactly what data should be extracted from documents, images, or structured content. They enable you to:
- Extract only the information that matters to your application
- Define the format of extracted data
- Create consistent data structures across different source formats
- Automate content processing workflows
When to Use Custom Rules
Custom extraction rules are particularly valuable when:
- You need specific data points from complex documents (invoices, receipts, contracts)
- You want to normalize data from different sources into a consistent format
- You're processing domain-specific content with unique terminology
- You need to extract data that standard models might miss
Creating Custom Rules
Custom rules are defined using a JSON schema that describes the fields you want to extract. You can create and manage your rules through the JUHE API dashboard or via API.
Rule Structure
Here's the basic structure of a custom extraction rule:
{
"name": "Rule Name",
"description": "What this rule extracts",
"version": "1.0",
"fields": [
{
"name": "field_name",
"type": "field_type",
"description": "Description of the field",
"required": true|false
},
// Additional fields...
]
}
Field Types
The following field types are supported:
Type | Description | Example |
---|---|---|
string | Text values | Names, addresses, descriptions |
number | Numeric values | Prices, quantities, measurements |
date | Date values | Issue dates, due dates |
boolean | True/false values | Status indicators |
array | Lists of values | Line items, categories |
object | Nested structures | Address components, person details |
Example: Invoice Extraction Rule
{
"name": "Invoice Extractor",
"description": "Extract key information from invoice documents",
"version": "1.0",
"fields": [
{
"name": "invoice_number",
"type": "string",
"description": "The invoice identification number",
"required": true
},
{
"name": "issue_date",
"type": "date",
"description": "The date when the invoice was issued",
"required": true
},
{
"name": "due_date",
"type": "date",
"description": "The payment due date",
"required": false
},
{
"name": "total_amount",
"type": "number",
"description": "The total invoice amount",
"required": true
},
{
"name": "currency",
"type": "string",
"description": "The currency code (e.g., USD, EUR)",
"required": true
},
{
"name": "vendor",
"type": "object",
"description": "Vendor information",
"required": true,
"fields": [
{
"name": "name",
"type": "string",
"description": "Vendor name",
"required": true
},
{
"name": "address",
"type": "string",
"description": "Vendor address",
"required": false
},
{
"name": "tax_id",
"type": "string",
"description": "Vendor tax ID",
"required": false
}
]
},
{
"name": "line_items",
"type": "array",
"description": "Individual line items on the invoice",
"required": false,
"items": {
"type": "object",
"fields": [
{
"name": "description",
"type": "string",
"description": "Item description"
},
{
"name": "quantity",
"type": "number",
"description": "Quantity of items"
},
{
"name": "unit_price",
"type": "number",
"description": "Price per unit"
},
{
"name": "amount",
"type": "number",
"description": "Total line amount"
}
]
}
}
]
}
Creating Custom Rules via Dashboard
- Log in to your JUHE API dashboard at juheapi.com/dashboard
- Navigate to AI & Machine Learning → Custom Extraction Rules
- Click Create New Rule
- Fill out the rule details:
- Name and description
- Document type (invoice, receipt, contract, etc.)
- Field definitions
- Test your rule with sample documents
- Save and publish your rule
Creating Custom Rules via API
You can also create rules programmatically:
const rule = {
name: "Receipt Extractor",
description: "Extract key information from receipts",
version: "1.0",
fields: [
// Field definitions...
]
};
fetch('<https://hub.juheapi.com/ai/extraction/rules>', {
method: 'POST',
headers: {
'Content-Type': 'application/json',
'Authorization': `Bearer ${API_KEY}`
},
body: JSON.stringify(rule)
})
.then(response => response.json())
.then(data => {
console.log('Rule created with ID:', data.rule_id);
});
Applying Custom Rules
Once created, you can apply your custom rules to extract data from documents:
Using the Dashboard
- Navigate to AI & Machine Learning → Document Processing
- Upload your document or provide a URL
- Select your custom rule from the dropdown
- Click Process
Using the API
curl -X POST "<https://hub.juheapi.com/ai/document/extract>" \\
-H "Authorization: Bearer YOUR_API_KEY" \\
-H "Content-Type: multipart/form-data" \\
-F "file=@invoice.pdf" \\
-F "rule_id=your_custom_rule_id"
API Response
{
"status": "success",
"data": {
"invoice_number": "INV-2023-0042",
"issue_date": "2023-09-15",
"due_date": "2023-10-15",
"total_amount": 1250.00,
"currency": "USD",
"vendor": {
"name": "Acme Supplies Inc.",
"address": "123 Business St., Commerce City, CA 90210",
"tax_id": "US-987654321"
},
"line_items": [
{
"description": "Office Chair",
"quantity": 5,
"unit_price": 150.00,
"amount": 750.00
},
{
"description": "Desk Lamp",
"quantity": 10,
"unit_price": 50.00,
"amount": 500.00
}
]
},
"confidence_score": 0.92
}
Best Practices for Custom Rules
- Start Simple: Begin with a few essential fields before adding complexity
- Test Thoroughly: Test your rules with a variety of documents to ensure accuracy
- Use Required Fields Wisely: Only mark fields as required if they're truly necessary
- Provide Clear Descriptions: Good descriptions help the model understand what to extract
- Check Confidence Scores: Pay attention to confidence scores to identify potential extraction issues
- Iterate Based on Results: Refine your rules based on extraction performance
Advanced Features
Field Validation
You can add validation rules to ensure extracted data meets specific criteria:
{
"name": "price",
"type": "number",
"description": "Product price",
"validation": {
"min": 0,
"max": 10000
}
}
Regular Expressions
For string fields, you can specify regex patterns for more precise extraction:
{
"name": "product_code",
"type": "string",
"description": "Product identification code",
"pattern": "^[A-Z]{3}-\\\\d{4}$"
}
Conditional Fields
Some fields may only be relevant based on the presence of other fields:
{
"name": "shipping_address",
"type": "string",
"description": "Shipping address",
"conditions": [
{
"field": "shipping_method",
"operator": "equals",
"value": "physical"
}
]
}
Troubleshooting
Low Confidence Scores
If your extractions have low confidence scores:
- Check if your field descriptions are clear
- Ensure document quality is sufficient
- Consider providing sample documents for training
Missing Fields
If certain fields are consistently missing:
- Check if the information is actually present in the document
- Make the field description more specific
- Provide example values in the field description
Incorrect Extractions
If fields are being extracted with incorrect values:
- Refine your field descriptions
- Use validation rules to constrain the expected format
- Consider using regex patterns for structured data
Next Steps
Now that you understand custom extraction rules, explore these related topics:
- Batch Processing - Process multiple documents efficiently
- Error Handling - Handle extraction errors gracefully
- Best Practices - General best practices for using JUHE API