Custom Extraction Rules | JuheAPI Document

Custom extraction rules allow you to tailor the output of JUHE API's data extraction services to your specific needs. This guide will show you how to create, apply, and manage custom rules for extracting precise information from various data sources.

What Are Custom Extraction Rules?

Custom extraction rules are user-defined templates that specify exactly what data should be extracted from documents, images, or structured content. They enable you to:

Extract only the information that matters to your application
Define the format of extracted data
Create consistent data structures across different source formats
Automate content processing workflows

When to Use Custom Rules

Custom extraction rules are particularly valuable when:

You need specific data points from complex documents (invoices, receipts, contracts)
You want to normalize data from different sources into a consistent format
You're processing domain-specific content with unique terminology
You need to extract data that standard models might miss

Creating Custom Rules

Custom rules are defined using a JSON schema that describes the fields you want to extract. You can create and manage your rules through the JUHE API dashboard or via API.

Rule Structure

Here's the basic structure of a custom extraction rule:

{
  "name": "Rule Name",
  "description": "What this rule extracts",
  "version": "1.0",
  "fields": [
    {
      "name": "field_name",
      "type": "field_type",
      "description": "Description of the field",
      "required": true|false
    },
    // Additional fields...
  ]
}

Field Types

The following field types are supported:

Type	Description	Example
string	Text values	Names, addresses, descriptions
number	Numeric values	Prices, quantities, measurements
date	Date values	Issue dates, due dates
boolean	True/false values	Status indicators
array	Lists of values	Line items, categories
object	Nested structures	Address components, person details

Example: Invoice Extraction Rule

{
  "name": "Invoice Extractor",
  "description": "Extract key information from invoice documents",
  "version": "1.0",
  "fields": [
    {
      "name": "invoice_number",
      "type": "string",
      "description": "The invoice identification number",
      "required": true
    },
    {
      "name": "issue_date",
      "type": "date",
      "description": "The date when the invoice was issued",
      "required": true
    },
    {
      "name": "due_date",
      "type": "date",
      "description": "The payment due date",
      "required": false
    },
    {
      "name": "total_amount",
      "type": "number",
      "description": "The total invoice amount",
      "required": true
    },
    {
      "name": "currency",
      "type": "string",
      "description": "The currency code (e.g., USD, EUR)",
      "required": true
    },
    {
      "name": "vendor",
      "type": "object",
      "description": "Vendor information",
      "required": true,
      "fields": [
        {
          "name": "name",
          "type": "string",
          "description": "Vendor name",
          "required": true
        },
        {
          "name": "address",
          "type": "string",
          "description": "Vendor address",
          "required": false
        },
        {
          "name": "tax_id",
          "type": "string",
          "description": "Vendor tax ID",
          "required": false
        }
      ]
    },
    {
      "name": "line_items",
      "type": "array",
      "description": "Individual line items on the invoice",
      "required": false,
      "items": {
        "type": "object",
        "fields": [
          {
            "name": "description",
            "type": "string",
            "description": "Item description"
          },
          {
            "name": "quantity",
            "type": "number",
            "description": "Quantity of items"
          },
          {
            "name": "unit_price",
            "type": "number",
            "description": "Price per unit"
          },
          {
            "name": "amount",
            "type": "number",
            "description": "Total line amount"
          }
        ]
      }
    }
  ]
}

Creating Custom Rules via Dashboard

Log in to your JUHE API dashboard at juheapi.com/dashboard
Navigate to AI & Machine Learning → Custom Extraction Rules
Click Create New Rule
Fill out the rule details:
- Name and description
- Document type (invoice, receipt, contract, etc.)
- Field definitions
Test your rule with sample documents
Save and publish your rule

Creating Custom Rules via API

You can also create rules programmatically:

const rule = {
  name: "Receipt Extractor",
  description: "Extract key information from receipts",
  version: "1.0",
  fields: [
    // Field definitions...
  ]
};

fetch('<https://hub.juheapi.com/ai/extraction/rules>', {
  method: 'POST',
  headers: {
    'Content-Type': 'application/json',
    'Authorization': `Bearer ${API_KEY}`
  },
  body: JSON.stringify(rule)
})
.then(response => response.json())
.then(data => {
  console.log('Rule created with ID:', data.rule_id);
});

Applying Custom Rules

Once created, you can apply your custom rules to extract data from documents:

Using the Dashboard

Navigate to AI & Machine Learning → Document Processing
Upload your document or provide a URL
Select your custom rule from the dropdown
Click Process

Using the API

curl -X POST "<https://hub.juheapi.com/ai/document/extract>" \\
  -H "Authorization: Bearer YOUR_API_KEY" \\
  -H "Content-Type: multipart/form-data" \\
  -F "file=@invoice.pdf" \\
  -F "rule_id=your_custom_rule_id"

API Response

{
  "status": "success",
  "data": {
    "invoice_number": "INV-2023-0042",
    "issue_date": "2023-09-15",
    "due_date": "2023-10-15",
    "total_amount": 1250.00,
    "currency": "USD",
    "vendor": {
      "name": "Acme Supplies Inc.",
      "address": "123 Business St., Commerce City, CA 90210",
      "tax_id": "US-987654321"
    },
    "line_items": [
      {
        "description": "Office Chair",
        "quantity": 5,
        "unit_price": 150.00,
        "amount": 750.00
      },
      {
        "description": "Desk Lamp",
        "quantity": 10,
        "unit_price": 50.00,
        "amount": 500.00
      }
    ]
  },
  "confidence_score": 0.92
}

Best Practices for Custom Rules

Start Simple: Begin with a few essential fields before adding complexity
Test Thoroughly: Test your rules with a variety of documents to ensure accuracy
Use Required Fields Wisely: Only mark fields as required if they're truly necessary
Provide Clear Descriptions: Good descriptions help the model understand what to extract
Check Confidence Scores: Pay attention to confidence scores to identify potential extraction issues
Iterate Based on Results: Refine your rules based on extraction performance

Advanced Features

Field Validation

You can add validation rules to ensure extracted data meets specific criteria:

{
  "name": "price",
  "type": "number",
  "description": "Product price",
  "validation": {
    "min": 0,
    "max": 10000
  }
}

Regular Expressions

For string fields, you can specify regex patterns for more precise extraction:

{
  "name": "product_code",
  "type": "string",
  "description": "Product identification code",
  "pattern": "^[A-Z]{3}-\\\\d{4}$"
}

Conditional Fields

Some fields may only be relevant based on the presence of other fields:

{
  "name": "shipping_address",
  "type": "string",
  "description": "Shipping address",
  "conditions": [
    {
      "field": "shipping_method",
      "operator": "equals",
      "value": "physical"
    }
  ]
}

Troubleshooting

Low Confidence Scores

If your extractions have low confidence scores:

Check if your field descriptions are clear
Ensure document quality is sufficient
Consider providing sample documents for training

Missing Fields

If certain fields are consistently missing:

Check if the information is actually present in the document
Make the field description more specific
Provide example values in the field description

Incorrect Extractions

If fields are being extracted with incorrect values:

Refine your field descriptions
Use validation rules to constrain the expected format
Consider using regex patterns for structured data

Next Steps

Now that you understand custom extraction rules, explore these related topics:

Batch Processing - Process multiple documents efficiently
Error Handling - Handle extraction errors gracefully
Best Practices - General best practices for using JUHE API

API Documentation