JUHE API Marketplace
Comprehensive Documentation

API Documentation

Everything you need to integrate and use our APIs effectively with guides, references, and examples

Custom Extraction Rules

6 min read

Custom extraction rules allow you to tailor the output of JUHE API's data extraction services to your specific needs. This guide will show you how to create, apply, and manage custom rules for extracting precise information from various data sources.

What Are Custom Extraction Rules?

Custom extraction rules are user-defined templates that specify exactly what data should be extracted from documents, images, or structured content. They enable you to:

  • Extract only the information that matters to your application
  • Define the format of extracted data
  • Create consistent data structures across different source formats
  • Automate content processing workflows

When to Use Custom Rules

Custom extraction rules are particularly valuable when:

  • You need specific data points from complex documents (invoices, receipts, contracts)
  • You want to normalize data from different sources into a consistent format
  • You're processing domain-specific content with unique terminology
  • You need to extract data that standard models might miss

Creating Custom Rules

Custom rules are defined using a JSON schema that describes the fields you want to extract. You can create and manage your rules through the JUHE API dashboard or via API.

Rule Structure

Here's the basic structure of a custom extraction rule:

{
  "name": "Rule Name",
  "description": "What this rule extracts",
  "version": "1.0",
  "fields": [
    {
      "name": "field_name",
      "type": "field_type",
      "description": "Description of the field",
      "required": true|false
    },
    // Additional fields...
  ]
}

Field Types

The following field types are supported:

TypeDescriptionExample
stringText valuesNames, addresses, descriptions
numberNumeric valuesPrices, quantities, measurements
dateDate valuesIssue dates, due dates
booleanTrue/false valuesStatus indicators
arrayLists of valuesLine items, categories
objectNested structuresAddress components, person details

Example: Invoice Extraction Rule

{
  "name": "Invoice Extractor",
  "description": "Extract key information from invoice documents",
  "version": "1.0",
  "fields": [
    {
      "name": "invoice_number",
      "type": "string",
      "description": "The invoice identification number",
      "required": true
    },
    {
      "name": "issue_date",
      "type": "date",
      "description": "The date when the invoice was issued",
      "required": true
    },
    {
      "name": "due_date",
      "type": "date",
      "description": "The payment due date",
      "required": false
    },
    {
      "name": "total_amount",
      "type": "number",
      "description": "The total invoice amount",
      "required": true
    },
    {
      "name": "currency",
      "type": "string",
      "description": "The currency code (e.g., USD, EUR)",
      "required": true
    },
    {
      "name": "vendor",
      "type": "object",
      "description": "Vendor information",
      "required": true,
      "fields": [
        {
          "name": "name",
          "type": "string",
          "description": "Vendor name",
          "required": true
        },
        {
          "name": "address",
          "type": "string",
          "description": "Vendor address",
          "required": false
        },
        {
          "name": "tax_id",
          "type": "string",
          "description": "Vendor tax ID",
          "required": false
        }
      ]
    },
    {
      "name": "line_items",
      "type": "array",
      "description": "Individual line items on the invoice",
      "required": false,
      "items": {
        "type": "object",
        "fields": [
          {
            "name": "description",
            "type": "string",
            "description": "Item description"
          },
          {
            "name": "quantity",
            "type": "number",
            "description": "Quantity of items"
          },
          {
            "name": "unit_price",
            "type": "number",
            "description": "Price per unit"
          },
          {
            "name": "amount",
            "type": "number",
            "description": "Total line amount"
          }
        ]
      }
    }
  ]
}

Creating Custom Rules via Dashboard

  1. Log in to your JUHE API dashboard at juheapi.com/dashboard
  2. Navigate to AI & Machine LearningCustom Extraction Rules
  3. Click Create New Rule
  4. Fill out the rule details:
    • Name and description
    • Document type (invoice, receipt, contract, etc.)
    • Field definitions
  5. Test your rule with sample documents
  6. Save and publish your rule

Creating Custom Rules via API

You can also create rules programmatically:

const rule = {
  name: "Receipt Extractor",
  description: "Extract key information from receipts",
  version: "1.0",
  fields: [
    // Field definitions...
  ]
};

fetch('<https://hub.juheapi.com/ai/extraction/rules>', {
  method: 'POST',
  headers: {
    'Content-Type': 'application/json',
    'Authorization': `Bearer ${API_KEY}`
  },
  body: JSON.stringify(rule)
})
.then(response => response.json())
.then(data => {
  console.log('Rule created with ID:', data.rule_id);
});

Applying Custom Rules

Once created, you can apply your custom rules to extract data from documents:

Using the Dashboard

  1. Navigate to AI & Machine LearningDocument Processing
  2. Upload your document or provide a URL
  3. Select your custom rule from the dropdown
  4. Click Process

Using the API

curl -X POST "<https://hub.juheapi.com/ai/document/extract>" \\
  -H "Authorization: Bearer YOUR_API_KEY" \\
  -H "Content-Type: multipart/form-data" \\
  -F "file=@invoice.pdf" \\
  -F "rule_id=your_custom_rule_id"

API Response

{
  "status": "success",
  "data": {
    "invoice_number": "INV-2023-0042",
    "issue_date": "2023-09-15",
    "due_date": "2023-10-15",
    "total_amount": 1250.00,
    "currency": "USD",
    "vendor": {
      "name": "Acme Supplies Inc.",
      "address": "123 Business St., Commerce City, CA 90210",
      "tax_id": "US-987654321"
    },
    "line_items": [
      {
        "description": "Office Chair",
        "quantity": 5,
        "unit_price": 150.00,
        "amount": 750.00
      },
      {
        "description": "Desk Lamp",
        "quantity": 10,
        "unit_price": 50.00,
        "amount": 500.00
      }
    ]
  },
  "confidence_score": 0.92
}

Best Practices for Custom Rules

  1. Start Simple: Begin with a few essential fields before adding complexity
  2. Test Thoroughly: Test your rules with a variety of documents to ensure accuracy
  3. Use Required Fields Wisely: Only mark fields as required if they're truly necessary
  4. Provide Clear Descriptions: Good descriptions help the model understand what to extract
  5. Check Confidence Scores: Pay attention to confidence scores to identify potential extraction issues
  6. Iterate Based on Results: Refine your rules based on extraction performance

Advanced Features

Field Validation

You can add validation rules to ensure extracted data meets specific criteria:

{
  "name": "price",
  "type": "number",
  "description": "Product price",
  "validation": {
    "min": 0,
    "max": 10000
  }
}

Regular Expressions

For string fields, you can specify regex patterns for more precise extraction:

{
  "name": "product_code",
  "type": "string",
  "description": "Product identification code",
  "pattern": "^[A-Z]{3}-\\\\d{4}$"
}

Conditional Fields

Some fields may only be relevant based on the presence of other fields:

{
  "name": "shipping_address",
  "type": "string",
  "description": "Shipping address",
  "conditions": [
    {
      "field": "shipping_method",
      "operator": "equals",
      "value": "physical"
    }
  ]
}

Troubleshooting

Low Confidence Scores

If your extractions have low confidence scores:

  • Check if your field descriptions are clear
  • Ensure document quality is sufficient
  • Consider providing sample documents for training

Missing Fields

If certain fields are consistently missing:

  • Check if the information is actually present in the document
  • Make the field description more specific
  • Provide example values in the field description

Incorrect Extractions

If fields are being extracted with incorrect values:

  • Refine your field descriptions
  • Use validation rules to constrain the expected format
  • Consider using regex patterns for structured data

Next Steps

Now that you understand custom extraction rules, explore these related topics:

  • Batch Processing - Process multiple documents efficiently
  • Error Handling - Handle extraction errors gracefully
  • Best Practices - General best practices for using JUHE API