JUHE API Marketplace
DATASET
Open Source Community

DAHL

DAHL is a long‑form biomedical text generation hallucination evaluation benchmark curated by Seoul National University. It comprises 8,573 questions across 29 categories sourced from PubMed Central biomedical research papers. Questions were automatically generated and manually filtered to ensure high quality and answerability. DAHL evaluates large language models' hallucination in the biomedical domain by decomposing model responses into atomic units for factual accuracy assessment, offering a deeper evaluation than traditional multiple‑choice tasks. Its primary applications lie in biomedical and clinical research to address factual conflicts in generated texts.

Updated 11/14/2024
arXiv

Description

DAHL Dataset Overview

Dataset Construction

  • Source: Generated from research papers crawled from PMC.
  • Generation: Questions created with gpt‑4‑1106‑preview and manually filtered for high quality.

Evaluation Procedure

  • Automated Evaluation Pipeline: Consists of two stages:
    1. Segment responses into atomic units.
    2. Verify factuality of each atomic unit.

Installation & Usage

  • Installation:

    git clone https://github.com/seemdog/DAHL.git
    cd DAHL
    
  • Response Generation:

    • HuggingFace Model:
      python generate_response_hf.py --model meta‑llama/Meta‑Llama‑3‑8B‑Instruct --temperature 0.6 --max_new_tokens 256
      
    • OpenAI Model:
      python generate_response_gpt.py --model gpt‑4o --api_key YOUR_API_KEY --temperature 0.6
      
  • Evaluation:

    cd evaluate
    sh run.sh model_to_evaluate openAI_API_key perplexityAI_API_key model_to_use_perplexityAI
    

Result Storage

  • Final DAHL Score: Saved in a .txt file.

Citation

  • Citation: To be determined (TBD).

AI studio

Generate PPTs instantly with Nano Banana Pro.

Generate PPT Now

Access Dataset

Login to Access

Please login to view download links and access full dataset details.

Topics

Biomedical
Model Evaluation

Source

Organization: arXiv

Created: 11/14/2024

Power Your Data Analysis with Premium AI Models

Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.

Enjoy a free trial and save 20%+ compared to official pricing.