Back to datasets
Dataset assetOpen Source CommunityBiomedicalModel Evaluation

DAHL

DAHL is a long‑form biomedical text generation hallucination evaluation benchmark curated by Seoul National University. It comprises 8,573 questions across 29 categories sourced from PubMed Central biomedical research papers. Questions were automatically generated and manually filtered to ensure high quality and answerability. DAHL evaluates large language models' hallucination in the biomedical domain by decomposing model responses into atomic units for factual accuracy assessment, offering a deeper evaluation than traditional multiple‑choice tasks. Its primary applications lie in biomedical and clinical research to address factual conflicts in generated texts.

Source
arXiv
Created
Nov 14, 2024
Updated
Nov 14, 2024
Signals
300 views
Availability
Linked source ready
Overview

Dataset description and usage context

DAHL Dataset Overview

Dataset Construction

  • Source: Generated from research papers crawled from PMC.
  • Generation: Questions created with gpt‑4‑1106‑preview and manually filtered for high quality.

Evaluation Procedure

  • Automated Evaluation Pipeline: Consists of two stages:
    1. Segment responses into atomic units.
    2. Verify factuality of each atomic unit.

Installation & Usage

  • Installation:

    git clone https://github.com/seemdog/DAHL.git
    cd DAHL
    
  • Response Generation:

    • HuggingFace Model:
      python generate_response_hf.py --model meta‑llama/Meta‑Llama‑3‑8B‑Instruct --temperature 0.6 --max_new_tokens 256
      
    • OpenAI Model:
      python generate_response_gpt.py --model gpt‑4o --api_key YOUR_API_KEY --temperature 0.6
      
  • Evaluation:

    cd evaluate
    sh run.sh model_to_evaluate openAI_API_key perplexityAI_API_key model_to_use_perplexityAI
    

Result Storage

  • Final DAHL Score: Saved in a .txt file.

Citation

  • Citation: To be determined (TBD).
Need downstream help?

Pair the dataset with AI analysis and content workflows.

Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.

Explore AI studio