Back to datasets
Dataset assetOpen Source CommunityQuality AssessmentDocument Classification

pubmed-en-quality-annotations-7

This dataset includes several features such as id, French translation, educational score, domain, and document type. 'Domain' and 'document type' are categorical variables with three and four categories respectively. The dataset is split into a training set and a validation set, containing 358,199 and 39,800 samples respectively. The total download size is 245,314,153 bytes, and the overall size is 438,787,962 bytes.

Source
huggingface
Created
Dec 12, 2024
Updated
Dec 12, 2024
Signals
81 views
Availability
Linked source ready
Overview

Dataset description and usage context

Dataset Overview

Dataset Information

  • Features:
    • id: data type int32.
    • french_translation: data type string.
    • educational_score: data type int32.
    • domain: data type class_label, with the following categories:
      • 0: biomedical
      • 1: clinical
      • 2: other
    • document_type: data type class_label, with the following categories:
      • 0: Study
      • 1: Other
      • 2: Review
      • 3: Clinical case
Need downstream help?

Pair the dataset with AI analysis and content workflows.

Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.

Explore AI studio