Back to datasets
Dataset assetOpen Source CommunityNatural Language ProcessingIntent Recognition

ATIS dataset

The ATIS dataset is used in training and evaluation phases, containing 4,978 training sentences and 850 evaluation sentences. It is utilized for natural language understanding (NLU) training, involving tokenization, featurization, intent classification, and entity recognition and extraction.

Source
github
Created
Aug 23, 2022
Updated
Dec 24, 2022
Signals
169 views
Availability
Linked source ready
Overview

Dataset description and usage context

Dataset Overview

Dataset Name

  • ATIS dataset

Dataset Purpose

  • Used for training and evaluating natural language understanding (NLU) models

Dataset Composition

  • Training set contains 4,978 sentences
  • Evaluation set contains 850 sentences

Dataset Sample

  • Sample image shows a portion of the dataset

Model Configuration and Results

Intent Classifier

  • Model 1: DIET, 256‑bit binary transformer, outperforms other models
  • Model 2: Linear SVM
  • Model 3: MITIE language model
  • Performance Metrics:
    • Weighted average precision: 0.96, 0.88, 0.94
    • Weighted average recall: 0.96, 0.89, 0.94
    • Weighted average F1 score: 0.96, 0.88, 0.93

Entity Extractor

  • Model 1: DIET, used for both intent classification and entity extraction
  • Model 2: CRF, less efficient than DIET
  • Model 3: MITIE entity extractor, performance between DIET and CRF
  • Performance Metrics:
    • Weighted average precision: 0.96, 0.90, 0.95
    • Weighted average recall: 0.94, 0.89, 0.92
    • Weighted average F1 score: 0.94, 0.89, 0.93
Need downstream help?

Pair the dataset with AI analysis and content workflows.

Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.

Explore AI studio