Back to datasets
Dataset assetOpen Source CommunityNatural Language ProcessingIntent Recognition
ATIS dataset
The ATIS dataset is used in training and evaluation phases, containing 4,978 training sentences and 850 evaluation sentences. It is utilized for natural language understanding (NLU) training, involving tokenization, featurization, intent classification, and entity recognition and extraction.
Source
github
Created
Aug 23, 2022
Updated
Dec 24, 2022
Signals
169 views
Availability
Linked source ready
Overview
Dataset description and usage context
Dataset Overview
Dataset Name
- ATIS dataset
Dataset Purpose
- Used for training and evaluating natural language understanding (NLU) models
Dataset Composition
- Training set contains 4,978 sentences
- Evaluation set contains 850 sentences
Dataset Sample
- Sample image shows a portion of the dataset
Model Configuration and Results
Intent Classifier
- Model 1: DIET, 256‑bit binary transformer, outperforms other models
- Model 2: Linear SVM
- Model 3: MITIE language model
- Performance Metrics:
- Weighted average precision: 0.96, 0.88, 0.94
- Weighted average recall: 0.96, 0.89, 0.94
- Weighted average F1 score: 0.96, 0.88, 0.93
Entity Extractor
- Model 1: DIET, used for both intent classification and entity extraction
- Model 2: CRF, less efficient than DIET
- Model 3: MITIE entity extractor, performance between DIET and CRF
- Performance Metrics:
- Weighted average precision: 0.96, 0.90, 0.95
- Weighted average recall: 0.94, 0.89, 0.92
- Weighted average F1 score: 0.94, 0.89, 0.93
Need downstream help?
Pair the dataset with AI analysis and content workflows.
Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.