facebook/lama
The LAMA dataset is used to analyze and probe factual and commonsense knowledge in pre‑trained language models. It includes multiple configurations such as google_re, trex, conceptnet, and squad, each with specific fields. The dataset is English‑only and monolingual. It was created to assess language‑model understanding without reference translations. The data sources include Google RE, TRex, ConceptNet, and SQuAD. The dataset includes cleaned sentences with mask tokens ([MASK]) and corresponding answers, as well as negative sentences for some configurations.
Description
Dataset Overview
Dataset Name: LAMA: LAnguage Model Analysis
Purpose: Probe and analyze factual and commonsense knowledge contained in pre‑trained language models.
Composition: Data from Google_RE, TRex (Wikidata subset), ConceptNet, and SQuAD.
Language: English (en)
License: CC‑BY‑4.0
Multilinguality: Monolingual
Size Categories:
- <1K
- 1K‑10K
- 10K‑100K
- 1M‑10M
Task Types:
- Text Retrieval
- Text Classification
Task IDs:
- Fact‑checking Retrieval
- Text Scoring
Configurations: conceptnet, google_re, squad, trex
Structure
Data Instances:
- trex: uuid, obj_uri, obj_label, sub_uri, sub_label, predicate_id, ...
- conceptnet: uuid, sub, obj, pred, obj_label, ...
- squad: id, sub_label, obj_label, ...
- google_re: uuid, pred, sub, obj, evidences, judgments, ...
Splits: No explicit splits provided.
Creation
Source Data: Aggregated from existing datasets, cleaned and adapted for probing.
Annotation: Mixed crowd‑sourced, expert‑generated, and machine‑generated annotations.
Usage Notes
Social Impact: Designed to evaluate language‑model understanding.
Bias Discussion: Crowd‑sourced data may contain biases.
Known Limitations: Limited documentation of original fields.
AI studio
Generate PPTs instantly with Nano Banana Pro.
Generate PPT NowAccess Dataset
Please login to view download links and access full dataset details.
Topics
Source
Organization: hugging_face
Created: Unknown
Power Your Data Analysis with Premium AI Models
Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.
Enjoy a free trial and save 20%+ compared to official pricing.