Dataset assetOpen Source CommunityLanguage Model AnalysisKnowledge Probing

facebook/lama

The LAMA dataset is used to analyze and probe factual and commonsense knowledge in pre‑trained language models. It includes multiple configurations such as google_re, trex, conceptnet, and squad, each with specific fields. The dataset is English‑only and monolingual. It was created to assess language‑model understanding without reference translations. The data sources include Google RE, TRex, ConceptNet, and SQuAD. The dataset includes cleaned sentences with mask tokens ([MASK]) and corresponding answers, as well as negative sentences for some configurations.

Source

hugging_face

Created

Nov 28, 2025

Updated

Jan 18, 2024

Signals

129 views

Availability

Linked source ready

Overview

Dataset description and usage context

Dataset Overview

Dataset Name: LAMA: LAnguage Model Analysis

Purpose: Probe and analyze factual and commonsense knowledge contained in pre‑trained language models.

Composition: Data from Google_RE, TRex (Wikidata subset), ConceptNet, and SQuAD.

Language: English (en)

License: CC‑BY‑4.0

Multilinguality: Monolingual

Size Categories:

<1K
1K‑10K
10K‑100K
1M‑10M

Task Types:

Text Retrieval
Text Classification

Task IDs:

Fact‑checking Retrieval
Text Scoring

Configurations: conceptnet, google_re, squad, trex

Structure

Data Instances:

trex: uuid, obj_uri, obj_label, sub_uri, sub_label, predicate_id, ...
conceptnet: uuid, sub, obj, pred, obj_label, ...
squad: id, sub_label, obj_label, ...
google_re: uuid, pred, sub, obj, evidences, judgments, ...

Splits: No explicit splits provided.

Creation

Source Data: Aggregated from existing datasets, cleaned and adapted for probing.

Annotation: Mixed crowd‑sourced, expert‑generated, and machine‑generated annotations.

Usage Notes

Social Impact: Designed to evaluate language‑model understanding.

Bias Discussion: Crowd‑sourced data may contain biases.

Known Limitations: Limited documentation of original fields.

Need downstream help?

Pair the dataset with AI analysis and content workflows.

Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.

Explore AI studio