Back to datasets
Dataset assetOpen Source CommunityLanguage Model AnalysisKnowledge Probing

facebook/lama

The LAMA dataset is used to analyze and probe factual and commonsense knowledge in pre‑trained language models. It includes multiple configurations such as google_re, trex, conceptnet, and squad, each with specific fields. The dataset is English‑only and monolingual. It was created to assess language‑model understanding without reference translations. The data sources include Google RE, TRex, ConceptNet, and SQuAD. The dataset includes cleaned sentences with mask tokens ([MASK]) and corresponding answers, as well as negative sentences for some configurations.

Source
hugging_face
Created
Nov 28, 2025
Updated
Jan 18, 2024
Signals
129 views
Availability
Linked source ready
Overview

Dataset description and usage context

Dataset Overview

Dataset Name: LAMA: LAnguage Model Analysis

Purpose: Probe and analyze factual and commonsense knowledge contained in pre‑trained language models.

Composition: Data from Google_RE, TRex (Wikidata subset), ConceptNet, and SQuAD.

Language: English (en)

License: CC‑BY‑4.0

Multilinguality: Monolingual

Size Categories:

  • <1K
  • 1K‑10K
  • 10K‑100K
  • 1M‑10M

Task Types:

  • Text Retrieval
  • Text Classification

Task IDs:

  • Fact‑checking Retrieval
  • Text Scoring

Configurations: conceptnet, google_re, squad, trex

Structure

Data Instances:

  • trex: uuid, obj_uri, obj_label, sub_uri, sub_label, predicate_id, ...
  • conceptnet: uuid, sub, obj, pred, obj_label, ...
  • squad: id, sub_label, obj_label, ...
  • google_re: uuid, pred, sub, obj, evidences, judgments, ...

Splits: No explicit splits provided.

Creation

Source Data: Aggregated from existing datasets, cleaned and adapted for probing.

Annotation: Mixed crowd‑sourced, expert‑generated, and machine‑generated annotations.

Usage Notes

Social Impact: Designed to evaluate language‑model understanding.

Bias Discussion: Crowd‑sourced data may contain biases.

Known Limitations: Limited documentation of original fields.

Need downstream help?

Pair the dataset with AI analysis and content workflows.

Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.

Explore AI studio