JUHE API Marketplace
DATASET
Open Source Community

facebook/lama

The LAMA dataset is used to analyze and probe factual and commonsense knowledge in pre‑trained language models. It includes multiple configurations such as google_re, trex, conceptnet, and squad, each with specific fields. The dataset is English‑only and monolingual. It was created to assess language‑model understanding without reference translations. The data sources include Google RE, TRex, ConceptNet, and SQuAD. The dataset includes cleaned sentences with mask tokens ([MASK]) and corresponding answers, as well as negative sentences for some configurations.

Updated 1/18/2024
hugging_face

Description

Dataset Overview

Dataset Name: LAMA: LAnguage Model Analysis

Purpose: Probe and analyze factual and commonsense knowledge contained in pre‑trained language models.

Composition: Data from Google_RE, TRex (Wikidata subset), ConceptNet, and SQuAD.

Language: English (en)

License: CC‑BY‑4.0

Multilinguality: Monolingual

Size Categories:

  • <1K
  • 1K‑10K
  • 10K‑100K
  • 1M‑10M

Task Types:

  • Text Retrieval
  • Text Classification

Task IDs:

  • Fact‑checking Retrieval
  • Text Scoring

Configurations: conceptnet, google_re, squad, trex

Structure

Data Instances:

  • trex: uuid, obj_uri, obj_label, sub_uri, sub_label, predicate_id, ...
  • conceptnet: uuid, sub, obj, pred, obj_label, ...
  • squad: id, sub_label, obj_label, ...
  • google_re: uuid, pred, sub, obj, evidences, judgments, ...

Splits: No explicit splits provided.

Creation

Source Data: Aggregated from existing datasets, cleaned and adapted for probing.

Annotation: Mixed crowd‑sourced, expert‑generated, and machine‑generated annotations.

Usage Notes

Social Impact: Designed to evaluate language‑model understanding.

Bias Discussion: Crowd‑sourced data may contain biases.

Known Limitations: Limited documentation of original fields.

AI studio

Generate PPTs instantly with Nano Banana Pro.

Generate PPT Now

Access Dataset

Login to Access

Please login to view download links and access full dataset details.

Topics

Language Model Analysis
Knowledge Probing

Source

Organization: hugging_face

Created: Unknown

Power Your Data Analysis with Premium AI Models

Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.

Enjoy a free trial and save 20%+ compared to official pricing.