High Quality Data

Dataset Hub

Explore high-quality datasets for your AI and machine learning projects.

Sort:

Browse by Category

facebook/lama

The LAMA dataset is used to analyze and probe factual and commonsense knowledge in pre‑trained language models. It includes multiple configurations such as google_re, trex, conceptnet, and squad, each with specific fields. The dataset is English‑only and monolingual. It was created to assess language‑model understanding without reference translations. The data sources include Google RE, TRex, ConceptNet, and SQuAD. The dataset includes cleaned sentences with mask tokens ([MASK]) and corresponding answers, as well as negative sentences for some configurations.

hugging_face

View Details