Back to datasets
Dataset assetOpen Source CommunityNatural Language ProcessingNews Articles

fine-tuned/NFCorpus-512-192-gpt-4o-2024-05-13-43315

The dataset "news articles" is a generated dataset designed to support the development of domain‑specific embedding models for retrieval tasks.

Source
hugging_face
Created
Nov 28, 2025
Updated
May 28, 2024
Signals
183 views
Availability
Linked source ready
Overview

Dataset description and usage context

NFCorpus-512-192-gpt-4o-2024-05-13-43315 Dataset

Overview

  • Name: news articles
  • License: Apache-2.0
  • Language: English
  • Task Categories:
    • Feature Extraction
    • Sentence Similarity
  • Tags:
    • sentence-transformers
    • feature-extraction
    • sentence-similarity
    • mteb
    • News
    • Articles
    • Journalism
    • Media
    • Current Events
  • Size Category: n<1K

Dataset Description

This dataset is a synthetic collection created specifically to enable the development of domain‑specific embedding models, primarily for retrieval tasks.

Associated Model

The dataset is used to train the NFCorpus-512-192-gpt-4o-2024-05-13-43315 model.

Usage

To train or evaluate models with this dataset, load it via the Hugging Face datasets library:

from datasets import load_dataset

dataset = load_dataset("fine-tuned/NFCorpus-512-192-gpt-4o-2024-05-13-43315")
print(dataset[test][0])
Need downstream help?

Pair the dataset with AI analysis and content workflows.

Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.

Explore AI studio