Back to datasets
Dataset assetOpen Source CommunityNatural Language ProcessingNews Articles
fine-tuned/NFCorpus-512-192-gpt-4o-2024-05-13-43315
The dataset "news articles" is a generated dataset designed to support the development of domain‑specific embedding models for retrieval tasks.
Source
hugging_face
Created
Nov 28, 2025
Updated
May 28, 2024
Signals
183 views
Availability
Linked source ready
Overview
Dataset description and usage context
NFCorpus-512-192-gpt-4o-2024-05-13-43315 Dataset
Overview
- Name: news articles
- License: Apache-2.0
- Language: English
- Task Categories:
- Feature Extraction
- Sentence Similarity
- Tags:
- sentence-transformers
- feature-extraction
- sentence-similarity
- mteb
- News
- Articles
- Journalism
- Media
- Current Events
- Size Category: n<1K
Dataset Description
This dataset is a synthetic collection created specifically to enable the development of domain‑specific embedding models, primarily for retrieval tasks.
Associated Model
The dataset is used to train the NFCorpus-512-192-gpt-4o-2024-05-13-43315 model.
Usage
To train or evaluate models with this dataset, load it via the Hugging Face datasets library:
from datasets import load_dataset
dataset = load_dataset("fine-tuned/NFCorpus-512-192-gpt-4o-2024-05-13-43315")
print(dataset[test][0])
Need downstream help?
Pair the dataset with AI analysis and content workflows.
Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.