Dataset assetOpen Source CommunityNatural Language ProcessingMachine Translation

Lauler/flan-norwegian

This dataset includes multiple feature fields, such as inputs, targets, task, index, as well as input and target fields that have been normalized and back‑translation processed. The dataset is split into training, validation, and test sets, containing 2,771,562, 23,860, and 734,178 examples respectively. The total size of the dataset is 12,154,335,861.0 bytes, with a download size of 5,880,786,502 bytes.

Source

hugging_face

Created

Nov 28, 2025

Updated

Apr 19, 2024

Signals

59 views

Availability

Linked source ready

Overview

Dataset description and usage context

Dataset Overview

Dataset Features

inputs: type large_string
targets: type large_string
task: type large_string
index: type int64
inputs_nor: type large_string
inputs_backtranslation: type large_string
targets_nor: type large_string
targets_backtranslation: type large_string

Dataset Splits

Training set (train):
- Number of examples: 2771562
- Data size: 9331534698.0 bytes
Validation set (validation):
- Number of examples: 23860
- Data size: 85141364.0 bytes
Test set (test):
- Number of examples: 734178
- Data size: 2737659799.0 bytes

Dataset Size

Download size: 5880786502 bytes
Total dataset size: 12154335861.0 bytes

Data File Configuration

Default configuration (default):
- Training path: data/train-*
- Validation path: data/validation-*
- Test path: data/test-*

Need downstream help?

Pair the dataset with AI analysis and content workflows.

Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.

Explore AI studio