High Quality Data

Dataset Hub

Explore high-quality datasets for your AI and machine learning projects.

Sort:

Browse by Category

json-training

This dataset is intended to support fine‑tuning of small yet powerful models (e.g., Qwen2 0.5B and SmolLM 135M/360M) that struggle with JSON‑structured data generation tasks. It contains three fields—`query`, `schema`, and `response`—representing the user's plain‑text query, the desired output JSON schema, and an LLM response that conforms to the schema. The data were synthesized by large language models such as Llama 3.1 8B and Claude 3.5 Sonnet and will be updated regularly.

huggingface

View Details