json-training
This dataset is intended to support fine‑tuning of small yet powerful models (e.g., Qwen2 0.5B and SmolLM 135M/360M) that struggle with JSON‑structured data generation tasks. It contains three fields—`query`, `schema`, and `response`—representing the user's plain‑text query, the desired output JSON schema, and an LLM response that conforms to the schema. The data were synthesized by large language models such as Llama 3.1 8B and Claude 3.5 Sonnet and will be updated regularly.
Description
JSON Training Data
数据集概述
该数据集旨在为小型但功能强大的模型(如Qwen2 0.5B和SmolLM 135M/360M)提供微调数据,特别是在JSON结构化数据生成方面。这些模型在处理JSON输出时表现不佳,因此需要专门的数据集进行微调。
数据收集
数据完全由大型语言模型(LLMs)合成生成,主要使用Llama 3.1 8B生成,并由Claude 3.5 Sonnet贡献约2000个示例。
数据字段
数据集包含以下字段:
query:用户的纯文本查询,无结构化组件。schema:期望的输出JSON模式。response:符合schema的LLM对query的示例响应。
用户可以根据需要将这些字段转换为任何格式进行微调,例如将模式放入系统提示中,或将模式注入用户消息中。
AI studio
Generate PPTs instantly with Nano Banana Pro.
Generate PPT NowAccess Dataset
Please login to view download links and access full dataset details.
Topics
Source
Organization: huggingface
Created: 8/21/2024
Power Your Data Analysis with Premium AI Models
Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.
Enjoy a free trial and save 20%+ compared to official pricing.