JUHE API Marketplace
API CatalogDatasetsDocsBlog
API CatalogDatasetsDocsBlog

Dataset Catalog

Browse trusted datasets for evaluation, enrichment, and production use.

Category index
Showing 1 of 1 datasets
Category: Automotive Industry

IndustryCorpus_automobile

Data ProcessingAutomotive Industry

This dataset was constructed to address the shortage of industry‑specific training data, including insufficient data volume, low quality, and lack of domain expertise. By applying 22 industry data processing operators to over 100 TB of open‑source data, a high‑quality 3.4 TB multi‑industry Chinese‑English pre‑training dataset was extracted. The filtered data consist of 1 TB Chinese and 2.4 TB English texts, with the Chinese portion annotated with 12 label types. The dataset covers 18 industry categories (e.g., medical, education, literature, finance) and undergoes rule‑based and model‑based filtering as well as document‑level deduplication. It is partitioned into 18 industry‑specific subsets; the description below pertains to the automotive subset.

Source huggingfaceUpdated Jul 26, 2024184 viewsLinked
Inspect dataset
JUHE API Marketplace

Accelerate development and ship production-grade integrations with APIs, MCP services, and AI-first infrastructure workflows.

For Developers

ConsoleDocumentation

Product

Browse APIsTemp Mail APIGlobal SMS

Company

What's NewContact SupportTerms Of ServicePrivacy Policy
Copyright © 2026 JUHEDATA HK LIMITED - All rights reserved