JUHE API Marketplace
High Quality Data

Dataset Hub

Explore high-quality datasets for your AI and machine learning projects.

Sort:

Browse by Category

Stevens-VLP16-Dataset

LiDAR
Autonomous Driving

该数据集使用安装在Stevens Institute of Technology校园内的UGV - Clearpath Jackal上的Velodyne VLP-16捕获,VLP-16的旋转速率为10Hz。数据集包含超过20,000次扫描和多个闭环。TF变换由LeGO-LOAM提供,整个映射过程也被记录在视频中。

github
View Details

3D_Lane_Synthetic_Dataset

Autonomous Driving
3D Lane Detection

This is a synthetic dataset designed to facilitate the development and evaluation of 3D lane detection methods. It extends the [Apollo Synthetic Dataset](http://apollo.auto/synthetic.html), with construction strategy and evaluation methods based on the ECCV 2020 paper: Gen‑LaneNet: A Generalized and Scalable 3D Lane Detection Approach.

github
View Details

Acti

Autonomous Driving
Cybersecurity

The Acti dataset, created by Beihang University, focuses on mining cybersecurity threat intelligence entities and their relations for autonomous driving vehicles. It contains 908 real automotive cybersecurity reports, comprising 3,678 sentences, 8,195 security entities, and 4,852 semantic relations. Data were collected from the National Vulnerability Database and specific automotive threat intelligence platforms, and annotated using a BIOES joint labeling scheme. The dataset is primarily used for modeling automotive cybersecurity threat intelligence, aiming to extract valuable information from large volumes of cybersecurity data for proactive defense.

arXiv
View Details

OpenDriveLab/DriveLM

Autonomous Driving
Visual Question Answering

The DriveLM dataset supports perception, prediction, planning, behavior and motion tasks through graph‑structured question‑answer pairs. It consists of two parts: DriveLM‑nuScenes and DriveLM‑CARLA. DriveLM‑nuScenes is built on the nuScenes dataset, while DriveLM‑CARLA is collected from the CARLA simulator. Currently, only the training split of DriveLM‑nuScenes is publicly available. The dataset includes a series of questions and answers together with the associated images.

hugging_face
View Details

DriveMLLM

Autonomous Driving
Spatial Understanding

The DriveMLLM dataset, created by the Institute of Automation, Chinese Academy of Sciences and other institutions, focuses on spatial understanding tasks in autonomous driving scenarios. It contains 880 forward‑camera images covering absolute and relative spatial reasoning tasks, accompanied by rich natural‑language questions. Built upon the nuScenes dataset, the images were strictly selected and annotated to ensure clear visibility of objects and explicit spatial relationships. DriveMLLM aims to evaluate and improve multimodal large language models' spatial reasoning abilities in autonomous driving, addressing complex spatial relation understanding.

arXiv
View Details

FB-SSEM-dataset

Autonomous Driving
Image Processing

The FB‑SSEM dataset is a synthetic dataset comprising surround‑view fisheye camera images and BEV (bird’s‑eye‑view) maps generated from simulated ego‑vehicle motion sequences.

github
View Details

LoT-nuScenes

Autonomous Driving
Accident Simulation

LoT‑nuScenes is a virtual long‑tail scenario dataset for parallel vision and parallel vehicles. Built in the CARLA simulator, it contains accident scenarios under various conditions, including six types of motor vehicle accidents and one pedestrian accident, combined with three extreme weather conditions, three time periods, and five location categories. The dataset follows the nuScenes format, equipped with multi‑sensor and 360° views, filling the gap in accident scenario data and providing a long‑tail standardized distribution.

github
View Details

TSEC-Dataset

Autonomous Driving
Video Analysis

TSEC‑Dataset was developed for training and testing video captioning methods in driving scenarios, aiming to describe key events involving the ego vehicle, road environment, and other traffic participants. The dataset aggregates videos from various sources, including on‑board cameras, public datasets, and traffic‑accident videos downloaded from BiliBili and YouTube, to capture diverse traffic scenes. Videos are segmented into independent clips containing 1‑3 key events, totaling 8,000 video clips with a cumulative duration of 11.5 hours.

github
View Details