Back to datasets
Dataset assetOpen Source CommunityComputer VisionAction Recognition
NTU Dataset
The NTU dataset is a multi‑video collection that records 60 different human actions, each captured by three cameras from distinct viewpoints. The data files contain per‑frame skeletal coordinates.
Source
github
Created
Jul 24, 2020
Updated
Jul 24, 2020
Signals
202 views
Availability
Linked source ready
Overview
Dataset description and usage context
NTU Dataset Overview
Dataset Description
The NTU dataset is a collection of videos covering a wide range of human actions. Each action was recorded using three cameras.
List of Recorded Actions
- Drinking water
- Eating/snacking
- Brushing teeth
- Combing hair
- Dropping objects
- Picking up objects
- Throwing
- Sitting down
- Standing up from a seated position
- Clapping
- Reading
- Writing
- Tearing paper
- Putting on a jacket
- Removing a jacket
- Putting on shoes
- Removing shoes
- Wearing glasses
- Taking off glasses
- Wearing a hat/cap
- Removing a hat/cap
- Cheering
- Waving
- Kicking something
- Putting something into a pocket / taking something out of a pocket
- Jumping on one foot
- Jumping
- Making a phone call / answering a phone call
- Using a phone/tablet
- Typing on a keyboard
- Pointing with a finger
- Taking a selfie
- Checking the time (from a watch)
- Rubbing hands
- Nodding / bowing
- Shaking head
- Rubbing face
- Saluting
- Pressing palms together
- Crossing hands in front (signifying stop)
- Sneezing / coughing
- Staggering
- Falling down
- Touching head (headache)
- Touching chest (stomachache/heart pain)
- Touching back (back pain)
- Touching neck (neck pain)
- Nausea or vomiting
- Using a fan (hand or paper) / feeling hot
- Hitting / slapping someone
- Kicking someone
- Pushing someone
- Patting someone's back
- Pointing at someone with a finger
- Hugging someone
- Giving something to someone
- Touching someone's pocket
- Shaking hands
- Walking towards each other
- Walking away from each other
File Naming Convention
Each file is named according to the following information:
- Setup number
- Camera ID
- Performer ID
- Replication number
- Action class label
File Content
Each file contains per‑frame skeletal coordinates. The first three columns of coordinates are retained in the code for training neural networks.
Need downstream help?
Pair the dataset with AI analysis and content workflows.
Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.