FEMNIST
FEMNIST is an image classification dataset containing 62 classes (10 digits, 26 lowercase letters, 26 uppercase letters), with images of size 28×28 pixels (optionally upscaled to 128×128 pixels), involving 3,500 users. The dataset is derived by partitioning the EMNIST dataset so that each user's data includes characters written by a single author.
Description
Dataset Overview
1. FEMNIST
- Type: Image dataset
- Details: Contains 62 classes (10 digits, 26 lowercase letters, 26 uppercase letters), image size 28×28 pixels, optionally adjustable to 128×128 pixels, with a total of 3,500 users.
- Task: Image classification
2. Sentiment140
- Type: Text dataset containing tweets
- Details: 660,120 users
- Task: Sentiment analysis
- Format: Includes training and test sets; data stored in JSON format, each user’s data includes tweet content and sentiment label.
3. Shakespeare
- Type: Text dataset containing Shakespeare dialogues
- Details: 1,129 users (reduced to 660 users)
- Task: Next‑character prediction
- Format: Text format containing dialogue content.
4. Celeba
- Type: Image dataset
- Details: 9,343 users (excluding celebrities with fewer than 5 images)
- Task: Image classification (smiling vs. non‑smiling)
5. Synthetic Dataset
- Type: Synthetic dataset
- Details: Users can customize the number of devices, number of classes, dimensions, etc.
- Task: Classification
6. Reddit
- Type: Text dataset containing Reddit comments
- Details: 1,660,820 users, total of 56,587,343 comments
- Task: Next‑word prediction
7. CIFAR 10 / CIFAR 100
- Type: Image classification dataset
- Details: 60,000 color images of size 32×32 pixels, distributed across 10 and 100 classes respectively, with 50,000/10,000 training/testing split.
- Task: Image classification
8. FedVision - Street Dataset
- Type: Real‑world object detection dataset
- Details: Contains 5,20 devices, 956 samples, 7 classes
- Task: Object detection
- Format: Includes image data and training labels, stored in JSON format.
9. EMNIST
- Type: Extended MNIST dataset containing English letters and digits
- Details: Divided into 6 subsets: By_Class, By_Merge, Balanced, Digits, Letters, and MNIST
- Task: Classification
10. MovieLens
- Type: Structured dataset
- Details: Contains user ratings for videos and video attributes; ratings are on a 5‑point scale
- Task: Recommendation system
- Format: Includes ratings.dat, users.dat, and movies.dat
11. Credit
- Type: Structured dataset
- Details: Contains user attributes such as gender, education level, etc. Credit 1 includes 150,000 samples with 10 attributes; Credit 2 includes 30,000 samples with 25 attributes
- Task: Classification (predict whether a user will default on repayment)
12. ModelNet
- Type: Image classification dataset
- Details: Contains 2,311 3D models from 40 categories captured from various viewpoints
- Task: Image classification
- Processing: Requires conversion of CAD models to images using open‑source software Blender
13. PersonaChat
- Type: Dialogue dataset
- Details: Naturally non‑i.i.d. partitioned, based on assigned personas, divided into 17,568 clients
14. KWS
- Type: Speech command dataset
- Task: Limited‑vocabulary speech recognition
15. Flickr
- Type: Personalized image aesthetic dataset
- Task: Personalized image classification
AI studio
Generate PPTs instantly with Nano Banana Pro.
Generate PPT NowAccess Dataset
Please login to view download links and access full dataset details.
Topics
Source
Organization: github
Created: 11/17/2020
Power Your Data Analysis with Premium AI Models
Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.
Enjoy a free trial and save 20%+ compared to official pricing.