JUHE API Marketplace
DATASET
Open Source Community

FEMNIST

FEMNIST is an image classification dataset containing 62 classes (10 digits, 26 lowercase letters, 26 uppercase letters), with images of size 28×28 pixels (optionally upscaled to 128×128 pixels), involving 3,500 users. The dataset is derived by partitioning the EMNIST dataset so that each user's data includes characters written by a single author.

Updated 7/7/2022
github

Description

Dataset Overview

1. FEMNIST

  • Type: Image dataset
  • Details: Contains 62 classes (10 digits, 26 lowercase letters, 26 uppercase letters), image size 28×28 pixels, optionally adjustable to 128×128 pixels, with a total of 3,500 users.
  • Task: Image classification

2. Sentiment140

  • Type: Text dataset containing tweets
  • Details: 660,120 users
  • Task: Sentiment analysis
  • Format: Includes training and test sets; data stored in JSON format, each user’s data includes tweet content and sentiment label.

3. Shakespeare

  • Type: Text dataset containing Shakespeare dialogues
  • Details: 1,129 users (reduced to 660 users)
  • Task: Next‑character prediction
  • Format: Text format containing dialogue content.

4. Celeba

  • Type: Image dataset
  • Details: 9,343 users (excluding celebrities with fewer than 5 images)
  • Task: Image classification (smiling vs. non‑smiling)

5. Synthetic Dataset

  • Type: Synthetic dataset
  • Details: Users can customize the number of devices, number of classes, dimensions, etc.
  • Task: Classification

6. Reddit

  • Type: Text dataset containing Reddit comments
  • Details: 1,660,820 users, total of 56,587,343 comments
  • Task: Next‑word prediction

7. CIFAR 10 / CIFAR 100

  • Type: Image classification dataset
  • Details: 60,000 color images of size 32×32 pixels, distributed across 10 and 100 classes respectively, with 50,000/10,000 training/testing split.
  • Task: Image classification

8. FedVision - Street Dataset

  • Type: Real‑world object detection dataset
  • Details: Contains 5,20 devices, 956 samples, 7 classes
  • Task: Object detection
  • Format: Includes image data and training labels, stored in JSON format.

9. EMNIST

  • Type: Extended MNIST dataset containing English letters and digits
  • Details: Divided into 6 subsets: By_Class, By_Merge, Balanced, Digits, Letters, and MNIST
  • Task: Classification

10. MovieLens

  • Type: Structured dataset
  • Details: Contains user ratings for videos and video attributes; ratings are on a 5‑point scale
  • Task: Recommendation system
  • Format: Includes ratings.dat, users.dat, and movies.dat

11. Credit

  • Type: Structured dataset
  • Details: Contains user attributes such as gender, education level, etc. Credit 1 includes 150,000 samples with 10 attributes; Credit 2 includes 30,000 samples with 25 attributes
  • Task: Classification (predict whether a user will default on repayment)

12. ModelNet

  • Type: Image classification dataset
  • Details: Contains 2,311 3D models from 40 categories captured from various viewpoints
  • Task: Image classification
  • Processing: Requires conversion of CAD models to images using open‑source software Blender

13. PersonaChat

  • Type: Dialogue dataset
  • Details: Naturally non‑i.i.d. partitioned, based on assigned personas, divided into 17,568 clients

14. KWS

  • Type: Speech command dataset
  • Task: Limited‑vocabulary speech recognition

15. Flickr

  • Type: Personalized image aesthetic dataset
  • Task: Personalized image classification

AI studio

Generate PPTs instantly with Nano Banana Pro.

Generate PPT Now

Access Dataset

Login to Access

Please login to view download links and access full dataset details.

Topics

Image Classification
Character Recognition

Source

Organization: github

Created: 11/17/2020

Power Your Data Analysis with Premium AI Models

Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.

Enjoy a free trial and save 20%+ compared to official pricing.