JUHE API Marketplace
DATASET
Open Source Community

Semi-Aves, Semi-Fungi, Semi-CUB, Semi-iNat

These datasets are used for evaluating semi‑supervised learning in fine‑grained classification, including images of birds, fungi, the CUB‑200‑2011 bird dataset, and natural species.

Updated 12/22/2021
github

Description

Dataset Overview

Dataset List

  1. Semi‑Aves: Dataset from the Semi‑Aves Challenge used in the CVPR 2020 FGVC7 workshop.
  2. Semi‑Fungi: Built from the 2018 FGVCx Fungi Classification Challenge for the CVPR 2018 FGVC5 workshop.
  3. Semi‑CUB: Constructed from the Caltech‑UCSD Birds‑200‑2011 dataset.
  4. Semi‑iNat: New dataset for the CVPR 2021 FGVC8 workshop Semi‑iNat Challenge.

Dataset Structure

Each dataset’s split information is stored in data/${dataset}/${split}.txt and includes:

  • l_train: labeled in‑domain data
  • u_train_in: unlabeled in‑domain data
  • u_train_out: unlabeled out‑of‑domain data
  • u_train: union of u_train_in and u_train_out
  • val: validation set
  • l_train_val: union of l_train and val
  • test: test set

Each line lists a filename and its corresponding class label.

Storage

  • Semi‑Aves: data stored in data/semi_aves.
  • Semi‑Fungi and Semi‑CUB: images stored in data/semi_fungi/images and data/cub/images respectively.

Notes

  • Semi‑Fungi: Images resized so the longest side is 300 px.
  • Semi‑Aves: Provides additional cross‑validation splits; labels are withheld to prevent leakage of unlabeled data.
  • Semi‑Aves and Semi‑Fungi: Species name files are included.

Training and Evaluation

CVPR Papers

Code is provided for supervised training, self‑training, pseudo‑labeling, and curriculum pseudo‑labeling, based on this PyTorch implementation.

BMVC Papers

Hierarchical supervision based on coarse labels is added for semi‑supervised learning. Commands and parameters for hierarchical training are supplied.

Pre‑trained Models

Supervised models, MoCo pre‑trained models, and MoCo + supervised models are available for Semi‑Aves and Semi‑Fungi. Models can be downloaded via the provided links.

Related Competitions

  • Semi‑iNat 2021 Competition and Semi‑Aves 2020 Competition offer challenge websites, Kaggle links, and technical reports.

AI studio

Generate PPTs instantly with Nano Banana Pro.

Generate PPT Now

Access Dataset

Login to Access

Please login to view download links and access full dataset details.

Topics

Semi-supervised Learning
Image Classification

Source

Organization: github

Created: 4/1/2021

Power Your Data Analysis with Premium AI Models

Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.

Enjoy a free trial and save 20%+ compared to official pricing.