JUHE API Marketplace
DATASET
Open Source Community

MinDat-Mineral-Image-Dataset

A dataset containing over 500,000 mineral images, each labeled, sourced from mindat.org. The dataset includes two CSV files that store image URLs and cleaned label information.

Updated 9/22/2023
github

Description

MinDat-Mineral-Image-Dataset Overview

Basic Dataset Information

  • Dataset Name: MinDat-Mineral-Image-Dataset
  • Volume: Over 500,000 mineral images
  • Format: Contains two CSV files
    • img_url_list.csv: Contains image URLs and their original labels
    • img_url_list_converted.csv: Contains cleaned labels and image URLs of images whose unlabeled images have been removed
  • Source: Scraped from [mindat.org]
  • Processing Time:
    • CSV file generation takes ~10 hours
    • Image download takes ~24 hours (assuming network speed >10 Mbps)

Dataset Generation Process

  1. Run make_url_list.py to fetch all image URLs and save them to the img_urls directory.
  2. Run the concat_url_files script to merge URL files into img_url_list.csv.
  3. Run convert_img_url_list.py to clean labels and generate img_url_list_converted.csv.
  4. Run download_images.py to download all images to the specified directory.

Dataset Characteristics

  • Some images have extremely high resolution, with total data size around 400 GB.
  • During label cleaning, variant labels such as “Capped Quartz, Chalcedony Quartz” were simplified to “Quartz”.

Example Images

  • Example Image 1
  • Example Image 2

AI studio

Generate PPTs instantly with Nano Banana Pro.

Generate PPT Now

Access Dataset

Login to Access

Please login to view download links and access full dataset details.

Topics

Mineral Recognition
Image Dataset

Source

Organization: github

Created: 6/25/2017

Power Your Data Analysis with Premium AI Models

Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.

Enjoy a free trial and save 20%+ compared to official pricing.