DATASET
Open Source Community
MinDat-Mineral-Image-Dataset
A dataset containing over 500,000 mineral images, each labeled, sourced from mindat.org. The dataset includes two CSV files that store image URLs and cleaned label information.
Updated 9/22/2023
github
Description
MinDat-Mineral-Image-Dataset Overview
Basic Dataset Information
- Dataset Name: MinDat-Mineral-Image-Dataset
- Volume: Over 500,000 mineral images
- Format: Contains two CSV files
img_url_list.csv: Contains image URLs and their original labelsimg_url_list_converted.csv: Contains cleaned labels and image URLs of images whose unlabeled images have been removed
- Source: Scraped from [mindat.org]
- Processing Time:
- CSV file generation takes ~10 hours
- Image download takes ~24 hours (assuming network speed >10 Mbps)
Dataset Generation Process
- Run
make_url_list.pyto fetch all image URLs and save them to theimg_urlsdirectory. - Run the
concat_url_filesscript to merge URL files intoimg_url_list.csv. - Run
convert_img_url_list.pyto clean labels and generateimg_url_list_converted.csv. - Run
download_images.pyto download all images to the specified directory.
Dataset Characteristics
- Some images have extremely high resolution, with total data size around 400 GB.
- During label cleaning, variant labels such as “Capped Quartz, Chalcedony Quartz” were simplified to “Quartz”.
Example Images
AI studio
Generate PPTs instantly with Nano Banana Pro.
Generate PPT NowAccess Dataset
Login to Access
Please login to view download links and access full dataset details.
Topics
Mineral Recognition
Image Dataset
Source
Organization: github
Created: 6/25/2017
Power Your Data Analysis with Premium AI Models
Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.
Enjoy a free trial and save 20%+ compared to official pricing.

