DATASET

Open Source Community

MinDat-Mineral-Image-Dataset

A dataset containing over 500,000 mineral images, each labeled, sourced from mindat.org. The dataset includes two CSV files that store image URLs and cleaned label information.

Updated 9/22/2023

github

Description

MinDat-Mineral-Image-Dataset Overview

Basic Dataset Information

Dataset Name: MinDat-Mineral-Image-Dataset
Volume: Over 500,000 mineral images
Format: Contains two CSV files
- img_url_list.csv: Contains image URLs and their original labels
- img_url_list_converted.csv: Contains cleaned labels and image URLs of images whose unlabeled images have been removed
Source: Scraped from [mindat.org]
Processing Time:
- CSV file generation takes ~10 hours
- Image download takes ~24 hours (assuming network speed >10 Mbps)

Dataset Generation Process

Run make_url_list.py to fetch all image URLs and save them to the img_urls directory.
Run the concat_url_files script to merge URL files into img_url_list.csv.
Run convert_img_url_list.py to clean labels and generate img_url_list_converted.csv.
Run download_images.py to download all images to the specified directory.

Dataset Characteristics

Some images have extremely high resolution, with total data size around 400 GB.
During label cleaning, variant labels such as “Capped Quartz, Chalcedony Quartz” were simplified to “Quartz”.

Example Images

AI studio

Generate PPTs instantly with Nano Banana Pro.

Generate PPT Now

Access Dataset

Login to Access

Please login to view download links and access full dataset details.

Topics

Mineral Recognition

Image Dataset

Source

Organization: github

Created: 6/25/2017

Power Your Data Analysis with Premium AI Models

Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.

Enjoy a free trial and save 20%+ compared to official pricing.

Check Prices →