Shopping MMLU
Shopping MMLU is a large‑scale multi‑task online‑shopping benchmark dataset created by Amazon. It is designed to comprehensively evaluate large language models (LLMs) on multiple shopping‑related tasks. The dataset comprises 57 tasks covering four core shopping skills—concept understanding, knowledge reasoning, user‑behavior alignment, and multilingual capability—totaling 20,799 questions. It was constructed from authentic Amazon data and reformulated into text‑generation tasks to suit LLM solutions. Shopping MMLU is primarily intended for online‑shopping assistants, aiming to improve the shopping experience by reducing task‑specific engineering effort and enabling interactive user dialogues.
Description
Shopping MMLU Dataset Overview
Dataset Introduction
- Name: Shopping MMLU
- Description: An online‑shopping multi‑task benchmark for large language models (LLMs), covering four primary shopping skills: shopping concept understanding, shopping knowledge reasoning, user‑behavior alignment, and multilingual ability.
- Release Agency: Accepted by the NeurIPS 2024 Dataset and Benchmark Track and used for the Amazon KDD Cup 2024.
Dataset Structure
- Data Folder:
data - Skill‑wise Evaluation Code:
skill_wise_eval - Task‑wise Evaluation Code:
task_wise_eval
Data Formats
- Task Types: Five different task formats:
- Multiple‑choice:
.csvfiles with three columns:question,choices,answer. - Other Tasks:
.jsonfiles containing two fields:input_fieldandtarget_field.
- Multiple‑choice:
Data Download
- Download Method: Download the
data.ziparchive and unzip it into thedatadirectory.
Evaluation Methods
Dependencies
- Main Libraries:
transformers==4.37.0torch==2.1.2+cu121pandas==2.0.3evaluate==0.4.1sentence_transformers==2.2.2rouge_scoresacrebleusacrebleu[jp]
Single‑Task Evaluation
- Example: Evaluate the Vicuna‑7B‑v1.5 model on the
multiple_choicetask.cd task_wise_eval/ python3 hf_multi_choice.py --test_subject asin_compatibility --model_name vicuna2
Skill‑Level Evaluation
- Example: Evaluate the Vicuna‑7B‑v1.5 model on the
skill1_conceptskill.cd skill_wise_eval/ python3 hf_skill_inference.py --model_name vicuna2 --filename skill1_concept --output_filename <your_filename> python3 skill_evaluation.py --data_filename skill1_concept --output_filename vicuna2_<your_filename>
References
- Paper: Detailed information can be found in the arXiv paper.
- KDD Cup Challenge: More details are available on the KDD Cup 2024 website.
AI studio
Generate PPTs instantly with Nano Banana Pro.
Generate PPT NowAccess Dataset
Please login to view download links and access full dataset details.
Topics
Source
Organization: arXiv
Created: 10/28/2024
Power Your Data Analysis with Premium AI Models
Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.
Enjoy a free trial and save 20%+ compared to official pricing.