Dataset assetOpen Source CommunityHuman-Machine CollaborationPlanning and Reasoning

PARTNR

PARTNR is a benchmark dataset created by FAIR Meta for studying planning and reasoning tasks in human‑robot collaboration. It contains 100,000 natural‑language tasks covering 60 houses and 5,819 unique objects, designed to simulate cooperative scenarios in everyday household activities. The data were generated through a semi‑automated pipeline that combines large language models (LLMs) with a simulated environment, emphasizing constraints on spatial, temporal, and heterogeneous agent capabilities. PARTNR is intended to advance robot‑human collaboration in complex tasks, addressing current model shortcomings in coordination, task tracking, and error recovery.

Source

arXiv

Created

Nov 1, 2024

Updated

Nov 1, 2024

Signals

698 views

Availability

Linked source ready

Overview

Dataset description and usage context

PARTNR: A Benchmark for Planning and Reasoning in Embodied Multi‑Agent Tasks

Overview

Dataset Name: PARTNR
Domain: Planning and reasoning in embodied multi‑agent tasks
Main Abstractions:
- Agent: Represents a robot or a human capable of acting in the environment.
- Planner: Represents a centralized or decentralized planner.
- Tool: Abstracts a capability that allows an agent to sense or interact with the environment.
- Skill: Low‑level abilities that agents can use to interact with the environment.
Dataset Generation: Produced using large language models (LLM).

Code Organization

habitat‑llm:
- Agent: Represents a robot or a human.
- Tools: Abstracts sensing or interaction capabilities.
- Planner: Represents centralized and decentralized planners.
- LLM: Contains abstractions for Llama and GPT APIs.
- WorldGraph: Hierarchical world graph that represents rooms, furniture, and objects.
- Perception: Simulated perception pipeline that sends local detections to the world model.
- Examples: Demonstrations and evaluation scripts for showcasing or analyzing planner performance.
- EvaluationRunner: Abstracts the execution of a planner.
- Conf: Hydra configuration files for all classes.
- Utils: Various utility functions required by the code base.
- Tests: Unit tests.
scripts:
- hitl_analysis: Scripts for analyzing and replaying human‑in‑the‑loop interaction trajectories.
- prediviz: Visualization and annotation tools for PARTNR tasks.

Information Flow

EnvironmentInterface: Reads observations from each agent and forwards them to the perception module.
Perception Module: Processes observations and updates the world graph.
Planner: Uses the world graph and task description to select tools and interact with the environment.

Installation

Follow the instructions in INSTALLATION.md.

Quick Start

Dataset Splits: train_2k, val, train, val_mini

Examples:

Decentralized Multi‑Agent React Summary:

python -m habitat_llm.examples.planner_demo --config-name baselines/decentralized_zero_shot_react_summary.yaml \
    habitat.dataset.data_path="data/datasets/partnr_episodes/v0_0/val_mini.json.gz" \
    evaluation.agents.agent_0.planner.plan_config.llm.inference_mode=hf \
    evaluation.agents.agent_1.planner.plan_config.llm.inference_mode=hf \
    evaluation.agents.agent_0.planner.plan_config.llm.generation_params.engine=meta-llama/Meta-Llama-3-8B-Instruct \
    evaluation.agents.agent_1.planner.plan_config.llm.generation_params.engine=meta-llama/Meta-Llama-3-8B-Instruct

Centralized Multi‑Agent React Summary:

python -m habitat_llm.examples.planner_demo --config-name baselines/centralized_zero_shot_react_summary.yaml \
    habitat.dataset.data_path="data/datasets/partnr_episodes/v0_0/val_mini.json.gz" \
    evaluation.planner.plan_config.llm.inference_mode=hf \
    evaluation.planner.plan_config.llm.generation_params.engine=meta-llama/Meta-Llama-3-8B-Instruct

Single‑Agent React Summary:

python -m habitat_llm.examples.planner_demo --config-name baselines/single_agent_zero_shot_react_summary.yaml \
    habitat.dataset.data_path="data/datasets/partnr_episodes/v0_0/val_mini.json.gz" \
    evaluation.agents.agent_0.planner.plan_config.llm.inference_mode=hf \
    evaluation.agents.agent_0.planner.plan_config.llm.generation_params.engine=meta-llama/Meta-Llama-3-8B-Instruct

Heuristic Planner:

python -m habitat_llm.examples.planner_demo --config-name baselines/heuristic_full_obs.yaml \
    habitat.dataset.data_path="data/datasets/partnr_episodes/v0_0/val_mini.json.gz"

Results

Use python scripts/read_results.py <output_dir>/<dataset_name> to inspect progress and results.

Test Set

Run the following command to test dataset viability and initial step success rate:

HYDRA_FULL_ERROR=1 python -m habitat_llm.examples.verify_episodes \
    --config-name examples/planner_multi_agent_demo_config.yaml \
    hydra.run.dir="." \
    evaluation=centralized_evaluation_runner_multi_agent \
    habitat.dataset.data_path="data/datasets/partnr_episodes/v0_0/val_mini.json.gz" \
    mode=data \
    world_model.partial_obs=False \
    evaluation.type="centralized" \
    num_proc=5

License

License: MIT

Need downstream help?

Pair the dataset with AI analysis and content workflows.

Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.

Explore AI studio