Back to datasets
Dataset assetOpen Source CommunityHuman-Machine CollaborationPlanning and Reasoning

PARTNR

PARTNR is a benchmark dataset created by FAIR Meta for studying planning and reasoning tasks in human‑robot collaboration. It contains 100,000 natural‑language tasks covering 60 houses and 5,819 unique objects, designed to simulate cooperative scenarios in everyday household activities. The data were generated through a semi‑automated pipeline that combines large language models (LLMs) with a simulated environment, emphasizing constraints on spatial, temporal, and heterogeneous agent capabilities. PARTNR is intended to advance robot‑human collaboration in complex tasks, addressing current model shortcomings in coordination, task tracking, and error recovery.

Source
arXiv
Created
Nov 1, 2024
Updated
Nov 1, 2024
Signals
698 views
Availability
Linked source ready
Overview

Dataset description and usage context

PARTNR: A Benchmark for Planning and Reasoning in Embodied Multi‑Agent Tasks

Overview

  • Dataset Name: PARTNR
  • Domain: Planning and reasoning in embodied multi‑agent tasks
  • Main Abstractions:
    • Agent: Represents a robot or a human capable of acting in the environment.
    • Planner: Represents a centralized or decentralized planner.
    • Tool: Abstracts a capability that allows an agent to sense or interact with the environment.
    • Skill: Low‑level abilities that agents can use to interact with the environment.
  • Dataset Generation: Produced using large language models (LLM).

Code Organization

  • habitat‑llm:
    • Agent: Represents a robot or a human.
    • Tools: Abstracts sensing or interaction capabilities.
    • Planner: Represents centralized and decentralized planners.
    • LLM: Contains abstractions for Llama and GPT APIs.
    • WorldGraph: Hierarchical world graph that represents rooms, furniture, and objects.
    • Perception: Simulated perception pipeline that sends local detections to the world model.
    • Examples: Demonstrations and evaluation scripts for showcasing or analyzing planner performance.
    • EvaluationRunner: Abstracts the execution of a planner.
    • Conf: Hydra configuration files for all classes.
    • Utils: Various utility functions required by the code base.
    • Tests: Unit tests.
  • scripts:
    • hitl_analysis: Scripts for analyzing and replaying human‑in‑the‑loop interaction trajectories.
    • prediviz: Visualization and annotation tools for PARTNR tasks.

Information Flow

  • EnvironmentInterface: Reads observations from each agent and forwards them to the perception module.
  • Perception Module: Processes observations and updates the world graph.
  • Planner: Uses the world graph and task description to select tools and interact with the environment.

Installation

Quick Start

  • Dataset Splits: train_2k, val, train, val_mini
  • Examples:
    • Decentralized Multi‑Agent React Summary:
      python -m habitat_llm.examples.planner_demo --config-name baselines/decentralized_zero_shot_react_summary.yaml \
          habitat.dataset.data_path="data/datasets/partnr_episodes/v0_0/val_mini.json.gz" \
          evaluation.agents.agent_0.planner.plan_config.llm.inference_mode=hf \
          evaluation.agents.agent_1.planner.plan_config.llm.inference_mode=hf \
          evaluation.agents.agent_0.planner.plan_config.llm.generation_params.engine=meta-llama/Meta-Llama-3-8B-Instruct \
          evaluation.agents.agent_1.planner.plan_config.llm.generation_params.engine=meta-llama/Meta-Llama-3-8B-Instruct
      
    • Centralized Multi‑Agent React Summary:
      python -m habitat_llm.examples.planner_demo --config-name baselines/centralized_zero_shot_react_summary.yaml \
          habitat.dataset.data_path="data/datasets/partnr_episodes/v0_0/val_mini.json.gz" \
          evaluation.planner.plan_config.llm.inference_mode=hf \
          evaluation.planner.plan_config.llm.generation_params.engine=meta-llama/Meta-Llama-3-8B-Instruct
      
    • Single‑Agent React Summary:
      python -m habitat_llm.examples.planner_demo --config-name baselines/single_agent_zero_shot_react_summary.yaml \
          habitat.dataset.data_path="data/datasets/partnr_episodes/v0_0/val_mini.json.gz" \
          evaluation.agents.agent_0.planner.plan_config.llm.inference_mode=hf \
          evaluation.agents.agent_0.planner.plan_config.llm.generation_params.engine=meta-llama/Meta-Llama-3-8B-Instruct
      
    • Heuristic Planner:
      python -m habitat_llm.examples.planner_demo --config-name baselines/heuristic_full_obs.yaml \
          habitat.dataset.data_path="data/datasets/partnr_episodes/v0_0/val_mini.json.gz"
      

Results

  • Use python scripts/read_results.py <output_dir>/<dataset_name> to inspect progress and results.

Test Set

  • Run the following command to test dataset viability and initial step success rate:
    HYDRA_FULL_ERROR=1 python -m habitat_llm.examples.verify_episodes \
        --config-name examples/planner_multi_agent_demo_config.yaml \
        hydra.run.dir="." \
        evaluation=centralized_evaluation_runner_multi_agent \
        habitat.dataset.data_path="data/datasets/partnr_episodes/v0_0/val_mini.json.gz" \
        mode=data \
        world_model.partial_obs=False \
        evaluation.type="centralized" \
        num_proc=5
    

License

  • License: MIT
Need downstream help?

Pair the dataset with AI analysis and content workflows.

Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.

Explore AI studio