V‑D4RL Dataset Overview

Dataset Description

V‑D4RL provides pixel‑based analogues of the D4RL benchmark tasks, originating from the dm_control suite, and serves as a natural extension of two state‑of‑the‑art online pixel continuous‑control algorithms, DrQ‑v2 and DreamerV2, to offline settings.

Dataset Structure

The dataset is stored under the vd4rl_data directory with the following layout:

vd4rl_data
├── main
│   ├── walker_walk
│   │   ├── random
│   │   │   ├── 64px
│   │   │   └── 84px
│   │   └── medium_replay
│   │       ...
│   ├── cheetah_run
│   │   ...
│   └── humanoid_walk
│       ...
├── distracting
│   ...
└── multitask
    ...

Benchmarks

Environment Setup

Each environment folder contains a conda_env.yml file specifying the required dependencies. Create the environment with:

conda env create -f conda_env.yml

A Dockerfile is also provided in the dockerfiles directory; replace <<USER_ID>> with your user ID.

Evaluation Command Examples

Below are example commands for various algorithms. Set ENVNAME to one of [walker_walk, cheetah_run, humanoid_walk] and TYPE to one of [random, medium_replay, medium, medium_expert, expert].

Offline DV2

python offlinedv2/train_offline.py \
  --configs dmc_vision \
  --task dmc_${ENVNAME} \
  --offline_dir vd4rl_data/main/${ENVNAME}/${TYPE}/64px \
  --offline_penalty_type meandis \
  --offline_lmbd_cons 10 \
  --seed 0

DrQ+BC

python drqbc/train.py \
  task_name=offline_${ENVNAME}_${TYPE} \
  offline_dir=vd4rl_data/main/${ENVNAME}/${TYPE}/84px \
  nstep=3 seed=0

DrQ+CQL

python drqbc/train.py \
  task_name=offline_${ENVNAME}_${TYPE} \
  offline_dir=vd4rl_data/main/${ENVNAME}/${TYPE}/84px \
  algo=cql cql_importance_sample=false min_q_weight=10 seed=0

BC

python drqbc/train.py \
  task_name=offline_${ENVNAME}_${TYPE} \
  offline_dir=vd4rl_data/main/${ENVNAME}/${TYPE}/84px algo=bc seed=0

Distracting and Multitask Experiments

Run distracting or multitask experiments by altering the offline_dir in the commands above.

Data Collection & Format

The data‑collection pipeline is described in Appendix B of the paper. Conversion scripts are located in the conversion_scripts directory.

Offline DV2 stores data as *.npz files with 64 px images.
DrQ+BC uses *.hdf5 files with 84 px images.

Acknowledgements

V‑D4RL builds upon several open‑source repositories for offline reinforcement learning and online pixel‑based control. Special thanks to the authors of:

Contact

For questions, contact Cong Lu or Philip Ball. Contributions and suggestions are welcome!

conglu/vd4rl

Description