Explore high-quality datasets for your AI and machine learning projects.
ProcessTBench is a synthetic dataset for evaluating the planning capabilities of large language models (LLMs) within a process mining framework. Built upon TaskBench, it contains 532 base queries, each paraphrased 5–6 times, with an average of 4.08 solution plans per query. The dataset involves action sequences using 40 distinct tools and provides corresponding ground‑truth plans in Petri‑net format. Creation involved selecting the most challenging subset from TaskBench, generating plans with LLMs, and processing them using an event‑log parser and a plan‑conformance checker. ProcessTBench aims to support research on LLM plan generation in complex and dynamic environments, especially regarding multilingual and paraphrased queries.