OpenMol/PubChemSFT
This dataset contains single‑turn dialogues with SMILES molecular descriptions, formatted as JSON and including SMILES strings with their corresponding textual descriptions. The dataset is split into training, validation, and test sets containing 264,391, 33,072, and 32,987 samples respectively. Dialogue templates consist of human queries and GPT‑generated molecule descriptions. Additionally, 14 query templates are provided for generating the query portion of the dialogues.