Explore high-quality datasets for your AI and machine learning projects.
The Web2Code dataset was created by MBZUAI to improve multimodal large language models' (MLLMs) capabilities in web understanding and HTML code generation. It comprises 11.797 million web instruction‑response pairs, including webpage images, HTML code, and structured questions and answers. The dataset was constructed using GPT‑3.5 and GPT‑4 for data cleaning and new data generation. Web2Code is primarily used for web content generation and task automation, addressing the shortcomings of existing MLLMs in handling web screenshots and generating HTML code.