3W Dataset
This is the first public dataset of real oil wells containing rare adverse events, which can serve as a benchmark dataset for developing machine learning techniques related to the inherent challenges of real-world data.
Description
3W Dataset Overview
Dataset Description
3W dataset is the first public real dataset containing rare adverse real events in oil wells, and can be used as a benchmark for developing machine learning techniques related to the inherent difficulties of real data. The dataset consists of instances of eight adverse events, involves eight process variables, and includes expert‑validated historical instances as well as simulated and hand‑drawn instances.
Dataset Structure
3W dataset contains 1,984 CSV files stored in a 7z archive under the data directory. Each file represents an instance, and the filename reveals its source. The data format has one observation per row, one series per column, columns separated by commas, and decimal points represented by periods. The first column is the timestamp, the last column is the observation label, and the remaining columns contain multivariate time series data.
Citation Information
When using the 3W dataset, the following reference should be cited:
@article{VARGAS2019106223, title = "A realistic and public dataset with rare undesirable real events in oil wells", journal = "Journal of Petroleum Science and Engineering", volume = "181", pages = "106223", year = "2019", issn = "0920-4105", doi = "https://doi.org/10.1016/j.petrol.2019.106223", url = "http://www.sciencedirect.com/science/article/pii/S0920410519306357", author = "Ricardo Emanuel Vaz Vargas and Celso José Munaro and Patrick Marques Ciarelli and André Gonçalves Medeiros and Bruno Guberfain do Amaral and Daniel Centurion Barrionuevo and Jean Carlos Dias de Araújo and Jorge Lins Ribeiro and Lucas Pierezan Magalhães", keywords = "Fault detection and diagnosis, Oil well monitoring, Abnormal event management, Multivariate time series classification", abstract = "Detection of undesirable events in oil and gas wells can help prevent production losses, environmental accidents, and human casualties and reduce maintenance costs. The scarcity of measurements in such processes is a drawback due to the low reliability of instrumentation in such hostile environments. Another issue is the absence of adequately structured data related to events that should be detected. To contribute to providing a priori knowledge about undesirable events for diagnostic algorithms in offshore naturally flowing wells, this work presents an original and valuable dataset with instances of eight types of undesirable events characterized by eight process variables. Many hours of expert work were required to validate historical instances and to produce simulated and hand-drawn instances that can be useful to distinguish normal and abnormal actual events under different operating conditions. The choices made during this datasets preparation are described and justified, and specific benchmarks that practitioners and researchers can use together with the published dataset are defined. This work has resulted in two relevant contributions. A challenging public dataset that can be used as a benchmark for the development of (i) machine learning techniques related to inherent difficulties of actual data, and (ii) methods for specific tasks associated with detecting and diagnosing undesirable events in offshore naturally flowing oil and gas wells. The other contribution is the proposal of the defined benchmarks." }
Dataset Usage
The dataset provides results of several benchmark experiments, including:
- Benchmark 1: Impact of using simulated and hand‑drawn instances (code and results link)
- Benchmark 2: Anomaly detection (code and results link)
These results can serve as benchmark references for researchers and practitioners.
AI studio
Generate PPTs instantly with Nano Banana Pro.
Generate PPT NowAccess Dataset
Please login to view download links and access full dataset details.
Topics
Source
Organization: github
Created: 1/19/2019
Power Your Data Analysis with Premium AI Models
Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.
Enjoy a free trial and save 20%+ compared to official pricing.