GeoQuestions1089
GeoQuestions1089 is a crowdsourced geospatial question‑answering dataset containing 1,089 triples of natural‑language questions, SPARQL/GeoSPARQL queries, and answers, targeting the YAGO2geo knowledge graph. The dataset is split into two parts: GeoQuestions_c (1,017 entries without linguistic errors) and GeoQuestions_w (72 entries with grammar, syntax, or spelling errors). Version 1.1 introduced several improvements, including unified query format, corrected natural‑language case handling, query classification fixes, and replacement of erroneous triples. Questions are categorized into nine groups covering various aspects of geospatial QA.
Description
GeoQuestions1089 Dataset Overview
Basic Information
- License: CC BY 4.0
- Task Category: Question Answering System
- Language: English
- Data Size: 1K < n < 10K
Dataset Description
GeoQuestions1089 is a crowdsourced geospatial QA dataset containing 1,089 sets of natural‑language questions, SPARQL/GeoSPARQL queries, and their answers, targeting the YAGO2geo knowledge graph.
Dataset Structure
The dataset is divided into two parts:
- GeoQuestions_c: 1,017 entries, questions are free of grammatical, syntactic, and spelling errors.
- GeoQuestions_w: 72 entries, questions contain grammatical, syntactic, or spelling errors.
Dataset Versions
- Current Version: 1.1
- Version 1.1 Updates:
- Unified query format and variable naming
- Fixed natural‑language case issues
- Corrected query classification errors
- Replaced stSPARQL functions with GeoSPARQL functions
- Improved query correctness
- Replaced erroneous triples
Dataset Classification
Questions are grouped into nine categories:
- Inquiry about the attribute or spatial property of a feature
- Inquiry whether a feature has a geospatial relation with another (or multiple) feature(s)
- Inquiry about the geospatial relation between a feature of a given class and another feature
- Inquiry about the geospatial relation between a feature of a given class and any other class feature
- Inquiry about the geospatial relation between a feature of a given class and an unspecified feature, where one or both are also related to a specifically named feature
- Similar to C, D, E but with more subject features and/or spatial features
- Questions involving quantity and aggregation
- Questions containing superlative or comparative forms
- Questions involving quantity, aggregation, and superlative/comparative forms
Category Distribution
| Category | GeoQuestions1089_c | GeoQuestions1089_w |
|---|---|---|
| A | 173 | 16 |
| B | 139 | 11 |
| C | 176 | 14 |
| D | 22 | 1 |
| E | 138 | 6 |
| F | 24 | 2 |
| G | 174 | 11 |
| H | 145 | 9 |
| I | 26 | 2 |
Benchmarks
The dataset is used to evaluate two QA engines:
- GeoQA2
- Hamzei et al. engine
Evaluation Results
GeoQA2
| Category | Executable Queries (C) | Correct Answers (C) | Correct* (C) | Executable Queries (W) | Correct Answers (W) | Correct* (W) |
|---|---|---|---|---|---|---|
| A | 83.81% | 50.86% | 60.68% | 75.00% | 50.00% | 66.67% |
| B | 74.82% | 60.43% | 80.76% | 81.81% | 45.45% | 55.56% |
| C | 81.25% | 45.45% | 55.94% | 85.71% | 50.00% | 58.34% |
| D | 54.54% | 9.09% | 16.67% | 100.00% | 0.00% | 0.00% |
| E | 76.08% | 24.63% | 32.38% | 50.00% | 33.33% | 66.67% |
| F | 58.33% | 25.00% | 42.85% | 50.00% | 0.00% | 0.00% |
| G | 73.56% | 33.33% | 45.31% | 36.36% | 0.00% | 0.00% |
| H | 66.89% | 18.62% | 27.83% | 66.67% | 0.00% | 0.00% |
| I | 80.76% | 19.23% | 23.80% | 50.00% | 0.00% | 0.00% |
| Total | 75.61% | 37.75% | 49.93% | 68.05% | 30.55% | 44.89% |
Hamzei et al.
| Category | Executable Queries (C) | Correct Answers (C) | Correct* (C) | Executable Queries (W) | Correct Answers (W) | Correct* (W) |
|---|---|---|---|---|---|---|
| A | 82.08% | 23.12% | 28.16% | 93.75% | 6.25% | 6.67% |
| B | 94.96% | 53.23% | 56.06% | 100.00% | 54.54% | 54.54% |
| C | 81.81% | 26.13% | 31.94% | 100.00% | 14.28% | 14.28% |
| D | 81.81% | 4.54% | 5.55% | 100.00% | 0.00% | 0.00% |
| E | 92.75% | 6.52% | 7.03% | 83.34% | 0.00% | 0.00% |
| F | 62.50% | 12.50% | 20.00% | 90.90% | 0.00% | 0.00% |
| G | 80.45% | 10.34% | 12.85% | 100.00% | 0.00% | 0.00% |
| H | 77.93% | 26.89% | 34.51% | 77.78% | 0.00% | 0.00% |
| I | 84.61% | 7.96% | 9.09% | 50.00% | 0.00% | 0.00% |
| Total | 83.97% | 22.81% | 27.28% | 93.05% | 12.50% | 13.43% |
Materialization and Translators
To improve query execution time, relationships between certain entities in the YAGO2geo KG were pre‑computed and materialized.
RDF Store
Experiments were run on GraphDB to generate gold answers and generated query answers.
License
The dataset follows the CC0 Attribution 4.0 International license.
AI studio
Generate PPTs instantly with Nano Banana Pro.
Generate PPT NowAccess Dataset
Please login to view download links and access full dataset details.
Topics
Source
Organization: huggingface
Created: 6/30/2024
Power Your Data Analysis with Premium AI Models
Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.
Enjoy a free trial and save 20%+ compared to official pricing.