JUHE API Marketplace
DATASET
Open Source Community

GeoQuestions1089

GeoQuestions1089 is a crowdsourced geospatial question‑answering dataset containing 1,089 triples of natural‑language questions, SPARQL/GeoSPARQL queries, and answers, targeting the YAGO2geo knowledge graph. The dataset is split into two parts: GeoQuestions_c (1,017 entries without linguistic errors) and GeoQuestions_w (72 entries with grammar, syntax, or spelling errors). Version 1.1 introduced several improvements, including unified query format, corrected natural‑language case handling, query classification fixes, and replacement of erroneous triples. Questions are categorized into nine groups covering various aspects of geospatial QA.

Updated 6/30/2024
huggingface

Description

GeoQuestions1089 Dataset Overview

Basic Information

  • License: CC BY 4.0
  • Task Category: Question Answering System
  • Language: English
  • Data Size: 1K < n < 10K

Dataset Description

GeoQuestions1089 is a crowdsourced geospatial QA dataset containing 1,089 sets of natural‑language questions, SPARQL/GeoSPARQL queries, and their answers, targeting the YAGO2geo knowledge graph.

Dataset Structure

The dataset is divided into two parts:

  • GeoQuestions_c: 1,017 entries, questions are free of grammatical, syntactic, and spelling errors.
  • GeoQuestions_w: 72 entries, questions contain grammatical, syntactic, or spelling errors.

Dataset Versions

  • Current Version: 1.1
  • Version 1.1 Updates:
    • Unified query format and variable naming
    • Fixed natural‑language case issues
    • Corrected query classification errors
    • Replaced stSPARQL functions with GeoSPARQL functions
    • Improved query correctness
    • Replaced erroneous triples

Dataset Classification

Questions are grouped into nine categories:

  1. Inquiry about the attribute or spatial property of a feature
  2. Inquiry whether a feature has a geospatial relation with another (or multiple) feature(s)
  3. Inquiry about the geospatial relation between a feature of a given class and another feature
  4. Inquiry about the geospatial relation between a feature of a given class and any other class feature
  5. Inquiry about the geospatial relation between a feature of a given class and an unspecified feature, where one or both are also related to a specifically named feature
  6. Similar to C, D, E but with more subject features and/or spatial features
  7. Questions involving quantity and aggregation
  8. Questions containing superlative or comparative forms
  9. Questions involving quantity, aggregation, and superlative/comparative forms

Category Distribution

CategoryGeoQuestions1089_cGeoQuestions1089_w
A17316
B13911
C17614
D221
E1386
F242
G17411
H1459
I262

Benchmarks

The dataset is used to evaluate two QA engines:

  • GeoQA2
  • Hamzei et al. engine

Evaluation Results

GeoQA2

CategoryExecutable Queries (C)Correct Answers (C)Correct* (C)Executable Queries (W)Correct Answers (W)Correct* (W)
A83.81%50.86%60.68%75.00%50.00%66.67%
B74.82%60.43%80.76%81.81%45.45%55.56%
C81.25%45.45%55.94%85.71%50.00%58.34%
D54.54%9.09%16.67%100.00%0.00%0.00%
E76.08%24.63%32.38%50.00%33.33%66.67%
F58.33%25.00%42.85%50.00%0.00%0.00%
G73.56%33.33%45.31%36.36%0.00%0.00%
H66.89%18.62%27.83%66.67%0.00%0.00%
I80.76%19.23%23.80%50.00%0.00%0.00%
Total75.61%37.75%49.93%68.05%30.55%44.89%

Hamzei et al.

CategoryExecutable Queries (C)Correct Answers (C)Correct* (C)Executable Queries (W)Correct Answers (W)Correct* (W)
A82.08%23.12%28.16%93.75%6.25%6.67%
B94.96%53.23%56.06%100.00%54.54%54.54%
C81.81%26.13%31.94%100.00%14.28%14.28%
D81.81%4.54%5.55%100.00%0.00%0.00%
E92.75%6.52%7.03%83.34%0.00%0.00%
F62.50%12.50%20.00%90.90%0.00%0.00%
G80.45%10.34%12.85%100.00%0.00%0.00%
H77.93%26.89%34.51%77.78%0.00%0.00%
I84.61%7.96%9.09%50.00%0.00%0.00%
Total83.97%22.81%27.28%93.05%12.50%13.43%

Materialization and Translators

To improve query execution time, relationships between certain entities in the YAGO2geo KG were pre‑computed and materialized.

RDF Store

Experiments were run on GraphDB to generate gold answers and generated query answers.

License

The dataset follows the CC0 Attribution 4.0 International license.

AI studio

Generate PPTs instantly with Nano Banana Pro.

Generate PPT Now

Access Dataset

Login to Access

Please login to view download links and access full dataset details.

Topics

Geospatial QA
Natural Language Processing

Source

Organization: huggingface

Created: 6/30/2024

Power Your Data Analysis with Premium AI Models

Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.

Enjoy a free trial and save 20%+ compared to official pricing.