JUHE API Marketplace
DATASET
Open Source Community

PoeTree

PoeTree is a standardized poetry‑corpus collection, containing over 300,000 poems and covering nine languages (Czech, English, French, German, Hungarian, Italian, Portuguese, Spanish, and Russian). Each corpus has been deduplicated, enriched with universal dependencies, provides additional metadata, and is converted into a unified JSON structure.

Updated 1/17/2024
github

Description

Dataset Overview

poetRee is an R package that fetches curated poetry data from the PoeTree API. PoeTree is a standardized collection comprising over 300,000 poems across nine languages (Czech, English, French, German, Hungarian, Italian, Portuguese, Slovenian, Spanish, and Russian). Each sub‑corpus has been deduplicated, enriched with universal dependency relations, includes extra metadata, and is transformed into a uniform JSON format.

Dataset Contents

  • Metadata: Provides a summary for each sub‑corpus, including ISO language codes.
  • Author Information: Detailed information for all authors present in the corpora.
  • Source Information: Bibliographic source details for all entries, supporting author IDs.
  • Poem Information: All poem records for a given author ID (or vector of author IDs).
  • Text Information: Text and annotations for a specified poem ID, supporting multiple output formats.

Usage

  • Installation: Install via devtools::install_github("perechen/poetRee").
  • Citation: When using the PoeTree dataset, cite the associated dataset and publications.

Examples

  • Metadata Example: Shows statistics such as number of authors, poems, and lines per corpus.
  • Author Example: Lists detailed author information for a specific corpus (e.g., Czech).
  • Source Example: Shows source details for a given corpus and author ID.
  • Poem Example: Displays poem details for a specific corpus and author ID.
  • Text Example: Shows the text of a particular poem ID in various output formats.

AI studio

Generate PPTs instantly with Nano Banana Pro.

Generate PPT Now

Access Dataset

Login to Access

Please login to view download links and access full dataset details.

Topics

Multilingual Literature
Corpus

Source

Organization: github

Created: 12/22/2023

Power Your Data Analysis with Premium AI Models

Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.

Enjoy a free trial and save 20%+ compared to official pricing.