Back to datasets
Dataset assetOpen Source CommunityCorpusMultilingual Literature

PoeTree

PoeTree is a standardized poetry‑corpus collection, containing over 300,000 poems and covering nine languages (Czech, English, French, German, Hungarian, Italian, Portuguese, Spanish, and Russian). Each corpus has been deduplicated, enriched with universal dependencies, provides additional metadata, and is converted into a unified JSON structure.

Source
github
Created
Dec 22, 2023
Updated
Jan 17, 2024
Signals
196 views
Availability
Linked source ready
Overview

Dataset description and usage context

Dataset Overview

poetRee is an R package that fetches curated poetry data from the PoeTree API. PoeTree is a standardized collection comprising over 300,000 poems across nine languages (Czech, English, French, German, Hungarian, Italian, Portuguese, Slovenian, Spanish, and Russian). Each sub‑corpus has been deduplicated, enriched with universal dependency relations, includes extra metadata, and is transformed into a uniform JSON format.

Dataset Contents

  • Metadata: Provides a summary for each sub‑corpus, including ISO language codes.
  • Author Information: Detailed information for all authors present in the corpora.
  • Source Information: Bibliographic source details for all entries, supporting author IDs.
  • Poem Information: All poem records for a given author ID (or vector of author IDs).
  • Text Information: Text and annotations for a specified poem ID, supporting multiple output formats.

Usage

  • Installation: Install via devtools::install_github("perechen/poetRee").
  • Citation: When using the PoeTree dataset, cite the associated dataset and publications.

Examples

  • Metadata Example: Shows statistics such as number of authors, poems, and lines per corpus.
  • Author Example: Lists detailed author information for a specific corpus (e.g., Czech).
  • Source Example: Shows source details for a given corpus and author ID.
  • Poem Example: Displays poem details for a specific corpus and author ID.
  • Text Example: Shows the text of a particular poem ID in various output formats.
Need downstream help?

Pair the dataset with AI analysis and content workflows.

Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.

Explore AI studio