Back to datasets
Dataset assetOpen Source CommunityResearch MethodsAcademic Papers

Elise-hf/PwC

This dataset contains multiple features such as user ID, paper URL, arXiv ID, title, abstract, URL link, conference, authors, task, date, and methods. The dataset is split into a training set and a test set, with the training set comprising 149,495 samples and the test set comprising 37,108 samples. The total dataset size is 547,449,614 bytes.

Source
hugging_face
Created
Nov 28, 2025
Updated
Apr 18, 2023
Signals
98 views
Availability
Linked source ready
Overview

Dataset description and usage context

Dataset Overview

Dataset Features

  • uid: Data type is int64
  • paper_url: Data type is string
  • arxiv_id: Data type is string
  • title: Data type is string
  • abstract: Data type is string
  • url_abs: Data type is string
  • url_pdf: Data type is string
  • proceeding: Data type is string
  • authors: Data type is sequence:string
  • tasks: Data type is sequence:string
  • date: Data type is float64
  • methods: Data type is list, containing the following sub‑features:
    • code_snippet_url: Data type is string
    • description: Data type is string
    • full_name: Data type is string
    • introduced_year: Data type is int64
    • main_collection: Data type is struct, containing the following sub‑features:
      • area: Data type is string
      • description: Data type is string
      • name: Data type is string
      • parent: Data type is string
    • name: Data type is string
    • source_title: Data type is string
    • source_url: Data type is string
  • index_level_0: Data type is int64

Dataset Splits

  • train: Size 437,349,959 bytes, containing 149,495 samples
  • test: Size 110,099,655 bytes, containing 37,108 samples

Dataset Size

  • Download size: 183,963,479 bytes
  • Total size: 547,449,614 bytes
Need downstream help?

Pair the dataset with AI analysis and content workflows.

Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.

Explore AI studio