Back to datasets
Dataset assetOpen Source CommunitySocial Media AnalysisPublic Opinion Research

WeiboHotListDataSet

We scraped posts that appeared on the Weibo hot list from 2022‑11‑25 to 2023‑03‑08 (only posts from the day they trended) together with their associated comments.

Source
github
Created
Apr 20, 2023
Updated
Apr 21, 2023
Signals
384 views
Availability
Linked source ready
Overview

Dataset description and usage context

Dataset Overview

Dataset Name

  • WeiboHotListDataSet

Dataset Content

  • Collected Weibo hot‑list posts from 2022‑11‑25 to 2023‑03‑08 and their corresponding comment data.

Data Structure

  • archives.tar.gz
    • Contains daily Markdown files for all Weibo hot‑list entries between 2022‑11‑25 and 2023‑03‑08.
  • comments.tar.gz
    • Includes Excel files with posts and their associated comments for each hot‑list entry.

File Naming and Format

  • Post and comment files
    • Post files: KeywordName.xlsx
    • Comment files: KeywordName/KeywordName_cmtPostID.xlsx

Example

  • Post file example
    • Contains post ID, user ID, username, post time, post text, repost count, comment count, like count, etc.
  • Comment file example
    • Contains comment ID, timestamp, user ID, user nickname, user city, like count, reply count, comment content, etc.

Download Options

Data Source

License

  • Released under the MIT License.
Need downstream help?

Pair the dataset with AI analysis and content workflows.

Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.

Explore AI studio