JUHE API Marketplace
DATASET
Open Source Community

WeiboHotListDataSet

We scraped posts that appeared on the Weibo hot list from 2022‑11‑25 to 2023‑03‑08 (only posts from the day they trended) together with their associated comments.

Updated 4/21/2023
github

Description

Dataset Overview

Dataset Name

  • WeiboHotListDataSet

Dataset Content

  • Collected Weibo hot‑list posts from 2022‑11‑25 to 2023‑03‑08 and their corresponding comment data.

Data Structure

  • archives.tar.gz
    • Contains daily Markdown files for all Weibo hot‑list entries between 2022‑11‑25 and 2023‑03‑08.
  • comments.tar.gz
    • Includes Excel files with posts and their associated comments for each hot‑list entry.

File Naming and Format

  • Post and comment files
    • Post files: KeywordName.xlsx
    • Comment files: KeywordName/KeywordName_cmtPostID.xlsx

Example

  • Post file example
    • Contains post ID, user ID, username, post time, post text, repost count, comment count, like count, etc.
  • Comment file example
    • Contains comment ID, timestamp, user ID, user nickname, user city, like count, reply count, comment content, etc.

Download Options

Data Source

License

  • Released under the MIT License.

AI studio

Generate PPTs instantly with Nano Banana Pro.

Generate PPT Now

Access Dataset

Login to Access

Please login to view download links and access full dataset details.

Topics

Social Media Analysis
Public Opinion Research

Source

Organization: github

Created: 4/20/2023

Power Your Data Analysis with Premium AI Models

Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.

Enjoy a free trial and save 20%+ compared to official pricing.