Back to datasets
Dataset assetOpen Source CommunitySocial Media AnalysisPublic Opinion Research
WeiboHotListDataSet
We scraped posts that appeared on the Weibo hot list from 2022‑11‑25 to 2023‑03‑08 (only posts from the day they trended) together with their associated comments.
Source
github
Created
Apr 20, 2023
Updated
Apr 21, 2023
Signals
384 views
Availability
Linked source ready
Overview
Dataset description and usage context
Dataset Overview
Dataset Name
- WeiboHotListDataSet
Dataset Content
- Collected Weibo hot‑list posts from 2022‑11‑25 to 2023‑03‑08 and their corresponding comment data.
Data Structure
- archives.tar.gz
- Contains daily Markdown files for all Weibo hot‑list entries between 2022‑11‑25 and 2023‑03‑08.
- comments.tar.gz
- Includes Excel files with posts and their associated comments for each hot‑list entry.
File Naming and Format
- Post and comment files
- Post files:
KeywordName.xlsx - Comment files:
KeywordName/KeywordName_cmtPostID.xlsx
- Post files:
Example
- Post file example
- Contains post ID, user ID, username, post time, post text, repost count, comment count, like count, etc.
- Comment file example
- Contains comment ID, timestamp, user ID, user nickname, user city, like count, reply count, comment content, etc.
Download Options
- Google Drive
- Shared link: Google Drive link
- Baidu Cloud
- Link: Baidu Cloud link
- Extraction code: i0lg
Data Source
- Sourced from justjavac/weibo-trending-hot-search.
License
- Released under the MIT License.
Need downstream help?
Pair the dataset with AI analysis and content workflows.
Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.