DATASET
Open Source Community
WeiboHotListDataSet
We scraped posts that appeared on the Weibo hot list from 2022‑11‑25 to 2023‑03‑08 (only posts from the day they trended) together with their associated comments.
Updated 4/21/2023
github
Description
Dataset Overview
Dataset Name
- WeiboHotListDataSet
Dataset Content
- Collected Weibo hot‑list posts from 2022‑11‑25 to 2023‑03‑08 and their corresponding comment data.
Data Structure
- archives.tar.gz
- Contains daily Markdown files for all Weibo hot‑list entries between 2022‑11‑25 and 2023‑03‑08.
- comments.tar.gz
- Includes Excel files with posts and their associated comments for each hot‑list entry.
File Naming and Format
- Post and comment files
- Post files:
KeywordName.xlsx - Comment files:
KeywordName/KeywordName_cmtPostID.xlsx
- Post files:
Example
- Post file example
- Contains post ID, user ID, username, post time, post text, repost count, comment count, like count, etc.
- Comment file example
- Contains comment ID, timestamp, user ID, user nickname, user city, like count, reply count, comment content, etc.
Download Options
- Google Drive
- Shared link: Google Drive link
- Baidu Cloud
- Link: Baidu Cloud link
- Extraction code: i0lg
Data Source
- Sourced from justjavac/weibo-trending-hot-search.
License
- Released under the MIT License.
AI studio
Generate PPTs instantly with Nano Banana Pro.
Generate PPT NowAccess Dataset
Login to Access
Please login to view download links and access full dataset details.
Topics
Social Media Analysis
Public Opinion Research
Source
Organization: github
Created: 4/20/2023
Power Your Data Analysis with Premium AI Models
Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.
Enjoy a free trial and save 20%+ compared to official pricing.