Back to datasets
Dataset assetOpen Source CommunitySocial Media AnalysisOnline Bullying Detection

BullyDataset

A Sina Weibo comment dataset specifically collected for cyberbullying detection, where comments are labeled as bullying if they contain gender discrimination, racial or regional insults, profanity or humiliation, factual distortion, expressions of violence, attacks on appearance or family members, repetitive negative comments, calls for others to join the attack, or imposing unwanted or insulting nicknames.

Source
github
Created
Jul 2, 2019
Updated
Jan 16, 2024
Signals
527 views
Availability
Linked source ready
Overview

Dataset description and usage context

BullyDataset Overview

Dataset Description

  • Source: Sina Weibo comment
  • Purpose: Specifically for cyberbullying detection

Label Definition

  • Bullying Comment: A Weibo comment that satisfies any of the following conditions:
    1. Uses gender‑discriminatory, racial or regional slurs.
    2. Uses abusive or insulting language to criticize others without reasonable justification.
    3. Clearly distorts facts or attempts to bias views on minority groups, making unfounded accusations.
    4. Expresses violent tendencies or curses toward minority groups.
    5. Contains attacks on a person’s appearance, body, or family members.
    6. Repeatedly posts negative comments, or calls on others to join the attack.
    7. Imposes an unwanted or insulting nickname on others.

Citation Information

  • Authors: Nijia Lu, Guohua Wu, Zhen Zhang, Yitao Zheng, Yizhi Ren, Kim‑Kwang Raymond Choo
  • Year: 2019
  • Paper Title: Cyberbullying Detection in Social Media Text Based on Character‑level Convolutional Neural Networks with Shortcuts
  • Contact: lunijia@hdu.edu.cn
Need downstream help?

Pair the dataset with AI analysis and content workflows.

Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.

Explore AI studio