# phishing_dataset Dataset Overview ## Dataset Composition - Collected 500 phishing sites sourced from PhishTank. - Collected 500 legitimate sites sourced from Alexa. - The dataset is split with a 70%/30% train‑test ratio. ## Feature Description ### URL Features 1. **Domain similarity**: Similarity between the accessed website's domain and the domain of URLs obtained from Alexa or PhishTank, computed using the Ratcliff‑Obershelp algorithm. 2. **URL length**: Number of characters in the URL. 3. **HTTP protocol type**: Standard (0) or secure (1). 4. **Number of '.' characters**: Count of dot symbols in the URL. 5. **Number of '/' characters**: Count of slash symbols in the URL. 6. **Number of '//' sequences**: Count of double‑slash symbols in the URL. 7. **Number of '-' characters**: Count of hyphen symbols. 8. **Number of '_' characters**: Count of underscore symbols. 9. **Number of '=' characters**: Count of equal signs. 10. **Number of '(' and ')' characters**: Count of parentheses. 11. **Number of '{' and '}' characters**: Count of curly braces. 12. **Number of '[' and ']' characters**: Count of square brackets. 13. **Number of '<' and '>' characters**: Count of angle brackets. 14. **Number of '~' characters**: Count of tilde symbols. 15. **Number of '*' characters**: Count of asterisks. 16. **Number of '+' characters**: Count of plus signs. 17. **Presence of '@' symbol**: Whether the URL contains '@' (1 = yes, 0 = no). 18. **Presence of IP address**: Whether the URL contains an IP address (1 = yes, 0 = no). ### HTML Features 19. **Number of tags**: Count of tags used to create hyperlinks or anchor links. 20. **Number of tags**: Count of tags used for various form elements. 21. **Number of

Dataset Hub

Browse by Category

Phishing and Benign URLs Dataset

Falco-Alerts-Dataset-with-APT-attacks

CICIDS2018

Jetlime/NF-CSE-CIC-IDS2018-v2

CSIC 2010 Dataset

Dark Web Datasets

pAILabs/base-security-qa

phishing_dataset

Malicious URL v5

NSL-KDD

NSL-KDD

Acti

CICIDS2017

autogenCTF/CTFAIA

Cybersecurity Attacks Analysis

CSE-CIC-IDS2018 Dataset

Jetlime/NF-UNSW-NB15-v2

SQL Injection Jailbreak Dataset

bnsapa/cybersecurity-ner