Back to datasets
Dataset assetOpen Source CommunityCybersecurityURL Analysis
Malicious URL v5
This dataset is intended for training and testing malicious URL detectors. It contains multiple URLs together with detailed attributes such as domain name, registrar, registrar address, organization, Alexa traffic rank, etc.
Source
github
Created
Jul 18, 2020
Updated
Nov 4, 2020
Signals
194 views
Availability
Linked source ready
Overview
Dataset description and usage context
Dataset Overview
Dataset Content
- Purpose: Used for training and testing malicious URL detectors.
- Data Structure:
- Column Information:
- S.NO
- URL
- Property
- Name
- Organisation
- Address
- City
- State
- Zipcode
- Country
- E‑mails
- Domain
- Alexa Rank
- Registrar
- time
- Example Records:
- Example 1:
- URL: https://www.airtelxstream.in/search
- Property: Legitimate
- Domain: airtelxstream.in
- Alexa Rank: 5793
- Registrar: GoDaddy.com LLC
- Example 2:
- URL: https://www.airtelxstream.in/livetv-channels/sony-sab/mwtv_livetvchannel_347
- Property: Legitimate
- Domain: airtelxstream.in
- Alexa Rank: 5793
- Registrar: GoDaddy.com LLC
- Example 3:
- URL: https://myjiocare.com/sony-liv-premium-account-free/
- Property: Legitimate
- Domain: MYJIOCARE.COM
- Alexa Rank: 2272473
- Registrar: BigRock Solutions Ltd
- Example 4:
- URL: https://www.youtube.com/watch?v=dnbkysr3hoo
- Property: Legitimate
- Domain: YOUTUBE.COM
- Alexa Rank: 2
- Registrar: MarkMonitor Inc.
- Example 1:
- Column Information:
Dataset Applications
- Function: Predict the legitimacy of URLs and detect phishing assets.
- Data Acquisition: Collects dynamic and sensitive URL attributes such as domain, registrar, registrar address, organization, Alexa traffic rank, etc.
Phishing Webpage Examples
- Includes screenshots of phishing webpages mimicking well‑known brands such as WHO, the UK government, Chase Bank, Netflix, Adobe, Facebook, Microsoft, PayPal, Yahoo, etc.
Need downstream help?
Pair the dataset with AI analysis and content workflows.
Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.