Skip to content

Samuel-Njoroge/phishing-emails-analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Phishing Emails Analysis.

Phishing is the practice of sending fraudulent communications that appear to come from a legitimate and reputable source, usually through email and text messaging. The attacker's goal is to steal money, gain access to sensitive data and login information, or to install malware on the victim's device. Phishing is a dangerous, damaging, and an increasingly common type of cyberattack.

Source

This project involves an analysis on emails classified as 'phishing'.

Data source : https://www.kaggle.com/datasets/naserabdullahalam/phishing-email-dataset

Data Source Reference

Al-Subaiey, A., Al-Thani, M., Alam, N. A., Antora, K. F., Khandakar, A., & Zaman, S. A. U. (2024, May 19). Novel Interpretable and Robust Web-based AI Platform for Phishing Email Detection. ArXiv.org. https://arxiv.org/abs/2405.11619

Project Architecture.

phishing_emails

Skills & Tools

  • Python - Applied in the whole Data Cleaning process.
  • SQL - Applied in the Data Exploration phase.
  • Jupyter Notebooks - Used as the Data Cleaning environment.
  • PostgreSQL - Used as the Database Management System to handle the data.

Objectives

  • To find out the most frequent emails associated with phishing.
  • Identify the frequent phishing targets.
  • Identify the trend of phishing emails over time.
  • To find out the most frequent target time and day.
  • To identify the sender-receiver pattern in phishing.

1. What is the total observations in the phishing emails dataset?.

Query

SELECT 
	COUNT(*) AS total_observations 
FROM 
	public.fraud_data;

Results

1

2. Which emails are most frequently associated with phishing?

Query

SELECT 
	sender_email,
	COUNT(*) as total_sent
FROM 
	public.fraud_data
GROUP BY 
	sender_email
ORDER BY 
	total_sent DESC;

Results

2

3. Who are the most frequently associated with phishing emails?

Query

SELECT 
	sender_name,
	COUNT(*) as total 
FROM 
	public.fraud_data
GROUP BY 
	sender_name
ORDER BY 
	total DESC;

Results

3

4. Who are the most frequently targeted by phishing emails?

Query

SELECT 
	receiver_email,
	COUNT(*) as total_received
FROM 
	public.fraud_data
WHERE
	receiver_email != 'unknown@example.com'
GROUP BY 
	receiver_email
ORDER BY 
	total_received DESC
LIMIT 10;

Results

4

5. What is the trend of phishing emails over time?

Query

SELECT 
	date AS date_sent,
	COUNT(*) AS total_emails
FROM
	public.fraud_data
GROUP BY
	date
ORDER BY 
	total_emails DESC
LIMIT 10;

Results

5

6. Are there specific days when phishing emails are more likely to be sent?

Query

SELECT
	TO_CHAR(date, 'Day') AS day_of_week,
	COUNT(*) AS total_sent
FROM
	public.fraud_data
GROUP BY
	day_of_week
ORDER BY
	total_sent DESC;

Results

6

7. What are the most frequently used words as the subject in phishing emails?

Query

SELECT 
	subject
FROM
	public.fraud_data
WHERE 	
	subject LIKE '%Important%' OR
	subject LIKE '%Money%' OR
	subject LIKE '%Urgent%' OR
	subject LIKE '%action%';

Results

7

8. What are the sender-receiver pairs that occur most frequently?

Query

SELECT 
	sender_email,
	receiver_email,
	COUNT(*) AS total
FROM
	public.fraud_data
GROUP BY
	sender_email,
	receiver_email
ORDER BY
	total DESC
LIMIT 10;

Results

8

9. What is the most prone target time of the day?

Query

SELECT 
	CASE 
		WHEN EXTRACT(HOUR FROM time) = 0 THEN '12 AM'
		WHEN EXTRACT(HOUR FROM time) < 12 THEN CONCAT(EXTRACT(HOUR FROM time)::text, ' AM')
		WHEN EXTRACT(HOUR FROM time) = 12 THEN '12 PM'
		ELSE CONCAT((EXTRACT(HOUR FROM time) - 12)::text, ' PM')
	END AS hour_of_day,
	COUNT(*) AS total
FROM
	public.fraud_data
GROUP BY
	hour_of_day
ORDER BY
	total DESC;

Results

9

10. What is the most frequent target time and day?

Query

SELECT 
	EXTRACT(HOUR FROM time) AS hour_of_day,
	TO_CHAR(date, 'Day') AS day_of_week,
	COUNT(*) AS total
FROM
	public.fraud_data
GROUP BY
	hour_of_day,
	day_of_week
ORDER BY 
	total DESC 
LIMIT 10;

Results

10

Conclusions

  • You're more likely to receive a phishing email on Tuesday than any other day.
  • An email from USAA is likely to be used in phishing.
  • Urgent, Make Money, Important are likely to be used in phishing emails.
  • Phishing emails are likely to be sent between 12 PM and 1 PM.
  • Phishing emails are likely to contain lure words such as offers. Example : greatoffers@sendgreatoffers.com

Recommendations

  • Verify all emails before clicking any link.
  • Implement mandatory and regular phishing awareness training for all employees.
  • Develop and regularly update incident response plans specifically tailored for phishing attacks.
  • Enforce the use of multi-factor authentication (MFA) across all corporate email accounts and critical systems.
  • Enhance email filtering systems by using machine learning algorithms that can detect and flag phishing attempts more accurately.

About

Analysis of common Phishing emails

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published