Skip to content

khanya88/Python-project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 

Repository files navigation

Report on Credit Card Fraud Detection Analysis Introduction This report summarizes the findings from the analysis of a credit card fraud dataset. The primary goal was to clean the dataset, perform exploratory data analysis (EDA), and visualize key insights related to fraudulent transactions.

Data Overview The dataset consists of transaction records, where each transaction is labeled as either fraudulent or non-fraudulent. The key columns include:

Amount: The transaction amount. IsFraud: A binary indicator of whether the transaction is fraudulent (1) or not (0).

Data Cleaning Process Loading the Dataset: The dataset was loaded into a Pandas DataFrame. Missing Values Check: The analysis identified any missing values across columns. print("\nMissing values in each column:") print(df.isnull().sum())

Removing Duplicates: Duplicate entries were removed to ensure data integrity. df = df.drop_duplicates()

Forward Filling Missing Values: Any remaining missing values were filled using the forward fill method. df.ffill(inplace=True)

Data Manipulation Filtering Fraudulent Transactions: The DataFrame was filtered to extract only fraudulent transactions. fraudulent_df = df[df['IsFraud'] == 1]

Statistical Analysis Dataset Shape: The shape of the dataset was examined, indicating the number of transactions and features. python Copy code print("Shape of the dataset:", df.shape) Data Types: The data types for each column were reviewed to ensure they were appropriate for analysis. python Copy code print("\nData types:") print(df.dtypes) Summary Statistics: Descriptive statistics provided insights into the distribution of transaction amounts and other numerical features. print("\nSummary statistics:") print(df.describe())

Data Visualization Distribution of Transaction Amounts: A histogram with a Kernel Density Estimate (KDE) was created to visualize the distribution of transaction amounts. plt.figure(figsize=(10, 6)) sns.histplot(df['Amount'], bins=30, kde=True, color="purple") plt.title('Distribution of Transaction Amounts', color="green") plt.xlabel('Transaction Amount', color="green") plt.ylabel('Frequency', color="green") plt.show() Interpretation: The histogram shows that most transactions fall below a certain amount, with a few high-value transactions indicating potential outliers.

Count of Fraudulent vs Non-Fraudulent Transactions: A count plot was used to compare the number of fraudulent and non-fraudulent transactions. plt.figure(figsize=(8, 5)) ax = sns.countplot(x='IsFraud', data=df, palette=["#FF9999", "#66B3FF"]) plt.title('Count of Fraudulent vs Non-Fraudulent Transactions', color="blue") plt.xlabel('Is Fraud', color="blue") plt.ylabel('Count', color="blue") for p in ax.patches: ax.annotate(f'{int(p.get_height())}', (p.get_x() + p.get_width() / 2., p.get_height()), ha='center', va='baseline', color='black', fontsize=12) plt.xticks(ticks=[0, 1], labels=['Non-Fraudulent', 'Fraudulent']) plt.show() Interpretation: The count plot reveals a significant imbalance between fraudulent and non-fraudulent transactions, with a considerably higher count of non-fraudulent transactions.

Conclusion The analysis provided valuable insights into the dataset, highlighting the characteristics of fraudulent transactions. The data cleaning process ensured data quality, and the exploratory data analysis revealed patterns that can inform further predictive modeling efforts.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages