Classifying the Bank Notes by the application of Machine Learning Algorithms

Introduction
Banknotes are one of the most important assets of a country. Some miscreants introduce fake notes which bear a resemblance to original note to create discrepancies of the money in the financial market.
Fake notes are created with high precision, hence there is need for an efficient algorithm which accurately predicts whether a banknote is genuine or not.
Therefore, I decided to apply some machine learning models to figure out the Genuine and Counterfeit Bank Notes.
For this project, I used UCL’s Bank Note Authentication dataset to build various classification models.
So let’s now build the Model!
Import the necessary libraries
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import KFold
from sklearn.model_selection import GridSearchCV
from sklearn.naive_bayes import GaussianNB
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import classification_report
from sklearn.metrics import confusion_matrix
from sklearn.metrics import roc_auc_score
from sklearn.metrics import roc_curve
from sklearn.metrics import accuracy_score
from sklearn.metrics import cohen_kappa_score
from sklearn.ensemble import AdaBoostClassifier
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.ensemble import StackingClassifier
Read the Data
df = pd.read_csv('BankNote_Authentication.csv')
df.head()

Shape of the Data and Data Types
df.shape

Below provides a list of the five variables in the dataset.
Variance of Wavelet Transformed image (continuous).
Skewness of Wavelet Transformed image (continuous).
Kurtosis of Wavelet Transformed image (continuous).
Entropy of image (continuous).
Class (integer)
df.dtypes

Check for Missing Values
Total = df.isnull().sum().sort_values(ascending=False)Percent = (df.isnull().sum()*100/df.isnull().count()).sort_values(ascending=False)missing_data = pd.concat([Total, Percent], axis = 1, keys = ['Total', 'Percentage of Missing Values'])
missing_data

There are no missing values in the dataset.
Check for Outliers in the Dataset
fig,ax = plt.subplots(1,4, figsize=(15,5))for variable, subplot in zip(df.columns,ax.flatten()):
z = sns.boxplot(x=df[variable],orient = 'h',whis=1.5, ax =subplot)
z.set_xlabel(variable,fontsize =15)

There are a few outliers in the curtosis and entropy attributes.
Pie chart count of Genuine vs Forged Notes
df_target = df['class'].copy()
df_target.value_counts()
plt.figure(figsize=(10, 10))
plt.pie(df_target.value_counts(), startangle=90, labels=["GENUINE", "FORGED"], autopct='%.2f%%', colors=['cornflowerblue', 'lightcoral'])
plt.title("Class Distribution")
plt.show()

Split the Data into Features and Target Variable
X = df.drop('class', axis = 1)
y = df['class']
Standardize the Data
Standardization is essential to bring all the attributes on the same scale, for model building.
from sklearn.preprocessing import StandardScaler
ss = StandardScaler()
X = ss.fit_transform(X)
Split the data into Train and Test set
X_train, X_test, y_train, y_test = train_test_split(X, y, train_size=0.7, test_size = 0.3,random_state = 10)
Let’s now build the models
- Logistic Regression
logreg = sm.Logit(y_train, X_train).fit()y_pred_prob = logreg.predict(X_test)y_pred = [ 0 if x < 0.5 else 1 for x in y_pred_prob]classification_report(y_test, y_pred)


2. Naive Bayes
from sklearn.naive_bayes import GaussianNBgnb = GaussianNB()gnb_model = gnb.fit(X_train, y_train)y_pred = gnb_model.predict(X_test)classification_report(y_test, y_pred)


3. K -N Neighbors
from sklearn.neighbors import KNeighborsClassifierknn_classification = KNeighborsClassifier(n_neighbors = 3)knn_model = knn_classification.fit(X_train2, y_train2)y_pred = knn_model.predict(X_test)classification_report(y_test, y_pred)


4. Decision Trees
from sklearn.tree import DecisionTreeClassifierdt_model = DecisionTreeClassifier(random_state=1)dt_model.fit(X_train, y_train)y_pred = model1.predict(X_test)classification_report(y_test, y_pred)


5. Random Forest
from sklearn.ensemble import RandomForestClassifierrf_model = RandomForestClassifier(random_state=1)rf_model.fit(X_train, y_train)y_pred = rf_model.predict(X_test)classification_report(y_test, y_pred)


6. AdaBoost
from sklearn.ensemble import AdaBoostClassifierada_model = AdaBoostClassifier(random_state=1)ada_model.fit(X_train, y_train)y_pred = ada_model.predict(X_test)classification_report(y_test, y_pred)


Except for the Naive Bayes model, all other models show accuracies higher than 95%. Maybe there is overfitting in the models.
Thanks for reading!
Connect with me on Linkedin: https://www.linkedin.com/in/harshavardhans198/