Genuine and Counterfeit Bank Note Detection using Machine Learning

Harsha S
4 min readJun 23, 2021

--

Classifying the Bank Notes by the application of Machine Learning Algorithms

Introduction

Banknotes are one of the most important assets of a country. Some miscreants introduce fake notes which bear a resemblance to original note to create discrepancies of the money in the financial market.

Fake notes are created with high precision, hence there is need for an efficient algorithm which accurately predicts whether a banknote is genuine or not.

Therefore, I decided to apply some machine learning models to figure out the Genuine and Counterfeit Bank Notes.

For this project, I used UCL’s Bank Note Authentication dataset to build various classification models.

So let’s now build the Model!

Import the necessary libraries

import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline

from sklearn.preprocessing import StandardScaler

from sklearn.model_selection import train_test_split
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import KFold
from sklearn.model_selection import GridSearchCV


from sklearn.naive_bayes import GaussianNB
from sklearn.neighbors import KNeighborsClassifier


from sklearn.metrics import classification_report
from sklearn.metrics import confusion_matrix
from sklearn.metrics import roc_auc_score
from sklearn.metrics import roc_curve
from sklearn.metrics import accuracy_score
from sklearn.metrics import cohen_kappa_score


from sklearn.ensemble import AdaBoostClassifier
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.ensemble import StackingClassifier

Read the Data

df = pd.read_csv('BankNote_Authentication.csv')
df.head()

Shape of the Data and Data Types

df.shape

Below provides a list of the five variables in the dataset.

Variance of Wavelet Transformed image (continuous).

Skewness of Wavelet Transformed image (continuous).

Kurtosis of Wavelet Transformed image (continuous).

Entropy of image (continuous).

Class (integer)

df.dtypes

Check for Missing Values

Total = df.isnull().sum().sort_values(ascending=False)Percent = (df.isnull().sum()*100/df.isnull().count()).sort_values(ascending=False)missing_data = pd.concat([Total, Percent], axis = 1, keys = ['Total', 'Percentage of Missing Values'])

missing_data

There are no missing values in the dataset.

Check for Outliers in the Dataset

fig,ax =  plt.subplots(1,4, figsize=(15,5))for variable, subplot in zip(df.columns,ax.flatten()):
z = sns.boxplot(x=df[variable],orient = 'h',whis=1.5, ax =subplot)
z.set_xlabel(variable,fontsize =15)

There are a few outliers in the curtosis and entropy attributes.

Pie chart count of Genuine vs Forged Notes

df_target = df['class'].copy()
df_target.value_counts()
plt.figure(figsize=(10, 10))
plt.pie(df_target.value_counts(), startangle=90, labels=["GENUINE", "FORGED"], autopct='%.2f%%', colors=['cornflowerblue', 'lightcoral'])
plt.title("Class Distribution")
plt.show()

Split the Data into Features and Target Variable

X = df.drop('class', axis = 1)
y = df['class']

Standardize the Data

Standardization is essential to bring all the attributes on the same scale, for model building.

from sklearn.preprocessing import StandardScaler
ss = StandardScaler()
X = ss.fit_transform(X)

Split the data into Train and Test set

X_train, X_test, y_train, y_test = train_test_split(X, y, train_size=0.7, test_size = 0.3,random_state = 10)

Let’s now build the models

  1. Logistic Regression
logreg = sm.Logit(y_train, X_train).fit()y_pred_prob = logreg.predict(X_test)y_pred = [ 0 if x < 0.5 else 1 for x in y_pred_prob]classification_report(y_test, y_pred)

2. Naive Bayes

from sklearn.naive_bayes import GaussianNBgnb = GaussianNB()gnb_model = gnb.fit(X_train, y_train)y_pred = gnb_model.predict(X_test)classification_report(y_test, y_pred)

3. K -N Neighbors

from sklearn.neighbors import KNeighborsClassifierknn_classification = KNeighborsClassifier(n_neighbors = 3)knn_model = knn_classification.fit(X_train2, y_train2)y_pred = knn_model.predict(X_test)classification_report(y_test, y_pred)

4. Decision Trees

from sklearn.tree import DecisionTreeClassifierdt_model  = DecisionTreeClassifier(random_state=1)dt_model.fit(X_train, y_train)y_pred = model1.predict(X_test)classification_report(y_test, y_pred)

5. Random Forest

from sklearn.ensemble import RandomForestClassifierrf_model = RandomForestClassifier(random_state=1)rf_model.fit(X_train, y_train)y_pred = rf_model.predict(X_test)classification_report(y_test, y_pred)

6. AdaBoost

from sklearn.ensemble import AdaBoostClassifierada_model = AdaBoostClassifier(random_state=1)ada_model.fit(X_train, y_train)y_pred = ada_model.predict(X_test)classification_report(y_test, y_pred)

Except for the Naive Bayes model, all other models show accuracies higher than 95%. Maybe there is overfitting in the models.

Thanks for reading!

Connect with me on Linkedin: https://www.linkedin.com/in/harshavardhans198/

Sign up to discover human stories that deepen your understanding of the world.

--

--

Harsha S
Harsha S

Written by Harsha S

NLP Engineer | I love to write about AI in beginner way

No responses yet

Write a response