Market Basket Analysis in Python using Apriori algorithm to Recommend Movies

Harsha S
4 min readJul 12, 2021

Market Basket Analysis is a method applied with the aid of transactional data to recognize patron’s shopping behavior from their shops. The result of the powerful evaluation may improve the dealer’s profitability, excellent of service, and consumer pride.

If it is regarded that clients who buy one product are probably to buy another product, it is viable for shops to marketplace those products together, or to make the purchasers of a target possibilities for the second one product. If clients who purchase diapers are in all likelihood to purchase beer, they’ll be much more likely to if beer is displayed simply beside a diaper aisle.

Real world examples of Market Basket Analysis:

Retail:

  • What items are purchased together, purchased sequentially, and purchased by season?
  • Determine product placement and promotion optimization (for instance, combining product incentives).

Telecommunications:

  • Determine what services are being utilized and what packages customers are purchasing.
  • Use this knowledge to direct marketing efforts at customers who are more likely to follow the same path.

Banking/ Insurance:

  • Analyze credit card purchases of customers to build profiles for fraud detection purposes and cross-selling opportunities.
  • Build profiles to detect insurance claim fraud.

Medical:

  • Comorbid conditions and symptom analysis, with which a profile of illness can be better identified.
  • Reveal biologically relevant associations between different genes.

Metrics of Market Basket Analysis:

Support:

Percentage of orders that contain the item set. Suppose, there are 11 orders in total, and {bread, butter} occurs in 3 of them.

Support = Freq(X,Y)/N

Support = 3/11 = 0.27

Confidence:

Given two items, X and Y, confidence measures the percentage of times that item Y is purchased, given that item X was purchased. This is expressed as:

Confidence = Freq(X,Y)/Freq(X)

Confidence values range from 0 to 1, where 0 indicates that Y is never purchased when X is purchased, and 1 indicates that Y is always purchased whenever X is purchased.

Lift:

Lift is used to measure how much more often the rule X->Y occur together than we would expect if they were statistically independent. If A and C are independent, the Lift score will be exactly 1.

In summary, lift can take the following values:

  • Lift = 1; implies no relationship between X and Y (i.e., X and Y occur together only by chance)
  • Lift > 1; implies that there is a positive relationship between X and Y (i.e., X and Y occur together more often than random)
  • Lift < 1; implies that there is a negative relationship between X and Y (i.e., X and Y occur together less often than random)

Implementing the Market Based Analysis in Python:

import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline
movies = pd.read_csv('movies.csv')
movies.head()
ratings = pd.read_csv('ratings.csv')
ratings.head()
movie_names = movies[['movieId', 'title']]
combined_movies_data = pd.merge(ratings, movie_names, on='movieId')
combined_movies_data = combined_movies_data[['userId','title']]
combined_movies_data.head()
onehot = combined_movies_data.pivot_table(index=’userId’, columns=’title’, aggfunc=len, fill_value=0)onehot = onehot.fillna(0)
onehot[onehot>1]=1
onehot.head()
onehot.shape
from mlxtend.frequent_patterns import association_rules, apriorifrequent_itemsets = apriori(onehot, min_support = 0.15, max_len = 2, use_colnames=True)rules = association_rules(frequent_itemsets)
rules
rules.sort_values(by = 'lift' ,ascending = False).head(10)

Inference:

By performing the association analysis using the apriori algorithm on the movies dataset, we were able to find out the movies preferred by certain users.

As you can make out from the last output,

  • Users who watches The Lord of the Rings: The Return of the King is 3.18 times more likely to watch The Lord of the Rings: The Two Towers, since the lift is high for these two movies.
  • Similarly, a user who watches Monsters is 3.14 times more likely to watch Shrek, as the lift for the two movies is equal to 3.14.

Thanks for your time!

Connect with me on Linkedin: https://www.linkedin.com/in/harshavardhans198/

Sign up to discover human stories that deepen your understanding of the world.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

Harsha S
Harsha S

Written by Harsha S

NLP Engineer | I love to write about AI in beginner way

No responses yet

Write a response