๐Ÿ›’
Machine Learning ยท Python ยท Data Mining

Predicting
Impulsive
Online Shoppers

Can we predict which online shoppers are likely to make impulsive purchases โ€” and what actually drives them? We analyzed three real-world datasets across behavioral, psychological, and platform dimensions to find out.

Course
MISM3515 Data Mining
Team
Amber, Member 2, Member 3
Models
KNN ยท Logistic Regression ยท Decision Tree ยท Random Forest
Datasets
3 real-world datasets

Why does any of this matter?

Impulse buying drives a significant portion of e-commerce revenue โ€” but it's notoriously hard to predict and even harder to influence strategically. Companies spend millions on promotions hoping to trigger impulsive purchases, often without understanding who actually buys impulsively or why.

We approached this as a real business problem: if we could build a model that reliably predicts impulsive buyers, companies could personalize promotions, optimize conversion rates, and design better digital experiences โ€” all grounded in data rather than guesswork.

Three datasets, three angles.

Rather than relying on a single dataset, we analyzed impulsive buying from three distinct perspectives โ€” behavioral, psychological, and platform-driven โ€” to build a more complete picture.

๐Ÿ›๏ธ

Consumer Shopping Behavior

~3,900 records of purchase history, demographics, and discount usage. Used to predict behavioral impulsivity from observable actions.

Behavioral
๐Ÿง 

Mendeley E-Paylater Survey

306 respondents answered Likert-scale questions on self-control, happiness, social influence, and promotion sensitivity.

Psychological
๐Ÿ“ฑ

Vietnamese TikTok Shopping

Survey data on platform-driven impulsive buying behavior among TikTok users โ€” scarcity, trust, hedonic motivation.

Platform-Driven

No explicit labels? We built our own.

None of the datasets came with a pre-labeled "impulsive buyer" column. A core part of this project was engineering meaningful target variables from scratch โ€” which required thinking carefully about what impulsive behavior actually looks like in data.

01

Engineer the Target Variable

For the behavioral dataset, a shopper was labeled impulsive if they shopped monthly or more (or above-average prior purchases) AND used a discount or promo code โ€” capturing both frequency and incentive-triggered buying. For survey datasets, we computed mean Likert scores and applied a 3.5 threshold.

02

Clean & Prepare the Data

Converted categorical frequency strings to numeric monthly equivalents. Aggregated Likert items into psychological constructs to reduce multicollinearity. One-hot encoded categorical variables. Dropped data leakage columns carefully.

03

Train & Tune Four Models

Built KNN, Logistic Regression (full, forward, backward selection), Decision Tree, and Random Forest classifiers. Used GridSearchCV with 5-fold cross-validation, optimizing for F1 score โ€” because missing an impulsive buyer (false negative) carries the highest business cost.

04

Translate Results into Strategy

Evaluated not just model performance, but what the results tell us about impulsive buyer behavior โ€” then connected those insights to concrete marketing and segmentation recommendations.

Behavioral data wins.

The behavioral dataset was by far the strongest performer โ€” actions leave visible traces that models can learn from. Psychological data proved much harder to model, largely because self-reported survey responses are noisy and impulsivity is often unconscious.

Behavioral Dataset Results

ModelF1 (Test)AccuracyPrecisionRecall
Logistic Regression (Forward)0.750.830.760.74
Decision Tree (Tuned)0.780.870.980.65
Random Forest (Tuned)0.780.870.970.65
KNN (Tuned)0.690.820.820.59

Psychological Dataset Results (Mendeley)

ModelF1 (Test)AccuracyPrecisionRecall
Logistic Regression (Backward)0.550.790.690.46
Decision Tree (Tuned)0.520.760.570.47
KNN (Tuned)0.460.730.530.41

What we actually learned.

1

Behavior predicts. Psychology explains.

Behavioral signals โ€” purchase frequency, subscription status, promo code usage โ€” are reliably predictive. Psychological data tells you the motivation behind the behavior but is harder to classify cleanly.

2

Subscription status is the #1 signal.

Across the behavioral dataset, subscription status was the most important feature in Random Forest models. Repeat, subscribed customers are the most impulse-prone group by a significant margin.

3

Promotion sensitivity drives psychology.

In the Mendeley psychological dataset, promotion sensitivity was the strongest predictor. Low self-control and positive emotions (happiness) also consistently increased impulsive buying tendency.

4

Impulsivity exists on a spectrum.

Low recall across psychological models suggests impulsivity isn't a clean binary โ€” many buyers are only impulsive under specific emotional or contextual conditions, making them harder to classify from static data.

Two distinct impulsive buyer profiles emerged consistently across models:

Predictable Impulsives

  • Frequent buyers with consistent purchase history
  • Highly responsive to promotions and discounts
  • Subscription-heavy โ€” already engaged
  • Easy to identify and target proactively

Situational Impulsives

  • Irregular buyers with no clear behavioral history
  • Triggered by mood, emotion, or context
  • Platform and social influence dependent
  • Timing-dependent โ€” harder to predict in advance

"Behavior tells you who to target.
Psychology tells you how to influence them."

Built with

Python pandas scikit-learn matplotlib seaborn KNN Logistic Regression Decision Tree Random Forest GridSearchCV Feature Engineering Data Cleaning Business Strategy
View Code on GitHub โ† Back to Projects