Can we predict which online shoppers are likely to make impulsive purchases โ and what actually drives them? We analyzed three real-world datasets across behavioral, psychological, and platform dimensions to find out.
Impulse buying drives a significant portion of e-commerce revenue โ but it's notoriously hard to predict and even harder to influence strategically. Companies spend millions on promotions hoping to trigger impulsive purchases, often without understanding who actually buys impulsively or why.
We approached this as a real business problem: if we could build a model that reliably predicts impulsive buyers, companies could personalize promotions, optimize conversion rates, and design better digital experiences โ all grounded in data rather than guesswork.
Rather than relying on a single dataset, we analyzed impulsive buying from three distinct perspectives โ behavioral, psychological, and platform-driven โ to build a more complete picture.
~3,900 records of purchase history, demographics, and discount usage. Used to predict behavioral impulsivity from observable actions.
Behavioral306 respondents answered Likert-scale questions on self-control, happiness, social influence, and promotion sensitivity.
PsychologicalSurvey data on platform-driven impulsive buying behavior among TikTok users โ scarcity, trust, hedonic motivation.
Platform-DrivenNone of the datasets came with a pre-labeled "impulsive buyer" column. A core part of this project was engineering meaningful target variables from scratch โ which required thinking carefully about what impulsive behavior actually looks like in data.
For the behavioral dataset, a shopper was labeled impulsive if they shopped monthly or more (or above-average prior purchases) AND used a discount or promo code โ capturing both frequency and incentive-triggered buying. For survey datasets, we computed mean Likert scores and applied a 3.5 threshold.
Converted categorical frequency strings to numeric monthly equivalents. Aggregated Likert items into psychological constructs to reduce multicollinearity. One-hot encoded categorical variables. Dropped data leakage columns carefully.
Built KNN, Logistic Regression (full, forward, backward selection), Decision Tree, and Random Forest classifiers. Used GridSearchCV with 5-fold cross-validation, optimizing for F1 score โ because missing an impulsive buyer (false negative) carries the highest business cost.
Evaluated not just model performance, but what the results tell us about impulsive buyer behavior โ then connected those insights to concrete marketing and segmentation recommendations.
The behavioral dataset was by far the strongest performer โ actions leave visible traces that models can learn from. Psychological data proved much harder to model, largely because self-reported survey responses are noisy and impulsivity is often unconscious.
Behavioral Dataset Results
| Model | F1 (Test) | Accuracy | Precision | Recall |
|---|---|---|---|---|
| Logistic Regression (Forward) | 0.75 | 0.83 | 0.76 | 0.74 |
| Decision Tree (Tuned) | 0.78 | 0.87 | 0.98 | 0.65 |
| Random Forest (Tuned) | 0.78 | 0.87 | 0.97 | 0.65 |
| KNN (Tuned) | 0.69 | 0.82 | 0.82 | 0.59 |
Psychological Dataset Results (Mendeley)
| Model | F1 (Test) | Accuracy | Precision | Recall |
|---|---|---|---|---|
| Logistic Regression (Backward) | 0.55 | 0.79 | 0.69 | 0.46 |
| Decision Tree (Tuned) | 0.52 | 0.76 | 0.57 | 0.47 |
| KNN (Tuned) | 0.46 | 0.73 | 0.53 | 0.41 |
Behavioral signals โ purchase frequency, subscription status, promo code usage โ are reliably predictive. Psychological data tells you the motivation behind the behavior but is harder to classify cleanly.
Across the behavioral dataset, subscription status was the most important feature in Random Forest models. Repeat, subscribed customers are the most impulse-prone group by a significant margin.
In the Mendeley psychological dataset, promotion sensitivity was the strongest predictor. Low self-control and positive emotions (happiness) also consistently increased impulsive buying tendency.
Low recall across psychological models suggests impulsivity isn't a clean binary โ many buyers are only impulsive under specific emotional or contextual conditions, making them harder to classify from static data.
Two distinct impulsive buyer profiles emerged consistently across models:
"Behavior tells you who to target.
Psychology tells you how to influence them."