PREDICTING TAX EVASION USING MACHINE LEARNING: A STUDY OF E-COMMERCE TRANSACTIONS

Authors

  • Abdul Jaleel Mahesar
  • Ayaz Ali Wighio
  • Najma Imtiaz
  • Aadil Jamali
  • Yasir Nawaz
  • Uswa Urooj

Keywords:

Tax Evasion Detection, E- commerce Transactions Analysis, Anomaly Detection, Artificial Intelligence, Explainable AI

Abstract

This paper presents a novel machine learning framework for detection of tax evasion in e- commerce that is aimed at the rise of underreported sales and cross- border VAT fraud that has resulted in multibillion dollar revenue loss worldwide. However, due to the lack of such labeled e- commerce tax- evasion datasets, direct supervised learning is not possible, and so synthetic- data augmentation is adopted to simulate realistic transaction scenarios. Realistic attributes were generated using Python Faker library, and Statistical fidelity and embedding custom evasion pattern was preserved using the Synthetic Data Vault (SDV). The key indicators from which this model was built were produced alternatively during the process of feature engineering: declared_vs_actual_ratio, transaction_velocity, and tax_haven_flag that were aimed at detecting underreporting of fraudulent charges, excessive micro- transactions, andophysical mismatches, respectively. Other classifiers like XGBoost and LightGBM were trained as well as unsupervised detectors namely Isolation Forest and deep Autoencoders to mark anomalies without explicit labels. Probability estimates and anomaly scores from individual approaches were merged with a hybrid stacking ensemble to obtain final better robustness as compared to individual approaches. Study evaluate the performance of the hybrid model via stratified split 70/15/15, 5- fold cross validation and precision, recall, F1 score, and ROC AUC metrics, which shows that the hybrid model has AUC 0.885 and F1 score 0.830 on the  full feature set, while surpassing standalone models. SHAP and LIME were used to provide interpretability through feature-level explanations of flagged transactions. This end hand end pipeline enables scalable and interpretable e- commerce tax evasion detection solution, as well as provides the basis for real hand world deployment and potential studies in the future using real transaction data.

Downloads

Published

2025-04-26

How to Cite

Abdul Jaleel Mahesar, Ayaz Ali Wighio, Najma Imtiaz, Aadil Jamali, Yasir Nawaz, & Uswa Urooj. (2025). PREDICTING TAX EVASION USING MACHINE LEARNING: A STUDY OF E-COMMERCE TRANSACTIONS. Spectrum of Engineering Sciences, 3(4), 840–852. Retrieved from https://sesjournal.com/index.php/1/article/view/311