หลักสูตรปริญญาโท สาขาวิชาสถิติประยุกต์ (AS) สาขาวิชาการวิเคราะห์ข้อมูลและวิทยาการข้อมูล (DADS) สาขาวิชาโลจิสติกส์อัจฉริยะและการจัดการโซ่อุปทาน (LSCM) สาขาวิชาวิทยาการคอมพิวเตอร์และระบบสารสนเทศ (CSIS) สาขาวิชาการจัดการวิเคราะห์ข้อมูลและเทคโนโลยีข้อมูล (MADT) สาขาวิชาบริหารเทคโนโลยีสารสนเทศ (ITM) สาขาวิชาการจัดการความเสี่ยงความมั่นคงทางไซเบอร์ (CYBER)
Cover GSAS

Working Papers

A Machine Learning Approach for Detecting HiddenCorruption Risks in Public Budget Utilization

Authors

Netnapit Rittisorn , Ekarat Rattagan

Graduate School

Graduate School of Applied Statistic, National Institute of Development Administration (NIDA), Thailand. email: 6710432001@stu.nida.ac.th

Graduate School of Applied Statistic, National Institute of Development Administration (NIDA), Thailand. email: ekarat@as.nida.ac.th

Abstract

Corruption in public procurement remains a critical challenge that undermines the integrity, efficiency, and transparency of public budget utilization. Traditional auditing mechanisms often struggle to uncover hidden corruption risks—those embedded within complex, large-scale financial transactions and not immediately visible through surface-level inspection. These risks may manifest as subtle anomalies in disbursement patterns or deviations from standard financial practices.

In the era of digital government, a machine learning (ML) approach offers powerful tools to enhance anomaly detection, uncover irregular spending patterns, and enable data-driven oversight. This study applies an ML-based approach to identify such hidden corruption risks by analyzing quarterly financial and procurement data from Thai government agencies. This study employs an ensemble of four unsupervised anomaly detection models—Isolation Forest, AutoEncoder, Long Short-Term Memory (LSTM), and Long Short-Term Memory AutoEncoder (LSTM-AE)—to leverage their complementary strengths. In addition to deploying these models jointly, their individual performance is also compared to evaluate detection accuracy across different algorithmic approaches. These models aim to uncover patterns that deviate significantly from the norm, revealing potential warning signs often undetectable through manual inspection.

Given the absence of verified ground-truth labels, a consensus-based labeling strategy is adopted: a project is considered a high-confidence anomaly if flagged by at least three out of the four models. Among the tested models, LSTM-AE outperforms others, achieving the highest AUC score (0.998), and proving particularly effective in capturing long-term temporal dependencies in public financial data.

This study demonstrates that a machine learning approach can be applied to public financial management to uncover hidden corruption risks. By combining multiple unsupervised models, the approach offers a scalable tool for detecting red flags across key stages of the budget process—such as approval, expenditure tracking, and auditing. It provides a practical and interpretable framework to support policymakers and oversight bodies in proactively identifying high-risk projects and enhancing transparency in the public sector.