When Statistics Misread Democracy: Southeast 2023 Result Not Manipulated
By Osita Chidoka
A recent Master’s thesis now circulating widely argues that Nigeria’s 2023 presidential election can be forensically decoded using statistical and machine-learning techniques—and that doing so reveals significant electoral manipulation, particularly in the South-East.
Given the weight of that claim and the attention it has attracted, I sought to verify its provenance. A brief check with Amara Nwankpa, who is referenced in the summary, confirmed that the thesis is genuine. He kindly shared the full document, titled Applying Machine Learning Techniques to Detect Electoral Fraud in Nigeria’s 2023 Elections by Joachim Oye MacEbong.
What follows is an assessment of what the thesis demonstrates, what it suggests, and where its conclusions exceed what the evidence can sustain.
The thesis argues that statistical forensics and machine-learning analysis of polling-unit results from the 2023 presidential election reveal widespread irregularities, with a particularly high concentration in the South-East. From this, it infers that significant electoral manipulation occurred in that region, even suggesting that parties dominant there benefited from inflated results.
While the work is ambitious and technically literate, its key conclusions do not hold when examined against Nigeria’s electoral history, the institutional realities of the BVAS era, and established limits of statistical forensics in identity-driven elections.
This is not an argument against data. It is an argument for using data correctly, especially in elections shaped by identity, history, and institutional change.
The single most important reform in Nigeria’s electoral process is the Bimodal Voter Accreditation System (BVAS). For the first time, over-voting is not inferred statistically; it is directly observable. The primary integrity test in the BVAS era is therefore simple and decisive: Votes cast must not exceed BVAS-accredited voters at the polling unit.
Any credible claim of ballot manipulation must begin at polling unit. Statistical tools can guide inquiry, but they cannot replace ground truth. Analyses that begin with digit tests and anomaly detection, without first anchoring results to BVAS accreditation, cancellations, or documented collation breaches, invite false positives.
The thesis relies heavily on last-digit tests, Benford’s Law, and unsupervised machine learning to flag “anomalies.” These techniques are exploratory, not evidentiary. They are most reliable in environments where voter preferences aggregate independently and identity effects are weak.
Nigeria is not such an environment.
Identity-driven democracies routinely produce non-Gaussian outcomes—highly clustered results that are politically intelligible but statistically “odd.” Treating these shapes as proof of manipulation confuses sociology with fraud.
History is a good guide here. Consider the South-East across four election cycles:
• 2011: The South-East voted ~98% for Goodluck Jonathan.
• 2015: The South-East voted ~88% for the PDP, while the North-West voted ~81% for Muhammadu Buhari.
• 2019: With two northern candidates, the South-East voted ~81% against Buhari.
• 2023: With a South-East candidate—Peter Obi—the region voted ~90% for Labour.
2023 is not an anomaly. It is consistent with history. In 2019, the South-East voted against perceived exclusion. In 2023, it voted for perceived representation. The direction changed; the structure did not. High cohesion, low dispersion, and identity-anchored mobilisation define this pattern.
The same logic explains northern voting for Muhammadu Buhari in 2011 and 2015, and South-West voting for AD/APP in 1999. These outcomes were never treated as fraud because they were politically legible.
Any framework that flags South-East results in 2023 but ignores northern results in 2011–2015 is analytically inconsistent.
Machine learning detects outliers, not illegality. Unsupervised models such as Isolation Forests detect outliers, not crimes. Outliers can arise from:
• Ethnic or regional homogeneity
• Differential mobilisation
• Youth-driven protest voting
• Urban–rural turnout asymmetries
Without BVAS corroboration, incident reports, or cancellations, translating “anomalies” into claims of manipulation is an epistemic leap the data does not permit.
From our off-cycle governorship election work at the Athena Centre, a consistent picture emerges:
• Manipulation exists, but it is limited in scope—well under 20% of polling units.
• Those units can matter, but they must be identified through BVAS breaches, cancellations, or collation distortions.
• Broad regional conclusions drawn from statistical shape alone are unreliable in identity-heavy contexts.
Good analysis respects boundaries. Statistics can illuminate patterns; they cannot establish intent or scale without institutional evidence. In the BVAS era, ground truth must lead. History must contextualise. Methods must fit the polity.
The 2023 election deserves scrutiny. It does not deserve conclusions that outrun the evidence. When statistics misread democracy, the remedy is not less data but better questions, grounded methods, and historical sense.
At best, the analysis is exploratory; at worst, it reflects a misapplication of statistical tools beyond what the data and institutional context can support.
Osita Chidoka,
21 December 2025



