Shape-Constrained Additive Modelling of Total Petroleum Hydrocarbon Contamination: A Simulation-Based Comparison within a Gamma–Log Framework
DOI:
https://doi.org/10.22105/kmisj.v3i1.115Keywords:
Gamma regression, Generalized additive models, Shape-constrained additive models, Monotonicity constraints, Simulation study, Bootstrap uncertainty analysisAbstract
Total Petroleum Hydrocarbon (TPH) contamination poses a long-term threat to soil quality and ecosystem health in oil-producing regions. Accurate characterization of TPH drivers is challenging because environmental responses are frequently nonlinear, while available datasets are often sparse and uncertain. In this study, we evaluated whether incorporating process-informed shape constraints can improve statistical modelling of TPH concentrations. A common Gamma–log modelling framework was implemented and used to compare a Generalized Linear Model (GLM), a Generalized Additive Model (GAM), and a Shape-Constrained Additive Model (SCAM). The evaluation was performed using a site-informed simulation scenario that reflects established environmental expectations, in which TPH increases with Total Organic Carbon (TOC) and decreases with both distance from oil wells and soil depth. A synthetic dataset comprising 600 observations was generated through constrained random sampling over empirically informed covariate ranges. Model performance was examined using residual diagnostics, calibration analysis, and stratified 10-fold cross-validation. In contrast, uncertainty was assessed through Bias-Corrected and Accelerated (BCa) bootstrap intervals (B = 5,000) and local one-at-a-time sensitivity analysis (±10%). Results showed that SCAM achieved the most coherent monotonic response patterns and the strongest calibration performance, highlighting the value of integrating scientifically justified shape constraints into environmental regression models. Although conclusions remain conditional on the adopted simulation framework, the study demonstrates how constrained nonlinear models can improve interpretability and robustness in data-limited contamination assessments.
References
Dobson, A. J., & Barnett, A. G. (2018). An introduction to generalized linear models. Chapman and Hall/CRC. https://doi.org/10.1201/9781315182780
Helsel, D. R. (2011). Statistics for censored environmental data using Minitab and R. John Wiley & Sons. https://doi.org/10.1002/9781118162729
Zuur, A. F., Ieno, E. N., Walker, N. J., Saveliev, A. A., & Smith, G. M. (2009). Mixed effects models and extensions in ecology with R (Vol. 574, p. 574). New York: Springer. https://doi.org/10.1007/978-0-387-87458-6
Hastie, T., & Tibshirani, R. (1986). Generalized additive models. Statistical science, 1(3), 297–310. https://doi.org/10.1214/ss/1177013604
Wood, S. N. (2017). Generalized additive models: An introduction with R. Chapman and Hall/CRC. https://doi.org/10.1201/9781315370279
Pya, N., & Wood, S. N. (2015). Shape constrained additive models. Statistics and computing, 25(3), 543–559. https://doi.org/10.1007/s11222-013-9448-7
Pya, N. (2016). Scam: Shape constrained additive models (R package version 1.x). https://cran.r-project.org/package=scam
Arnqvist, N. P. (2024). On some extensions of shape-constrained generalized additive modelling in R. https://doi.org/10.48550/arXiv.2403.09438
Saltelli, A., Ratto, M., Andres, T., Campolongo, F., Cariboni, J., Gatelli, D., ... & Tarantola, S. (2008). Global sensitivity analysis: The primer. John Wiley & Sons. https://doi.org/10.1002/9780470725184
Iooss, B., & Lemaître, P. (2015). A review on global sensitivity analysis methods. In Uncertainty management in simulation-optimization of complex systems: Algorithms and applications (pp. 101–122). Springer. https://doi.org/10.1007/978-1-4899-7547-8_5
Registry., A. for T. S. and D. (1999). Toxicological profile for total petroleum hydrocarbons (TPH). https://www.atsdr.cdc.gov/toxprofiles/tp123.pdf
Canadian Council of Ministers of the Environment. (2008). Canada-wide standard for petroleum hydrocarbons (PHC) in soil: Scientific rationale: Supporting technical document. https://ccme.ca/en/res/cws_phc_user_guide_1.1_e.pdf
Doucette, W. J. (2003). Quantitative structure-activity relationships for predicting soil-sediment sorption coefficients for organic chemicals. Environmental toxicology and chemistry, 22(8), 1771–1788. https://doi.org/10.1897/01-362
Palma, P., Fialho, S., Alvarenga, P., Santos, C., Brás, T., Palma, G., ... & Neves, L. A. (2016). Membranes technology used in water treatment: Chemical, microbiological and ecotoxicological analysis. Science of the total environment, 568, 998-1009. https://doi.org/10.1016/j.scitotenv.2016.04.208
Karickhoff, S. W., Brown, D. S., & Scott, T. A. (1979). Sorption of hydrophobic pollutants on natural sediments. Water research, 13(3), 241–248. https://doi.org/10.1016/0043-1354(79)90201-X
Yu, L., Wang, J., Li, Y., & Chen, B. (2019). Distribution and migration of petroleum hydrocarbons around oil wells in contaminated soils. Environmental monitoring and assessment, 191(502), 13. https://doi.org/10.1007/s10661-019-7659-9
Zhu, L., Li, W., & Pan, L. (2015). Horizontal and vertical migration of petroleum hydrocarbons in oilfield soils. Environmental science: Processes & impacts, 17(3), 659–667. https://doi.org/10.1039/C4EM00628J
Seiti, B., Topi, D., Lame, A., & Drushku, S. (2010). The evaluation of environmental situation due to the oil industry activity in Albania. BALWOIS 2010 conference, Ohrid, North Macedonia (p. 469). BALWOIS / Balkan Institute for Water and Environment (IB2E). https://www.researchgate.net/publication/348434478_The_Evaluation_of_Environmental_Situation_due_to_the_Oil_Industry_Activity_in_Albania
Beqiraj, I., & Topi, D. (2016). Soil pollution from oil fields’ exploitation in Albania-incidence of the marinza oil well explosion. Mechanical engineering scientific journal (SKOPJE), 34(1), 85–90. http://www.mesj.ukim.edu.mk/archive
Métois, M., Benjelloun, M., Lasserre, C., Grandin, R., Barrier, L., Dushi, E., & Koçi, R. (2020). Subsidence associated with oil extraction, measured from time series analysis of Sentinel-1 data: Case study of the Patos-Marinza oil field, Albania. Solid earth, 11(2), 363–378. https://doi.org/10.5194/se-11-363-2020
Efron, B. (1987). Better bootstrap confidence intervals. Journal of the American statistical association, 82(397), 171-185. https://doi.org/10.1080/01621459.1987.10478410
DiCiccio, T. J., & Efron, B. (1996). Bootstrap confidence intervals. Statistical science, 11(3), 189–228. https://doi.org/10.1214/ss/1032280214
Efron, B., & Narasimhan, B. (2020). The automatic construction of bootstrap confidence intervals. Journal of computational and graphical statistics, 29(3), 608–619. https://doi.org/10.1080/10618600.2020.1714633
Steyerberg, E. W. (2019). Clinical prediction models. Cham: Springer International Publishing. https://doi.org/10.1007/978-3-030-16399-0
Qiriazi, P., & Sala, S. (2000). Environmental problems of Albania. In Remote sensing for environmental data in Albania: A strategy for integrated management (pp. 13–30). Springer. https://doi.org/10.1007/978-94-011-4357-8_4
National Environment Agency of Albania. (2019). Report on the state of the environment 2019. https://akm.gov.al/ova_doc/raport-per-gjendjen-e-mjedisit-2019/
Gropa, O., Karamani, E., & Zema, R. (2025). Heavy oil production in the Patos-Marinza field: Mitigating high-viscosity constraints and enhancing recovery efficiency. International journal of innovative research in science, engineering and technology, 14(7), 17090–17096. https://doi.org/10.15680/IJIRSET.2025.1407002
Interstate Technology & Regulatory Council (ITRC). (2018). TPH risk evaluation at petroleum-contaminated sites (TPHRisk-1). https://tphrisk-1.itrcweb.org
Ditzler, C., Scheffe, K., & Monger, H. C. (2017). Soil survey manual. Government Printing Office. https://bibliotecadigital.ciren.cl/handle/20.500.13082/148329
Programme., U. N. E. (2017). Environmental assessment of Ogoniland. https://www.unep.org/ogoniland
Team, R. C. (2020). RA language and environment for statistical computing, R foundation for statistical. Computing. https://cir.nii.ac.jp/crid/1370298755636824325
Burton, A., Altman, D. G., Royston, P., & Holder, R. L. (2006). The design of simulation studies in medical statistics. Statistics in medicine, 25(24), 4279–4292. https://doi.org/10.1002/sim.2673
Morris, T. P., White, I. R., & Crowther, M. J. (2019). Using simulation studies to evaluate statistical methods. Statistics in medicine, 38(11), 2074–2102. https://doi.org/10.1002/sim.8086
Limpert, E., Stahel, W. A., & Abbt, M. (2001). Log-normal distributions across the sciences: Keys and clues: On the charms of statistics, and how mechanical models resembling gambling machines offer a link to a handy way to characterize log-normal distributions, which can provide deeper insight into v. BioScience, 51(5), 341–352. https://doi.org/10.1641/0006-3568(2001)051[0341:LNDATS]2.0.CO;2
Millard, S. P. (2013). EnvStats: An R package for environmental statistics. Springer Science & Business Media. https://doi.org/10.1007/978-1-4614-8456-1
McCullagh, P. (2019). Generalized linear models. Routledge. https://doi.org/10.1007/978-1-4899-3242-6
Kohavi, R. (1995). A study of cross-validation and bootstrap for accuracy estimation and model selection. International joint conference on artificial intelligence (IJCAI) (Vol. 14, No. 2, pp. 1137-1145). IJCAI. https://www.ijcai.org/Proceedings/95-2/Papers/016.pdf
Hastie, T. (2009). The elements of statistical learning: Data mining, inference, and prediction. Springer. https://doi.org/10.1007/978-0-387-84858-7
Rudin, C. (2019). Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature machine intelligence, 1(5), 206–215. https://doi.org/10.1038/s42256-019-0048-x
Molnar, C. (2020). Interpretable machine learning. Lulu. Com. https://christophm.github.io/interpretable-ml-book/