Shape-Constrained Additive Modelling of Total Petroleum Hydrocarbon Contamination: A Simulation-Based Comparison within a Gamma–Log Framework

Authors

  • Mirlinda Kadriaj Department of Business Informatics, Faculty of Technology Information, Tirana Business University College.
  • Luan Arapi Department of Applied Geology and Geoinformatics, Faculty of Geology and Mining, Polytechnic University of Tirana, Tirana, Albania.
  • Raimonda Dervishi Department of Mathematical Engineering, Faculty of Mathematical Engineering and Physical Engineering, Polytechnic University of Tirana, Tirana, Albania. https://orcid.org/0000-0001-8679-5489

DOI:

https://doi.org/10.22105/kmisj.v3i1.115

Keywords:

Gamma regression, Generalized additive models, Shape-constrained additive models, Monotonicity constraints, Simulation study, Bootstrap uncertainty analysis

Abstract

Total Petroleum Hydrocarbon (TPH) contamination poses a long-term threat to soil quality and ecosystem health in oil-producing regions. Accurate characterization of TPH drivers is challenging because environmental responses are frequently nonlinear, while available datasets are often sparse and uncertain. In this study, we evaluated whether incorporating process-informed shape constraints can improve statistical modelling of TPH concentrations. A common Gamma–log modelling framework was implemented and used to compare a Generalized Linear Model (GLM), a Generalized Additive Model (GAM), and a Shape-Constrained Additive Model (SCAM). The evaluation was performed using a site-informed simulation scenario that reflects established environmental expectations, in which TPH increases with Total Organic Carbon (TOC) and decreases with both distance from oil wells and soil depth. A synthetic dataset comprising 600 observations was generated through constrained random sampling over empirically informed covariate ranges. Model performance was examined using residual diagnostics, calibration analysis, and stratified 10-fold cross-validation. In contrast, uncertainty was assessed through Bias-Corrected and Accelerated (BCa) bootstrap intervals (B = 5,000) and local one-at-a-time sensitivity analysis (±10%). Results showed that SCAM achieved the most coherent monotonic response patterns and the strongest calibration performance, highlighting the value of integrating scientifically justified shape constraints into environmental regression models. Although conclusions remain conditional on the adopted simulation framework, the study demonstrates how constrained nonlinear models can improve interpretability and robustness in data-limited contamination assessments.

References

Dobson, A. J., & Barnett, A. G. (2018). An introduction to generalized linear models. Chapman and Hall/CRC. https://doi.org/10.1201/9781315182780

Helsel, D. R. (2011). Statistics for censored environmental data using Minitab and R. John Wiley & Sons. https://doi.org/10.1002/9781118162729

Zuur, A. F., Ieno, E. N., Walker, N. J., Saveliev, A. A., & Smith, G. M. (2009). Mixed effects models and extensions in ecology with R (Vol. 574, p. 574). New York: Springer. https://doi.org/10.1007/978-0-387-87458-6

Hastie, T., & Tibshirani, R. (1986). Generalized additive models. Statistical science, 1(3), 297–310. https://doi.org/10.1214/ss/1177013604

Wood, S. N. (2017). Generalized additive models: An introduction with R. Chapman and Hall/CRC. https://doi.org/10.1201/9781315370279

Pya, N., & Wood, S. N. (2015). Shape constrained additive models. Statistics and computing, 25(3), 543–559. https://doi.org/10.1007/s11222-013-9448-7

Pya, N. (2016). Scam: Shape constrained additive models (R package version 1.x). https://cran.r-project.org/package=scam

Arnqvist, N. P. (2024). On some extensions of shape-constrained generalized additive modelling in R. https://doi.org/10.48550/arXiv.2403.09438

Saltelli, A., Ratto, M., Andres, T., Campolongo, F., Cariboni, J., Gatelli, D., ... & Tarantola, S. (2008). Global sensitivity analysis: The primer. John Wiley & Sons. https://doi.org/10.1002/9780470725184

Iooss, B., & Lemaître, P. (2015). A review on global sensitivity analysis methods. In Uncertainty management in simulation-optimization of complex systems: Algorithms and applications (pp. 101–122). Springer. https://doi.org/10.1007/978-1-4899-7547-8_5

Registry., A. for T. S. and D. (1999). Toxicological profile for total petroleum hydrocarbons (TPH). https://www.atsdr.cdc.gov/toxprofiles/tp123.pdf

Canadian Council of Ministers of the Environment. (2008). Canada-wide standard for petroleum hydrocarbons (PHC) in soil: Scientific rationale: Supporting technical document. https://ccme.ca/en/res/cws_phc_user_guide_1.1_e.pdf

Doucette, W. J. (2003). Quantitative structure-activity relationships for predicting soil-sediment sorption coefficients for organic chemicals. Environmental toxicology and chemistry, 22(8), 1771–1788. https://doi.org/10.1897/01-362

Palma, P., Fialho, S., Alvarenga, P., Santos, C., Brás, T., Palma, G., ... & Neves, L. A. (2016). Membranes technology used in water treatment: Chemical, microbiological and ecotoxicological analysis. Science of the total environment, 568, 998-1009. https://doi.org/10.1016/j.scitotenv.2016.04.208

Karickhoff, S. W., Brown, D. S., & Scott, T. A. (1979). Sorption of hydrophobic pollutants on natural sediments. Water research, 13(3), 241–248. https://doi.org/10.1016/0043-1354(79)90201-X

Yu, L., Wang, J., Li, Y., & Chen, B. (2019). Distribution and migration of petroleum hydrocarbons around oil wells in contaminated soils. Environmental monitoring and assessment, 191(502), 13. https://doi.org/10.1007/s10661-019-7659-9

Zhu, L., Li, W., & Pan, L. (2015). Horizontal and vertical migration of petroleum hydrocarbons in oilfield soils. Environmental science: Processes & impacts, 17(3), 659–667. https://doi.org/10.1039/C4EM00628J

Seiti, B., Topi, D., Lame, A., & Drushku, S. (2010). The evaluation of environmental situation due to the oil industry activity in Albania. BALWOIS 2010 conference, Ohrid, North Macedonia (p. 469). BALWOIS / Balkan Institute for Water and Environment (IB2E). https://www.researchgate.net/publication/348434478_The_Evaluation_of_Environmental_Situation_due_to_the_Oil_Industry_Activity_in_Albania

Beqiraj, I., & Topi, D. (2016). Soil pollution from oil fields’ exploitation in Albania-incidence of the marinza oil well explosion. Mechanical engineering scientific journal (SKOPJE), 34(1), 85–90. http://www.mesj.ukim.edu.mk/archive

Métois, M., Benjelloun, M., Lasserre, C., Grandin, R., Barrier, L., Dushi, E., & Koçi, R. (2020). Subsidence associated with oil extraction, measured from time series analysis of Sentinel-1 data: Case study of the Patos-Marinza oil field, Albania. Solid earth, 11(2), 363–378. https://doi.org/10.5194/se-11-363-2020

Efron, B. (1987). Better bootstrap confidence intervals. Journal of the American statistical association, 82(397), 171-185. https://doi.org/10.1080/01621459.1987.10478410

DiCiccio, T. J., & Efron, B. (1996). Bootstrap confidence intervals. Statistical science, 11(3), 189–228. https://doi.org/10.1214/ss/1032280214

Efron, B., & Narasimhan, B. (2020). The automatic construction of bootstrap confidence intervals. Journal of computational and graphical statistics, 29(3), 608–619. https://doi.org/10.1080/10618600.2020.1714633

Steyerberg, E. W. (2019). Clinical prediction models. Cham: Springer International Publishing. https://doi.org/10.1007/978-3-030-16399-0

Qiriazi, P., & Sala, S. (2000). Environmental problems of Albania. In Remote sensing for environmental data in Albania: A strategy for integrated management (pp. 13–30). Springer. https://doi.org/10.1007/978-94-011-4357-8_4

National Environment Agency of Albania. (2019). Report on the state of the environment 2019. https://akm.gov.al/ova_doc/raport-per-gjendjen-e-mjedisit-2019/

Gropa, O., Karamani, E., & Zema, R. (2025). Heavy oil production in the Patos-Marinza field: Mitigating high-viscosity constraints and enhancing recovery efficiency. International journal of innovative research in science, engineering and technology, 14(7), 17090–17096. https://doi.org/10.15680/IJIRSET.2025.1407002

Interstate Technology & Regulatory Council (ITRC). (2018). TPH risk evaluation at petroleum-contaminated sites (TPHRisk-1). https://tphrisk-1.itrcweb.org

Ditzler, C., Scheffe, K., & Monger, H. C. (2017). Soil survey manual. Government Printing Office. https://bibliotecadigital.ciren.cl/handle/20.500.13082/148329

Programme., U. N. E. (2017). Environmental assessment of Ogoniland. https://www.unep.org/ogoniland

Team, R. C. (2020). RA language and environment for statistical computing, R foundation for statistical. Computing. https://cir.nii.ac.jp/crid/1370298755636824325

Burton, A., Altman, D. G., Royston, P., & Holder, R. L. (2006). The design of simulation studies in medical statistics. Statistics in medicine, 25(24), 4279–4292. https://doi.org/10.1002/sim.2673

Morris, T. P., White, I. R., & Crowther, M. J. (2019). Using simulation studies to evaluate statistical methods. Statistics in medicine, 38(11), 2074–2102. https://doi.org/10.1002/sim.8086

Limpert, E., Stahel, W. A., & Abbt, M. (2001). Log-normal distributions across the sciences: Keys and clues: On the charms of statistics, and how mechanical models resembling gambling machines offer a link to a handy way to characterize log-normal distributions, which can provide deeper insight into v. BioScience, 51(5), 341–352. https://doi.org/10.1641/0006-3568(2001)051[0341:LNDATS]2.0.CO;2

Millard, S. P. (2013). EnvStats: An R package for environmental statistics. Springer Science & Business Media. https://doi.org/10.1007/978-1-4614-8456-1

McCullagh, P. (2019). Generalized linear models. Routledge. https://doi.org/10.1007/978-1-4899-3242-6

Kohavi, R. (1995). A study of cross-validation and bootstrap for accuracy estimation and model selection. International joint conference on artificial intelligence (IJCAI) (Vol. 14, No. 2, pp. 1137-1145). IJCAI. https://www.ijcai.org/Proceedings/95-2/Papers/016.pdf

Hastie, T. (2009). The elements of statistical learning: Data mining, inference, and prediction. Springer. https://doi.org/10.1007/978-0-387-84858-7

Rudin, C. (2019). Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature machine intelligence, 1(5), 206–215. https://doi.org/10.1038/s42256-019-0048-x

Molnar, C. (2020). Interpretable machine learning. Lulu. Com. https://christophm.github.io/interpretable-ml-book/

Published

2026-03-03

How to Cite

Kadriaj, M., Arapi, L., & Dervishi, R. (2026). Shape-Constrained Additive Modelling of Total Petroleum Hydrocarbon Contamination: A Simulation-Based Comparison within a Gamma–Log Framework. Karshi Multidisciplinary International Scientific Journal, 3(1), 13-33. https://doi.org/10.22105/kmisj.v3i1.115