Date of Award

5-14-2024

Author's School

McKelvey School of Engineering

Author's Department

Energy, Environmental & Chemical Engineering

Degree Name

Doctor of Philosophy (PhD)

Degree Type

Dissertation

Abstract

Water and wastewater management systems play an essential role in satisfying residential, commercial, and industrial needs, but these systems suffer from undesired issues and high expenses, which stresses the need for sustainable development and system optimization. Modeling simulation is commonly used to optimize system performance through process prediction and precise control, and artificial intelligence (AI) technologies including machine learning (ML) and deep learning (DL) are showing their advantages in the application of water and wastewater management systems. A general framework was illustrated for optimized performance and enhanced explainability of ML models. Several analytical techniques were explored, including feature engineering, data scaling, hyperparameter tuning, data validation, and sensitivity analysis. This investigation involved the comparison of eight ML models in the applications of effluent pH prediction and chemical dosage control in neutralization processes. In the neutralizer pH prediction, the highest coefficient of determination (R2) results were obtained at 0.765 (k-nearest neighbors - KNN), 0.918 (eXtreme Gradient Boosting - XGBoost), and 0.900 (random forest - RF) for three neutralizers. The impacts of input features were quantified by a sensitivity analysis using SHAP values, which demonstrated the importance of temperature, valve position, and upstream pH. For lime dosage control represented by valve position, the best model performance was from XGBoost (0.605), RF (0.788), and RF (0.436) in R2. The recommended valve position was guided based on the target pH and illustrated under various upstream pH values. The consistency observed between the results of model training and testing further underscored the effectiveness of this ML framework in mitigating overfitting concerns. Small datasets or incomplete datasets can be a common issue in ML/DL applications. For example, the effluent phosphorus needs to be predicted but the influent phosphorus load and chemical dosage for phosphorus removal are not monitored in small-scale wastewater treatment plants (WWTP). To solve this small-data issue, two methods with ML and DL were investigated. On the one hand, we cooperated ML models with a feature engineering method, i.e., correlation analysis to select essential model input features from 42 variables by exploring internal correlations between the variables. Five ML regression models were used and the highest R2 value of 0.637 was achieved with the support vector machine (SVM) model. On the other hand, the raw datasets were preprocessed to serve as time series data. One DL model named long short-term memory (LSTM) was applied to learn from these temporal data and predict phosphorus load in one day advance with R2 result of 0.496. In addition to conventional data inputs, graphs can also serve as valuable inputs, offering spatial data for ML/DL modeling purposes. The pipeline network of water distribution systems (WDS) can be considered as graphical structural data, assisting the pipe failure prediction to prioritize pipe maintenance. Herein, two methods including ML with geographical information system (GIS) and ML with graph neural network (GNN) were explored to achieve enhanced pipe failure prediction by using spatial features to account for neighboring effects. Five ML models served as benchmarks, achieving the highest R2 value of 0.809 with the RF model. The first method named “ML-ArcGIS” was conducted with ArcGIS for two clustering methods (Hot spot analysis, and Cluster and Outlier analysis) to acquire spatial features as the inputs. The “ML-ArcGIS” method demonstrated an improved model performance with an R2 value of 0.824. The second method named “ML-GNN” gathered spatial features by the Unsupervised Graph SAmple and aggreGatE (GraphSAGE) method, resulting in an enhanced R2 value of 0.823. The importance of spatial features was further confirmed by sensitivity analyses. Apart from the spatial data collected from WDS pipeline graphs, temporal data can be obtained from the historical break records and serve as time-series data in pipe failure prediction. This study proposed a novel deep learning-based DeeperGCN framework that cooperated with graph convolutional network (GCN) models for a much deeper model architecture to process spatial and temporal data simultaneously. Two graph representation methods and three GCN models were compared, showing the best predictions with the “Pipe_as_Edge” method and the DeeperGEN model. To suggest the priority of pipe maintenance directly, the prediction targets were assigned as a binary classification question to judge break or not within the following one year, three years, and five years, achieving prediction accuracy of 96.91%, 96.73%, and 97.23%. An issue of data imbalance was observed, so three forms (binary, macro, and weighted) of evaluation metrics were compared, resulting in weighted F1 scores of 0.9612, 0.9676, and 0.9722. This model framework demonstrated its potential applications in the proactive maintenance of WDS pipes.

Language

English (en)

Chair

Zhen He

Available for download on Tuesday, May 13, 2025

Share

COinS