Journal of Environmental Treatment Techniques Download PDF version 
2018, Volume 6, Issue 2, Pages: 
J. Environ. Treat. Tech.
ISSN:
Journal weblink: http://www.jett.dormaj.com
Predictive Performance Modeling of Habesha
Brewery’s Wastewater Treatment Plant Using
Artificial Neural Networks
Elias Barsenga Hassen1 and Abraham M. Asmare2
1Faculty of Chemical and Food Engineering, Bahir Dar Institute of Technology, Bahir Dar University, Ethiopia
2Institute of disaster Risk management and Food Security Studies, Bahir Dar University, Ethiopia
Received: 25/07/2017 
Accepted: 07/02/2018 
Published: 30/07/2018 
Abstract
Recently, process control is, mostly, accomplished through examining the quality of the product water and adjusting the processes through an operator’s experience. This practice is inefficient, costly and slow in control response. A better control of WTPs can be achieved by developing a robust mathematical tool for predicting the performance. Owing to their high accuracy and quite promising application in the field of engineering, Artificial Neural Networks (ANNs) are getting attention in the predictive performance modeling of WTPs. This paper focus on applying ANN with a
Keywords: Artificial Neural Network, Wastewater Treatment Plant, Performance Modeling
1Introduction1
1.1Background
The aim of wastewater treatment process is to
achieve a treated effluent and sludge that is environmentally safe for disposal and/or reuse [1]. Taking into account current environmental problems and water scarcity, it is not unrealistic to believe that the trend of development of new WTPs will be continued all over the world. At the same time, loads on existing plants are expected to increase due to the growth of production capacity of industries. Moreover, the advent of more stringent environmental regulations relative to WTP and discharge puts more pressure on operators and decision makers to better manage and improve the reliability of their treatment plants [2]. Currently, process control is generally accomplished through examining the quality of the product water and adjusting the processes through an operator’s experience. This practice is inefficient and slow in control response [3]. This situation demands more efficient and economic control and assessment tool for proper functioning of WTPs.
Modeling and simulation is an elegant and cost effective tool to assess performance and control of WTPs
Corresponding author: Elias Barsenga Hassen, Faculty of Chemical and Food Engineering, Bahir Dar Institute of Technology, Bahir Dar University, Ethiopia,
[4].Models of WTPs can be divided into two main categories: linguistic and mathematical models. Linguistic models (such as expert systems) can relate cause to effect, without the construction of a mathematical model. They are most suitable for describing phenomena in environmental systems that are very difficult to represent mathematically. Most of the expert systems that have been developed in the field of wastewater treatment are directed into diagnosis. This category of application helps to identify the causes of malfunctions and their remedies in pollution control facilities [5].
Mathematical models of wastewater treatment processes can be divided into two broad classes: white box (deterministic) models and black box models (empirical) models. White box models are most useful to understand the events occurring in a system [5]. Generally, deterministic models incorporate direct links between inputs and outputs through ordinary and partial differential equations that seek to mimic the mechanistic reactions. Development of such model which can accurately describe a system requires a detail knowledge and evaluation of the system, the factors that act in the system and the interaction between those factors. Although these models give a good insight into the mechanics of the system, they require a lot of hard work and time before applying them to a specific WTP [6]. Activated Sludge Model No. 1 (ASM1) and Activated Sludge Model No. 2 (ASM2), developed by International
15
Journal of Environmental Treatment Techniques 
2018, Volume 6, Issue 2, Pages: 
Water Association (IWA), are among the few mechanistic models that are developed in the field of wastewater treatment [7].
The other type of mathematical model, black box or empirical type of model is designed to represent
Artificial Neural Network (ANN) is an artificial intelligence technique that mimics the human brain’s biological neural network in the problem solving processes. As human solves a new problem based on the past experience, a neural network take previously solved examples, looks for patterns in these examples, learns the pattern and develops the ability to correctly classify new patterns and predict and forecast process parameters
[8].Experimental based laboratory tests are used for determination of the quality of the wastewater throughout the treatment process which is time consuming, expensive and slow in response. This study is aimed at developing a robust predictive model for determination of the quality of the treated effluent by using ANN.
Wastewater treatment plants (WTPs) are dynamic,
To date, almost all the industrial WTPs in Ethiopia, including Habesha Brewery, use conventional experimental approach to determine the quality of the treated effluent water before discharge or reuse. However, this method is expensive, time consuming and slow in response. In addition, this approach requires experienced professional to obtain the best result. A better operation and control of the WTP can be achieved by developing a robust mathematical tool for that enable prediction of the quality of the treated effluent based on past observations of certain key parameters. To this end, in this thesis work, an ANN modeling technique was used to develop predictive performance model of
Habesha Brewery’s WTP. Therefore, the general objective of this thesis work was to develop a robust model that enables to predict the performance of Habesha Brewery’s WTP using Artificial Neural Networks.
2 Description of the Wastewater Plant
2.1Plant description
Habesha Brewery with a capacity of producing
700,000 hectoliters of beer annually. Its WTP is located in Debre Birhan Town, 120 kilometers North of Addis
16
Ababa City. It uses a biological based wastewater treatment plant. The wastewater has a maximum capacity of treating 1000 m3 of wastewater at a time (Figure 1).
Figure 1: Habesha Brewery’s WTP
2.2Process description of the wastewater plant
The wastewater collected from the production plant
is first passes through the screen chamber to remove large solid parts like broken bottles and sticker papers then it will be collected in the equalization tank where it will be neutralized by addition of HCl or NaOH depending on the initial pH of the wastewater. Then the neutralized wastewater will pass to
Figure 2: Process flow diagram of Habesha Brewery's
WTP
2.3Effluent Guidelines and standards
Globally there is a great political and social pressure
to reduce the pollution arising from industrial activities. The brewing industry is one such industry that generate large amount of wastewater; it has been documented that 3 to 10 liters of wastewater is generated per liter of beer production. Due to increasing environmental concerns and regulations, attempts were made to utilize treated wastewater in environmental friendly manner [9].
The Environmental Protection Agency of Ethiopia (EPA) has developed general national pollutant discharge
Journal of Environmental Treatment Techniques 
2018, Volume 6, Issue 2, Pages: 
limit to control water pollution. The EPA’s effluent standard limit for reuse of wastewater for agricultural purpose is shown Table 1 below.
Table 1: EPA's Standard limit for discharge of treated industrial wastewater for selected physicochemical parameters [10]
Parameters 
Limits 
Temperature 
Do not change ambient temperature by 
more than 10C 

pH 

COD 
≤ 125 mg/l 
TN 
≤ 30 mg/l 
3 Methodology
3.1 Materials
3.1.1 Historical Operational Data
The historical data used for this study were obtained from Habesha Brewery’s WTP laboratory. About 11 months of records, between May 2, 2016 March 20, 2017, for COD, pH, and Total Nitrogen (TN) of raw influent wastewater and treated effluent were obtained from the plant laboratory.
The gathered data were carefully investigated and after considering the available options for modeling the treatment plant performance, it is decided to relate the quality of the raw influent wastewater to the quality of the final treated effluent. The descriptive statistical
analysis for the raw data operational data obtained for the influent and effluent water are presented in Tables 2 and 3 below respectively.
3.1.2 Software
MATLAB® (R2014b) software with neural network toolbox, which is a high performance interactive software package for scientific and engineering computation, was used for designing, building, training and testing the neural networks. MiniTab® v 17 and Microsoft Excel® 2016 were also used for the data organization and
3.2 Methods
3.2.1 Data
The quantity and quality of the available data sets will ultimately determine the performance and complexity of the ANN. Mostly, raw data collected from plant operations are noisy and incomplete by nature. Therefore, the raw data that was collected from the plant laboratory was examined for completeness, missing values and outliers by using statistical analysis software called MiniTab.
Missing values were estimated by using interpolation and outlier removal was accomplished by removing measurements that were not within the range of ±3 standard deviations. The descriptive statistics of the pre processed data is presented in the Tables 4 and 5 below.
Table 2: Descriptive Statistics for raw operational data of Influent wastewater
Variable 
Units 
Min 
Max 
Mean 
St. Dev 
Var. 







pH 
 
6.030 
9.710 
7.208 
0.551 
0.303 
COD 
mg/l 
920.000 
2764.000 
1605.400 
437.000 
190985.100 
TN 
mg/l 
8.440 
49.600 
31.337 
6.817 
46.477 







Table 3: Descriptive Statistics for raw operational data of treated effluent 


Variable 
Unit 
Min 
Max 
Mean 
St. dev 
Var. 







pH 
 
7.560 
8.330 
7.960 
0.140 
0.020 
COD 
mg/l 
14.600 
422.000 
89.690 
67.420 
4545.250 
TN 
mg/l 
2.691 
27.575 
8.721 
5.305 
28.142 







To ensure the statistical distribution of the values for each net input and output is roughly uniform, data scaling was carried out by using MATLAB function mapminmax function in MATALB by which the data sets were scaled to a specified range of
3.2.2 Data Division
When training multilayer networks, the general practice is to first divide the data into three subsets. The first subset is the training set, which is used for
computing the gradient and updating the network weights and biases. The second set is the validation set. The error on validation set is monitored during the training process. The validation error decreases during the initial phase of the training and the error tends to rise as the network begins to rise. Therefore, the validation set is used to decide when to stop training in order to avoid over training. The test set is not used during the training phase, but it is used to compare the performance of different models.
17
Journal of Environmental Treatment Techniques2018, Volume 6, Issue 2, Pages:
Table 0: Descriptive Statistics for
Variable 

Units 
Min 
Max 
Mean 
St. Dev 
Var. 








pH 
 
6.030 
8.750 
7.138 
0.413 
0.170 

COD 
mg/l 
920.000 
2764.000 
1602.200 
408.700 
167036.500 

TN 
mg/l 
14.820 
49.600 
31.562 
5.980 
35.765 










Table 5: Descriptive Statistics for 


Variable 

Units 
Min 
Max 
Mean 
St. Dev 
Var. 








pH 

 
7.560 
8.290 
7.951 
0.135 
0.0183 
COD 

mg/l 
14.600 
258.000 
83.48 
52.200 
2724.44 
TN 

mg/l 
2.690 
21.150 
8.356 
3.968 
15.747 








MATLAB provides four alternative functions for dividing data into training, validation and test sets. They are dividerand (the default), divideblock, divideint, and divideind. Normally the data division process is carried out automatically during the training process. In this study all the four data division functions are used alternatively in an attempt to increase the performance of the networks.
Table 6: List of MATLAB data division functions [11]
Function Algorithm
dividerand Divide the data randomly (default)
divideblock Divide the data into contiguous blocks
divideint Divide the data using an interleaved selection
divideind Divide the data by index
3.2.3 Model architecture selection
Model architecture determines the overall the structure and direction of information flow in the model. Generally, artificial neural network architectures are divided into feed forward and recurrent networks. In feed forward networks the information flows in one direction from the input layer to the output layer. In contrast to feed forward networks recurrent information moves in both forward and backward direction. Multilayer Perceptron (MLP) Networks, the most commonly used type of feedforward neural network, were used in this thesis work.
3.2.4 Network Structure Selection
Network structure, together with model network architecture, defines the functional form of the relationship between network inputs and outputs. The optimal network structure generally strikes a balance between generalization ability and network complexity. Optimal network structure was achieved by investigating
the effect of the following network characteristics on the network performance.
i.Network configuration: In this study two types of configurations were developed based on the pre processed data. pH, COD and TN of the raw influent wastewater were used as input variables to predict the quality of the treated effluent. A total of 13 models (9 SISO, 3 MISO and 1 MIMO configurations) were designed and evaluated in this study.
ii.Selection of number of hidden neuron: Determination of optimal network structure involves the selection of appropriate number of hidden neurons. Generally, a
iii.Selections of proper transfer function: Generally, the hyperbolic tangent and sigmoid functions are appropriate for most types of networks, especially for prediction problems. The hyperbolic tangent function was preferred over the log sigmoid function in this work for the following reasons:
a.The output varying from
18
Journal of Environmental Treatment Techniques 
2018, Volume 6, Issue 2, Pages: 
the log sigmoid function always has a positive response.
b.The slope of the hyperbolic tangent is much greater than the slope of the sigmoid function. Which means the hyperbolic tangent function is more sensitive to small changes in input.
iv.Training algorithm selection: It is difficult tasks to determine which training algorithm provides fast learning experience for a given problem. Training algorithm selection depends on many factors including complexity of the problem, the number of data points in the training set, the number of biases in the network, the error goal and to which purpose the network is being used. MATLAB provides 13 training algorithms for training of a neural network. Among which Bayesian Regularization algorithm (trainbr) was used in this study. Even if the algorithm takes more time for training compared to other algorithms, it has a good generalization capacity for difficult, small or noisy datasets [11].
3.2.5 Model Training
Model Training is an iterative process that seeks to modify the network through numerous presentations of data. Mainly, there are two type of model training; unsupervised and supervised training. Unsupervised training the model is only fed with the input values and uses it to adjust its connection weights. In the case of supervised data, feed with both input and target data values. Supervised mode of training was used for this study since it requires shorter time compared to the unsupervised training mode.
3.2.6 Model Evaluation
In order to determine which network structure is optimal, the performance of the calibrated models was evaluated. In this thesis work, Mean Squared Error (MSE) and Correlation coefficient (R) were used to evaluate the performance of the networks, where MSE (Mean Square Error) is the average square difference between outputs and targets. Lower MSE values are considered better and zero means no error. R values
measure the correlation between outputs and targets. An R value of 1 means a close relationship, 0 a random relationship.
����� = 
�� 
���������� −�������� 2 
( 0.1 ) 

��=1 
��2 



4 Results and Discussion
After
3.3pH prediction models
Over all 5 networks (3 SISO, 1 MISO and 1 MIMO)
models were developed to determine the optimum network topology for prediction of pH of the treated effluent. Table 7 below presents the statistical performance results of these models.
Based on the statistical performance of all the three configurations for prediction of pH, the MIMO model is the best by scoring R value of 0.920. This means, the MIMO model generalizes the data well and is likely to make accurate prediction when new data (data that is not from the training or testing set) is provided compared to the other models. The linear regression plot for the best performing models from each configuration is presented in Figure 3 below.
Table 7: Performance statistics for pH prediction models
Configuration 
Input 
Training 
Testing 
All 

MSE 
R 
MSE 
R 
R 






pH 
0.120 
0.276 
0.161 
0.421 
0.455 

SISO 
COD 
0.054 
0.805 
0.057 
0.747 
0.804 


TN 
0.135 
0.254 
0.178 
0.050 
0.172 

MISO 
pH, COD, TN 
0.060 
0.771 
0.070 
0.773 
0.771 

MIMO 
pH, COD, TN 
0.037 
0.921 
 
0.917 
0.920 
19
Journal of Environmental Treatment Techniques 
2018, Volume 6, Issue 2, Pages: 
Figure 3: Regression plot for best performing pH prediction (A)
(A) 










(B) 




























































































































































(C)
Figure 4: Comparison between predicted and actual data for pH prediction using (A)
MIMO models
20
Journal of Environmental Treatment Techniques 
2018, Volume 6, Issue 2, Pages: 
As it can be seen the comparison plot between the predicted and actual data and based on the MSE values of the models, the MIMO model indicates a good fit
compared to the other models. Therefore, the appropriate architecture for prediction of pH was determined to consist input layer with 3 neurons, a hidden layer with 21 neurons and 3 output layer neurons.
3.4COD prediction models
For predicting COD of the treated effluent 3 SISO
configuration models with different input, 1 MISO and 1 MIMO configuration models were developed. The statistical performance for best performing models from each configuration is presented in Table 8.
Among the models developed for prediction COD, the SISO model where COD is used as an input has shown excellent generalization and predictive efficiency by scoring R value of 0.9692. Even if the MISO and MIMO models have scored a lower R value compared to the SISO model, both the configurations have shown a good accuracy by scoring R values greater than 0.9 for
all the training and testing sets. These results were also supported by the regression plot and comparison plots presented in Figure 5 and 6 below respectively. Based on the R value and MSE value of the models the best topology for prediction of COD is a SISO configuration with 1 neuron in the input layer, 76 neurons in the hidden layer and 1 neuron in the output layer.
3.5TN prediction models
For predicting TN of the treated effluent 3 SISO
configuration models with different input, 1 MISO and 1 MIMO configuration models were developed. The statistical performance for best performing models from each configuration is presented in Table 9.
As presented in Table 9 above, the best performing model for prediction of TN of the treated effluent is the MIMO model with topology of
Table 8: Performance Statistics for COD prediction models
Configuration 
Input 
Training 
Testing 
All 

MSE 
R 
MSE 
R 
R 






pH 
0.202 
0.136 
0.181 
0.247 
0.305 

SISO 
COD 
0.007 
0.982 
0.057 
0.885 
0.970 




TN 
0.111 
0.305 
0.092 
0.079 
0.275 

MISO 
pH, COD, TN 
0.022 
0.941 
0.015 
0.942 
0.945 

MIMO 
pH, COD, TN 
0.037 
0.921 
 
0.917 
0.920 
Figure 5: Regression plot for best performing COD prediction (A)
21
Journal of Environmental Treatment Techniques 



2018, Volume 6, Issue 2, Pages: 













































































































(A)
(B)
(C)
Figure 6: Comparison between predicted and actual data for COD prediction using (A)
and (C) MIMO models
22
Journal of Environmental Treatment Techniques 
2018, Volume 6, Issue 2, Pages: 
Table 9: Performance Statistics for TN prediction models
Configuration 
Input 
Training 
Testing 
All 

MSE 
R 
MSE 
R 
R 






pH 
0.133 
0.340 
0.184 
0.729 
0.330 

SISO 
COD 
0.024 
0.937 
0.131 
0.779 
0.913 




TN 
0.145 
0.229 
0.172 
0.024 
0.208 

MISO 
pH, COD, TN 
0.035 
0.911 
0.033 
0.817 
0.907 

MIMO 
pH, COD, TN 
0.037 
0.921 
 
0.917 
0.920 
Figure 7: Regression plot for best performing TN prediction (A)
23
Journal of Environmental Treatment Techniques 



2018, Volume 6, Issue 2, Pages: 









































































































































































(A)
(B)
(C)
Figure 8: Comparison between predicted and actual data for TN prediction using (A) CODin
(C) MIMO models
24
Journal of Environmental Treatment Techniques 
2018, Volume 6, Issue 2, Pages: 
5 Conclusion
The performance prediction of wastewater treatment processes is important in order to keep the system stable under a wide range of circumstances. This thesis work presented a step by step procedure for developing a neural network performance predictive model for Habesha Brewery’s WTP by using ANN. During the model development the raw influent wastewater quality were used as input variables to develop 13 distinct SISO, MISO and MIMO network models for prediction of treated effluent water quality. The raw data obtained from the treatment plant were analyzed and pre processed before they were used for training and evaluating the networks. Trial and error method was used to identify the best performing network topology. Based on the observed results of the models developed, it can be concluded that the outputs of the models are in very good agreement with the raw data obtained from the WTP laboratory. Generally, based on the performance statistical results, the MIMO model have shown a better predictive performance compared to the SISO and MISO configuration by scoring R value of 0.9201. In the case of the input variables, the models with COD as input variables have a better quality of prediction and accuracy than the models where pH and TN are used as input variable. Model architectures
References
1.Fezzi, M., A Pragmatic Approach to Wastewater Treatment Modelling: The Kallby Wastewater
Treatment Plant as a Case Study, in Department of Chemical Engineering. 2015, Lund University: Lund, Seweden. p. 121.
2.Dairi, S., et al., Dynamic Simulation for the requirements of oxygen about the Municipal Wastewater Treatment Plant Case of
3.Zhang, Q. and S.J. Stanley,
4.Vandejerckhove, A., W. Moerman, and S.W.H. Hulle,
5.
6.Moral, H., Modeling of Activated Sludge Process by using Artificial Neural Networks, in Department of Environmental Engineering. 2004, Middle East Technical University: Ankara, Turkey. p. 1126.
7.Gernaey, K.V., et al., Activated sludge wastewater treatment plant modelling and simulation: state of the art. Journal of Environmental Modelling and Software, 2004. 19: p.
8.Zhang, Q.J., Artiﬁcial Neural
9.Senthilraja, K., P. Jothimani, and G. Rajannan, Effect of brewery wastewater on growth and physiological changes in maize, sunflower and sesame crops. International Journal for Life Science and Educational Research, 2013. 1(1): p.
10.Fikresilasie, T., Impact of Brewery Effluent on River Water Quality: The Case of Meta abo Brewery Factory and Finchewa River in Sebeta, Ethiopia, in School of Graduate Studies. 2011, Addis Ababa University: Addis Ababa. p. 85.
11.MathWorks, Neural Network Toolbox User's GuideTM. 2017, MathWorks Inc.: Massachusetts, USA.
25