Journal of Environmental Treatment Techniques  
2020, Volume 8, Issue 2, Pages: 770-778  
J. Environ. Treat. Tech.  
ISSN: 2309-1185  
Journal web link:  
Data Understanding for Flash Flood Prediction  
in Urban Areas  
Nur Shuhada Abdul Malek , Syamil Zayid , Zaifulasraf Ahmad , Suraya Yaacob , Nur  
Azaliah Abu Bakar 5  
Razak Faculty of Technology and Informatics, Universiti Teknologi Malaysia, 54100 Kuala Lumpur, Malaysia  
Received: 19/11/2019  
Accepted: 18/02/2020  
Published: 20/05/2020  
Flash flood has become one of the major disastrous events, especially in urban areas in Malaysia. It has become more prominent  
to city dwellers, causing massive loss of infrastructures, damage to people, and disruption in business and daily activities.  
Population growth and rapid development of urban areas have worsened the situation even more. Since the era of Big Data, the  
possibility to analyse complex data coming from heterogeneous sources, which can be used to predict flash flood, has given a  
different perspective and hope for finding innovative ways to reduce the impact of flood, especially in urban areas. The purpose of  
this study is to understand data needed to produce predictive visual analytics for flash flood forecasting using Cross-Industry  
Standard Process for Data Mining (CRISP-DM) Methodology. Focusing on understanding the flash flood data, this paper intends  
to characterize data pertaining to disaster management and identify the right data that can facilitate more accurate decision making  
by stakeholders. Literature review was done to determine which data are needed in the Malaysian urban setting. The research found  
the critical factors for determining flash flood occurrence in Malaysia are unique due to the tropical climate and urbanization.  
Therefore, it is important to understand and characterize these factors for more effective and accurate data collection and predictive  
analytics later. Based on the findings, the most significant factors identified for flash flood prediction are rainfall, urbanization, and  
fluvial flood which eventually lead to blocked drainage. Details of data under these categories will be analysed as part of data  
understanding of flash flood occurrence. This study intends to uncover the potential of using Predictive Visual Analytics in flood  
forecasting and also to discuss how prediction can bring values to the Malaysian environment and create a sustainable ecosystem.  
Keywords: Flash Flood, Disaster, Predictive Analytics, Data Understanding  
flood occurrence: i) rainfall and climatic changes, ii) urban  
changes and anthropogenic activities, iii) network and  
catchment factors, and iv) geomorphological features.  
According to (1), the climatic changes have a major impact  
on the rainfall intensity. The climate is one of the major  
factors that will lead to the rainfall intensity which has the  
potential to cause flood. In relation to the flash flood issue in  
Malaysia, this research intends to implement the big data  
analytic approach in prediction of flood using the occurrence  
pattern of flash flood in Malaysia. Big data has a good  
potential to play an important role in generating  
knowledgeable information that ignite business and  
government interest in driving towards better decision  
making and mitigation plan for flood situation. This is the  
case when paying attention to findings of (2) indicating that  
the inefficient solid waste management is also one of the  
contributors to flood situation.  
Flash flood is a sudden and rapid flooding of low-lying  
areas. It is one of the hazards that may cause loss of life,  
injury, and destroyed or damaged assets, which could affect  
the society, economy, and environment (1). Due to monsoon  
season and heavy rainfall, Malaysia is exposed to the flash  
flood risk every year. This condition becomes worse when  
rapid urbanization takes place. As an example, Kuala  
Lumpur is well known as the capital and the largest city in  
Malaysia. According to Department of statistic Malaysia, the  
population of Kuala Lumpur is estimated at 1.78 million as  
of 2019. It is among the fastest growing metropolitan regions  
in South-East Asia in terms of population, economic, and  
social development. Nevertheless, in terms of climate, Kuala  
Lumpur is one of the equatorial regions often beset by  
prolonged rainfall and storms that eventually lead to flood.  
Several factors cause flash flood situation. Specifically,  
in case of Kuala Lumpur that is a big city located in a tropical  
country, the research found four factors contributing to flash  
Big Data usually known as 5Vs which are volume,  
variety, velocity, veracity, and value. As for big data  
Corresponding author: Suraya Ya’acob, Razak Faculty of Technology and Informatics, Universiti Teknologi Malaysia, 54100  
Kuala Lumpur, Malaysia. E-mail:  
Journal of Environmental Treatment Techniques  
2020, Volume 8, Issue 2, Pages: 770-778  
definition, these 5Vs can be translated as a big volume of  
data collection that has velocity of complexities from variety  
instruments and sources which has the potential veracity and  
value to be used for data-driven decision makers in the  
organizations. To discover the required data and to get  
information, data analysis approaches such as descriptive,  
predictive, and prescriptive are used. Therefore, with the  
result obtained from the analysis, data are used to generate  
useful information supporting better decision making. As for  
this research, the implementation of Big Data Analytics is to  
predict the flash flood occurrence supports sustainability of  
cities and community in Malaysia. Therefore, the present  
study intends to understand and characterize data pertaining  
to flood situations and identify the right data that can  
facilitate more accurate decision making regarding to flash  
flood situation. At the end, it can help to reduce flash flood  
impact on society and economy in urban areas such as Kuala  
unseen data or information.  
1.2 Impact of Flash Flood Prediction  
The application of big data to the prediction of flood can  
reduce many hazards to the environment, society, and  
economy. An early identification of hazards will help to  
manage residual risks. Authorities can formulate more  
comprehensive mitigation plans by taking into consideration  
three different entities: i) citizens affected by the flash flood,  
ii) the agencies that are expected to manage the flash flood,  
and iii) the risk reduction experts. At the end, the initiative  
practically taken can reduce risk of flash flood to the  
economy, people, process, technology, society, ecology, and  
environment in Malaysia.  
In brief, the research objective is to reduce flash flood  
risk in Kuala Lumpur. In order to do that, the researchers  
intend to develop predictive visual analytics based on the  
environment challenges as shown in Table 1.  
.1 More demand for Flash Flood Prediction.  
Table 1: Research Summary  
In (3), it was found that, in recent years, more researchers  
Elements Description  
have started to explore and integrate the predictive analytics  
techniques of data exploration into visual analytics systems.  
The capabilities of predictive analytics in reducing risk,  
making more intelligent decisions, and generating different  
customer experiences have attracted a lot of industrial  
players implement it in their business (4). Another study (5)  
investigated how the visual analytics community has  
recently focused on building interactive visualizations and  
associating them with predictive analytics methods. It was  
achieved through translating real data into the knowledge for  
their business decisions. The descriptive and predictive  
analytics are among four categories from business analytics.  
They differ in the way the use data: the descriptive analytics  
summarizes what has happened by collecting relevant data,  
storing them, and presenting the information to find the  
trends (4); on the other hand, the predictive analytics goes  
beyond that; it purposely provides insight into what and why  
will happen in the future by analysing the current and  
historical data (4).  
Predictive analytics uses machine learning algorithms  
and statistical analysis techniques to predict flash flood  
future trends (36). In more details, the authors in (6) found  
that predictive analytics can provide the organization or  
business with a better understanding of what people need  
and, at the same time, helps to identify potential mitigation  
In (37), an interactive visual analytics application was  
proposed to be combined with automatic predictive visual  
analytics supported by domain experts to investigate the  
environmental conditions. The automatic predictive analysis  
has achieved results of a high level of accuracy (5). A  
predictive model can forecast the buying patterns, potential  
risks, and possible prospects through providing a deep  
understanding of the customers (38), and according to (39),  
it is capable of foreseeing possible future stories from the  
past similar cases via a predictive analysis system. The  
researchers in (40) mentioned that, in the current era of big  
data, machine learning, and Artificial Intelligence (AI),  
many visual analytics systems have been recently used to  
produce different predictive models applicable to defining  
Developing predictive visual analytics to reduce  
flash flood risk in Kuala Lumpur, Malaysia.  
Flash flood has risk of loss of life, injury, and  
destroyed or damaged assets, which could occur  
to society, economy, and environment.  
Rainy season throughout the year.  
Malaysia and, more specifically, the urban areas.  
In this case the research focuses on Kuala  
Lumpur as a capital city and Selangor as most  
developed state in Malaysia.  
Affected environment and economy, urban  
regions, people’s health, and business and social  
To identify and characterize factors that  
contribute to the flash flood in the Malaysian  
Due to the Cross-Industry Standard Process for Data  
Mining (CRISP) methodology, there are six phases to  
implement predictive analytics. Due to environment-related  
content, this paper will focus more on phase 2, i.e., data  
understanding. The research found the importance of data  
understanding as to get the knowledge about the data, the  
needs that the data will satisfy, the availability, the  
requirements and the sources of the data. Furthermore, data  
understanding is important to connect between the flash-  
flood environment context and analytical preparation need to  
be done later. In data understanding phase, this study will  
identify potential of environmental data to be used; then, it  
characterizes and describes each data for predictive  
modelling in the next phase. The rest of the present paper is  
organized as follow. Section 1 introduces the research as a  
whole. Section 2 provide a background of the study