Performance of neural networks in forecasting short range occurrence of rainfall

: The performance of artiﬁcial neural networks in forecasting short range (3-6 hourly) occurrence of rainfall is presented. Feature sets extracted from both surface level weather parameters and satellite images were used in developing the networks. The study was limited to forecasting the weather over Colombo (79°52’ E, 6°54’ N), the capital of Sri Lanka. From the available ground level weather parameters, a total of seven parameters, namely, pressure, temperature, dew point, wind direction, wind speed, cloud amount and rainfall were selected for the present study. From satellite images, four types of images viz., visible image of clouds, infrared image of clouds, infrared colour image of clouds and water vapour image of clouds were used. The best performance was observed for hybrid models that combine ground level and satellite observations, with 75% accuracy for short range forecasting. A strong seasonal dependence in the accuracy of forecasting linked to monsoons was observed.


INTRODUCTION
Sri Lanka is an island in the Indian subcontinent situated within longitudes 79 o 42' E to 81 o 52' E and latitudes 5 o 55' N to 9 o 50' N. The topography of the country consists of central highlands reaching above 300 m in the south central part of the island surrounded by large lowland planes extending up to the coastal areas. The total land area of the country is 65,610 km 2 . In the central highlands the average temperature may be as low as 15 o C, while in the coastal areas it may be as high as 28 ºC. Due to the topography and the tropical location, Sri Lanka receives a fair amount of rain throughout the year. The two main rainfall seasons of the country are the South-West monsoon period from May to September and the North-East monsoon period from December to February. Based on the amount of rainfall it receives, the country is divided into two zones, the wet zone and the dry zone (Domroes & Ranatunge, 1993). In general the wet zone receives rainfall throughout the year while the dry zone experiences long dry periods.
Although Sri Lanka is relatively free from frequent harsh weather conditions and free of arid areas, the occasional variability of weather causes problems due to unexpected heavy rain, floods, intense lightning activities, drought conditions, landslides and high winds. The economy of the country connected to aviation and shipping depends mainly on tourism and exports of tea, apparels and agricultural products. Therefore, short range rainfall forecasting has applications in a number of areas. Being prepared for natural disasters, which may occur within a very short period of time causing severe damage to property and human lives, will help to boost the country's economic growth and improve the safety of communities.
In the past, several statistical (Punyawardena & Kulasiri, 1997;Perera et al., 2002) and computational (Kumarasiri & Sonnadara, 2008;Weerasinghe et al., 2010) models have been developed to forecast the rainfall over Sri Lanka. However, no research has been reported in literature on short range rainfall forecasting. The main reason could be the lack of availability of consistent and reliable measurements on upper level weather parameters to develop short range weather forecasting models. At present, the Meteorological Department of Sri Lanka uses numerical weather predictions downloaded from a public domain for short range weather forecasts (Premalal, 2005). The accuracy of these predictions is low due to poor resolution in public domain data, coupled with variability of rainfall caused by orographic features of the island.

Related work
Many methods have been developed in the past to forecast weather, ranging from simple numerical models to complex computational models. Artificial neural networks (ANN) became popular in meteorological forecasts especially due to their ability to handle unstructured and noisy data. There are a number of different architectures to build neural network models for forecasting. While some researchers claim that there is no extensible advantage using a particular model (Luk et al., 2001), others have shown that some models are superior (Maqsood et al., 2004). Radial basis function (RBF) networks have shown a greater ability to forecast in diverse areas (Lee, 1991;Finan et al., 1996;Dawson et al., 2002;Chiradeja & Ngaopitakkul., 2009).
Due to the variability of weather, predicting rainfall with high accuracy using a general model is quite difficult. In order to overcome this problem, different solutions have emerged for each situation; models from very short range forecasting (McCann, 1992;Kuligowski & Barros, 1998) to long range (Guhathakurta, 1997;Guhathakurta et al., 1999), as well as from local (Karmakar et al., 2008) and mesoscale (Purdom, 1976;Gel et al., 2004) prediction models to large geographical area prediction models (Grimes et al., 2003;Coppola et al., 2006). The conventional models, which use surface observations, have been improved by adding upper level observations to achieve higher accuracy. As a result, forecasting models were designed combining radar and satellite data (Browning, 1979;Reynolds et al., 1979;Austin et al., 1982), as well as satellite and ground data (McCullagh et al., 1995), while others have combined different sources to build improved forecasting models (Wilheit et al., 1991;Ganguly, 2002).
This study focuses on the development of forecasting models using ANN for the occurrence of short range rainfall. The models were developed for Colombo, the capital of Sri Lanka. Colombo is situated in the South-Western coast of the island, which is in the wet zone defined by the longitude 79 o 52' E and the latitude 6 o 54' N. The main objective of the present work was to test the reliability of rainfall forecasts made 3-6 hours prior to the rain (commonly known as nowcasting).
Artificial neural networks (Bishop, 1995) is a computational model inspired by the architecture of the human brain that learns to perform functions rather than having functions programmed into it. The model has a number of artificial neurons connected to each other, and is similar to a human brain. They act like the biological neurons, where all neurons process a small proportion of data in parallel. The function of a neural network, which is also similar to the human brain, undergoes an extensive and complex training and learning process to understand patterns by identifying their features. ANNs learn from experience, generalize from previous examples to new ones and extract essential characteristics from inputs containing irrelevant data. A major advantage of the neural technique is its ability to tolerate unstructured data. Neural networks are used in some of the main areas of computing such as pattern classification, function approximation and time-series forecasting. Inspired by its capabilities, researchers have developed network models with unique characteristics such as back propagation (Chauvin & Rumelhart, 1995), radial basis functions (Bors, 2001), growing cell structures (Fritzke, 1994) and self-organizing feature maps (Kohonen, 1989).
Radial basis function (RBF) networks were developed as a powerful alternative to the multi-layered perception (MLP) networks. Broomhead and Lowe (1988) were the first to use the radial-basis functions in the design of neural networks, after it was introduced by Powell (1985) as a solution for the real multivariate interpolation problem. RBF networks with diverse applications are becoming increasingly popular and is probably the main rival to the MLP. RBF networks are superior to other neural network approaches due to their capability of approximating nonlinear mappings effectively and handling the localminima problem (Sahin, 1997). The training time of RBF networks is quite low when compared with other neural network models and they are quite successful in identifying regions of sample data not classified into classes.
RBF network is a feed forward network consisting of three layers; an input layer, an output layer and a hidden layer. The basic architecture of an RBF network is shown in Figure 1. The input layer is made up of source nodes (which represent each set of predictor variables as an input) that connect the network to its environment. The input neurons feed values to each of the neurons in the hidden layer. The second layer or the hidden layer performs a non-linear mapping from the input space into a (usually) higher dimensional space in which the patterns become separable. This layer consists of a variable number of neurons. Each neuron consists of a radial basis function centered on a point with as many dimensions as the predictor variables. The spread (radius) of the RBF function may differ for each dimension. The centers, spreads and the optimal number of neurons for the layer are determined by the training process. The structure of the final layer (output layer) is decided by the expected outcome of the network. It performs a simple weighted sum, which is a linear mapping to create the output.
In the training process, an RBF network should determine three parameters for the radial function; center (c), distance measure (r) and shape (φ). For each neuron in the hidden layer, the weights (w ij ) represent the coordinates of the centre of the cluster (c) and r is the distance from the cluster centre, which is usually the Euclidean distance measured from the cluster centre and is found using the following equation for the input pattern x, ... (1) The shape (φ) of the radial-basis can be Gaussian, Hardy multiquadratic, inverse multiquadratic or any relevant radial function. However, the most commonly used radial-basis function is the Gaussian function, which generates the outcomes of the hidden units according to equation 2.
... (2) The variable sigma (σ) defines the width or radius of the bell-shape and is determined empirically. The final output of the j th hidden neuron is the value of the RBF φ j (x). The output value of the p th neuron is given as a linear combination of the RBFs according to equation 3.

Model development
Meteorological activities in Sri Lanka started in the mid 1860s with the establishment of a network of rainfall gauges. The Department of Meteorology, established in 1948, is responsible for collecting and maintaining weather data and providing services to government agencies, the private sector and the general public. Twenty four (24) hour weather observations are maintained at 12 meteorological stations and sixteen (16) hour weather observations are maintained at 8 meteorological stations. All these stations report meteorological parameters such as rainfall, pressure, temperature, humidity and wind speed at 3 h intervals to the National Meteorological Centre in Colombo. Since data obtained at 3 h intervals are not adequate to develop very short range forecasting models, an additional data set was extracted from satellite images available in a public domain to obtain details about recent weather conditions. The satellite images are available every 30 min.
The proposed model uses time series data from weather parameters. The model uses the past states of weather (x t-n ) up to current time (x t ) in order to predict a future state (x t+d ). To implement the required function f for forecasting, the architecture of the nowcasting model uses RBF networks as in the configuration shown in Figure 2.
The paper focuses on 10 short range forecasting (nowcasting) models developed under two broad categories, namely 3 h prior and 6 h prior to rain occurrence. The parameters for function f vary according to the forecasting models. The value for d is one for 3 h prior forecasting models and two for 6 h prior forecasting models. i.e., two models are forecasting one step prior and two steps prior forecasting for rainfall occurrences. The models have only one output node, which indicates the presence or the absence of rain. For developed models, the outcome of the RBF network can be represented as shown in equation 4. qu ... (4) where w is the weight (w 0 is the bias for the network) and φ i (x) are the fixed-basis functions (φ 0 =1).
To develop neural network models, both ground level weather parameters and satellite images have been used. From the available ground level weather parameters, a total of seven parameters have been selected for the present study. The study was limited to the weather observations recorded at the Colombo, Katunayaka, Ratmalana and Ratnapura meteorological stations ( Table 1). The frequency of observations was once every 3 h. The selected weather parameters are; pressure, temperature, dew point, wind direction, wind speed, cloud amount and rainfall. To include upper atmosphere information, three categories of satellite images (visible images, infrared images and water vapour images) were used. All images were taken from the geostationary meteorological satellite METEOSAT-7 operated by the European Organization for the Exploration of Meteorological Satellites (EUMETSAT). The satellite images are available for a time interval of 30 min. Four types of images were selected for this work, namely, visible image of clouds, infrared (IR) image of clouds, infrared colour image of clouds and water vapour image of clouds ( Figure 3).
The colour values of the satellite images have been used as the features for the upper atmosphere. The satellite images used in this work have the scaling of 1 pixel equivalent to 10 km at ground level. Due to the low resolution in the satellite images, finding the exact location/area to extract features is difficult. In the present work, the location was calculated relative to the marked longitudes and latitudes in the image. However, due to the outline of the country drawn on the selected satellite images, features of some pixels over a location such as Colombo, which is situated along the Western coast of the island, are not available. To avoid this problem, the average value of the neighbouring pixels was used as the feature value of the altered pixels.
It was necessary to pre-process some of the parameters before feeding them into the neural networks. The following steps were taken in order to pre-process the parameters.
Colour values of the satellite images (infrared 1. image, infrared colour image and water vapour image) were converted to temperature values. Colour values of the visible satellite images were 2.
converted into cloud amount. Rainfall was divided into two classes, dry and 3.
wet using a threshold value. A threshold value of 0.1 mm, which is the least measurable value, was used in this work to define the dry and wet conditions.
Missing data values were estimated using a 4. time series algorithm (piecewise cubic Hermite interpolation).
The best models for forecasting occurrence of rainfall at a short range for the selected area are chosen after investigating the accuracy of many models which were built using different feature sets and neural network architectures. The architecture of the complete forecasting system is shown in Figure 4. The forecasting system is divided into smaller modules and developed  At each level, ANN models were fine-tuned to obtain optimal accuracy by varying the spread-constant and input window size. The best ANN structure and the input window size were chosen to obtain the best accuracy at each level. For the developed models, the expected network outcomes were 0 and 1 for dry and wet, respectively. After many trials, it was found that the best prediction success rate can be achieved by using the threshold value of 0.5 to separate the classes. Therefore, the network outcome can be represented at time t as;

Meteorological station
... (5) where y t is the output class and x t is the output value generated by the RBF network model.

Single station model (SSM):
This model uses ground level parameters recorded at a single station for short range forecasting. The model was built to study the effects of using the lower atmospheric parameters in short range forecasting. Weather data obtained from the Colombo meteorological station was used in developing this model.

Satellite image model (SIM):
This model uses only satellite data for short range forecasting. The objective was to study the effect of higher atmospheric parameters in short range forecasting. The model is based on 3 hourly ground data combined with 30 min satellite image data (six satellite images for one record in the ground data) with four feature variables to represent four types of satellite images (water vapour image, infrared colour image, visible image and infrared image). One major  issue when using visible images was the availability period of images. The images were only valid during the daytime, from 6.00 A.M. to 5.30 P.M. local time. Initial work showed higher accuracy when the visible image was included. In order to predict 24 hour forecasts, the empty data slots for night time were filled by interpolating (piecewise cubic hermite interpolation) the values from available images. Thus, the model continued even at night time with all four feature variables.

Neighbour model (NM):
The Neighbour model was developed using the ground level parameters from multiple meteorological stations. The objective was to study the effect of neighbouring weather conditions in short range forecasting. In addition to the target meteorology station, the model uses weather parameters available at the meteorological stations in the vicinity. The target meteorological station was Colombo. As neighbours, data from the Katunayaka, Ratmalana and Ratnapura meteorology stations were used.

Hybrid single station model (HSSM):
This model forecasts rainfall by combining both ground level parameters from the Colombo meteorological station and the satellite data. The objective of the model was to use both lower and upper atmospheric parameters available at a specific location for short range forecasting. The model uses both ground level data and satellite data according to the outcomes of the initial models.

Hybrid neighbour model (HNM):
In this model, short range forecasting was done by using ground level parameters from multiple meteorological stations as well as satellite data. The objective of the model was to study the effect of regional weather conditions in short range forecasting for Colombo. The model also uses satellite images to extract the upper atmospheric condition. Hence, the model considers both upper and lower atmospheric conditions for predictions.
All the models were independently developed and optimized for short rang forecasting for both 3 hours and 6 hours prior to occurence of rain. The models have been developed with radial basis function neural networks. In this work, a spiral model development life cycle was adopted. The models were tested, and based on the outcome the input window size and spread-constant were selected. At the end of the development cycle, the optimal input window size and the ANN structure for each model were finalized.
For training and evaluation, two separate datasets were used for both lower atmospheric data (surface level observations) and upper atmospheric data (satellite images). For ground level information, a 3 hour dataset for 18 months starting from January 2008 was used. For satellite images, four types of satellite images taken during a 10 month period starting from August 2008 were used. The dataset contained 562 rain data and 3574 no-rain data (using a 0.1 mm threshold). For training, the same number of rain data and no-rain data were selected to avoid network bias to one class. Approximately 65% of rain data and the same number of no-rain data from the initial dataset were selected randomly for training while the others were used for testing.
The training and testing were carried out in two phases. In the first phase, the developed models were trained and tested using the full dataset. In the second phase, due to the behaviour differences in the rain pattern in each season (South-West monsoon, North-East monsoon and inter-monsoon), the models were tested using seasonal data (following the same data selection mechanism for seasons). The percentage seasonal occurrence of rainfall is shown in Table 2.
For evolution of the models, two indicators were used, which are the prediction success rate and the root mean square (RMS) error. The prediction success rate (accuracy) was calculated as a percentage value of the number of correct predictions over the total number in the test sample. In addition, the network performance was also tested by using the RMS error for each nowcasting model. The RMS error varies between 0, which represent 100% success and 1, which represents 0% success. Equations 6 and 7 show the definitions used.  ... (6) where, n is the number of correct predictions and N is the total number in the test sample.
... (7) where, X p is the predicted output, X e is the expected output and n is the total number of predictions.
In the training process, initially, the mean squared error target and maximum number of epochs were fixed at 10 -5 and 1000, respectively. The training mechanism consisted of two training stages. In the first stage, RBF network was trained for a given input and output vectors (supervised training). The network was trained until its mean squared error reached the target. The network undergoes several training cycles for the same network configuration with different training and testing samples taken from the main dataset. The process was repeated through a set of varying spreads (spread of radial basis functions) until the optimum accuracy was obtained.
In the second stage, the first stage was repeated for increasing input window sizes until the optimum window size was found. The network with optimum accuracy for the optimum window size was selected as the final RBF network for the nowcasting model. The same procedure was continued for all nowcasting models. The statistics calculated for the final RBF network were considered as the statistical outcome of the nowcasting model. Figure 5. For all short range forecasting models, three hours prior models have nearly equal or higher net prediction success rates compared to the six hours prior models. For all short range forecasting models, the ANNs have an overall prediction success rate of more than 60% for both three hours and six hours prior forecasting. When only no-rain classification was compared, all ANN models show over 70% prediction success rate in three hours prior predictions. The best was seen for the hybrid neighbour model with an accuracy of 82%. For predicting rain, the lowest accuracy with a prediction success rate of 53% is shown by the neighbour model. When comparing the variances of predicting rain and no-rain, rain prediction has a large variance than norain. A similar behaviour can be seen for the six hours prior forecasting. The best for no-rain was the neighbour model with an accuracy of 75%. However, for predicting rain, it is the lowest with a prediction success rate of around 52%. Overall, the neighbour model showed the lowest accuracy for rain prediction for both the three hours and six hours forecasting categories (54% and 52%, respectively), while the hybrid single station model showed the highest prediction for both three hours and six hours forecasting categories (80% and 75%, respectively). In general, rain prediction had higher fluctuations among models.

Comparison of the prediction success rates of ANNs is shown in
When satellite data are used, an improved accuracy was seen for all ANN models with 75% and 74% success rates for three hours and six hours forecasting. For both three hours and six hours prior forecasting, the best overall results were observed for the hybrid single station model. According to the results, the most successful overall performance was seen from the hybrid satellite model for both three hour and six hour prior short range forecasting. This could be due to the detection of cloud formation by satellite images (which are captured in 30 minute intervals) when there is a high probability of rainfall occurrence.
A large variation of performance was seen among nowcasting models when compared with the difference in the two forecasting categories. The network performance is better for the short term prediction compared to the long term prediction for each model. This is expected since the reliability of predictions reduces with the increase in time span. When considering the performances, the hybrid single station model shows the best performance with the lowest RMS error of 0.5, while the neighbour model shows the highest RMS error of 0.6. The observed performance differences between the main models and hybrid models suggest that the use of multilevel atmospheric observations increases the reliability of forecasting models.
Since the pattern of wet and dry behaviour changes with the seasons which can easily affect the developed models, the prediction success rate was calculated separately for rain and no-rain conditions for the North-East, South-West and Inter-Monsoon seasons using the forecasting models for three hours prior forecasting. The results are shown in Figure 6.
It can be seen that in general, compared to the South-West monsoon season during which Colombo receives a major share of its rain, during other seasons, the prediction success rate is higher with nowcasting. An explanation can be found in the rainfall occurrence statistics. Although the South-West monsoon season has a high number of rain occurrences, often the amount of rainfall is less compared to other seasons. As shown in Table 2, the average rainfall for other seasons is about twice as much as for the South-West monsoon. Under these conditions forecasting models are suitable for other seasons due to the strong indication of occurrence or non occurence of rain. When analyzing the results, it can be seen that the HSSM model has predicted dry conditions better, while the NM did the opposite. The outcomes of the seasonal forecasting also indicate that predicting rain has a larger variance (29%) among forecasting models for seasons when compared with the variance of no-rain (7%). The RMS error indicates that the neighbour model has the worst performance for all seasonal predictions while the satellite image model achieves the best. According to the predictions, noticeable differences in accuracy can be seen using a shared forecasting model for all seasons. Due to the lack of data to build seasonal models, the current research was limited to building a shared model for all seasons. Thus, to obtain better results, the network can be trained to predict the seasons separately. Prediction success rate

CONCLUSION
The results clearly show that satellite images can be used to improve the outcome of nowcasting models. This could be due to the shorter time interval between images which allows the capture of dynamic features of the atmosphere when compared with the interval surface level observations of three hours, which is a more generalized representation of the atmospheric conditions. Thus, by extracting more features from satellite images better forecasting models can be developed. Prediction could be improved using information such as cloud motion, cloud formation and cloud texture.
Usually, seasonal models will minimize the confusion generated by different parameters such as wind direction, cloud formation and cloud movement, which behave uniquely during the seasons. The present work used a timeline to decide the seasons. The timeline boundaries of the seasons may not be the best seasonal boundaries for rainfall. Enhanced methods such as clustering algorithms (which consider the rainfall patterns) can be used to improve the seasonal boundaries in future studies, which in turn can improve the predictions.
Another important factor in building nowcasting models is the time of the day. In the dataset 58% of rain occurrences were during the night time (5.30 P.M. to 5.30 A.M.) and the remaining 42% of rain occurrences were in the daytime (5.30 A.M. and 5.30 P.M.). When considering the input window sizes of the developed models, weather variables that have different behaviours during the night and the day (for example, temperature, wind and cloud patterns) can affect the outcomes of prediction models. Furthermore, during the morning to early afternoon session, very little (about 26%) rainfall had occurred. Since visible images are not available at night time, although interpolation techniques are used, the prediction accuracy of the neural network model may be reduced.
As the study shows, the neighbour model did not challenge the other models when considering the net-accuracy level. The reason could be the spatial disparities. Colombo, Ratmalana and Katunayake are closer to the coast while Rathnapura is more towards the central mountains, which have a different topography. Another factor is their locations where all the neighbours are on one side of the target station (Colombo). Having closer neighbours that share the same spatial and temporal variability of rainfall can change the situation. Therefore, sub-stations or agromet stations can be chosen as neighbours in order to achieve a higher result for the model.
It should be noted that the developed neural network models capture the persistence in input parameters rather than underlying physical processes. This is clearly evident from the improvement seen in the models for short range forecasts when satellite images are introduced as inputs.