Performance of neural networks in forecasting daily precipitation using multiple sources

The effectiveness of neural network based models in forecasting daily precipitation, based on ground level measurements obtained from a cluster of weather stations in the dry zone of Sri Lanka, is presented. The implemented networks were based on a feed-forward back-propagation technique. A cluster of ten neighbouring weather stations having 30 years of daily precipitation data (1970 – 1999) was used in training and testing the models. Twenty years of daily precipitation data were used to train the networks while ten years of daily precipitation data were used to test the effectiveness of the models. One model was developed to forecast the precipitation occurrences such as ‘rain’ or ‘no rain’, while another model was developed to predict the amount of precipitation at several sub levels using fuzzy techniques. Overall, the models were able to predict the occurrence of daily precipitation with an accuracy of 79±3%. Only the nearest neighbours contributed to improving the accuracy of predictions. In the dry zone, the accuracy of predicting the dry days was superior compared to predicting wet days except during the rainy season. Fuzzy classification produced a higher accuracy in predicting 'trace' precipitation than other categories.


INTRODUCTION
In Sri Lanka the economy depends to a large extent on agriculture production which is directly linked to the availability of water in different regions.Precipitation is the most important meteorological parameter that determines the crop-water requirement.Due to its tropical climate, Sri Lanka receives precipitation throughout the year.Based on the amount of precipitation it receives, the country is divided into -the wet zone and the dry zone 1 .Since the majority of crops are cultivated in the dry zone of the country, rainwater availability is a critical factor.
Chena cultivation, which is widespread in the dry zone, depends greatly on the timely reception of precipitation.Compared to the wet zone, the dry zone receives less precipitation (between 1200 and 1900 mm annually).Much of the rain in the dry zone falls during the months, October to February [2][3] .Weather, in particular forecasting daily precipitation is therefore of great importance.
In general, stochastic models or numerical models are used in weather forecasting.Numerical models have not yet been developed sufficiently to forecast precipitation in Sri Lanka.However, in recent years, neural network based models have been gaining popularity over statistical models, most probably due to the simplicity in modelling complex problems when many parameters are taken into consideration.Neural network based models applied successfully to a variety of water related problems in different parts of the world such as rainfall run-off modelling, river flow predictions, forecasting floods, typhoons and droughts are available in literature [4][5][6][7][8] .
Studies related to forecasting precipitation in Sri Lanka are scarce.In the recent past, a few research studies have attempted to forecast the occurrence of precipitation in Sri Lanka using statistical techniques as well as neural techniques [9][10][11][12] .The statistical approaches have used the Markov technique to predict the occurrence of precipitation in the dry and wet zones of Sri Lanka.These studies reveal higher prediction accuracy for the dry zone in comparison to the wet zone.Two recent studies have utilized feed-forward back-propagation neural networks to predict the occurrence of precipitation in the Colombo meteorological station 11,12 .These two studies discuss how neural networks can be utilized to predict the occurrence of precipitation using multiple input parameters such as precipitation, humidity, wind, temperature, etc.
Since the variability in daily precipitation is high, it is quite interesting to study how the neighbouring stations assist in strengthening the forecasts.Although there are many published research studies based on neural network models to forecast precipitation in different geographical regions, only one study was found in the literature related to forecasting precipitation using multiple point sources, which was applied to the Hong -Kong region 13 .The main focus of this work was to study how the precipitation records available at a cluster of weather stations can be combined to forecast precipitation in a selected station.

METHODS AND MATERIALS
Data classification: The precipitation data used in this study was obtained from the Department of Meteorology Sri Lanka.The data sample consists of daily precipitation data pertaining to a cluster of 10 weather stations in the dry zone.Spatial interpolation was found to be not effective for the dry zone when the aerial distance is greater than 110 km 14 .Thus, the stations in the study area were selected such that the aerial distance is less than 115 km from the centre of the cluster.For each station, 30 years of data recorded during the period 1970 to 1999 were utilized.Due to the lack of a dense precipitation gauge network within the study area as well as not having reliable continuous data records at least during the training period, the number of stations was restricted to 10. Figure 1 shows the geographical locations of the selected cluster of weather stations.
The highest number of missing monthly data records was for Nochchiyagama followed by Mannar and Mihintale (see Table 1).The number of missing daily precipitation measurements in the available data records was very low (less than 2%).To minimize the influence of the missing daily values in the data set, average values computed from previous years' data were used.
First, the daily precipitation measurements were transformed to formats that could be used to train the neural network models.For the precipitation occurrence model, a wet day was defined as a day in which a total precipitation depth for 24 h from 12 am exceeds or was equal to 0.3 mm, which is the amount lost due to evaporation.Any day which had a precipitation depth below 0.3 mm was considered as a dry day.
The input values to the network for precipitation occurrence was classified according to the definition given below.
where R t is the precipitation depth of the t th day and X t is the corresponding input value to the network.Thus, the precipitation occurrences during the past 3 days are either 0 (dry) or 1 (wet).The output of the first model is the target days' precipitation occurrence.
The precipitation occurrence model has a limitation since it can only predict whether a given day is a wet day or a dry day (two possibilities).However, in agriculture, it would be useful if one can classify the level of wetness.Thus, a second model was developed by incorporating the precipitation level of the following day.The days having precipitation depths greater than 0.3 mm were further classified into 4 categories namely, Trace, Light, Moderate and Heavy 13 .Table 2 illustrates the selected boundary values (range) for each class.The boundary values were selected by studying the precipitation distributions in the study area so that precipitation values could be distributed reasonably well between the selected classes.

Forecasting daily precipitation using multiple sources
Journal of the National Science Foundation of Sri Lanka 38 (3) September For simplicity, the trapezoidal membership function was applied for the fuzzy classification.The expressions for the 4 membership functions are shown below.
Graphical representation of the fuzzy membership functions are shown in Figure 2.
To verify whether the following day is a dry day or a wet day, the model requires precipitation values of 3 previous consecutive days.The occurrence of rain on the following day and the level of rain (if wet) can be obtained as the output of the fuzzy model.Model development: A feed-forward back-propagation neural network was used to implement the models 15 .Since there are 10 separate weather stations in the cluster, it was important to investigate the optimum number of nearby stations which influence the weather in any given station.In order to find this, precipitation data of all 10 stations were taken as inputs and then they were omitted one by one based on the distance to the station which is under investigation.
The results showed that distant stations do not contribute to the prediction success rate.In fact, consideration of the stations that are far tends to reduce the accuracy of forecast.The best overall performance was obtained when 3 nearby stations were considered.Therefore, the 3 neighbour model was selected as the best configuration to implement the neural network.Table shows the names of the stations and the corresponding nearest neighbours.
Using the Markovian assumption, for each station the precipitation occurrence of the past 3 days in the 3 neighbouring stations and the station under study was used as inputs to predict the following day precipitation occurrence (i.e.whether the following day is a dry day or a wet day).The precipitation data was converted to 1s and 0s according to the criterion discussed earlier.Thus, the corresponding network outputs are the precipitation occurrences of the following day; given as 0 for a dry day and 1 for a wet day.
Several neural network architectures were tested and it was found that the 12-11-1 (input layer -hidden layeroutput layer) architecture showed the best performance.

September 2010
Journal of the National Science Foundation of Sri Lanka 38 (3) Hence, it was chosen to implement the final model.A graphical interpretation of the network structure is represented in Figure 3.The only difference in the fuzzy classification model was the number of neurons in the output layer which was expanded to 5 representing the states -Nil, Trace, Light, Moderate and Heavy.Thus, the 12-11-5 architecture was adopted for the fuzzy classification model.
The network was trained using 20 years of daily precipitation data as inputs.The 'Tansig' transfer function was used for the first layer and the 'Purelin' function was used for the second layer.The default back-propagation network training function, 'Trainlm' was used for error adjustment.The neural network model implementation and training was carried out using Matlab neural network toolbox.
If the network is over-trained it will be trained to noise in the data set and not for the actual patterns.Therefore, the correct number of epochs has to be used during the training.A predetermined number of epochs was considered and the performance of the network was checked with early stopping.This process was carried out several times.During the training period, a limit of 150 epochs and a mean square error of 10 -3 were set.
Testing was done using 10 years of daily precipitation data which was not used in the training phase.The testing data set was also converted into 1 and 0 to define the wet and dry days respectively.The prediction success rate was defined as, where X c is the number of correct predictions and X tot is the total number of predictions.
The accuracy of the network was tested using the root mean square (RMS) error defined as, where X p and X e represent the predicted and expected or actual output, respectively, and n is the total number of predictions.
Since the dry zone receives significantly more precipitation during the second inter-monsoon season and the North-East monsoon season than the other days, the network was separately trained to forecast the occurrence of monsoon precipitation by selecting only the months from November to February.

Precipitation occurrence
Table 4 gives the summary of the network prediction success rates for precipitation occurrences over the entire 10 year period starting from January 1, 1990 and ending on December 31, 1999.As discussed in the previous section, the three neighbour architectures which require 12 inputs (status of past 3 days' precipitation in 4 stations) to forecast the status of a given day was used.Approximately 3,500 forecasts were made for each of the stations.The number of wet days in the data sample varied from 16% in Mannar to 28% in Maha Illuppallama.Data shows that most of the wet days are clustered around the months October, November and December which are in the second inter-monsoon season and the early part of the North-East monsoon season.The forecasting accuracy for Nochchiyagama which had the most number of missing monthly records (in the training period as well as in the testing period) compared to other stations in the cluster performed quite well.An average forecasting accuracy of 79±3% was seen for all stations.The three forecasts for the stations, Nachchaduwa, Nochchiyagama and Maradankadawala showed accuracies of over 80%.The performance for the stations in the outer boundary of the cluster (Vavuniya, Puttalam, Mannar and Trincomalee) was comparable to the stations in the centre of the cluster.
For these two classes of stations, the average forecasting accuracy differs by only 2% which is not very significant.
The overall RMS error of the network was found to be 0.46.
When the prediction success rates of dry days and wet days were analyzed separately, a notable difference was seen − prediction of dry days showed a superior accuracy compared to the prediction of wet days.On average, the accuracy of predicting a dry day was 90±2% while the same for wet day was 48±5%.The higher accuracy and the lower variation in predicting dry days indicate a tendency of long dry spells in all stations in the study area.Closer inspection showed that wet spells are short and are scattered throughout the dataset except during the wet season of the dry zone.The average length of dry spells and the average length of wet spells are 6.5 days and 2.3 days, respectively.
In Figure 4, the correlation between the prediction success rates for wet and dry days is shown.Although the data has a fair amount of scatter, in general, the dry and wet prediction success rates are negativelycorrelated giving rise to a situation where stations that have high prediction accuracy for dry days perform less accurately in predicting wet days.Overall, for the dry zone, if the network prediction is a dry day, there is a 90% chance that it will be indeed a day without rain.If the network prediction is a wet day, there is only a 50% chance of raining on that day.However, this accuracy has a tendency to vary especially during the second intermonsoon season and North-East monsoon seasons due to the change in the dry and wet spell patterns.Thus, if the network is trained for the general pattern seen throughout the year, the accuracy of the network may suffer.

Monsoon season
In order to study the effectiveness of the network in predicting occurrence of daily precipitation during the wet season in which the dry zone receives most of its rain, the network was separately trained for the period from November to February.Table 5 shows the accuracy obtained through this model.
It can be clearly seen that the accuracy achieved for predicting the dry and wet days are similar now.The average accuracy of predicting dry days and wet days are 71% and 74%, respectively, i.e., accuracy in predicting wet days is slightly higher than the accuracy in predicting dry days during this season.The overall prediction accuracy of the model is 72±2%.Since the chance of wet spells occurring is not low, the model is now able to predict the occurrence of wet days with a reasonable success.This is supported by the estimated length of dry and wet spells.Compared to the year round observations given earlier, the length of wet spells is now longer (2.8 days) and the length of dry spells is shorter (5.1 days).In addition, the percentage of wet days has increased from an average of 23±4% to 33±5%.Thus, the prediction accuracies are higher.In Figure 5 the correlation between the lengths of dry spells against the percentage of wet days is shown for all seasons and for the wet season separately.As expected, the data show a clear negative correlation between the two parameters.
Unlike in the previous case, the data show no correlation between the prediction accuracies of dry and  wet days.The highest disparity between the wet and dry predictions was observed for Mihintale (over 10%).No clear difference in the performance was seen for stations well within the cluster or the outskirts of the cluster.Thus, it is clear from the study that if there are randomly scattered dry and wet days without long spells, the model accuracy for predicting daily occurrence of precipitation reduces.

Fuzzy classification model
The fuzzy classification model was developed to predict the level of rain that one can expect on any given day.
Since the previously discussed models can predict the occurrence of precipitation, the fuzzy model can be used to estimate the depth of precipitation if the day is a wet day.To test this model, all cases leading to a rainfall depth below 0.3 mm were omitted from the input data set and the rest of the data were classified into four different classes -Trace, Light, Moderate and Heavy (Table 6).
It should be noted that due to the fuzzy classification, in certain instances, there is a tendency of a prediction satisfying two classes at the same time (when the precipitation falls within the shared areas between boundaries of classes).The overall prediction accuracy of the model varies between 73% and 87%.However, when the accuracies among the sub-classes are compared, it can be seen that the class 'Trace' shows very high accuracy whereas the accuracies for all other classes are quite poor.One reason for this could be the nature of the data set in the dry zone where most of the days are either dry days or days with very little rain (Trace).Therefore, in the training process the network may be biased towards trace values; hence the high accuracy.In addition, since we have omitted the depth information from the input dataset where the model is totally dependent only on the pattern of dry and wet days, the model may lack vital information that is required to predict higher levels of precipitation.This is one of the limitations in the present implementation of the fuzzy classification model.

CONCLUSION
The prediction of a highly variable event such as precipitation occurrence based only on ground level information is an extremely difficult task.However, in this work we have shown that neural network models using a cluster of weather stations in the dry zone of Sri Lanka can be applied with reasonable success in predicting daily precipitation.
The dataset had missing monthly records in all stations.However, the number of missing daily precipitation measurements in the available data records was less than 2% for all stations.Average precipitation values estimated from the previous years were used to fill the gaps in the daily records.The results of the models indicate that there is no dependency on the missing values at any of the stations or the location of the stations within the cluster.
The main objective of this work was to test the effectiveness of neural networks in predicting the precipitation occurrence using multiple sources.The results show that in the dry zone, neural networks using information from the nearest neighbours are highly effective in predicting the occurrence of dry days compared to wet days.When there are seasons in precipitation, higher accuracies can be obtained in the prediction of wet days if the model can be trained for each season separately.

Table 1 :
Longitude, latitude and altitude of the selected weather stations together with the available number of months with precipitation records and the percentage of missing daily precipitation values Source: Department of Meteorology, Sri Lanka.

Table 3 :
Stations under study and their nearest neighbours

Table 2 :
Fuzzy classification levels

Table 4 :
Overall prediction success rates together with RMS error, prediction success rates for dry days, prediction success rates for wet days, average length of dry spells and average length of wet spells

Table 5 :
Prediction success rates for the North-East monsoon season with the length of the dry and wet spells 2010 Journal of the National Science Foundation of Sri Lanka 38 (3)

Table 6 :
Prediction success rates for the fuzzy classification model