Performances of data mining techniques in forecasting stock index – evidence from India and US

Forecasting the stock market is a challenging task because of its stochastic and complex nature. Various statistical models and data mining techniques have been developed in the recent years and applied to stock market forecasting. A review of the relevant literature shows that only a very few studies have applied high frequency data to forecast the stock market and among these studies, only one or two have applied data mining techniques. There are no studies on forecasting high frequency data of stock index using multivariate adaptive regression splines. In this paper we study the applicability of the following four data mining techniques: backpropagation neural network (BPNN), support vector regression (SVR), multivariate adaptive regression splines (MARS) and Markov chain incorporated into fuzzy stochastic (MF), for one-stepahead forecast of S&P CNX Nifty index of India and Nasdaq composite index of USA with every sixtieth minute data. The results of the study shows that SVR is better than the others for forecasting high frequency data of both indices with an accuracy of 99.7 %.


INTRODUCTION
A series of data that represents specific activities of an entity that occur at periodic intervals of time is termed a time series and is used in the fields such as medicine (Friston et al., 1994), finance (Demyanyk & Hasan, 2010) and engineering (Weerasinghe et al., 2010) for prediction and decision making activities.Some real-life examples of time series analysis include weather forecasting, estimation of power consumption and prediction of earthquakes etc.
Stock market is an important area where time series analysis is applicable.Analysts, traders and investors are constantly required to predict the stock values at a future time (t + 1).Traditionally, analysts have used fundamental and technical analyzes to forecast the stock values but there is now an increasing trend to apply data mining techniques (Fu, 2011).Forecasting stock market is challenging because of its noisy, nonlinear and volatile nature that is driven by macro and micro factors such as organizational policies, political events, economic conditions and exchange rates (Kimoto et al., 1990;Fu, 2011).
This study compares various forecasting techniques that can be used in mature and emerging markets that experience high levels of fluctuations.The fluctuations are typically a result of high-risk trading by retail investors who predict the market using interrelated attributes such as the relative strength index (RSI), moving average (MA), exponential moving average (EMA) and William's %R, %K, %D.
This study also provides useful insights into successful forecasting of the next-hour-value of the index using data mining techniques with one dependent variable (close price) and an independent variable (1 time lag of close price).High frequency data from S&P CNX Nifty Index (Nifty) and Nasdaq composite index (NCI) are used to test and compare the performances of back propagation neural network (BPNN), support vector regression (SVR), multivariate adaptive regression splines (MARS) (Friedman, 1991) and Markov chain into fuzzy stochastic (MF) model (Wang et al., 2010).This study will provide traders and analysts with a reference model to avoid blind and irrational prediction.
In economics and financial studies, the random walk hypothesis by Malkiel and Fama (1970) and the efficient market hypothesis by Fama (1991) are very popular.These two hypotheses state that the stock market evolves randomly and no investor can earn excess returns by predicting or timing the market.There are, however, views that oppose the above hypotheses, where the financial market is believed to be predictable to an extent (Wang & Zhu, 2010).Thus there have been many studies on the development of models based on intelligent soft computing techniques for predicting the market (Sureshkumar & Elango, 2012).Recent years have witnessed the application of data mining techniques for forecasting the stock index.Some relevent and useful studies on the forecast of stock market using BPNN, SVR, MARS and MF are described below.

Back propagation neural network
Since the 1990s, ANN is a popular soft computing technique that has been used extensively in forecasting financial time series.There has been growing interest in applying neural network modelling to financial engineering in the recent years, because of its interesting learning abilities (Thenmozhi, 2006).In the recent time, Kara et al. (2011) proved the success in applying BPNN to modelling and forecasting financial time series.In particular, neural networks are increasingly used to model the stock market because of their nonlinear nature (Kimoto et al., 1990;Schierholt & Dagli, 1996).Modular neural network has been used in the past to predict the TOPIX index (Kimoto et al., 1990).These studies have accurately predicted the market and thus promised good profit in simulation on trading stocks.In another study (Chiang et al., 1996), BPNN was compared with linear regression and other non-linear regression models to predict 101 US funds, and BPNN was shown to better predict mutual funds than the others.
Yet in another study, change point analysis was integrated with BPNN (Oh & Han, 2000) to predict treasury bills and treasury bonds.The integrated model was compared to BPNN and the former was found to have a better prediction capability than BPNN alone.Safer (2003) compared the ability of BPNN and MARS in predicting abnormal returns of the index by using the insider stock trading data and found that the BPNN performed better than MARS.Linear regression, logistic regression, BPNN, support vector classification (SVC) and principal component analysis (PCA) with all four classifiers were applied by Son et al. (2012) to forecast the trend of KOSPI 200 high frequency data, which had shown that BPNN performed better than the other compared techniques when a dimension reduction was applied.Kumar and Thenmozhi (2012) studied the prediction performance of BPNN, ARIMA-EGARCH and ARIMA-EGARCH-BPNN in Nifty and S&P 500 returns and showed that BPNN outperformed the other two hybrid models by providing lower MAPE.Apart from forecasting financial time series data, ANN has also been applied in rainfall prediction (Rathnayake et al., 2011), speech recognition (Dahl et al., 2012) and biology (Boorman et al., 2011) etc., with promising results in each case.
Although there are various ANN models that have been studied for various applications, it appears that the back propagation neural network (BPNN) is the most popular and extensively used technique in forecasting.This study used BPNN as one of the techniques to forecast the intraday data of stock markets.

Support vector regression
Although ANN has provided good forecasting results, it has some limitations such as over-fitting and dependence on the operators to control the parameters.As a result of these weaknesses researchers have developed many models to improve the ANN model.In 1995, Vapnik developed the support vector machine (SVM) model for classification, which is widely acceptable and it overcomes the limitations of ANN.Many researchers have found that the SVM is superior to BPNN making it a particularly attractive data mining technique in forecasting studies.SVM is further divided into SV classification (SVC) and SV regression (SVR).
SVR has been found to outperform BPNN in terms of NMSE, MAE, DS and WDS in forecasting future contracts (Tay & Cao, 2001).Kim (2003) compared SVC with BPNN and case based reasoning using 12 technical indicators to forecast the Kospi index and found that SVC outperformed the other models, proving the applicability of SVMs in forecasting financial time series.A hybrid model of ARIMA-SVR was developed and compared with ARIMA and SVR in forecasting daily closing prices of 10 stocks in NYSE (Pai & Lin, 2005).The authors found that the hybrid model minimizes the forecasting errors considerably.In another study, the USD/GBP exchange rates were successfully forecasted by employing the SVM model using daily data (Cao et al., 2005), thus exhibiting promise in financial time series modelling applications.).Among these, the LEL-TSK technique was found to have the best accuracy in prediction.Wei et al. (2011) used an ANFIS to forecast the TAIEX and found it to be better than the other techniques.
In another study, hidden Markov model (HMM) was found to be a good predictor of the trend of SCI and Shenzhen -Sinopec shares (Zhang & Zhang, 2009).HMM was also developed with EM algorithm and compared with RNN to predict the S&P 500 index (Zhang, 2004).This model was found to successfully predict the index in both bull and bear markets.Fuzzy stochastic and grey prediction models were developed to predict the nexthour-value of Taiwan stock exchange (Wang, 2003).The index was successfully predicted with a very little deviation when the fuzzy stochastic technique was applied.A hybrid model combining the hidden Markovfuzzy stochastic was developed by Wang et al. (2010) to forecast the Taiwan stock exchange and was found to perform better in 298 of 330 trials in predicting the perhour data than the fuzzy stochastic technique.
It is observed that most studies have hitherto focused on the use of daily close of stocks and indices for forecasting with some individual and hybrid techniques, and using advanced techniques beyond the understanding of traders and investors.The major gaps in stock market predictive studies are identified based on the previous literature and listed as follows: (a) few studies have applied high frequency data to forecast stock index; (b) even among the studies that have used the high frequency data for prediction, data mining techniques have rarely been applied; (c) there have been no studies on application of MARS in forecasting the intraday price of stock indices; (d) many studies have applied support vector classification (SVC) rather than support vector regression (SVR) to forecast stock indices and (e) a very few researchers have used the lag value of the dependent attribute as an independent attribute to forecast the stock index.These specific issues are addressed in this research paper.

Back propagation neural network
The principle of neural network is derived from the human nervous system where every neuron receives signals from the outside or from an adjacent neuron, and processes it through an activation function to produce outputs that are then transmitted to the other neurons.The strength of the input depends on the weight of the neurons; the higher the weight of the neuron, the stronger is the input and betters the connection between neurons, and vice versa.

Epsilon -support vector regression
The support vector machine introduced by Vapnik in 1995 (2000) can be used for either classification or regression.It minimizes the upper bound of the generalization error by applying the structural risk minimization (SRM) principle.This overview is to understand the concept of SVM.

SVR is formulated as
where φ(x) is a feature, which is mapped nonlinearly from the input space x.Kernel functions play a major role in classification or regression through SVM.In most cases SVM s give good results when the radial basis function (RBF) kernel is used.To know the detailed description about this technique, please refer Smola and Scholkopf (2004).

Multivariate adaptive regression splines (MARS)
MARS is a multivariate nonlinear and non-parametric regression procedure developed by Friedman (1991).It is an extension of linear models that can model the nonlinearity and interactions between the variables without the need for human intervention.MARS can also rapidly find the attributes to be used and the end points of the intervals.It can allow any degree of interaction to provide a model that fits best with the data.
Markov chain into fuzzy stochastic Wang (2003) proposed a fuzzy stochastic model to forecast the stock market by considering the situations in the stock market as random process: where r =  n=1 n=J μ(t n+1 )μ(t n )/J and n = 1,2,…J Є N; where N refers to natural numbers, μ(t n ) is a membership function μ(t n ) = (x/y) 2 , where x is the observed value of a specific hour t n on a day and y is the highest value at the specific hour of the day t n .In this study, the parameter r of the fuzzy stochastic prediction model in equation ( 2) is adjusted by the Markov chain concept.
Markov chain is a progression, which consists of a finite number of states and some known probabilities p ij , where p ij represents the probability of moving from one state j to another state i.The probabilities p ij are called transition probability.The process can remain in the state it is in, and this occurs with probability p ii , which is known as state probability.A random process (X n , n ≥ 0) on state space S is said to be a Markov chain if i and j belong to S.
As the dataset of Nifty and NCI are grouped by the hour, the random variable X n represents the stock index value at the n th hour in this study.X n = 1, represents the stock index is in the rising trend; X n = 2, represents the opposite trend i.e., stock index falling, where n = 1,2,….,N. a i (n) in equation ( 3) and (4) represents the probability (i = 1,2) of the state in situation i at the n th hour, like a i (n) = P(X n = i).p ij states the probability (i,j = 1,2) of the transmit of the first state from a certain hour in situation i and to the next hour in situation j, like p ij = P(X (n + 1) = j│ X n = i).Thus X (n + 1) depends only on X n and p ij .The following is obtained according to the concerned probability formula Here r ij is used to represent the change rate (i = 1,2 and j = 1,2) from a specific hour's state in situation i to the to the next hour in situation j, as r ij = =  n=1 n=k μ(t n+1 )μ(t n )/k, where μ(t n ) is defined as (x/y) 2 , where x is the index value of a specific hour t n on a day and y is the highest value of the index at the specific hour of the same day.The r parameter of the prediction model is obtained using equation ( 3) and (4) ...( 5)

METHODS AND MATERIALS
This section describes the experiments performed and the comparative performances of the four techniques namely, BPNN, SVR, MARS and MF to predict the one-stepahead forecast of Nifty and NCI indices.NCI is a leading index in NASDAQ stock market, which is followed in the USA as a sign of performance of stocks in technology and growth companies.Nifty is a benchmark index of the Indian stock market and it covers 22 sectors of the Indian economy.The one-minute high frequency data of the indices were collected between January 2, 2012 and September 28, 2012 on all full trading sessions and from this dataset, every sixtieth minute was taken for analysis.The missing observations were filled with the mean value of the respective hour.The trading time for Nifty index was between 9:15 a.m. to 3:30 p.m. and that for NCI was 9:30 a.m. to 4:00 p.m.In this study we considered the time and data for full day trading sessions on weekdays between 9 a.m. to 4 p.m. for both the indices to simplify the data processing.
To see the performances of the used techniques in the sample period, this study divided the dataset for examining the out-of sample performance of BPNN, SVR, MARS and MF.The datasets used in this analysis were divided into training (80 %) to build the model and testing (20 %) to estimate the model.Every sixtieth minute data taken for this analysis were normalised based on the minmax normalisation reported by Han et al. (2012).The experiments were performed using the dataset with minmax normalized variables, viz., normalised close price of an index as the dependent attribute and normalized lag close price of an index as an independent attribute for BPNN, SVR and MARS.For the MF model, the denormalized close price of the index was used.The collected data were taken from Google Finance.
The forecasting techniques were analyzed based on the statistical measures such as the mean absolute error (MAE), mean absolute percentage error (MAPE), mean square error (MSE), root means square error (RMSE) Theil's U -statistics and forecasting efficiency (%).The formulae for these statistical performance measures are given below.
where e n is the difference between actual a n and predicted y n and n is number of observations.MAE, MAPE, MSE and RMSE are called as "measures of fit".These values help to measure the deviations between the actual and forecasted values.The Theil's U statistics is a measure of the efficiency of the model to predict the data.Smaller the values of these parameters, closer are the predicted values to actual values.The MAPE was used to select the best model in the particular techniques.Tables 13 and 14 shows the performances of all the techniques used based on the denormalized test data values of the actual and predicted results.
In the representation of BPNN model, the architecture was formed with a single input node in the input layer, three nodes in the hidden layer and a single node in the output layer.The input node consisted of the normalized lagged close price of Nifty or NCI index's sixtieth minute data as it is the forecasting attribute, and the output node was the the normalized close price as it is the forecasted attribute.The trained BPNN architecture of this study was 1-X-1.This represents the neural network with 1 neuron in the input layer (normalized lagged close price), X neurons in the hidden layer and 1 neuron in the output layer (normalized close price), since onestep ahead prediction is made in this study.Since there are no rules to determine the number of nodes in the hidden layer (Han et al., 2012), the nodes in the hidden layer tested were set from 1 to 5 for Nifty and NCI.We applied tansig function for the nodes in the input layer and the hidden layer, while linear funtion was applied at the output layer.The model selection results for Nifty and NCI are presented in Tables 1 and 2, respectively.Another important measure to train the network is learning rate.The learning rates with 0.001 to 0.5 were tested in the training process.The number of epochs tested in this study were 10000, 10250, 10500, 10750,...., 25000.To obtain the best parameter in BPNN, the learning rate and the momentum were fine tuned.The BPNN topology with the minimum testing MAPE was considered as the best.
As the second step, SVR was used to resolve the forecasting problems of BPNN, as it has been proven to be better than BPNN in stock market forecasting.The best SVR model can result only with the selection of best parameters.Hence the RBF kernel was chosen to train the SVR model for this study.Here also fine tuning was performed to identify the best parameter C and ε for SVR model with the minimum testing error.The authors have chosen C from 2 1 to 2 12 and ε from 2 -12 to 2 12 to identify the best model.Portions of the best out of the combinations are presented in Table 3 for both indices.
Thirdly, the basis functions (BF) were used in the MARS model to forecast the effects of one time lag close of the index on close price of the index.In order to build the best MARS model, a penalty factor from 1 to 5 was chosen for both time horizons of the indices.The model with the least MAPE from test data was considered as the best model.
For the BPNN and SVR, Weka 3.7.7,developed by Hall et al. (2009) was operated to develop the models.Statistica 10, provided by Statsoft was used in building the MARS models and to develop the MF model Ms Excel 2007 was employed.All the modelling tasks were implemented on an HP Compaq PC with Intel (R) Core (TM) 2 Duo E8400 @ 3.00 GHz CPU processor.The detailed forecasting results of indices using the above mentioned modelling techniques are described in the following section.

Back propagation neural network
Tables 1 and 2   learning rates for Nifty and NCI.From Table 1, it is seen that the minimum MAPE 1.4566 was obtained when 3 nodes were used in the hidden layer.This implies that the Nifty's every sixtieth minute close price was successfully forecasted by 1-3-1 network topology with a learning rate 0.3 and this network took 15000 iterations to train to produce the lowest MAPE.Hence it is the best network topology in forecasting every sixtieth minute close price of the Nifty index.
From Table 2, it can be observed that the network with 5 nodes in the hidden layer produces the minimum MAPE of 1.1311 for NCI.Furthermore, it is understood that the network with 1-5-1 network topology produces the lowest error with 0.05 as the learning rate and 17500 epochs and hence 1-5-1 network topology is considered as the best in forecasting the NCI index.The forecasting performances of the BPNN topology of each index was analyzed and presented in Table 13 with MSE, RMSE, MAE, MAPE, Theil U statistic and forecasting efficiency (%) by denormalizing the normalized predicted data.

Support vector regression
Totally 1152 different SVR models each in Nifty and NCI resulted from the combination of 12 C parameter and 96 ε parameter.A portion of the model selection results is presented in Table 3 (Nifty and NCI).As observed in Table 3, the combination of C = 2 10 and ε = 2   13 by the denormalized predicted ouput.

Multivariate adaptive regression splines
Here only one variable was considered as the forecasting parameter, and so this variable was automatically selected.
In order to explain the MARS prediction model, the first built Nifty's MARS model was used as an illustrative example.For example, if the value of normalized lag close Nifty for BF1 = max (0, normalized lag close Thus, it was found that SVR produced the lowest MAPE compared to the other models.Although MARS model was able of identify the important independent attribute, its forecasting ability for the normalized variables was not as good as those of BPNN and SVR as seen from Tables 1 to 3. In this study the forecasting ability of the models built using these techniques was compared for which the normalized predicted output were denormalized and are presented in Tables 13, and the robustness evaluation on these techniques were analyzed and presented in Table 14.

Markov chain into fuzzy stochastic
Nifty and NCI datasets were used to provide a detailed analysis on the predictive capability of MF.From Table 5, it is observed that the time period considered for this study is reformatted from 9.00 a.m. to 04.00 p.m.

Rising and falling stock index probabilities
The probabilities p 11 , p 12 , p 21 , and p 22 in equation ( 3) and ( 4) were computed for the rising and falling stock indices from the datasets.To know the value of the index in the next period, we used the mentioned dataset, for example in Table 5 to produce Table 6 for further analysis.
Table 7 indicates the rising and falling measures of the stock index."0" indicates that the value of the index is falling or equal to the previous state (falling)."1" indicates that the value of the index is rising or equal when compared to the previous state (rising).Now to compute the probabilities p 11 , p 12 , p 21 , and p 22 over the training data we use [times of occurrences of (1,1)/ total number of entries] to find p 11 ; [times of occurrences of (0,1)/ total number of entries] to find p 12 ; [times of occurrences of (1,0)/ total number of entries]

Rising and falling change rate
The stock index change rates were calculated for r 11 , r 12 , r 21 , and r 22 in equation ( 5), which are presented in Table 10.For example, in Table 9, a portion of change rate r 11 for the time period 9.00 to 10.00 for Nifty index was computed.
Here, we have to calculate the average µ index for the period 9.00 to 10.00 and the change rate by finding the difference between the average µ index 0.991381119 (10.00) and the average µ index 0.987669 (9.00), and its difference 0.003712359.

Parameter computing and obtaining the predicted value
Parameter r = r 11 p 11 + r 21 p 21 was used to calculate the rising parameter when the stock value increased, and parameter r = r 12 p 12 + r 22 p 22 was used as a falling parameter for falling stock index values.The computed r parameter values for both indices obtained by combining Table 8 and 10 are presented in Table 11.
The next period index value can be predicted by using the prediction function X (n+1) = X (n) e r (e represents 2.71828182845904) .For example, to predict the Nifty's next period value, we have to take the stock index at 9:00, 9:01 and the parameter r.The reformatted stock index at 9:00 (5269.75)and 9:01 (5269.2012, shows that the index is on the falling side and the corresponding falling parameter during 9:00 to 10:00 is -0.04662.Substituting the parameter and the index value at 9:00 in the prediction function 5269.75e -0.04662 gives the predicted value of 5029.71 for the next time period 10.00.Likewise the next hour's values can be predicted for both indices.A part of the predictions thus made is shown in Table 12.
The performance of MF technique with other compared techniques is presented in Table 13.

Comparative result
Table 13 summarizes the forecasting performances of Nifty and NCI using BPNN, SVR, MARS and MF for the denormalized predicted data.To evaluate the forecasting performance of the best model, MSE, RMSE, MAE, MAPE, Theil U statistic and forecasting efficiency (%) were used.It can be seen that BPNN, SVR and MARS has a 99.9 % efficiency in forecasting the indices.It is observed from Table 13 that the forecasting efficiency of SVR model is 99.99845 % and 99.99899 % in Nifty and NCI, respectively, which was better than the other techniques.From Table 13, we can also observe that MARS outperformed BPNN and MF with lower errors in MSE (287.563),RMSE (16.958),MAE (10.844),MAPE (0.2 %), Theil U statistic (0.00156) and forecasting efficiency (99.99844 %) in forecasting every sixtieth minute of Nifty.There were smaller deviations between the actual and predicted values when the SVR model was used.Thus the SVR model provided better forecasting results than BPNN, MARS and MF in Nifty and NCI indices.

Robustness evaluation
To evaluate the robustness of the SVR model, the performances of BPNN, MARS and MF models were tested using different ratios of training and testing sample sizes.The experiments were based on the relative ratio of the observations of the training dataset size to the complete dataset size.In this robustness evaluation, we considered four relative ratios: 60 %, 70 %, 80 % and 90 %.Table 14, summarizes the forecasting performaces of the analyzed indices by four techniques in terms of MSE, RMSE, MAE, MAPE and Theil U statistic.The SVR models outperformed the other models in all four different ratios in terms of the five different performance measures.Thus, the SVR undeniably provides better forecasting results than the other techniques in both Nifty and NCI indices.Based on the discussions and findings reported in this empirical study, we can say that SVR model is most suited for forecasting the next period value of the index with higher accuracy when using a single forecasting variable.

CONCLUSION
Earlier studies have examined and compared various sets of data mining techniques in time series forecasting, mostly in the area of ANN, SVC and a series of hybrid models.A sixty-minute dataset from Nifty and NCI indices were used in this study to evaluate the performances of BPNN, SVR, MARS and MF techniques.The use of minute data was found to increase the frequency and fluctuations among the data.MSE, RMSE, MAE, MAPE, Theil U statistic and forecasting efficiency (%) were used as performance measures to check the forecasting capability of the techniques.The results shows that an accurate prediction can be made for Nifty and NCI without the use of extensive market related data or macroeconomic variables.SVR models were found to provide better forecasting results than MARS, BPNN and MF in forecasting Nifty and NCI indices.SVR was found to outperform the other techniques in considering the single-time lag close in the input.Also the robustness evaluation for the employed techniques proves that SVR outperforms the other techniques in all the datasets.This study has also justified the recent emergence of MARS as a better prediction technique for intraday trade than BPNN in Nifty index.It has contributed to the knowledge by evaluating the MARS technique in time series forecasting.
Forecasting stock market is important for fund managers, policymakers, investors, borrowers and traders.This study offers investors and analysts a comparitive study of popular models and recommends the model to use for successful forecasting of stock index.
A detailed description about this technique can be found in Chen et al. (2006), Han et al. (2012) and Kara et al. (2011).

Multivariate adaptive regression splines
Sai et al. (2007)developed a model that integrated rough set and SVC (RS-SVM) to investigate the trend of H & S 300 index.This model was compared with RW, ARIMA, BPNN and SVC models.The error and computational time for the integrated model were much less than those of the other models.

Fuzzy logic and Markov chain
Silicon Integrated System Corp and UMC Corp of TSEC index.AR, STAR, NCSTAR and LEL-TSK techniques have also been used to predict the DJIA index of 23 companies (José Luis Aznarte et al., 2012

Table 1 :
A portion of model selection results of BPNN model -Nifty

Table 2 :
A portion of model selection results of BPNN model -NCI

Table 3 :
A

Table 4 :
Basis function and prediction equation of MARS -Nifty and NCI

Table 5 :
A portion of stock index data with µ Index -Nifty and NCI

Table 6 :
-7.5produced the minimum MAPE 1.2883 and was considered the best SVR model for forecasting the Nifty index.From Table3, it is observed that the minimum MAPE 1.1209 was obtained through the combination of C = 210and ε = 2-7.75forforecasting NCI's every sixtieth minute close price.From this analysis, it is understood that SVR performs better than BPNN in both indices by providing the lowest MAPE.Increasing the C and decreasing the ε parameter values provided the best results for the indices.The MSE, A portion of stock index data -Nifty and NCI

Table 7 :
A portion of stock index rising and falling -Nifty and NCI

Table 8 :
Table 4 is 0.3, then the BF1 = 0.057 and the model predicts that the normalized close of Nifty is increased by 0.0566 (i.e., 0.993*max (0, 0.3 -0.243).The obtained basis function and the attribute selection result for Nifty and NCI are presented in Table4.The out of sample predicted results of the MARS model produced a GCV error of 0.000306 and 0.000310 for Nifty and NCI, respectively.From the results in Table4, the lowest MAPE obtained for Nifty and NCI, were 1.33062 and 1.523801, respectively.The penalty parameter used to obtain these lowest MAPE was 2 for both indices.Probability for all time periods in the trained dataset -Nifty and NCI

Table 9 :
A portion of change rate calculation -Nifty and NCI

Table 10 :
Stock index change rate for all time periods -Nifty and NCI

Table 11 :
Parameter r for the time period 9:00 to 16:00 -Nifty and NCI 21 and to find p 22 [times of occurrences of (0,0)/ total number of entries].For example, in Table7, from timing 9.00 to 10.00, the number of occurrences of (1,1) in Nifty index is 2 and the total number of entries are 8, so p 11 is 2/8 = 0.25.In a similar way we can obtain the value for p 12, p 21 and p 22 , which is shown in Table8for both indices.

Table 12 :
25)on August 6, Comparison of a portion of predicted result for the model with actual -Nifty and NCI

Table 13 :
Denormalized stock index forecasting results for all the models in this study -Nifty and NCI

Table 14 :
Robustness evaluation on denormalized result of the compared techniques -Nifty and NCI