An information rich subspace separation for non-stationary signal

This paper proposes a novel automated approach for the classification of highly non-stationary signals based on a non-principal component analysis (non-PCA) methodology. This method generates an eigen analysis based pseudospectrum to emulate the spectral characteristic variations of the non-stationary signals to be classified. Then, the estimated pseudo-spectrum is used to implement a comb like subspace filter structure, which captures the variations of all significant spectral components throughout the whole observation period. It is shown that this filter implementation method yields better results than the existing dimensionality reduction methods, which only utilise the principal k components of the eigen space. Finally, a novel probabilistic approach which creates a signature vector representing each class of signals in the training phase is proposed for the classification process. It is also shown that the proposed method can be effectively used not only for classification but also for the extraction of hidden stationary signature features from a non-stationary signal. Further, it is also proven that the proposed subspace filtering scheme can be used as a dynamic spectral estimation technique, which can eliminate the time frequency resolution tradeoff that exists in techniques such as short-time fourier transform (STFT).

The signals in concern in a vast number of important practical applications are non-stationary in nature. However, currently in many practical applications stationarity is assumed for such non-stationary signals considering small windows or slow fading/variations. Two of the most widely used methods for the analysis methods and short-time fourier transform (STFT) based methods. However, parametric power spectral estimation methods such as the auto-regressive method are not appropriate for non-stationary signals (Subasi & methods such as STFT are ineffective when the time varying information are useful (Sukittanon et al., 2003) due to the time-frequency tradeoffs and less resolution effects.
In a previous study (Nettasinghe et al., 2013;Pollwaththage et al., 2013;Ratnayake et al., 2013), it has been proven that subspace based signal analysing techniques can be successfully incorporated for signal correlated and noisy. In this paper, we explore how the subspace based signal analysing techniques can be stationary signals.
The major contribution of this paper is the introduction of a novel approach for non-stationary non-PCA approach. In other words, unlike the widely used PCA based methods, this method selects frequency components from a signal in a manner such that the selected spectral components represent all distinct spectral components that might contain information about the signal (not only the ones that have the highest power). It utilises an eigen-decomposition based methodology to identify characteristics for the signal in concern. It is shown that this method can be used for hidden signature Further, it can be used to identify and extract stationary components of a non-stationary signal. In other words the non-stationarity of the spectral content can

SIGNAL DATABASE AND PRE-PROCESSING OF SIGNALS
A collection of signals is utilised to verify the accuracy of the proposed methods and to compare their performance with the existing methods. Synthetic signals are mainly used to formulate the method, and naturally occurring signals such as speech signals and elephant rumbles are used to test the accuracy of the proposed methods and to prove their applicability.
The higher degree of non-stationarity in speech signals from each speaker were recorded while they were reciting the same word ('Evaporation'). Figure 1 depicts the time domain waveforms and the spectrograms for one recital by each speaker. As the same word was used, remain highly correlated between the speakers. Further, the selected speakers have similar English accents, hence increasing the correlated content. Therefore, only the hidden features related to the vocal cords of a particular to extract the hidden information about a particular speaker, which will be useful in recognising that speaker. Figure 1(b) shows that the signals are highly nonstationary since the spectral content of the signals is rapidly changing with time. It is noted by visual observation is not possible due to the high degree of correlation between the signals of different classes. This again highlights the importance and the need for accurate extraction of hidden signature features.
Synthetic signals used in this research are generated by concatenating different sinusoids of different frequencies such as the synthetic signal no.1 shown in Figure 2 (a). These signals are mainly used to formulate the proposed algorithm.
Apart from the synthetic signals and the main database of speech signals, elephant infrasound calls are also used to illustrate how the proposed algorithm can be used for stationary feature extraction from non-stationary signals.

PROPOSED ALGORITHM FOR SPECTRAL ESTIMATION AND FILTER IMPLEMENTATION
In this section, a novel non-PCA based approach for the generation of the pseudo-spectrum of a signal is presented. Then, how the generated pseudo-spectrum can be used to capture the described. It is also shown how the proposed method can be used for the extraction of stationary signature features of the proposed method over some of the popular dynamic spectral estimation methods such as PCA based methods and STFT is also discussed in this section.
In this paper, the signal in consideration x, is treated as a set of samples stored in an array where x(i) is the i th sample of the signal obtained by sampling at a frequency of F s (here F s is 44100 Hz).
In order to calculate the statistical parameters of the signal x, it is assumed that the signal x is a realisation of an ergodic random process. In other words, it is assumed that every realisation of the random process, which generates the signal x carries the complete statistics of the random process with it.
Next, the autocorrelation matrix (ACM) of the signal x denoted by xx , is created as,  Therefore, all the elements of the ACM can be easily calculated using the expression (4) for any signal.
In the training phase of this proposed approach, the minimum ACM order (maximum number of shifts in the time domain), which is required to model the signals in concern accurately is estimated using a minimisation of variance technique. Here the ACM is constructed initially using a very small model order (N = 10). Thereafter, the eigenvalues and eigen-vectors The eigenvalues of the ACM (sorted in the descending order) are stored in the diagonal of the matrix D and their respective eigen-vectors are stored in the matrix V. Therefore the relation between V, D and the ACM ( xx ) can be expressed as (5) ... (5) using the basic concept of eigen-vectors and eigenvalues.
Then the eigen-vectors are realised as a set of FIR eigen-vector (6) ... (6) which corresponds to the i th largest eigenvalue of xx . Hence, q i appear in the i th column of V. This will produce the transfer function given by Then, the corresponding eigenvalues are plotted against the main lobe center frequencies of the frequency pseudo-spectrum (eigen spectrum), which gives an idea of spectral content of the signal in concern. The accuracy the ACM size (N).
N opt ), the order of the ACM is increased from the starting value of 10 by incremental step sizes of 10. For each and every ACM size the number of peaks in the pseudo-spectrum and the frequencies corresponding to those peaks are recorded. When the ACM size is increased this way, the number of peaks in the estimated pseudo-spectrum will also increase and the frequencies corresponding to those peaks will vary. Hence, the ACM order is increased until the number of peaks and the frequencies corresponding to those peaks becomes steady (number of peaks and their frequencies do not vary when the ACM size is further increased).
The condition to select N opt is as follows. If the number of peaks and their corresponding frequencies remain at a steady state for 10 steps starting from a certain value where each step is an increment of 10, the starting model order is selected as N opt . This method of determining N opt is based on the reasoning that the ACM order, which gives the minimum accurately capture all the peaks and their corresponding frequencies is the optimum ACM order. This means that N opt is the model order required for the generated pseudospectrum to accurately capture all the dominant spectral components in the actual power spectrum of the signal in concern.
Consider the non-stationary signal for which the spectrogram (variation of frequency with time) is depicted in Figure 2 (a) (labelled hereafter as-synthetic signal no. 1). This signal was generated synthetically, using sinusoidal signals of frequencies 100, 500, 1000, 2000, 3000 and 4000 Hz at different time windows. Figure 2 (b) illustrates how the number of peaks and the frequencies corresponding to those peaks vary with the ACM order. It is observed that when N is greater than 230, the number of peaks does not vary even though the ACM order increases. However, the frequencies corresponding uctuation until 480. After 480, for 10 consecutive increments of the ACM order, the frequencies of the peaks remain exactly the same. Therefore, a value slightly greater than 480 is chosen to be the N opt . Here, the chosen value for N opt is 490 (480 + 10). The selected value does not necessarily have to be 490 and any value, which is greater than the experimentally  selected lower bound of 480 would be suitable. However, to avoid wasting computational power unnecessarily, it is better to select a value which is slightly higher.
Figure 2(c) depicts the estimated pseudo-spectrum with the chosen N opt value. It can be seen that the peak locations of this pseudo-spectrum are approximately equal to the frequencies of the constituent sinusoids of the synthetic signal no. 1 shown in Figure 2 (a).
The most advantage of this proposed pseudo-spectrum based method is its capability to extract all the distinct spectral components of a signal. This is achieved by selecting the subspaces that correspond to the peaks of the pseudo-spectrum. Further, it should also be noted that not all these subspaces which correspond to the peaks of the pseudo-spectrum are principal components of the eigen space. However, the proposed method is capable of extracting these principal components as well as non-principal subspaces corresponding to all distinct spectral components unlike the widely used PCA based methods. The overall subspace formed by the selected subspace, which carries unique information irrespective of whether it is a principal or non-principal component.
In the traditional PCA based dimensionality reduction methods, the selection of principal K subspaces (the K subspaces that has the highest eigenvalues) out of the total of N subspaces is performed. But this does not guarantee the capturing of all the distinct spectral components. This is due to the fact that two (or more) adjacent subspaces eigenvalue spread) may contain center frequencies, which are in close proximity to each other, hence containing no new spectral information about the signal. In other of one spectral lobe of the signal. The proposed method is capable of latching only on to the required subspaces located at distinct points of the spectrum given that the resolution is adjusted to N opt in the training phase.
For example, Figure 3 shows the plot of the corresponding eigenvalues. It can be seen that although only three distinct frequencies. In the proposed non-PCA six distinct frequency components, which essentially describe the spectral content of the signal more completely, thus eliminating the weakness of the PCA based traditional approach.
Another important aspect of this proposed method is its capability to extract constituent stationary components from a non-stationary signal. This is highly useful in many applications such as elephant infrasound call detection (which will be explained later in this paper), stationary signature features.
In order to highlight the said capability of the proposed method to extract stationary components, the synthetic signal number 1 shown in Figure 2 (a) estimated pseudo-spectrum. are shown in Figure 4 (b). It can be clearly seen that the how the total power of the signal is distributed among different frequencies at any given time instant. The popularly used method for this purpose is the short-time fourier transform (STFT). One weakness of the STFT based method is the high time-frequency resolution tradeoff (Quatieri, 2002;Marks II, 2008;Nam et al., 2010). This tradeoff makes the STFT based methods incapable of achieving high resolution simultaneously in both time and frequency domains. The reason for this weakness is the fact that a series of segments over a sliding window are used to obtain the STFT of a signal. The second weakness of the STFT based method is that it only indicates the variation of the spectral content with time and it is not capable of extracting the different spectral components.
This proposed method, as can be seen from of these weaknesses that exist in the STFT method by identifying the time windows where a particular frequency is present. This is due to the fact that the scheme as opposed to the sliding window method used spectral components at any given time instant. Since resolution effect does not exist in this proposed method unlike the STFT based method. Further, this method is also capable of extracting the distinct spectral components as opposed to merely representing their   reducing two major drawbacks of the commonly used STFT method.
An important advantage of the proposed method is its capability to perform well even with signals, which have spectral components located very near each other. In this scenario, N opt will be auto-adjusted in the training phase located distinct spectral components. Hence, this method can be easily adapted to many types of applications which have signals with different types of spectral properties (biological, acoustic, underwater, etc.).
Another important aspect of the proposed algorithm is its behaviour with the signals, which has spectral changes that occur gradually instead of sudden changes. For example, consider the synthetic signal no. 2, which is illustrated in Figure 5 (a). Figure 5(a) shows the synthetic signal no. 2 which consists of changes in spectral components that occurs linearly and logarithmically. It can be seen from Figure 5(b) that the ACM order, which captures all the spectral components and their corresponding frequencies is 500. After 500, the frequencies of the peaks do not vary for 10 consecutive increments. Hence 500 is the optimum model order according to the proposed method. It further shows that some peaks occur and then disappear when the ACM order is in the range of 150 -370. They occur due to the linear and logarithmic frequency transients.
higher model order that latches on only to the stationary regions of the signal. Center frequecny of the main lobe (c) signals.
The database of signals described in Table 1 and illustrated in Figure 1 have been used to show how proposed methods.
Initially, one signal from each class (each speaker) is used to identify the optimum ACM order that can be used to model the signals in concern. The obtained results are shown in Table 2.
Then the frequencies corresponding to the peaks of the spectrum are recorded for each signal. The recorded many signals of a particular class have peaks at a particular frequency. In other words, a plot with the y-axis indicating the normalised (divided by total number of signals -50) number of times a particular spectral content corresponding depicts the generated plots for each class. For example, the green circled marker located at 473.7, 0.5 indicates that 25 (50 × 0.5) signals of the EVP2 class has peaks at 473.7 Hz of the pseudo-spectrum.
These plotted values for each class can be used as primary keys, which distinguish each class of signals (different speakers in this case). Those primary keys can be expressed using a generalised expression as in, ... (8) where is the spectral presence of k th class at the frequency f i .
These primary keys can be used as means of distinguishing the class that a particular signal belongs. This is due to the fact that the elements of any primary key P EVP:k can be considered as a set of sensors placed at various points (f i for all i) of the power spectrum. Further, n k , f i can be considered as a measure of the reliability of the sensor located at f i since it indicates the probability of a signal of class k having a peak at the frequency f i .
After the construction of primary keys for the classes, the next phase is the use of constructed keys for the When such unknown signal is considered, initially the spectrum of that signal is estimated using the approach proposed previously in this paper. Here, the optimum model order (2000) selected in the training phase is used for the estimation. Then a vector U, ... (9) containing binary values (1s and 0s) to indicate the presence and non-presence of peaks at f i (for all i) of the spectrum is created.
As indicated in equation (9), the vector U comprises binary elements b , which indicates the presence or nonpresence of peaks at the frequencies f i of its estimated spectrum.  The reason for the difference of this model could be that the vocal system highly differs from person to person.
Since the highest required model order is 1930, the optimum model order was selected to be 2000, which is slightly greater than the highest required model order. This selection would ensure that the selected model order accurately models Initially, each signal of each class is modelled using the determined model order (2000) and their pseudospectra are estimated using the previously proposed method. The estimated spectra for one signal from each class are shown in Figure 6 (in blue colour).
It can be seen from Figure 6 (in blue colour) that the estimated frequency spectra contain peaks at various frequencies, which reveal all the frequencies that a particular signal contains. Therefore these frequencies that correspond to the peaks can be used for the Frequency responses of all the signals of each class of the class that a particular signal belongs. Then, all the frequencies that contain powers (eigenvalues), which are smaller than 1/10 th of the maximum power value of that spectrum are neglected. After neglecting those peaks, the pseudo-spectrum look as indicated by the red coloured plots in Figure 6 and the word pseudo-spectrum will refer to these plots henceforth. The purpose of doing this eigenvalues, which may correspond to noisy elements.

September 2016
Journal of the National Science Foundation of Sri Lanka 44 (3) Then, the elements of vector U can be thought as the binary outputs of the sensors that are placed at various P EVP:k . Since the primary key P EVP:k consists of the best set of sensors to identify the signals of class K, the correlation between U of the unknown signal. Therefore, which is most correlated with the vector U. The problem of utilising the dot product is that although it would take the presence of a peak at a particular frequency into consideration, the non-presence, which is indicated by a '0' element in the vector U would not be considered (because the dot product will not be affected by zero elements). Therefore, to rectify this problem 0.5 is subtracted from all the elements of primary keys as well as the elements of vector U. This would make both the presence and non-presence of a peak at a particular frequency of the spectrum of the unknown signal useful for the determination of its class.
... (10) largest such scalar. Here, the scalars will be in the range of 0 to 1 since the normalised vectors are used to obtain the scalars. It can be seen that S k is the largest scalar for most of the signals of the k th class. This proves the accuracy and For example, for the signals of the EVP1class, the S 1 is the largest scalar as depicted by Figure 8 (a).
class for the proposed approach. It can be seen that the proposed approach is capable of performing a robust signals are highly correlated and noisy. This technique is based on the minimisation of cross-correlation between which correspond to the uncorrelated information. This technique was proven to give highly accurate results for stationary signals. The results obtained by this technique signals) are illustrated in Figure 10. presented in Table 4. According to the table it can be stationary signals is very low compared to the results obtained from the method proposed in this paper, which is more appropriate for non-stationary signals.

PERFORMANCE COMPARISON BETWEEN THE PROPOSED METHOD AND SOME EXISTING METHODS
This section highlights the importance and the superior performance of the proposed method compared to the existing techniques.
Parametric techniques, which are widely used for distinct parameters that are capable of distinguishing signals according to their respective classes. The database of signals given in Table 1 is used to identify The box plots shown in Figure 9 illustrates the formant frequencies and their corresponding bandwidths) seen from these box plots that all the parameters are overlapped with different classes of signals and hence these signals accurately. Therefore, it can be concluded of highly non-stationary signals.
In our previous study (Nettasinghe et

Elephant infrasound call detection using the proposed method
In this section, the applicability of the method proposed in this paper to extract hidden signature features from noisy, non-stationary signals is described. The elephant infrasound call detection is used as the primary application for this purpose.
Previous research (Wijayakulasooriya, 2011;Dissanayake et al., 2012) has proven that elephants use low frequency elephant calls for long distance communication. These infrasound calls can be utilised for elephant detection due to a comb-like pattern observed in the time frequency distribution of these signals. This comb-like pattern, which is illustrated in Figure 11 is a signature of elephant calls. Figure 11 shows that the signature feature of the elephant call is masked by noise. Thus, it is crucial to extract the elephant call from surrounding noise such as other elephant sounds, wind and vehicular noise. The comb-like pattern makes it ideal for applying the proposed method. By applying the proposed method for an acoustic signal can extract signal illustrated in Figure 11 whose center frequencies are matched with the signature the existence of the corresponding center frequency in the signal can be checked. Figure 12 illustrates the clearly see that in all seven outputs a high output power (highlighted in red).
From Figure 12 it is apparent that the signature frequencies of the elephant call is extracted from these utilising the outputs of the selected . This is achieved by setting '1' when the output signal power and '0' when it does not. This binary array is considered as the outputs of seven sensors and thus fused together in order to make a decision on the presence of an elephant.

DISCUSSION
When the number of sensors which produce '1' as the output increases, the probability of the decision being a true-positive increases, i.e., the accuracy of the decision increases. Hence, the number of sensors which produce the same output as the decision made ('0' or '1', or in other words presence or the non-presence of an elephant), can CONCLUSION This paper proposes a novel non-PCA based approach for This approach generates an information rich comb-like subspace structure utilising an eigen analysis based pseudo-spectral estimation technique. The optimum model order for the spectral estimation is determined to be the lowest order, which captures all the distinct and dominant spectral information.
It is shown that the proposed method is capable of latching only on to the required subspaces located at distinct points of the spectrum, while the commonly used PCA based methods does not guarantee the capturing of all the distinct spectral information. It is further shown that the proposed approach has the capability of reducing two major drawbacks that exist in the STFT based spectral estimation methods, namely, the time-frequency resolution tradeoff and the inability to extract stationary signal components separately.
which is based on the proposed spectral estimation method. The proposed method was tested using a highly non-stationary speech signal database and the results were compared with the existing methods such as parametric methods and correlation minimisation methods to highlight the better performance.  Out put of the lter centered at 294