The world is full of complex, dynamical systems, from climate changes and tides to the route of the solar system around the galaxy. Our lives are also full of dynamical systems: fluctuations of the economy, the local and global trade, social dynamics, or the behavior of the consumers, are all dynamical systems, as are our heartbeats, blood pressure and a myriad of other vital constants and variables related to individual health.

Dynamical systems can be well defined with theoretical models using differential equations, but usually all we have are merely a collection of time series. The complexity of the natural processes usually makes it difficult to have good models that describe and analyze such chains of events, which are often chaotic, full of noise, or even seemingly random. We have to draw on complex data tools in order to transform that heap of data into useful information.

In this article, I will discuss two of those tools — artificial neural networks and Recurrence Quantification Analysis (RQA) — and how you can combine them to achieve some interesting results. Then I’ll demonstrate how you can use the R statistical analysis programming language and some related libraries to analyze complex real-world data sets with neural networks and RQA.

## Artificial Neural Networks

Neural networks are popular tools that are widely used in artificial intelligence (AI) and machine learning (ML) tasks like pattern recognition. Each neuron in a neural network can be seen as a kind of linear regression of the output value of the neuron from which it takes inputs.

The neurons are distributed in groups called layers, with all the outputs of one layer connected to the inputs of each neuron in the next layer. The data to analyze are fed to the inputs of the first layer, and the outputs of the last layer are the response from the neural network model to that input data.

You can use a neural network as a tool to predict the evolution of a time series, but the results obtained when working with complex time series are usually unsatisfactory. The accuracy in predicting a complex series quickly decreases as you go ahead in the future. This is due to the series’ nature; it is one intrinsic property of chaotic and complex dynamics. Still, the future evolution of a system represented by a time series can be conjectured if you are able to observe some changes, often subtle, in the dynamics of the series.

The problem is that you may need to process hundreds, or even thousands, of points to characterize the changes in the series dynamics. It is similar to what happens in image analysis, where sometimes a big amount of data should be processed in order to identify objects in a picture. Some sort of procedure to summarize the data would be highly useful, and that is precisely what RQA does.

## Recurrence Quantification Analysis

RQA is an alternative, less well known method we can employ for nonlinear data analysis of dynamical systems. Suppose that you have a time series related to the evolution of a population of lynx for a given number of years. We can postulate that, if there is a lynx population, there must be somewhere, in parallel, a hare population with its corresponding time series. The time series of the lynx population is one dimension of the whole system, but there can be more hidden dimensions, like that of the hares, and perhaps a carrot population, too. If you can properly reconstruct the D dimensions of the system, you might be able to discover some hidden properties of the dynamics that are not noticeable in the original one-dimensional time series. This number D is the *embedding dimension* of the system.

Although you only have one series, you can approximately reconstruct the whole system using just this one. The idea is using, for each hidden dimension, a delayed version of the initial time series: if the delay is d, the first dimension begins in position 0 of the original series, the second one in d, the third one in 2d, and so on. This gives us a series of points in a D-dimensional space, one coordinate from each delayed time series. The trajectory of the points of this series forms the *attraction basin* of the system, its *attractor* (for example, the Lorenz attractor of the butterfly effect).

In chaotic dynamics, that trajectory comes and goes by repeatedly visiting the neighborhood of points already visited, but following slightly different paths. When a point, at the instant j, passes near an already visited point (a distance less than a given value) at the instant i, this is called a *recurrence* of both points. You can build an m X m matrix using a chunk of m points of the series (the *window*). Each element (i, j) of the matrix is set to 1 if the points at indexes i and j of the series are recurrent, and it is set to 0 if they aren’t. A third important parameter is, then, this distance, or *radius*.

Once such a matrix is built, some measurements on the disposition of their elements can be made, like the number and length of vertical, horizontal and diagonal lines and their relative abundance. The most usual values are:

- the Recurrence Rate (RR) or the percentage of 1’s in the matrix (recurrent points)
- the DETerminism (DET), or the percentage of points forming diagonal lines
- Lmax, or the longest diagonal line
- L, or average diagonal line length
- ENTRopy, or the Shannon entropy of the probability distribution of the diagonal line lengths
- LAM, like DET but for vertical lines
- TT, the average length of the vertical lines

The point is that you can summarize hundreds, or even thousands, of time series’ points, using just about 10 numbers. Then, you can work with those few values instead of with the original time series.

There is a growing research field using RQA to work with complex signals, especially in Electroencephalography, where scientists begin to be able to identify subjects’ mood just analyzing electroencephalogram (EEG) and electrocardiograph (ECG) signals.

## Sample code

Let’s put it all working together with an example. The R environment is a free tool for statistical computing, with its own programming language and hundreds of specialized libraries, including libraries for data analysis with neural networks and RQA. I’ll use these resources to quickly develop a program that can classify two different ECG data sets using neural networks with the RQA values obtained from the ECG signals.

First, we need some data. Physionet is a website where you can find abundant medical records, such as the EEG During Mental Arithmetic Tasks data sets. Download the files Subject00_1.edf and Subject01_1.edf, which contain the recording of the EEG and electrocardiograph (ECG) signals of two individuals.

Now, we have to load some R libraries. Download them from The Comprehensive R Archive Network (cran) if needed:

library(edfReader) library(crqa) library(RSNNS)

The first package allows you to load .edf files.The second one is for the RQA calculations. The last builds the neural network.

Use the following lines of code to load the data containing the time series:

h1<-readEdfHeader("Subject00_1.edf") s1<-readEdfSignals(h1) h2<-readEdfHeader("Subject01_1.edf") s2<-readEdfSignals(h2)

You can explore the content of the data and the first 1000 samples of the ECG as follows:

summary(s1) plot(s1$"ECG ECG"$signal[1:1000],type="l")

The idea is to cut the ECG signal into chunks of 1000 points each, calculate the RQA values for each data chunk, then store the results in a matrix. To facilitate the task, we can write a function like this one:

rqadata<-function(data,l,clsval) { nc<-round(length(data)/l); for(j in 1:nc) { rqa<-crqa(data[(1+(j-1)*l):(j*l)], data[(1+(j-1)*l):(j*l)], delay=1, embed=2, radius=0.1, normalize=0, rescale=0, mindiagline=2, minvertline=2, side="lower"); if (j == 1) { vrqa<-c(rqa$RR,rqa$DET,rqa$NRLINE,rqa$maxL, rqa$L,rqa$ENTR,rqa$rENTR,rqa$LAM,rqa$TT,clsval); } else { vrqa<-rbind(vrqa,c(rqa$RR,rqa$DET,rqa$NRLINE,rqa$maxL, rqa$L,rqa$ENTR,rqa$rENTR,rqa$LAM,rqa$TT,clsval)); } } return(vrqa); }

The parameter data contains the time series, l is the chunk length, and clsval is the identifier of the individual. crqa is the function used to calculate the RQA measurements for the time series. Be patient; the process is not fast.

The next step is building a matrix with the RQA data samples for both ECGs and preparing it to train a neural network for the task of differentiating between them.

valrqa<-rqadata(s1$"ECG ECG"$signal[1:90000],1000,0) valrqa<-rbind(valrqa,rqadata(s2$"ECG ECG"$signal[1:90000],1000,1)) svalrqa<-scale(valrqa)

The network in this case is a simple multilayer perceptron. Use, for instance, the 80% of the samples to train the network, then you can check the accuracy with the remaining data.

train<-sample(1:nrow(svalrqa),0.8*nrow(svalrqa),F) X<-svalrqa[train,1:9] Y<-svalrqa[train,10] fitMLP<-mlp(x=X,y=Y,size=c(8,5),maxit=5000,learnFuncParams=c(0.01,0),linOut=T) predMLP<-sign(predict(fitMLP,svalrqa[-train,1:9])) table(predMLP,sign(svalrqa[-train,10]),dnn=c("Predicted","Observed"))

This is the output of the neural network when the samples not used to train it are used to try differentiating between the two subjects:

Observed Predicted -1 1 -1 19 0 1 0 17

The value -1 represents one of the individuals, and 1 is the value for the other one. As you can see, the neural network has classified correctly the ECG values of both subjects, 19 samples from subject -1 and 17 samples from subject 1, with a 100% of success.