Infinite Menus, Copyright 2006, OpenCube Inc. All Rights Reserved.

Order Futures and Commodities prediction software

Order forex prediction software

Review prediction software

Complete this form to see FREE recent forecasts from VantagePoint that are up to 86% accurate* at forecasting market trends!

          
  First Name:
  Last Name:
  Cell Phone:
  Home Phone:
  Work Phone:
  Email:
  State/Province:
  Country:
   
        
   

Call 800-732-5407 or 813-973-0496 to find out about our

CALL IN to save
up to
$600
on VantagePoint packages
Offer valid for call-ins only

1-800-732-5407
813-973-0496
Mention Priority Code:
Call-In Special

 

 


 

 

 

 

 

Synergistic Market Analysis with Neural Networks
The remainder of the chapter will examine the application of neural networks to Synergistic Market Analysis for financial forecasting in today's globalized trading environment. Benefits that make neural networks attractive to financial analysts and traders will be highlighted, and critical issues in the application of this technology to financial forecasting in the 1990s will be discussed. Emphasis will be on price and trend forecasting utilizing market data from a target market, various related inter-market, and fundamental inputs (see Figure 2). Where appropriate, I will discuss how other artificial intelligence technologies, such as expert systems and genetic algorithms, can be utilized in conjunction with neural networks to create hybrid information and trading systems.

Criteria for selection of appropriate neural network paradigms, architectures, and training regimens for this type of application will be discussed. Throughout the chapter, where appropriate, I will refer to research my firm has conducted in the development of its artificial intelligence software program, VantagePoint, which was commercially introduced in 1991. VantagePoint is a fully developed, pre-trained system of five neural networks in a hierarchical arrangement that implement Synergistic Analysis by combining technical, intermarket, and fundamental data for the purpose of predicting price and trend directions for various financial futures markets, including Treasury bonds, the S&P 500 stock index, and several currencies. For those interested in developing their own neural network applications, suggestions will be offered on how to avoid potential pitfalls to successful neural network design.

ARTIFICIAL NEURAL NETWORKS
Artificial neural networks are information processing models which attempt to mimic how the human brain processes information. Neural networks utilize a distributed processing approach to computation, in which many simple processing elements, or neurons, communicate through a network of interconnected links with associated variable weights. Information is stored in the network as a pattern of these weights, and incorporated by making changes to these weights. Neural networks are trained to behave in a desired fashion. As with humans, neural networks are capable of learning behaviors by being exposed to examples of those behaviors. Additionally, neural networks are capable of generalizing to related but previously unseen behaviors.

Neural networks are capable of solving a wide variety of problems by "learning" mathematical models. If data can be represented numerically, it can be used as inputs into a neural network. Therefore, technical and fundamental data related to a specific market or asset class, as well as related intermarket information, can be incorporated as inputs into neural networks.

In the remainder of this chapter, various aspects of neural network development for financial forecasting will be discussed. The topics include:

  • Paradigms

  • Architecture

  • Input data selection

  • Preprocessing input data

  • Fact selection

  • Training and testing

  • Implementation

In each section I will discuss basic decisions that must be made and highlight potential pitfalls that should be avoided.

Paradigms
Neural networks are being applied in a broad range of industries to solve many different categories of problems. These include classification, filtering, auto-association, pattern association, optimization, data compression, and prediction. Thus, before beginning development of a neural network application, the problem category must be identified. The proper choice of neural network paradigm for a specific application is highly dependent on the problem definition.

For example, paradigms like differential competitive learning and counter-propagation can be used for data clustering tasks, while the Hopfield network and brain state in a box paradigms can be used for auto-association, filtering, and pattern association. Each paradigm has numerous variations, depending on how parameters associated with it are chosen. There is no holy grail in picking a paradigm. It is more important to have a clear problem definition and match an appropriate paradigm to it. Since this chapter will focus on financial forecasting, the problem domain of interest falls into the prediction category. Two of the most widely used paradigms in financial analysis and forecasting are recurrent back-propagation networks and feed-forward back-propagation networks.

Recurrent Back-Propagation Networks
Recurrent back-propagation networks are particularly useful when working with time-series data. This type of network consists of a single functional layer of neurons that are fully connected to themselves via a time delay. Figure 3 shows a two-layer representation to make the architecture easy to visualize. Note that each neuron in the first layer is fully-connected to each neuron in the second layer. The neurons in the second layer feed back with a one-to-one mapping into the first layer. The second layer represents a time delay for the passage of data through the network. This type of architecture allows the network to learn temporal relationships.

Since the network can feed back upon itself, temporal information is learned as a result of the sequential order in which the facts are presented. There is no need to encode the temporal relationship into the input data. This alleviates the need for one of the major preprocessing steps associated with designing feed-forward back-propagation networks.

Feed-forward Back-Propagation Networks
A feed-forward network that trains through back-propagation of error throughout a multi-layered network is commonly referred to as a back-propagation, or back-prop, network. This is perhaps the most popular network paradigm for financial market analysis. A typical back-prop network architecture is shown in Figure 4. The primary functional difference between feed-forward and recurrent back-prop networks is that a feed-forward network is generally not designed to directly handle temporal relationships. Thus, time (temporal relationships) must typically be encoded into the "facts" presented to the network. To accomplish this, a technique, sometimes referred to as taking a "snapshot" of the data, is used to convert time-series data into a format necessary for training a feed-forward back-prop network.

For example, to present facts that contain the differences in the weekly close for the past five weeks, a snapshot of the data must be created by constructing a fact with an input vector containing five values (one for each difference) and an output for the next week. This would be done for each fact-week to be presented to the network, effectively encoding the temporal information (data from the last five weeks) into the facts themselves.

Instead, in the case of a recurrent net, each fact would be presented sequentially as a single difference. Since for every recurrent network there is a corresponding feed-forward network that can be designed with identical behavior, and since most commercially available neural network development packages do not include a recurrent model, the remainder of this chapter will use a feed-forward back-propagation network, for illustrative purposes.

This type of network learns by being given examples of inputs and expected outputs. To accomplish this, the network computes an error measure between its generated output and the desired output. This is calculated for each output in the output layer. The error is typically averaged over the entire set of facts, then propagated backward through the network, layer-by-layer, to be utilized in altering the weight connections between neurons.

During training, facts are presented to the network repeatedly until the error associated with the set of outputs is reduced to an acceptable level. Error can be judged by a variety of metrics, some of which will be discussed later in the chapter. For simple problems, the error level may be reduced to zero, but in most real world financial applications this is an unrealistic goal.

Architecture
Some of the decisions which must be made in reference to network architecture are:

  • What transfer function should be used?

  • How many inputs should the network have?

  • How many hidden layers should the network have?

  • How many hidden neurons per hidden layer?

  • How many outputs should the network have?

Back-prop networks are composed of an input layer, one or more "hidden" layers and an output layer. The hidden layers, separating the input and the output layers, are so named because they are not directly accessible to the network's user. This presentation, as represented in Figure 4, assumes that the layers are fully-connected. This means that each neuron in the input layer has a connection to each neuron in the hidden layer, with similar connections between the neurons in the hidden and output layers.

The standard model is relatively simple. For each individual neuron (See Figure 5), input data (I0-In) is multiplied by the weight (W0-Wn) associated with the connection to the neuron. The products are summed, with the result passed through a transfer function that maps the sum to a value in a specified interval, e.g. between zero and one. The output from each neuron is then multiplied by another weight and fed into the next neuron. If the neuron is in the output layer, as is the case with the example in Figure 4, then the output is not multiplied by a weight but is instead the network's output.

Transfer Functions
The transfer function, as mentioned previously, maps an individual neuron's inputs to an output. The neuron's input signals are multiplied by their respective weights, summed, and then mapped via the transfer function to an output. In general, the transfer function in a back-prop network should be a nonlinear, continuously differentiable function. This allows the network to perform nonlinear statistical modeling, needed to address the non-linearity associated with the financial markets. Therefore, it would be inappropriate, when modeling nonlinear systems, to use a function which represents a line with a constant slope, or a discontinuous function, like those shown in Figure 6.

The most commonly used nonlinear transfer functions include the logistic function (an example of which can be seen in Figure 5) and the tanh function, also known as the hyperbolic tangent function. Both functions are very similar, being of sigmoidal shape. The logistic function varies in height from zero to one, whereas the tanh function ranges from minus one to one. In fact, as is shown below, the tanh function is just the logistic function rescaled to the appropriate range.

The logistic function is sometimes referred to as the sigmoid function. Strictly speaking, this is not correct. "Sigmoid" refers to the shape of the function's curve when plotted. Many functions have a sigmoidal shape. The logistic function takes its name from the "logistic equation" found in population biology. This equation models the evolution of a population density over time:

Where U is the population density of a species, r is the birth rate of the species, k is the "carrying capacity" of the environment, and t is time. The left-hand side is the time derivative of the population density (the "velocity" of the change in magnitude of the population density). When U = 0 or U = k, the right-hand size is zero. These are the "steady states" of the equation. When U is 0 or k the population does not change with time. However, U = 0 is an "unstable steady state" which means that if the population is perturbed from U (i.e. U is given a small value much less than k), then the population will grow until it reaches U = k. Conversely, U = k is a stable steady state; if U is perturbed from k, the population will shrink or expand back to U = k.

In the study of neural nets, one usually uses k = 1. This gives the familiar sigmoidal curve which ranges from zero to one. The value of r determines the "steepness" of the ascent to one. Some popular neural net software packages let the user choose any nonzero value for r; others simply assign r = 1. In addition, one usually replaces the independent variable with x, instead of t.

The logistic function, with r = k = 1 as described above, can be written explicitly in closed form:

and the tanh function can be written as:

One can easily see that the tanh function is just the logistic function transformed by stretching the x and y axes by a factor of two and sliding the resulting curve down so that it ranges from minus one to one. That is:

g(x) = 2f(2x) - 1 or f(x) = 0.5[1 + g(0.5x)]

Put simply, if one is subtracted from two times the logistic function, the result is the same as the tanh function with half the input.

Either function can be used effectively in a back-prop network. However, there is some debate over which of these transfer functions allows for faster training. Empirically, in limited experimentation, we have found that the tanh function trains slightly faster than the logistic function. Intuitively plausible reasons why this might be so have been reported by other researchers.

Layers and Neurons
In addition to the transfer function, the number of layers and neurons per layer in the network must be selected. In the case of the input and output layers, this is straightforward. For example, in order to predict the change in the close for a particular futures contract based on a 20-day moving average of the close, a 5-day moving average of the high and a 5-day moving average of the low, the net would need three input neurons and one output neuron. An example of a net that would fit this description was depicted in Figure 4.

For a nonlinear problem, like predicting prices of a stock or commodity, the network would require at least one hidden layer. There are no standard rules for determining the best number of hidden layers or hidden neurons in a back-propagation network. While a back-prop net can have more than one hidden layer, only one is theoretically necessary to approximate any nonlinear function's input-to-output mapping. This is not to imply that a network with a single hidden layer is always best. As is the case with other architectural decisions, trial and error will help determine the best configuration for a specific network. Therefore, extensive experimentation is often necessary to make this determination. Typically, more complex problems require a larger number of hidden neurons. However, too many hidden neurons may cause a network to be "over fitted" to the training data, with poor performance on previously unseen facts.

Determining the number of hidden neurons is more art than science. One needs to train numerous nets, varying the number and size of hidden layers to get a feel for their affect on network performance. Automation of this process is necessary, since optimizing network architectures can be time-consuming, due to the large parameter space that is involved.

The more task-specific a neural network is, the more easily it can be trained. For instance, instead of designing one Treasury Bond network with two outputs, the next week's high and the next week's low, it may be preferable to design two independent networks, each with its own output.

One thing to remember when developing neural networks is that no single design decision, such as selecting the number of hidden layers or number of output neurons, determines how well the network will perform. Data selection and quality, data preprocessing techniques, optimization of training parameters, and testing protocols all affect network performance. The remainder of this chapter will examine these issues in more detail, and will illustrate real world examples to highlight common pitfalls that can arise at each stage in the development of a neural network.

Input Data Selection
The decisions that must be made during the input data selection phase of network development include:

  • What is the problem domain?

  • What market theory is subscribed to?

  • What are the candidate input sources?

  • Should the candidate input sources be technical, fundamental, intermarket, or a combination of the three?

Data selection involves the identification of appropriate input data sources for a network. This is a demanding task which must be performed judiciously to avoid the "garbage-in, garbage-out" phenomenon often associated with computers. A neural network's performance is dependent on the quality and relevance of its input data. Failure to include important data inputs can have a deleterious effect on the network's performance. Without a solid understanding of the problem domain, the likelihood of developing a successful neural network application is considerably lessened. When selecting input data for a financial neural network application, the developer must be cognizant of the implications of following a specific market theory. I posit that today's global markets are nonlinear, and possibly chaotic in nature, and that market inefficiencies exist, can be discerned quantitatively, and persist long enough for traders to profit from them.

One's analytic perspective on the markets clearly influences the selection of input data. Technical analysis would suggest the use of only single-market technical data as inputs. Fundamental analysis focuses on data that reflects macroeconomic factors such as economic reports that have an effect on the target market. In today's globalized markets, neither approach by itself is sufficient for financial forecasting. Instead, Synergistic Market Analysis, through the use of neural networks, combines both technical and fundamental analysis with intermarket analysis. The result is a multidimensional quantitative framework which overcomes the narrowness of single-market analysis and the limitation of having to interpret intermarket relationships subjectively through visual analysis of price charts, or through linear correlation analysis. Use of multiple data inputs reflecting a broad range of interdependent and interrelated global markets allow patterns and relationships in the data to be discerned often before they become so overt that they are discounted through the price discovery process.

Now, let us look at an example of a neural network designed to implement SMA in predicting the following week's high and low for the Treasury bond market. First, technical price data on Treasury bonds must be included as data inputs into the network. This makes general patterns and characteristics of the bond market apparent. Additionally, related fundamental information, such as the federal funds rate, should be included as inputs into the network.

Intermarket inputs can be identified through the application of various statistical analysis tools which determine correlations between data. Research involving sensitivity analysis, in which data inputs are varied, can be conducted to find the best mix of intermarket and fundamental data to use as inputs.

For instance, in the case of VantagePoint, single-market technical input data from the Treasury Bond market including open, high, low, close, volume, and open interest is utilized. In addition, similar technical data from eight related inter-market -- including the CRB Index, deutsche mark, Eurodollar, US dollar index, Japanese yen, S&P 500 stock index, crude oil, and gold -- are utilized along with the daily Fed funds rate. Other VantagePoint systems incorporate inter-market such as the FTSE and Nikkei stock indices as well as the Dow Jones industrial and utility Averages. APPENDIX A to this chapter contains a simple case study that exemplifies the benefits of utilizing intermarket data.

Preprocessing Input Data
To help a neural network produce accurate forecasts, the selected raw input data must be preprocessed. Two widely used preprocessing methods are known as transformation and normalization.

Transformation is used to manipulate one or more raw data inputs to generate a single network input. Normalization is a transformation that distributes data evenly and scales it into an acceptable range for network usage. Decisions made during this phase are:

  • What transformations should be applied to the data?

  • Should these transforms include standard technical analysis indicators?

  • How should the data be normalized?

As with the selection of raw data inputs, domain knowledge is critical to the choice of preprocessing methods.

Transformation
Two simple preprocessing methods involve the computation of differences between, or ratios of, inputs. These minimize the required number of input neurons and facilitate learning.

The noise component within raw price data tends to obscure underlying relationships between input data sources and slows down the training process. Smoothing techniques, such as moving averages, that help reduce the noise entering the network, are useful transforms. One obvious disadvantage of smoothing, however, is that some useful information may get lost. Additionally, smoothing a leading indicator can turn it into a lagging indicator. These points illustrate the tradeoffs that exist whenever smoothing methods are utilized.

Normalization
The goal during data normalization is to ensure that the statistical distribution of values for each net input and output is roughly uniform. If this is not done, and an input with, say, a normal distribution and a small variance is used, then the net will only see a small number of occurrences of facts away from the central tendency. Such a net will not perform well on such data in the future. The values should also be scaled to match the range of the input neurons. Therefore, in addition to any other transformations performed on network inputs, each should be normalized.

Simple Linear Scaling.
I will now discuss three useful methods of data normalization. The first is a simple linear method of scaling data. At a minimum, data must be scaled into the range used by the network's input neurons. This is typically minus one to one or zero to one. Many commercially available neural network development programs automatically linearly scale each input. This scaling function can also be implemented through a spreadsheet or by a custom-written program. Linear scaling requires that the minimum and maximum values associated with the facts for a single data input be found. Let's call these values Vmin and Vmax, respectively. Additionally, the input range required for the network must be determined. Let's assume that the input range is from Imin to Imax. Then the formula for transforming each data value V to an input value I is:

I = Imin + (Imax-Imin)*(V-Vmin)/(Vmax-Vmin)

Vmin and Vmax must be computed for each input. This method of normalization will scale input data into the appropriate range, but does not increase its uniformity.

Simple Statistical Normalization.
The second normalization method uses a statistical measure of central tendency and variance to help remove outliers, and spread out the distribution of the data. Doing this tends to increase uniformity. This, too, is a relatively simple method of normalization. First, the mean and standard deviation for the input data associated with each input are determined. Vmin is then set to the mean minus some number of standard deviations. For instance, if the mean is 32, the standard deviation is 5, and two standard deviations are chosen, then the Vmin value would be 22 (32-2*5). Vmax is conversely set to the mean plus two standard deviations. All data values less than Vmin are set to Vmin, while all data values greater than Vmax are set to Vmax. A linear scaling is then performed as described above. By clipping off the ends of the distribution in this manner, outliers are removed, causing the data to be more uniformly distributed. Assuming a normal distribution, two standard deviations would result in 95 percent of the data left unclipped, while three standard deviations would leave 99 percent unclipped.

Mendelsohn Histogram Normalization.
The third normalization method, the Mendelsohn Histogram Normalization (MHN) method, was developed by the Predictive Technologies Group, a research division of Market Technologies Corporation This function performs several transformations on the data to minimize the standard deviation of the heights of the columns in the initial frequency distribution histogram. The formulae it utilizes to perform the normalization have various parameters whose values vary depending on the specific input being transformed. To determine optimal parameter settings for each input, custom-written software is used to automate the search of the parameter space. The results of applying MHN have been quite promising.

Figure 7 depicts an example distribution, in the form of a histogram, in which the data is not uniformly distributed. To illustrate the effects of the various approaches, each of the three methods of normalization have been used to prepare this data as input to a neural net in the range zero to one. Figure 8 shows that a simple linear scaling of the data has no effect on the shape of the frequency distribution itself. Figure 9 shows the same original distribution normalized by the second method, in which two standard deviations are used to set the limits for the outliers so that the distribution becomes more uniform. Figure 10 shows the effects of performing MHN on the data, in which the resulting distribution is the most uniformly distributed. Other methods, in addition to these, exist for data normalization. Depending on the data to be normalized, some methods are more effective than others.

Denormalization.
During the testing phase of development, the output produced needs to be denormalized. Ideally, normalization should be reversible with little or no loss in accuracy. Normalization methods that clip outlier values are sometimes not sufficiently reversible. For instance, assume that during training all output values greater than 75 are clipped by assigning them a value of 75 regardless of their original value. Then, during testing, if the net produces an output of 75, this simply indicates that the net's output is 75 or more. If this level of detail is acceptable for a specific application, then the normalization method used is sufficiently reversible.

Transformation and normalization are routinely used to help improve a network's performance. For additional suggestions on preprocessing financial market data, please reference APPENDIX B.

After the network architecture has been selected, and the raw data inputs have been chosen and preprocessed, fact sets must be created.

Fact Selection
This section examines: 

  • What is a fact?

  • What is a fact set?

  • How many fact sets should be used in training a network?

  • What criteria can be used to select facts for the various sets?

A fact is a single input vector and its associated output vector. A fact is typically represented as a row of numbers where the first n numbers correspond to the n network inputs and the last m numbers correspond to the m network outputs. For example, assume that a network has been designed to predict the change in price of the Dow Jones Industrial Average (DJIA) one week in advance, based on the differences in both the highs and the lows for the past five days and a moving average of the closes for the past ten days. Each fact would be composed of a three-valued input vector and a single-valued output vector. The three input values would correspond to the differences in the highs and differences in the lows for the past five days and a moving average of the closes for the past ten days. The single-valued output vector would represent the change in the DJIA over the next week (See Figure 11).

A fact set is a group of related facts. The decision about what data to include in a fact set is important since facts should be selected which best represent the problem space that the neural network is to model. Although internal market data on a target market is readily available, relevant intermarket data may be unavailable, depending on when each of the related markets first began trading. For instance, while Japanese Yen futures began trading in 1972, the Nikkei 225 index started trading as a futures contract in 1990. To use both markets' data in a neural network application for currency predictions, the fact set could not start earlier than 1990. However, the use of shortened data sets in neural network training has its own risks which must be understood.

When the input data chosen does not span a long enough time frame, the likelihood of overlooking significant market characteristics, or including too few examples of them, increases considerably. For example, over the past decade, with the exception of 1987, there has really not been a bear market in the S&P 500 market. Therefore, a neural network trained only on recent data will not be able to adapt to changing market conditions such as would occur at the onset of a bear market. Yet, whether or not to include S&P 500 Index data from October 1987 in a fact set must be decided judiciously, since data from this period represents an extremely infrequent occurrence. Since this data is not supported by a sufficient number of examples, the network may not be able to learn how to recognize it in the future. Even so, its presence in the fact set might cause a bias that could reduce the overall accuracy of the system during more typical trading periods. Data availability and sufficient representation of various market conditions are important considerations in decisions regarding neural network input data and fact selection.

Training and Testing Fact Sets
Once the fact set has been selected, it is divided into two subsets, one for training and the other for testing. Back-propagation networks usually operate in two modes: a learning or training mode, and a recall or testing mode. In the learning mode, the network modifies the values of its interconnection weights between neurons to adapt its internal representation, in an effort to improve the mapping of inputs to outputs. In the recall mode the network is given new inputs and utilizes the representation it had previously learned to generate associated outputs without changing the weights. Since neural networks operate in these two modes, facts should be separated into at least two subsets: the training set and the testing set. The training set's facts are used during the network's learning mode, while the testing set's facts are used during the network's recall mode. Performance results from various networks on the test set allow for network comparisons and the determination of which net to use in the final application.

There are various criteria that can be used to determine the composition of the training and testing sets. First, they should be mutually exclusive, so that a specific fact does not reside in both subsets. Also, if two facts have precisely the same input and output values, one of these facts should be removed from the fact set before it is separated into subsets. Additionally, caution must be exercised when using commercial tools that automatically split the initial fact set. For example, in an 80/20 split, some tools may place every fifth fact into the test set. If the facts are in chronological order before the split, all data representing one day of the week, such as a Monday or Friday, could be assigned to the test set, while the rest of the data representing the remaining trading days would be assigned to the training set. Since doing this would adversely affect the network's performance, the facts should be randomized before splitting them into subsets, or should be randomly assigned to the subsets.

Even if proper precautions are taken to randomize fact order and split the fact set into subsets, all facts with a particular characteristic might still be assigned to one subset or the other. To minimize the potential of this occurring, it is advisable to identify the most important characteristics thought to be associated with the data and determine the initial fact set's underlying distribution relative to these characteristics. Then the fact set can be split so that the training and testing subsets will have similar distributions relative to these characteristics. Statistical analysis or clustering algorithms can be used for this purpose. A careful analysis of the fact set also allows outliers to be identified and eliminated.

Experimentation with various data handling methods should be performed before selecting one. The Predictive Technologies Group has developed a training/testing regimen which splits the initial fact set into three mutually exclusive subsets rather than just two. In addition to standard training and testing sets, a second testing set is utilized, which contains examples of those facts thought to be most important in judging network performance. This test set is then used to evaluate various networks on a comparative basis.

Training And Testing
In this section, I will discuss the process of training and testing a back-propagation neural network. When performing these steps of network development the following issues must be addressed:

  • How should the initial weights be determined?

  • What training algorithm should be used?

  • What is the learning rate? How should it be set?

  • What is momentum? How should it be set?

  • What is simulated annealing?

  • What is over training? How can it be avoided?

  • What metrics are to be used for testing?

The training process can begin once the training fact set has been created. The first task performed during training is to initialize the weights. As mentioned earlier, the weights change during the training phase, as the network adapts its internal representation to model a problem. Typically, relatively small random weights are used to initialize the network. It is advisable to train the same network with different sets of initial weights, since these initial conditions can affect how the network trains and ultimately performs.

Learning Algorithms
When performing back-propagation, a number of different learning algorithms can be used. Each one provides a method of minimizing the overall error associated with the network's output, by traversing the net's error surface, or error landscape. Since the landscape is immense, each algorithm attempts to minimize the overall error while evaluating as few points as possible on the error surface. A tradeoff exists between network performance and the time required to reach a solution. If every point on this multidimensional surface were to be evaluated in an exhaustive search, optimal performance could be assured. Since this would be impossible for all but the simplest problems, algorithms which can produce acceptable solutions in a reasonable time frame are utilized.

Gradient-Descent.
One of the most common algorithms used in back-propagation is the gradient-descent algorithm. This is a robust algorithm which is simple to understand and intuitively appealing. From some initial starting point on the error surface, it determines the gradient. The gradient quantifies the slope, or steepness, of the curvature of the error surface at that point. It is the direction that points "uphill" most sharply. The algorithm uses this information to move in the exact opposite direction by an amount proportional to the learning rate, a constant that is discussed later in this chapter. By utilizing this "downhill" movement along the error surface, the algorithm minimizes error.

Unfortunately gradient-descent is somewhat slow and prone to get stuck at local optima. These are points on the error surface which have minimum error in reference to all the points in some surrounding region, but are not the optimal point for the entire error surface. As a simple example, if the error surface contains two valleys, one "shallow" and the other "deep," a local optima would be at the bottom of the shallow valley, while the global optima (minimal point of error) would be at the bottom of the deep valley. Additionally, the error surface must have a certain degree of smoothness for gradient-descent to perform well. Also, one must be able to actually compute the gradient of the objective function which defines the surface.

Other optimization algorithms are available that have different performance characteristics than gradient descent in one way or another. Methods such as conjugate-gradients, Newton-Raphson, Levenberg-Marquardt and genetic algorithms have their own strengths and weaknesses. None is the best for all optimization problems. A good text on optimization theory will cover these issues in detail.

Conjugate-Gradients.
The conjugate-gradients approach finds the optimal point along the current gradient by doing a line-search (for example, a binary search). It then computes the gradient at the new point and projects it onto the subspace defined by the complement of the space defined by all previously chosen gradients. This indicates the new direction in which to search. The new direction is always perpendicular to all previous search directions, and so is a conjugate gradient rather than a true gradient. This algorithm has the most interesting property of converging in N steps to the optimal solution if the search-space has dimension N, and if certain somewhat stringent conditions are met by the objective function.

Newton-Raphson.
The Newton-Raphson method has the advantage of fast convergence to the optimum point on the performance landscape. The gradient-descent algorithm takes steps of constant size toward its goal. In effect, the number of correct digits in its solution is increased by one per time-step. The Newton-Raphson method, by comparison, takes variably sized steps, chosen so that, in effect, the number of correct digits in its solution doubles each time-step. This so-called quadratic convergence is the strength of this method. Unfortunately, the method requires a relatively smooth, well behaved landscape and a starting point that is relatively near the optimal point.

Levenberg-Marquardt.
The Levenberg-Marquardt algorithm can be thought of as an approach which combines the best properties of gradient-descent and Newton-Raphson. It will slowly improve upon an initial starting point like gradient-descent, but exhibits quadratic convergence as it gets close to the optimum.

While the above algorithms are superior for some problems, they all require a starting point somewhere in the vicinity of the optimal point, and some degree of smoothness of the performance landscape. This is not true of genetic algorithms.

Genetic Algorithms (Gas).
A simple generalization of the brute-force method involves defining probability distributions for each parameter in the search-space so that parameter values that are unlikely to yield good results have a low probability. When this is done for all parameters, an algorithm that preferentially searches the higher probability hyper-regions in parameter space should be more efficient at finding good solutions. This is the basis for the Monte Carlo search methods.

Genetic algorithms use a similar approach, but do not require the assignment of a priori probabilities. Rather, they determine the probability distributions implicitly by evolving a population of solutions over time. They use simple mechanisms analogous to those used in genetics to breed populations of superior solutions. Those that do well "breed" with other solutions to form new solutions. Solutions that perform poorly are culled.

Genetic algorithms are highly recommended as a general search method. They do not require any special initial conditions, and make no requirements on the smoothness of the performance landscape. They are a very general class of optimization algorithms that are quite robust and widely applicable.

Genetic algorithms can be used to train a neural network by evolving populations of weight matrices. In this case, back-propagation of errors is not needed. Only the forward-propagation of facts through the net and subsequent evaluation of the fact-errors is required. Alternatively, genetic algorithms can be used to control only the free parameters within the traditional gradient-descent based back-prop algorithm. Each member of the population to be evolved might have a different learning rate and momentum.

The Learning Rate
A neural network "learns" during training by altering its weights, based on error information propagated backward throughout the network from the output layer. Error can be propagated in this manner each time a fact is presented, after a subset of the facts have been presented, or after all facts have been presented. One cycle of presenting all facts to the network is commonly referred to as an epoch. With each change in the weights' values, the network is taking a step on a multidimensional surface, which is a representation of the overall error space. During training the network traverses the surface in an attempt to find the lowest point, or minimum error. Weight changes are proportional to a training parameter called the learning rate.

Oscillation.
The largest possible learning rate should be selected, which does not result in oscillation. As a simple example of oscillation, imagine that a network's current weight values place it halfway down a valley on a two-dimensional error surface, as depicted in Figure 12. If the learning rate is too large, the network's next step may place it on the other side of the valley as opposed to moving it closer toward the bottom. The following step may return it to the original side. In this simple example of oscillation, the network tends to bounce back and forth from one side of the valley to the other without moving toward the bottom where the solution lies. On the other hand, if the learning rate is too small, meaning that the steps the network takes are very small, it could take much too long to get to the bottom of the valley. Since each problem space has its own unique error surface, it is necessary to vary the learning rate to find the best balance between training time and overall error reduction for a specific application.

Momentum.
Another training parameter, known as momentum, acts as a filter to reduce oscillation. Therefore, it allows for the use of higher learning rates to obtain solutions similar to those found with lower learning rates, thus potentially decreasing the training time. Learning rates and momentum should be adjusted through experimentation. Some development tools include additional parameters such as temperature, gain, and noise which can also be modified to affect the training process.

Training and Testing Automation Necessary
The parameter space of a back-propagation model is the multi-dimensional space defined by all free parameters in the model. If a model had only two free parameters, such as the learning rate and momentum, then the parameter-space can be represented graphically in two dimensions with the learning rate on one axis and momentum on the other. Here, the parameter space is composed of the quarter-plane defined by the positive values of the two parameters. (see Figure 13).

Continuing with this simple example, the goal of the net developer is to find an optimal set of values for the parameters, whereby an optimally performing net is produced upon training with these values. This amounts to finding the "best" point in the parameter space.

Brute-Force.
A simple strategy for finding the optimal parameters is the brute-force approach. Here, a large set of points in the parameter-space is examined to determine how each performs. Assume that a third axis representing performance is defined. The three dimensional space created by the learning rate, momentum and performance can be viewed as creating a performance landscape. Each experiment finds one point on the performance landscape. After many points have been identified, the shape of the landscape may become apparent (at least in this simple example). Now the landscape may be useful in guiding the selection of trial parameter-space points.

After some time is spent investigating the properties of the back-propagation paradigm for neural net training, the vast size of its parameter-space can be appreciated. It is truly immense. The training parameters may vary from node to node in the net and from epoch to epoch during training. If a net has 100 trainable nodes and is trained for 1000 epochs, then the two-dimensional example is suddenly 200,000-dimensional!

To further complicate matters, all possible initial (random weight) conditions, as well as the number of hidden layers and nodes, must be considered as part of the parameter-space. In this light, the parameter-space is virtually infinite in extent.

Finally, the selection of net inputs can be viewed as part of the parameter-space. Only the desired net output is known; all else is a variable represented in this generalized parameter-space.

Automation.
The myriad decisions that must be made in the development of a neural network, as well as the sheer size of the parameter-space, makes automation of the training and testing process mandatory for any serious real world financial application. This is particularly true in setting training parameters, selecting preprocessing, and choosing the number of hidden layers and neurons. Tools such as genetic algorithms can be used to expedite parameter-space searches, and methods such as simulated annealing are useful for automating learning-rate adjustments during training.

Simulated Annealing
Simulated annealing is a training method that simulates the annealing process by including a temperature term that directly affects the learning rate. The temperature begins relatively high. This allows the network to move quickly over the error surface. The temperature then decreases as training progresses. Learning slows down as the network cools and settles upon a near-optimum solution. The use of simulated annealing also reduces the likelihood of oscillation. Figure 14 depicts a two-dimensional example of simulated annealing, in which the step size is reduced to avoid oscillation while finding a minimum point on the error surface.

Avoid Over Training
In neural network training, one of the major pitfalls that must be avoided is over training. This is analogous to the common problem of over optimizing rule-based trading systems. Over training occurs when a network has learned not just the basic mapping associated with the input and output data presented to it, but also the subtle nuances and even the errors specific to the training set. An over trained network performs very well on the training set by simply memorizing it, but performs poorly on out-of-sample test data and subsequently during actual trading since the network is unable to generalize to new data.

The easiest way to avoid over training is to use an automated training/testing routine in which testing is an integral facet of the training process, rather than a procedure that is performed after training is complete. In this manner, network training is halted periodically at predetermined intervals. Then the network operates in recall mode on the test set to evaluate the network's performance on selected error criteria. Thereafter, training is resumed from the point at which it was halted. This alternating process continues iteratively, with interim results that meet the error criteria retained for later analysis. When the performance on the test set starts to degrade, it can be assumed that the network is beginning to over train. The best saved network configurations up to this point are then recalled for further evaluation. A clearly defined training/testing methodology is necessary to conduct an apples-to-apples comparison of various networks as the architectures, selection of raw data inputs, preprocessing, and training parameters are refined.

Error Measures
There are numerous ways to evaluate a network's performance on test data. For example, assume that a network has been designed to predict the close for the next day. One possible error metric might be the difference between the actual close and the network's output. This value would typically be determined for each fact in the test set, summed and divided by the number of facts in the test set. This is a standard error measure called average error. Unfortunately, when judging network performance, this metric is not particularly useful, since the positive errors cancel the negative errors. A much better error metric is average absolute error, in which the absolute value of the error for each fact in the test set is summed and then divided by the number of facts in the test set. Examples of other error metrics based on the distance from the target value include sum-of-squares error and root mean squared (RMS) error. The sum-of-squares error is computed by squaring the error for each fact and then summing those squared errors over the entire test set. The RMS error is the square-root of the average of the squared errors. The RMS and sum-of-squares error metrics weight larger errors more heavily than the average absolute error. Other metrics can be used which calculate how often the network predicts a movement in the right direction, or how well network predictions match the shape of the actual price movement over the same time period. Additionally, if a neural network is developed to generate trading signals rather than make price predictions, criteria such as maximum draw down, net profit, and percent profitable trades can be used as testing error metrics.

Since many commercial neural network development tools are limited with respect to the error metrics available, development and implementation of custom error functions is highly desirable. Error metrics which best measure those characteristics that are most important in the final application should be incorporated into the testing methodology. By tailoring error functions to the specific application and outputs, real world neural network performance can be substantially improved.

Iterative Refinement.
Interestingly enough, neural networks can be used to judge the performance of other neural networks. One simple way to determine how much a net can still be improved is to train another net which predicts the errors of the first net. If the second net learns to predict a significant portion of the first net's errors, then the first net could still be improved.

At this point, one may simply replace the first net with the combination of the first and second nets. The first net predicts X, with error Y. The second net predicts y with error z. If the second net does well at predicting the errors of the first net, error z is less than error Y. Thus, combining the net outputs gives X+y, with error z, which is an improvement on the error Y.

This process, which my research staff terms iterative refinement, may be repeated indefinitely. However, in practice a single additional net is usually sufficient. The inputs to the additional nets are not limited to the inputs used by the first net. For example, the technical indicators, statistical transforms, etc. used to preprocess the inputs for the first net may be used instead on the error time-series produced by the first net's outputs.

In fact, there are many approaches that can be used to improve a net's performance. Most involve constructive algorithms applied during the training or retraining of an existing net.

Expectations of Performance
Expectations for any type of financial forecasting application depend on one's perspective concerning the underlying dynamics of the target market. For example, if a neural net is designed to forecast a completely random time series, then it should not be surprising to observe large prediction errors since, by definition, such a time series is unpredictable.

The degree of randomness in the markets has been a long-standing subject of debate. Although there is a substantial body of literature on the subject of the Random Walk Hypothesis, no single opinion prevails. Recent studies indicate that stock market prices do not follow a random walk.

I take a more pragmatic and less theoretical view of this debate. I believe that a given market is driven by both stochastic (random) and deterministic forces. Only the deterministic component is predictable. However, even chaos can be deterministically generated. Recent work at Los Alamos National Laboratory has shown that neural networks can predict such chaos quite well.

The equity curve (discussed in APPENDIX C to this chapter) produced by a simple hybrid trading system based on VantagePoint's predicted information indicates that there is a sufficient degree of predictability within the markets to be profitable. Currently, the maximum achievable forecasting accuracy is unknown. Certainly no one expects to achieve zero error since this would require a model that could account for every possible variable affecting the markets. On the other hand, just because something is currently unpredictable does not mean it is random. Indeed, each revision of VantagePoint is able to predict events that had previously appeared to be stochastic noise. So for now, it is unclear where the performance "ceiling" is located.

The development of a successful neural network for any non-trivial problem requires a considerable expenditure of time and effort. Even with extensive in-house research and development tools and access to a multitude of commercial tools, neural net development to implement Synergistic Market Analysis is a time consuming, labor intensive task which demands expertise in financial market analysis, computer science, and applied mathematics. For these reasons, a team effort is necessary for successful neural network development.

Implementation
Now I would like to concentrate on how Synergistic Analysis can be implemented through the use of neural networks. I would also like to discuss how to utilize neural networks as part of an overall trading strategy, in which they can be integrated into either of two types of trading applications: information systems or trading systems. VantagePoint will be used as an example to illustrate how this can be accomplished. Finally, I will take a brief look at future applications of artificial intelligence technologies to implement Synergistic Market Analysis in global asset allocation and financial forecasting.

Information Systems
Neural networks can be used to implement information technology systems that generate forecasts related to a specific target market or asset class, such as price forecasts, predictions about market direction or turning points, or predictions of risk/return for various assets over a specific time period. In this context, the forecasted information can be used alone, or in conjunction with other available information. Information systems can be made up of a single neural network, or a multi-network system such as VantagePoint. Here four networks are specifically designed and trained to make independent market forecasts of the high, low, short-term and medium-term trend direction for use on the following trading day. Since these predictions are independently arrived at, they can be used as confirmations to one another.

Additionally, in VantagePoint the outputs from these four networks are used as inputs to a fifth network which predicts market turning points. This type of network architecture, depicted in Figure 15, is referred to as a hierarchical neural network.

By designing each network to include just one output, large networks are not needed to perform all the work. Instead, predictions derived from networks at the primary level of the hierarchy are incorporated as inputs into a network, or networks, at the secondary level. This kind of hierarchical architecture facilitates faster training, since all networks at the primary level of the hierarchy can be trained simultaneously, as each network focuses solely on a single output.

VantagePoint's predictions can be visualized graphically with various charts or in tabular form on its daily trading prediction report. When viewing the charts, users can select four different chart types, from bar charts to candlestick (see Figure 16). Up to eight different studies can be overlaid on each chart. These studies include both the forecasted information produced by VantagePoint's neural networks as well as information computed from these forecasts to help traders utilize the information most effectively. Additionally, a variety of parameters, shown in Figure 16, allows users to customize the appearance of the charts.

An example of a chart produced by VantagePoint is shown in Figure 17. This chart was produced by the VantagePoint Treasury Bond System with predicted high and low values plotted over the daily bars. This type of predicted information is particularly useful for determining entry and exit points for day trading or position holding. If the forecasted indicators on the daily report suggest that tomorrow will be an up market day, day traders might wait for the market to trade down toward the predicted low, then enter a long position. The reverse would involve entering at or near the predicted high on a day expected to be down. Using forecasts of market trend direction in conjunction with predicted highs and lows greatly increases the potential for profitable day trades. Two examples of this are shown from the March 94 Treasury Bond contract.

In the example on the left in Figure 18, the up arrow (indicating an expected upward trend in market direction) and the predicted low for tomorrow are generated on December 2, 1993. If a long day trade was taken from open to close on December 3, 1993, based solely on the anticipated direction, a profit of 12 ticks ($375.00 before slippage and commission) would have been made. If, instead, a limit order entry to go long had been placed at the predicted low, with an exit at the close, 24 ticks profit ($750.00 before slippage and commission) would have been realized.

The example on the right shows the same concept in reverse. Instead of entering a short position at the open and exiting at the close on an expected down day, one could place a limit order to enter a short position at the predicted high and exit at the close. This would result in a profit of 10 ticks ($312.50 before slippage and commission), as opposed to just three ticks. Additionally, day traders can use the predicted high/low trading range to set exit points, rather than waiting for the close to exit from a day trade. In this scenario, on a day when the market direction is predicted to be up, a long position is taken at or near the predicted low, then closed out intraday at or near the predicted high.

There is always the possibility that a limit order to enter the market may not get executed when the market does not reach the entry objective set by the predicted high or low. Still, profitability of those trades that are executed can be increased, due to the more advantageous entry level.

Position holders can apply the same principles in entering the market, using the predicted range on subsequent days to set daily stops. For example, if position holders are long Treasury Bonds and the market is expected to continue to move up tomorrow, they might set their stop for tomorrow a few ticks below tomorrow's predicted low which acts as a support level. This would decrease the likelihood of being stopped out prematurely during the day as the result of intraday volatility in the market, yet protect profits in the event of an abrupt market downturn.

Both position holders and day traders can use forecasted information to their advantage. This information can be used alone, or in conjunction with other information, to generate buy and sell signals. One still popular method of technical analysis involves the use of moving averages in a crossover system. Typically, two moving averages are plotted on a chart. Buy and sell signals are generated when the short moving average crosses over or under the long moving average. The obvious limitation of moving averages is that, by definition, they tend to lag behind the market. Therefore, moving average crossover systems typically get in and out of trades after the turning points in market direction have occurred. Neural network generated trend forecasts can be used to reduce the lag associated with a traditional moving average crossover system. For instance, instead of calculating the value for today's short moving average, the forecasted moving average value for two to four days into the future can be used as the short moving average, in a crossover system. This reduces the lag, since the short moving average is a prediction of its value at a point in time in the future, not a calculated value as of today. An example of a move captured by the crossover of a forecasted ten day moving average four days in the future against a calculated ten day moving average today is shown in Figure 19 as it would appear in VantagePoint.

VantagePoint has adjustable parameters that allow users to customize it to their styles of trading. Figure 20 depicts a sample screen containing the parameters that traders can set to customize VantagePoint's forecasts. One area of flexibility built into VantagePoint allows users to emphasize the importance placed on each of the predictions in affecting the Strength Index which measures the strength of the impending move. This is done by altering the various "Weight" parameters seen in Figure 20. Signals that indicate the general market movement (up, down or sideways) are then generated by filtering the Strength Index by the "Upper Strength Limit" and "Lower Strength Limit," also seen in Figure 20. As is evidenced by the figure, other parameters can also be set to aid in tailoring VantagePoint to a particular trading style.

Trading Systems
Neural networks can also be utilized within formal trading systems in several ways. Neural networks can be trained to forecast trading signals. This approach is appealing, but has limitations which must be understood. Designing such a system requires that the trader who is ultimately going to use the system play an integral role in its development. Since the network in the final application will generate its trading signals based on the buy/sell points and the choice of selected input data and preprocessing performed during the development phase, the signals should be consistent with the trader's style, risk propensity, investment time horizon, and capitalization.

Traders have different trading styles. Even with perfect hindsight, no two traders would identify the same buy/sell points in a given market over the past year. Therefore, traders with limited willingness to tolerate draw down would not design and train neural networks that would generate signals appropriate for others with larger capitalization or a higher risk propensity. Additionally, it is not easy to incorporate risk management considerations into a neural network-based trading system. For this reason neural networks are best used as part of a hybrid approach.

Hybrid Trading Systems
Neural networks can be used to construct hybrid trading systems. The neural network would generate predictive information that could be used in conjunction with a set of rules that generate trading signals (See Figure 21). This approach combines an information system on the front end with a rule-based system on the back end. The rule-based portion of the system could range the gamut from relatively simple mathematical constructs to sophisticated expert systems. Regardless of how the rules are derived, they would need to be tailored to the trading style and requirements of the trader who will use the system.

Now let us examine how an information system such as VantagePoint can be used as part of a hybrid trading system. VantagePoint would represent the box labeled "Information System" in Figure 21. For the box labeled "Rules" we have devised a simple set of rules that utilize VantagePoint's predicted information as a means of generating buy and sell signals. This particular system uses only some of the information generated each day by VantagePoint's Treasury Bond system. This includes:

  1. The predicted high for tomorrow.

  2. The predicted low for tomorrow.

  3. The medium market. This is a user-adjustable indicator based on the various forecasts produced by VantagePoint. It generates up, down and sideways arrows on VantagePoint's charts indicating the market trend direction.

If two up arrows or two down arrows occur, within a specified window, in the medium market, the system takes a long or short position, respectively, on the following day at the open with a market order. Timing decisions concerning whether to enter with a limit order, in conjunction with VantagePoint's predicted high and low, for a more advantageous entry, are left to the trader's discretion. A full description of the details of the system can be found in APPENDIX C to this chapter, along with a trade listing and summary of trades made on the December 1992, March 1993, June 1993, September 1993, December 1993, March 1994, and June 1994 Treasury Bond futures contracts. Hypothetical trading of the system over these contract months (over 1.5 years of trading) resulted in the equity curve shown in Figure 22.

This chapter has briefly covered all aspects of neural network development including architectural decisions, input selection, preprocessing, fact selection, training, testing and implementation. Each of these phases of neural network development has been examined in the context of the recent globalization of the world's financial markets and the need to establish a synergistic analytic framework for global trading. While an in-depth discussion of the development of an actual neural network system such as VantagePoint is beyond the scope of this chapter, a simple case study that utilizes some of VantagePoint's features can be seen in Appendix A to this chapter.

What's Next?
Neural networks are an ideal tool for combining otherwise disparate data within a quantitative framework for implementation of Synergistic Analysis. Through the use of neural networks, nonlinear patterns and relationships between markets and asset classes can be ascertained. In the global markets of the 1990s, it is foolish to ignore this valuable information by limiting one's analysis to a single market. Still, it must be realized that neural network technology is just one of the tools that can be used to implement a synergistic global approach to financial forecasting and asset allocation. As market globalization accelerates and more professional money managers and sophisticated traders realize the benefit of a synergistic-quantitative approach to market analysis, they will recognize that technical analysis, as it is currently defined, is too narrow an approach to follow. Synergistic Analysis will emerge as the proper framework for trading in the global marketplace.

Other related technologies such as expert systems and genetic algorithms have a role in implementing Synergistic Analysis for financial forecasting. In fact, neural networks can be used to help extract primitive rules, which capture patterns that would not otherwise be apparent, for incorporation into an expert system.

Genetic algorithms are powerful search mechanisms, well suited to optimizing neural network parameters. As mentioned earlier, during training they may be used as a training algorithm or to search the space of training parameters in an efficient manner. Similarly, genetic algorithms can be used for net architecture selection.

Two of the most powerful applications of genetic algorithms are for input selection and preprocessing, which are perhaps the most challenging tasks faced in neural network development. To a very large extent, they determine the maximum possible performance achievable by the net. By automating the search for an optimal set of net inputs, a much wider range of inputs may be examined efficiently.

The same technology incorporated into genetic algorithms has also been used in classifier systems and genetic programming. Classifier systems perform a type of machine learning that generates rules from examples. Genetic programming goes even further by automatically generating a program from a set of primitive constructs. In addition to genetic models, fuzzy logic, wavelets, and chaos are also being applied in a multitude of domains including financial forecasting. Even virtual reality has applicability to financial market analysis.

Hardware advancements are also having an effect on the rate at which new analytic technologies emerge. Since many artificial intelligence implementations are computationally intensive, they will benefit greatly from more powerful computer systems, particularly hardware configurations known as massively parallel machines. Rather than the step-by-step approach to problem solving taken by serial computer systems, parallel processing machines work on different parts of a single problem simultaneously. This means that the computing time associated with solving a particular type of problem can often be reduced by orders of magnitude, once a suitable method of dividing the problem is devised. Technologies like neural networks and genetic algorithms are especially suited to these parallel processing machines. With connectionist machines, accelerator boards, hypercube architectures and other new hardware developments on the horizon, it will become more cost-effective for researchers to explore the application of various emerging technologies to financial market analysis.

Although this chapter's primary focus has been on the application of SMA utilizing neural networks to perform financial forecasting in the context of futures trading, the applicability of Synergistic Analysis goes far beyond this single arena. One area where my firm performs research is global asset allocation, in which derivatives and mutual funds are used to represent various asset classes in a global portfolio. This allows the portfolio to be easily rebalanced at predetermined intervals with minimal transaction costs, without altering the asset class structure of the portfolio.

Synergistic Analysis can be used to minimize diversifiable risk in a portfolio comprised of various global asset classes, by determining the nonlinear relationships and correlations between asset classes, and forecasting risk and return for each asset class over various time frames. By implementing SMA with neural networks in this manner, the portfolio can be rebalanced to provide higher return for equivalent risk, or lower risk for equivalent return. Other technologies, such as expert systems, can be used to discern investor characteristics, thus improving performance even further. For example, most asset allocation fund managers currently use questionnaires to ascertain investor characteristics such as risk propensity. A properly designed expert system could achieve similar or better results more expediently and contain a considerably more extensive knowledge base than a standard questionnaire.

As research analysts continue to explore the application of these technologies to financial market analysis, complex hybrid systems will be developed. Traders should understand that neural networks are only a tool and not the long-sought-after holy grail that can guarantee easy profits in today's global financial markets. Instead, Synergistic Market Analysis will come to rely upon several of these emerging technologies which, when used in concert as part of a hybrid approach to market analysis, will offer a competitive advantage over less robust single-market analytics.

Acknowledgments
The author would like to thank James T. Lilkendey, M. S., and Phillip Arcuri, Ph. D., of the Predictive Technologies Group, for their assistance in preparation of this chapter.

Synergistic Market Analysis, Market Synergy, Synergistic Analysis, and Synergistic Trading are trademarks of Lou Mendelsohn.


REFERENCES
1. Eiteman, D. K., Stonehill, A. I. & Moffett, M. H. [1992]. Multinational Business Finance, Addison-Wesley Publishing Company.
2. Mendelsohn, L. B. [1990]. "Building a Global Safety Net," The Journal of Commerce, February 5, 1990.
3. Mendelsohn, L. B. [1990]. "24-hour trading: Let's do it right," Futures, April 1990.
4. Diamond, B. B. & Kollar, M. P. [1989]. 24-Hour Trading: The Global Network of Futures and Options Markets, John Wiley & Sons.
5. Walmsley, J. [1992]. The Foreign Exchange and Money Markets Guide, John Wiley & Sons.
6. Ibbotson, R. G. & Brinson, G. P. [1993]. Global Investing: The Professional's Guide to the World Capital Markets, McGraw-Hill, Inc.
7. Levine, S. N. [1992]. Global Investing: A Handbook for Sophisticated Investors, Harper Business.
8. DeGooijer, J. G. [1989]. "Testing Non-linearities in World Stock Market Prices," Economics Letters, Vol 31.
9. Grandmont, J. & Malgrange P. [1986]. "Nonlinear Economic Dynamics: Introduction," Journal of Economic Theory, Vol 40.
10. Mendelsohn, L. B. [1989]. "It's Time to Combine Fundamental and Technical Analysis for a Total Game Plan," Barron's, March 13, 1989.
11. Mendelsohn, L. B. [1983]. "Picking software programs: Know their limitations," Commodities (Futures), May 1983.
12. Mendelsohn, L. B. [1983]. "History tester important factor in software selection," Commodities (Futures), July 1983.
13. Murphy, J. J. [1991]. Intermarket Technical Analysis, John Wiley & Sons, Inc.
14. Mendelsohn, L. B. [1991]. "The Basics of Developing A Neural Trading System," Technical Analysis of Stocks & Commodities, June 1991.
15. Chinetti, D., Gardin, F. & Rossignoli, C. [1993]. "A Neural Network model for Stock Market Prediction," The Second International Conference on Artificial Intelligence Applications on Wall Street.
16. Jang, G. & Lai, F. [1993]. "Intelligent Stock Market Prediction System Using Dual Adaptive-Structure Neural Networks," The Second International Conference on Artificial Intelligence Applications on Wall Street.
17. Trippi, R. R. & Efraim, T. [1992]. Neural Networks in Finance and Investing: Using Artificial Intelligence to Improve Real-World Performance, Probus Publishing Co.
18. Hecht-Nielsen, R. [1990]. Neurocomputing, Addison-Wesley Publishing Company, Inc.
19. Aleksander, I. & Morton, H. [1990]. An Introduction to Neural Computing, Chapman and Hall.
20. Wasserman, P. D. [1989]. Neural Computing: Theory and Practice, Van Nostrand Reinhold.
21. Rumelhart, D. E. & McClelland, J. L. [1986]. Parallel Distributed Processing, Volumes 1&2, The Massachusetts Institute of Technology.
22. Gallant, S. I. [1993]. Neural Network Learning and Expert Systems, The Massachusetts Institute of Technology.
23. Peters, E. E. [1991]. Chaos and Order in the Capital Markets: A New View of Cycles, Prices and Market Volatility, John Wiley & Sons.
24. Peters, E. E. [1994]. Fractal Market Analysis: Applying Chaos Theory to Investment & Economics, John Wiley & Sons.
25. Wolfe, M. A. [1978]. Numerical Methods for Unconstrained Optimization: An Introduction, Van Nostrand Reinhold.
26. Moore, J. J. [1977]. "The Levenberg-Marqurdt Algorithm: Implementation and Theory," Numerical Analysis, ed. G. A. Watson, Lecture Notes in Mathematics 630, Springer-Varleg, pp. 105-116.
27. Holland, J. H. [1975]. Adaptation in Natural and Artificial Systems. Ann Arbor: The University of Michigan Press.
28. Malkiel, B. G. [1973]. A Random Walk Down Wall Street, W. W. Norton & Company, Inc.
29. Lo, A. W. and MacKinlay, A. C. [1988]. "Stock Market Prices Do Not Follow Random Walks: Evidence From a Simple Specification Test," The Review of Financial Studies, Vol. 1, No. 1.
30. Lapedes, A. & Farber, R. [1987]. "Nonlinear Signal Processing Using Neural Network Prediction and System Modeling," Theoretical Division, Los Alamos National Laboratory, Report #: LA-UR-87-2662.
31. Farmer, D. F. and Sidorowich, J. J. [1988]. "Exploiting Chaos to Predict the Future and Reduce Noise," Version 1.2. Theoretical Division, and Center for Nonlinear Studies, Los Alamos National Laboratory, Report #: LA-UR-88-901.
32. Colin, A. M. [1992]. "Neural Networks and Genetic Algorithms for Exchange Rate Forecasting," International Joint Conference on Neural Networks, Beijing, China, Nov. 1-5.
33. Deboeck, G. J. [1993]. "Neural, Genetic and Fuzzy Approaches to the Design of Trading Systems," The Second International Conference on Artificial Intelligence Applications on Wall Street.

Want to see how you can use VantagePoint
to give you an edge in your trading? Click Here >>>


 

* VantagePoint's accuracy statistics were computed on out-of-sample price data utilizing neural networks trained on both single market and intermarket data and relate to the Neural Index which indicates whether the average of tomorrow's typical price and the typical price of the day after tomorrow (both unknowns at this time) are expected to be higher or lower than the average of yesterday's typical price and the typical price of the day before yesterday.  The numerical value of the Neural Index, either a one (1) or a zero (0) thereby indicates whether or not the trend direction is expected to be higher or lower for each target market over the next two days. A Neural Network accuracy statistic of 80% does not mean that eight out of ten trades will be winning trades.  VantagePoint is not a trading system that gives the same specific buy and sell signals to all users. It is a technical forecasting tool that is comprised of proprietary forecasting indicators that apply neural networks to market data for the purpose of finding patterns and relationships between markets and then using this information to make futuristic forecasts. Using these indicators each trader determines his or her own entries, exits and stop placements which may vary from those of other traders due to differences among traders in trading style, objectives, risk propensity, account size and number of contracts involved, thereby producing different trading results from one trader to another. Futures and options trading involves risk, is not for every trader, and only risk capital should be used.  For more detailed information, please read our important disclaimer and software license agreement.

VantagePoint Intermarket Analysis Software, TraderTech, ProfitTaker, World Leader in Market Forecasting, and Market Technologies, LLC are trademarks of Market Technologies, LLC. Synergistic Market Analysis, Synergistic Analysis and Market Synergy are service marks of Market Technologies, LLC. Hurricaneomics is a registered trademark of Market Technologies, LLC

 

Privacy Policy | Site Map

Home | Vantagepoint Software | About Market Technologies | Contact Us | Free Recent Forecasts

Lean Hogs Trading Market | Live Cattle Trading Market | Meats Trading Market | Oats Trading Market | Wheat Trading Market | Corn Trading Market
Frozen Pork Bellies Trading Market | Feeder Cattle Trading Market | Soybean Trading Market | Canola Trading Market | Soybean Oil Trading Market
Grains Trading Market | Soybean Meal Trading Market | Cocoa Trading Market | Orange Juice Trading Market | Coffee Trading Market | Sugar Trading Market
Gold Trading Market | Silver Trading Market | Copper Trading Market | Platinum Trading Market | Palladium Trading Market

Copyright © Market Technologies, LLC. All Rights Reserved.