|
||
|
||
Want to see VantagePoint's nearly 80% accuracy* for yourself?
Simply click here to receive your FREE recent forecasts.
|
Home : Mendelsohn's Library : Global Trading
Synergistic Market Analysis with Neural Networks Criteria for selection of appropriate neural network paradigms, architectures, and training regimens for this type of application will be discussed. Throughout the chapter, where appropriate, I will refer to research my firm has conducted in the development of its artificial intelligence software program, VantagePoint, which was commercially introduced in 1991. VantagePoint is a fully developed, pre-trained system of five neural networks in a hierarchical arrangement that implement Synergistic Analysis by combining technical, intermarket, and fundamental data for the purpose of predicting price and trend directions for various financial futures markets, including Treasury bonds, the S&P 500 stock index, and several currencies. For those interested in developing their own neural network applications, suggestions will be offered on how to avoid potential pitfalls to successful neural network design. ARTIFICIAL NEURAL NETWORKS Neural networks are capable of solving a wide variety of problems by "learning" mathematical models. If data can be represented numerically, it can be used as inputs into a neural network. Therefore, technical and fundamental data related to a specific market or asset class, as well as related intermarket information, can be incorporated as inputs into neural networks. In the remainder of this chapter, various aspects of neural network development for financial forecasting will be discussed. The topics include:
In each section I will discuss basic decisions that must be made and highlight potential pitfalls that should be avoided. Paradigms For example, paradigms like differential competitive learning and counter-propagation can be used for data clustering tasks, while the Hopfield network and brain state in a box paradigms can be used for auto-association, filtering, and pattern association. Each paradigm has numerous variations, depending on how parameters associated with it are chosen. There is no holy grail in picking a paradigm. It is more important to have a clear problem definition and match an appropriate paradigm to it. Since this chapter will focus on financial forecasting, the problem domain of interest falls into the prediction category. Two of the most widely used paradigms in financial analysis and forecasting are recurrent back-propagation networks and feed-forward back-propagation networks. Recurrent Back-Propagation Networks Since the network can feed back upon itself, temporal information is learned as a result of the sequential order in which the facts are presented. There is no need to encode the temporal relationship into the input data. This alleviates the need for one of the major preprocessing steps associated with designing feed-forward back-propagation networks. Feed-forward Back-Propagation Networks For example, to present facts that contain the differences in the weekly close for the past five weeks, a snapshot of the data must be created by constructing a fact with an input vector containing five values (one for each difference) and an output for the next week. This would be done for each fact-week to be presented to the network, effectively encoding the temporal information (data from the last five weeks) into the facts themselves. Instead, in the case of a recurrent net, each fact would be presented sequentially as a single difference. Since for every recurrent network there is a corresponding feed-forward network that can be designed with identical behavior, and since most commercially available neural network development packages do not include a recurrent model, the remainder of this chapter will use a feed-forward back-propagation network, for illustrative purposes. This type of network learns by being given examples of inputs and expected outputs. To accomplish this, the network computes an error measure between its generated output and the desired output. This is calculated for each output in the output layer. The error is typically averaged over the entire set of facts, then propagated backward through the network, layer-by-layer, to be utilized in altering the weight connections between neurons. During training, facts are presented to the network repeatedly until the error associated with the set of outputs is reduced to an acceptable level. Error can be judged by a variety of metrics, some of which will be discussed later in the chapter. For simple problems, the error level may be reduced to zero, but in most real world financial applications this is an unrealistic goal. Architecture
Back-prop networks are composed of an input layer, one or more "hidden" layers and an output layer. The hidden layers, separating the input and the output layers, are so named because they are not directly accessible to the network's user. This presentation, as represented in Figure 4, assumes that the layers are fully-connected. This means that each neuron in the input layer has a connection to each neuron in the hidden layer, with similar connections between the neurons in the hidden and output layers. The standard model is relatively simple. For each individual neuron (See Figure 5), input data (I0-In) is multiplied by the weight (W0-Wn) associated with the connection to the neuron. The products are summed, with the result passed through a transfer function that maps the sum to a value in a specified interval, e.g. between zero and one. The output from each neuron is then multiplied by another weight and fed into the next neuron. If the neuron is in the output layer, as is the case with the example in Figure 4, then the output is not multiplied by a weight but is instead the network's output. Transfer Functions The most commonly used nonlinear transfer functions include the logistic function (an example of which can be seen in Figure 5) and the tanh function, also known as the hyperbolic tangent function. Both functions are very similar, being of sigmoidal shape. The logistic function varies in height from zero to one, whereas the tanh function ranges from minus one to one. In fact, as is shown below, the tanh function is just the logistic function rescaled to the appropriate range. The logistic function is sometimes referred to as the sigmoid function. Strictly speaking, this is not correct. "Sigmoid" refers to the shape of the function's curve when plotted. Many functions have a sigmoidal shape. The logistic function takes its name from the "logistic equation" found in population biology. This equation models the evolution of a population density over time:
Where U is the population density of a species, r is the birth rate of the species, k is the "carrying capacity" of the environment, and t is time. The left-hand side is the time derivative of the population density (the "velocity" of the change in magnitude of the population density). When U = 0 or U = k, the right-hand size is zero. These are the "steady states" of the equation. When U is 0 or k the population does not change with time. However, U = 0 is an "unstable steady state" which means that if the population is perturbed from U (i.e. U is given a small value much less than k), then the population will grow until it reaches U = k. Conversely, U = k is a stable steady state; if U is perturbed from k, the population will shrink or expand back to U = k. In the study of neural nets, one usually uses k = 1. This gives the familiar sigmoidal curve which ranges from zero to one. The value of r determines the "steepness" of the ascent to one. Some popular neural net software packages let the user choose any nonzero value for r; others simply assign r = 1. In addition, one usually replaces the independent variable with x, instead of t. The logistic function, with r = k = 1 as described above, can be written explicitly in closed form:
and the tanh function can be written as: One can easily see that the tanh function is just the logistic function transformed by stretching the x and y axes by a factor of two and sliding the resulting curve down so that it ranges from minus one to one. That is: g(x) = 2f(2x) - 1 or f(x) = 0.5[1 + g(0.5x)] Put simply, if one is subtracted from two times the logistic function, the result is the same as the tanh function with half the input. Either function can be used effectively in a back-prop network. However, there is some debate over which of these transfer functions allows for faster training. Empirically, in limited experimentation, we have found that the tanh function trains slightly faster than the logistic function. Intuitively plausible reasons why this might be so have been reported by other researchers. Layers and Neurons For a nonlinear problem, like predicting prices of a stock or commodity, the network would require at least one hidden layer. There are no standard rules for determining the best number of hidden layers or hidden neurons in a back-propagation network. While a back-prop net can have more than one hidden layer, only one is theoretically necessary to approximate any nonlinear function's input-to-output mapping. This is not to imply that a network with a single hidden layer is always best. As is the case with other architectural decisions, trial and error will help determine the best configuration for a specific network. Therefore, extensive experimentation is often necessary to make this determination. Typically, more complex problems require a larger number of hidden neurons. However, too many hidden neurons may cause a network to be "over fitted" to the training data, with poor performance on previously unseen facts. Determining the number of hidden neurons is more art than science. One needs to train numerous nets, varying the number and size of hidden layers to get a feel for their affect on network performance. Automation of this process is necessary, since optimizing network architectures can be time-consuming, due to the large parameter space that is involved. The more task-specific a neural network is, the more easily it can be trained. For instance, instead of designing one Treasury Bond network with two outputs, the next week's high and the next week's low, it may be preferable to design two independent networks, each with its own output. One thing to remember when developing neural networks is that no single design decision, such as selecting the number of hidden layers or number of output neurons, determines how well the network will perform. Data selection and quality, data preprocessing techniques, optimization of training parameters, and testing protocols all affect network performance. The remainder of this chapter will examine these issues in more detail, and will illustrate real world examples to highlight common pitfalls that can arise at each stage in the development of a neural network. Input Data Selection
Data selection involves the identification of appropriate input data sources for a network. This is a demanding task which must be performed judiciously to avoid the "garbage-in, garbage-out" phenomenon often associated with computers. A neural network's performance is dependent on the quality and relevance of its input data. Failure to include important data inputs can have a deleterious effect on the network's performance. Without a solid understanding of the problem domain, the likelihood of developing a successful neural network application is considerably lessened. When selecting input data for a financial neural network application, the developer must be cognizant of the implications of following a specific market theory. I posit that today's global markets are nonlinear, and possibly chaotic in nature, and that market inefficiencies exist, can be discerned quantitatively, and persist long enough for traders to profit from them. One's analytic perspective on the markets clearly influences the selection of input data. Technical analysis would suggest the use of only single-market technical data as inputs. Fundamental analysis focuses on data that reflects macroeconomic factors such as economic reports that have an effect on the target market. In today's globalized markets, neither approach by itself is sufficient for financial forecasting. Instead, Synergistic Market Analysis, through the use of neural networks, combines both technical and fundamental analysis with intermarket analysis. The result is a multidimensional quantitative framework which overcomes the narrowness of single-market analysis and the limitation of having to interpret intermarket relationships subjectively through visual analysis of price charts, or through linear correlation analysis. Use of multiple data inputs reflecting a broad range of interdependent and interrelated global markets allow patterns and relationships in the data to be discerned often before they become so overt that they are discounted through the price discovery process. Now, let us look at an example of a neural network designed to implement SMA in predicting the following week's high and low for the Treasury bond market. First, technical price data on Treasury bonds must be included as data inputs into the network. This makes general patterns and characteristics of the bond market apparent. Additionally, related fundamental information, such as the federal funds rate, should be included as inputs into the network. Intermarket inputs can be identified through the application of various statistical analysis tools which determine correlations between data. Research involving sensitivity analysis, in which data inputs are varied, can be conducted to find the best mix of intermarket and fundamental data to use as inputs. For instance, in the case of VantagePoint, single-market technical input data from the Treasury Bond market including open, high, low, close, volume, and open interest is utilized. In addition, similar technical data from eight related inter-market -- including the CRB Index, deutsche mark, Eurodollar, US dollar index, Japanese yen, S&P 500 stock index, crude oil, and gold -- are utilized along with the daily Fed funds rate. Other VantagePoint systems incorporate inter-market such as the FTSE and Nikkei stock indices as well as the Dow Jones industrial and utility Averages. APPENDIX A to this chapter contains a simple case study that exemplifies the benefits of utilizing intermarket data. Preprocessing Input Data Transformation is used to manipulate one or more raw data inputs to generate a single network input. Normalization is a transformation that distributes data evenly and scales it into an acceptable range for network usage. Decisions made during this phase are:
As with the selection of raw data inputs, domain knowledge is critical to the choice of preprocessing methods. Transformation The noise component within raw price data tends to obscure underlying relationships between input data sources and slows down the training process. Smoothing techniques, such as moving averages, that help reduce the noise entering the network, are useful transforms. One obvious disadvantage of smoothing, however, is that some useful information may get lost. Additionally, smoothing a leading indicator can turn it into a lagging indicator. These points illustrate the tradeoffs that exist whenever smoothing methods are utilized. Normalization Simple Linear Scaling. I = Imin + (Imax-Imin)*(V-Vmin)/(Vmax-Vmin) Vmin and Vmax must be computed for each input. This method of normalization will scale input data into the appropriate range, but does not increase its uniformity. Simple Statistical Normalization. Mendelsohn Histogram Normalization. Figure 7 depicts an example distribution, in the form of a histogram, in which the data is not uniformly distributed. To illustrate the effects of the various approaches, each of the three methods of normalization have been used to prepare this data as input to a neural net in the range zero to one. Figure 8 shows that a simple linear scaling of the data has no effect on the shape of the frequency distribution itself. Figure 9 shows the same original distribution normalized by the second method, in which two standard deviations are used to set the limits for the outliers so that the distribution becomes more uniform. Figure 10 shows the effects of performing MHN on the data, in which the resulting distribution is the most uniformly distributed. Other methods, in addition to these, exist for data normalization. Depending on the data to be normalized, some methods are more effective than others. Denormalization. Transformation and normalization are routinely used to help improve a network's performance. For additional suggestions on preprocessing financial market data, please reference APPENDIX B. After the network architecture has been selected, and the raw data inputs have been chosen and preprocessed, fact sets must be created. Fact Selection
A fact is a single input vector and its associated output vector. A fact is typically represented as a row of numbers where the first n numbers correspond to the n network inputs and the last m numbers correspond to the m network outputs. For example, assume that a network has been designed to predict the change in price of the Dow Jones Industrial Average (DJIA) one week in advance, based on the differences in both the highs and the lows for the past five days and a moving average of the closes for the past ten days. Each fact would be composed of a three-valued input vector and a single-valued output vector. The three input values would correspond to the differences in the highs and differences in the lows for the past five days and a moving average of the closes for the past ten days. The single-valued output vector would represent the change in the DJIA over the next week (See Figure 11). A fact set is a group of related facts. The decision about what data to include in a fact set is important since facts should be selected which best represent the problem space that the neural network is to model. Although internal market data on a target market is readily available, relevant intermarket data may be unavailable, depending on when each of the related markets first began trading. For instance, while Japanese Yen futures began trading in 1972, the Nikkei 225 index started trading as a futures contract in 1990. To use both markets' data in a neural network application for currency predictions, the fact set could not start earlier than 1990. However, the use of shortened data sets in neural network training has its own risks which must be understood. When the input data chosen does not span a long enough time frame, the likelihood of overlooking significant market characteristics, or including too few examples of them, increases considerably. For example, over the past decade, with the exception of 1987, there has really not been a bear market in the S&P 500 market. Therefore, a neural network trained only on recent data will not be able to adapt to changing market conditions such as would occur at the onset of a bear market. Yet, whether or not to include S&P 500 Index data from October 1987 in a fact set must be decided judiciously, since data from this period represents an extremely infrequent occurrence. Since this data is not supported by a sufficient number of examples, the network may not be able to learn how to recognize it in the future. Even so, its presence in the fact set might cause a bias that could reduce the overall accuracy of the system during more typical trading periods. Data availability and sufficient representation of various market conditions are important considerations in decisions regarding neural network input data and fact selection. Training and Testing Fact Sets There are various criteria that can be used to determine the composition of the training and testing sets. First, they should be mutually exclusive, so that a specific fact does not reside in both subsets. Also, if two facts have precisely the same input and output values, one of these facts should be removed from the fact set before it is separated into subsets. Additionally, caution must be exercised when using commercial tools that automatically split the initial fact set. For example, in an 80/20 split, some tools may place every fifth fact into the test set. If the facts are in chronological order before the split, all data representing one day of the week, such as a Monday or Friday, could be assigned to the test set, while the rest of the data representing the remaining trading days would be assigned to the training set. Since doing this would adversely affect the network's performance, the facts should be randomized before splitting them into subsets, or should be randomly assigned to the subsets. Even if proper precautions are taken to randomize fact order and split the fact set into subsets, all facts with a particular characteristic might still be assigned to one subset or the other. To minimize the potential of this occurring, it is advisable to identify the most important characteristics thought to be associated with the data and determine the initial fact set's underlying distribution relative to these characteristics. Then the fact set can be split so that the training and testing subsets will have similar distributions relative to these characteristics. Statistical analysis or clustering algorithms can be used for this purpose. A careful analysis of the fact set also allows outliers to be identified and eliminated. Experimentation with various data handling methods should be performed before selecting one. The Predictive Technologies Group has developed a training/testing regimen which splits the initial fact set into three mutually exclusive subsets rather than just two. In addition to standard training and testing sets, a second testing set is utilized, which contains examples of those facts thought to be most important in judging network performance. This test set is then used to evaluate various networks on a comparative basis. Training And Testing
The training process can begin once the training fact set has been created. The first task performed during training is to initialize the weights. As mentioned earlier, the weights change during the training phase, as the network adapts its internal representation to model a problem. Typically, relatively small random weights are used to initialize the network. It is advisable to train the same network with different sets of initial weights, since these initial conditions can affect how the network trains and ultimately performs. Learning Algorithms Gradient-Descent. Unfortunately gradient-descent is somewhat slow and prone to get stuck at local optima. These are points on the error surface which have minimum error in reference to all the points in some surrounding region, but are not the optimal point for the entire error surface. As a simple example, if the error surface contains two valleys, one "shallow" and the other "deep," a local optima would be at the bottom of the shallow valley, while the global optima (minimal point of error) would be at the bottom of the deep valley. Additionally, the error surface must have a certain degree of smoothness for gradient-descent to perform well. Also, one must be able to actually compute the gradient of the objective function which defines the surface. Other optimization algorithms are available that have different performance characteristics than gradient descent in one way or another. Methods such as conjugate-gradients, Newton-Raphson, Levenberg-Marquardt and genetic algorithms have their own strengths and weaknesses. None is the best for all optimization problems. A good text on optimization theory will cover these issues in detail. Conjugate-Gradients. Newton-Raphson. Levenberg-Marquardt. While the above algorithms are superior for some problems, they all require a starting point somewhere in the vicinity of the optimal point, and some degree of smoothness of the performance landscape. This is not true of genetic algorithms. Genetic Algorithms (Gas). Genetic algorithms use a similar approach, but do not require the assignment of a priori probabilities. Rather, they determine the probability distributions implicitly by evolving a population of solutions over time. They use simple mechanisms analogous to those used in genetics to breed populations of superior solutions. Those that do well "breed" with other solutions to form new solutions. Solutions that perform poorly are culled. Genetic algorithms are highly recommended as a general search method. They do not require any special initial conditions, and make no requirements on the smoothness of the performance landscape. They are a very general class of optimization algorithms that are quite robust and widely applicable. Genetic algorithms can be used to train a neural network by evolving populations of weight matrices. In this case, back-propagation of errors is not needed. Only the forward-propagation of facts through the net and subsequent evaluation of the fact-errors is required. Alternatively, genetic algorithms can be used to control only the free parameters within the traditional gradient-descent based back-prop algorithm. Each member of the population to be evolved might have a different learning rate and momentum. The Learning Rate Oscillation. Momentum. Training and Testing Automation Necessary Continuing with this simple example, the goal of the net developer is to find an optimal set of values for the parameters, whereby an optimally performing net is produced upon training with these values. This amounts to finding the "best" point in the parameter space. Brute-Force. After some time is spent investigating the properties of the back-propagation paradigm for neural net training, the vast size of its parameter-space can be appreciated. It is truly immense. The training parameters may vary from node to node in the net and from epoch to epoch during training. If a net has 100 trainable nodes and is trained for 1000 epochs, then the two-dimensional example is suddenly 200,000-dimensional! To further complicate matters, all possible initial (random weight) conditions, as well as the number of hidden layers and nodes, must be considered as part of the parameter-space. In this light, the parameter-space is virtually infinite in extent. Finally, the selection of net inputs can be viewed as part of the parameter-space. Only the desired net output is known; all else is a variable represented in this generalized parameter-space. Automation. Simulated Annealing Avoid Over Training The easiest way to avoid over training is to use an automated training/testing routine in which testing is an integral facet of the training process, rather than a procedure that is performed after training is complete. In this manner, network training is halted periodically at predetermined intervals. Then the network operates in recall mode on the test set to evaluate the network's performance on selected error criteria. Thereafter, training is resumed from the point at which it was halted. This alternating process continues iteratively, with interim results that meet the error criteria retained for later analysis. When the performance on the test set starts to degrade, it can be assumed that the network is beginning to over train. The best saved network configurations up to this point are then recalled for further evaluation. A clearly defined training/testing methodology is necessary to conduct an apples-to-apples comparison of various networks as the architectures, selection of raw data inputs, preprocessing, and training parameters are refined. Error Measures Since many commercial neural network development tools are limited with respect to the error metrics available, development and implementation of custom error functions is highly desirable. Error metrics which best measure those characteristics that are most important in the final application should be incorporated into the testing methodology. By tailoring error functions to the specific application and outputs, real world neural network performance can be substantially improved. Iterative Refinement. At this point, one may simply replace the first net with the combination of the first and second nets. The first net predicts X, with error Y. The second net predicts y with error z. If the second net does well at predicting the errors of the first net, error z is less than error Y. Thus, combining the net outputs gives X+y, with error z, which is an improvement on the error Y. This process, which my research staff terms iterative refinement, may be repeated indefinitely. However, in practice a single additional net is usually sufficient. The inputs to the additional nets are not limited to the inputs used by the first net. For example, the technical indicators, statistical transforms, etc. used to preprocess the inputs for the first net may be used instead on the error time-series produced by the first net's outputs. In fact, there are many approaches that can be used to improve a net's performance. Most involve constructive algorithms applied during the training or retraining of an existing net. Expectations of Performance The degree of randomness in the markets has been a long-standing subject of debate. Although there is a substantial body of literature on the subject of the Random Walk Hypothesis, no single opinion prevails. Recent studies indicate that stock market prices do not follow a random walk. I take a more pragmatic and less theoretical view of this debate. I believe that a given market is driven by both stochastic (random) and deterministic forces. Only the deterministic component is predictable. However, even chaos can be deterministically generated. Recent work at Los Alamos National Laboratory has shown that neural networks can predict such chaos quite well. The equity curve (discussed in APPENDIX C to this chapter) produced by a simple hybrid trading system based on VantagePoint's predicted information indicates that there is a sufficient degree of predictability within the markets to be profitable. Currently, the maximum achievable forecasting accuracy is unknown. Certainly no one expects to achieve zero error since this would require a model that could account for every possible variable affecting the markets. On the other hand, just because something is currently unpredictable does not mean it is random. Indeed, each revision of VantagePoint is able to predict events that had previously appeared to be stochastic noise. So for now, it is unclear where the performance "ceiling" is located. The development of a successful neural network for any non-trivial problem requires a considerable expenditure of time and effort. Even with extensive in-house research and development tools and access to a multitude of commercial tools, neural net development to implement Synergistic Market Analysis is a time consuming, labor intensive task which demands expertise in financial market analysis, computer science, and applied mathematics. For these reasons, a team effort is necessary for successful neural network development. Implementation Information Systems Additionally, in VantagePoint the outputs from these four networks are used as inputs to a fifth network which predicts market turning points. This type of network architecture, depicted in Figure 15, is referred to as a hierarchical neural network. By designing each network to include just one output, large networks are not needed to perform all the work. Instead, predictions derived from networks at the primary level of the hierarchy are incorporated as inputs into a network, or networks, at the secondary level. This kind of hierarchical architecture facilitates faster training, since all networks at the primary level of the hierarchy can be trained simultaneously, as each network focuses solely on a single output. VantagePoint's predictions can be visualized graphically with various charts or in tabular form on its daily trading prediction report. When viewing the charts, users can select four different chart types, from bar charts to candlestick (see Figure 16). Up to eight different studies can be overlaid on each chart. These studies include both the forecasted information produced by VantagePoint's neural networks as well as information computed from these forecasts to help traders utilize the information most effectively. Additionally, a variety of parameters, shown in Figure 16, allows users to customize the appearance of the charts. An example of a chart produced by VantagePoint is shown in Figure 17. This chart was produced by the VantagePoint Treasury Bond System with predicted high and low values plotted over the daily bars. This type of predicted information is particularly useful for determining entry and exit points for day trading or position holding. If the forecasted indicators on the daily report suggest that tomorrow will be an up market day, day traders might wait for the market to trade down toward the predicted low, then enter a long position. The reverse would involve entering at or near the predicted high on a day expected to be down. Using forecasts of market trend direction in conjunction with predicted highs and lows greatly increases the potential for profitable day trades. Two examples of this are shown from the March 94 Treasury Bond contract. In the example on the left in Figure 18, the up arrow (indicating an expected upward trend in market direction) and the predicted low for tomorrow are generated on December 2, 1993. If a long day trade was taken from open to close on December 3, 1993, based solely on the anticipated direction, a profit of 12 ticks ($375.00 before slippage and commission) would have been made. If, instead, a limit order entry to go long had been placed at the predicted low, with an exit at the close, 24 ticks profit ($750.00 before slippage and commission) would have been realized. The example on the right shows the same concept in reverse. Instead of entering a short position at the open and exiting at the close on an expected down day, one could place a limit order to enter a short position at the predicted high and exit at the close. This would result in a profit of 10 ticks ($312.50 before slippage and commission), as opposed to just three ticks. Additionally, day traders can use the predicted high/low trading range to set exit points, rather than waiting for the close to exit from a day trade. In this scenario, on a day when the market direction is predicted to be up, a long position is taken at or near the predicted low, then closed out intraday at or near the predicted high. There is always the possibility that a limit order to enter the market may not get executed when the market does not reach the entry objective set by the predicted high or low. Still, profitability of those trades that are executed can be increased, due to the more advantageous entry level. Position holders can apply the same principles in entering the market, using the predicted range on subsequent days to set daily stops. For example, if position holders are long Treasury Bonds and the market is expected to continue to move up tomorrow, they might set their stop for tomorrow a few ticks below tomorrow's predicted low which acts as a support level. This would decrease the likelihood of being stopped out prematurely during the day as the result of intraday volatility in the market, yet protect profits in the event of an abrupt market downturn. Both position holders and day traders can use forecasted information to their advantage. This information can be used alone, or in conjunction with other information, to generate buy and sell signals. One still popular method of technical analysis involves the use of moving averages in a crossover system. Typically, two moving averages are plotted on a chart. Buy and sell signals are generated when the short moving average crosses over or under the long moving average. The obvious limitation of moving averages is that, by definition, they tend to lag behind the market. Therefore, moving average crossover systems typically get in and out of trades after the turning points in market direction have occurred. Neural network generated trend forecasts can be used to reduce the lag associated with a traditional moving average crossover system. For instance, instead of calculating the value for today's short moving average, the forecasted moving average value for two to four days into the future can be used as the short moving average, in a crossover system. This reduces the lag, since the short moving average is a prediction of its value at a point in time in the future, not a calculated value as of today. An example of a move captured by the crossover of a forecasted ten day moving average four days in the future against a calculated ten day moving average today is shown in Figure 19 as it would appear in VantagePoint. VantagePoint has adjustable parameters that allow users to customize it to their styles of trading. Figure 20 depicts a sample screen containing the parameters that traders can set to customize VantagePoint's forecasts. One area of flexibility built into VantagePoint allows users to emphasize the importance placed on each of the predictions in affecting the Strength Index which measures the strength of the impending move. This is done by altering the various "Weight" parameters seen in Figure 20. Signals that indicate the general market movement (up, down or sideways) are then generated by filtering the Strength Index by the "Upper Strength Limit" and "Lower Strength Limit," also seen in Figure 20. As is evidenced by the figure, other parameters can also be set to aid in tailoring VantagePoint to a particular trading style. Trading Systems Traders have different trading styles. Even with perfect hindsight, no two traders would identify the same buy/sell points in a given market over the past year. Therefore, traders with limited willingness to tolerate draw down would not design and train neural networks that would generate signals appropriate for others with larger capitalization or a higher risk propensity. Additionally, it is not easy to incorporate risk management considerations into a neural network-based trading system. For this reason neural networks are best used as part of a hybrid approach. Hybrid Trading Systems Now let us examine how an information system such as VantagePoint can be used as part of a hybrid trading system. VantagePoint would represent the box labeled "Information System" in Figure 21. For the box labeled "Rules" we have devised a simple set of rules that utilize VantagePoint's predicted information as a means of generating buy and sell signals. This particular system uses only some of the information generated each day by VantagePoint's Treasury Bond system. This includes:
If two up arrows or two down arrows occur, within a specified window, in the medium market, the system takes a long or short position, respectively, on the following day at the open with a market order. Timing decisions concerning whether to enter with a limit order, in conjunction with VantagePoint's predicted high and low, for a more advantageous entry, are left to the trader's discretion. A full description of the details of the system can be found in APPENDIX C to this chapter, along with a trade listing and summary of trades made on the December 1992, March 1993, June 1993, September 1993, December 1993, March 1994, and June 1994 Treasury Bond futures contracts. Hypothetical trading of the system over these contract months (over 1.5 years of trading) resulted in the equity curve shown in Figure 22. This chapter has briefly covered all aspects of neural network development including architectural decisions, input selection, preprocessing, fact selection, training, testing and implementation. Each of these phases of neural network development has been examined in the context of the recent globalization of the world's financial markets and the need to establish a synergistic analytic framework for global trading. While an in-depth discussion of the development of an actual neural network system such as VantagePoint is beyond the scope of this chapter, a simple case study that utilizes some of VantagePoint's features can be seen in Appendix A to this chapter. What's Next? Other related technologies such as expert systems and genetic algorithms have a role in implementing Synergistic Analysis for financial forecasting. In fact, neural networks can be used to help extract primitive rules, which capture patterns that would not otherwise be apparent, for incorporation into an expert system. Genetic algorithms are powerful search mechanisms, well suited to optimizing neural network parameters. As mentioned earlier, during training they may be used as a training algorithm or to search the space of training parameters in an efficient manner. Similarly, genetic algorithms can be used for net architecture selection. Two of the most powerful applications of genetic algorithms are for input selection and preprocessing, which are perhaps the most challenging tasks faced in neural network development. To a very large extent, they determine the maximum possible performance achievable by the net. By automating the search for an optimal set of net inputs, a much wider range of inputs may be examined efficiently. The same technology incorporated into genetic algorithms has also been used in classifier systems and genetic programming. Classifier systems perform a type of machine learning that generates rules from examples. Genetic programming goes even further by automatically generating a program from a set of primitive constructs. In addition to genetic models, fuzzy logic, wavelets, and chaos are also being applied in a multitude of domains including financial forecasting. Even virtual reality has applicability to financial market analysis. Hardware advancements are also having an effect on the rate at which new analytic technologies emerge. Since many artificial intelligence implementations are computationally intensive, they will benefit greatly from more powerful computer systems, particularly hardware configurations known as massively parallel machines. Rather than the step-by-step approach to problem solving taken by serial computer systems, parallel processing machines work on different parts of a single problem simultaneously. This means that the computing time associated with solving a particular type of problem can often be reduced by orders of magnitude, once a suitable method of dividing the problem is devised. Technologies like neural networks and genetic algorithms are especially suited to these parallel processing machines. With connectionist machines, accelerator boards, hypercube architectures and other new hardware developments on the horizon, it will become more cost-effective for researchers to explore the application of various emerging technologies to financial market analysis. Although this chapter's primary focus has been on the application of SMA utilizing neural networks to perform financial forecasting in the context of futures trading, the applicability of Synergistic Analysis goes far beyond this single arena. One area where my firm performs research is global asset allocation, in which derivatives and mutual funds are used to represent various asset classes in a global portfolio. This allows the portfolio to be easily rebalanced at predetermined intervals with minimal transaction costs, without altering the asset class structure of the portfolio. Synergistic Analysis can be used to minimize diversifiable risk in a portfolio comprised of various global asset classes, by determining the nonlinear relationships and correlations between asset classes, and forecasting risk and return for each asset class over various time frames. By implementing SMA with neural networks in this manner, the portfolio can be rebalanced to provide higher return for equivalent risk, or lower risk for equivalent return. Other technologies, such as expert systems, can be used to discern investor characteristics, thus improving performance even further. For example, most asset allocation fund managers currently use questionnaires to ascertain investor characteristics such as risk propensity. A properly designed expert system could achieve similar or better results more expediently and contain a considerably more extensive knowledge base than a standard questionnaire. As research analysts continue to explore the application of these technologies to financial market analysis, complex hybrid systems will be developed. Traders should understand that neural networks are only a tool and not the long-sought-after holy grail that can guarantee easy profits in today's global financial markets. Instead, Synergistic Market Analysis will come to rely upon several of these emerging technologies which, when used in concert as part of a hybrid approach to market analysis, will offer a competitive advantage over less robust single-market analytics. Acknowledgments Synergistic Market Analysis, Market Synergy, Synergistic Analysis, and Synergistic Trading are trademarks of Lou Mendelsohn.
Want to see how you can use VantagePoint |
|||||||||||||||||||||