• Home / AI / Single Instance Prediction – State of the Art

Single Instance Prediction – State of the Art

  • June 3, 2021

By developing and utilizing predictive models, organizations aim to improve their business decision making, forecasts, and general business intelligence.  Commonly today, models are developed over the entirety of the data set and applied indiscriminately to any or all instances of out of sample data.  However, if a business wants to be more accurate in making a single instance prediction, they can lean on the FASES algorithm underpinned by statistical theory in order to improve their results.



In practice the situation of needing a dynamic “one out of sample forecast” happens commonly in time series analysis or in personalized customer prediction.  For example with economic data, the release of new information happens on a set schedule, and the full sample of training data is available using all the information available at present.  If economists, traders, or other market participants could more accurately predict economic data releases such as employment figures, industrial production indicies, credit indicies, or a multitude of other indicators, they can front-run these announcements and develop better strategies for trading and money management.  Or let’s say your business is making a lending decision and within your data set, you can observe that person’s matching “twin brother” instance and use this information in concert with the full data set in order to objectively improve the modelling accuracy.



Another property common in prediction and especially macro economic data is that the data set is overdetermined.  This is to say the number of candidate predictors is greater than the number of observations in the series. Lags of over 18 months can be observed in causal relationship between economic indicators, so this multiplies the candidate predictor space. Most commonly in statistics, regularization and compression methods are used to select variables and assign coefficients to those predictors.  However all of these models need to trade off between the over-fitting the data and discarding useful information.



Time series training data and out of sample split. A live dynamic model should only be predicting the next unknown quantity.



Ensemble predictions – where averages of models are used are well known to deliver top results.  The theoretical property making these models most effective relies on errors “averaging out”. That is to say, if a number of models are averaged to form a prediction, their grand error is less than the errors of each of the individual members.  Of course, this relies on the assumption that the models errors are evenly distributed around the real value of the target series.



This property mathematically is provable via the accuracy/ambiguity decomposition.  The unresolved aspect of this theorem is HOW to best control for both of these properties in developing a statistical ensemble predictor.  Here is where an innovation can be applied for making a single instance prediction.  Because we can measure the Information Similarity of the single new instance of data with others from the training data set, the model can select variables and assign coefficients to the ensemble prediction members which force alternation of sign residuals around this most information similar point, and by extension, the out of sample point of interest.



Mathematical proof of the accuracy/ambiguity decomposition. Beginning with a least squared error objective function.



The Foward Alternating Stagewise Ensemble Selection Algorithm


The FASES algorithm choose variable coefficients and member ensemble groupings in a forward stagewise process, but imposes a sign restriction on the most information similar point to encourage disagreement on the new modeling instance of interest. In out of sample benchmarking the FASES algorithm thoroughly dominates other competing algorithms because of this property and the attention paid to the single observation to be predicted.  If there is a significant interest on making an accurate single instance prediction, using the FASES algorithm, or some other dynamic model which accounts for the accuracy/ambiguity decomposition of ensemble methods is a complete necessity.



Use Cases and Practical Considerations


The FASES algorithm is most practical on small/medium sized overdetermined data sets.  Cases where as few as 20 observations exist, and there are 10 or more candidate predictors, the algorithm should consistently deliver leading results. This is because of the stagewise process of entering the data – the variables are tested for best fit one at a time. In this way, the number of samples can be much smaller than other regularization techniques as the relationship between the series are examined one at a time before entry in each modelling step.  The method is still relatively fast to process (on the order of seconds on simple CPU hardware) on sets of up to ~1000 predictor variables and a similar number of observations.



A typical guideline in big data analysis is that there should be at least 10 times as many observations as predictors. In cases which don’t meet this sampling requirement, a regularization method should be utilized. The FASES algorithm is itself a regularization method and makes better use of the full data set than commonly used methods such as LASSO or LAR.  This is because it discards less data, and takes advantage of the theoretical advantage of ensemble prediction methods.



On big data sets of many hundreds of thousands of observations FASES is not computationally efficient, and therefore not advisable for use.  In these cases it is still advisable to use a gradient descent method in order to discover non-linear predictive coefficients.   Convolutional Neural Nets, AdaBoost, or other nonlinear interpolative methods are generally though of as the state of the art for applications of this size. But these models perform horribly on smaller data sets (less than 200K observations)  because of their tendency to over-fit the data.



FASES is the most accurate regression model on overdetermined data and runs in seconds on sets of less than 1k observations.






Does your business rely on single instance predictions? Are there more candidate predictor variables than the number of observations? Is the number of observations in your training data set below ~1000? If the answers to these questions is yes, benchmarking results and mathematical theory assure that you can make more consistently accurate and robust predictions using the FASES algorithm.


For exclusive access to this algorithm, please contact us for your enterprise data solutions.


Happy Learning.




A. Krogh,J.Vedelsby Neural network ensembles,cross validation, and active learning 1995: NIPS 7, 231-238.