locked
Dataset size and strategy for predicting a single output column RRS feed

  • Question

  • Hi there,

    Let me firstly start by admitting my lack of background in mathematics, and relatively low knowledge of current Artificial Intelligence techniques, especially in comparison to some of the concepts introduced in Infer.net.

    For a few years I have been developing a platform that predicts short term price movements in the foreign exchange markets with some success. My current platform makes use of a mixture of decision trees and back-propagating neural networks to drive the predictions. In both cases I am taking 50+ inputs nodes to predict a single output node. My results so far are positive, but I constantly strive to improve my accuracy by looking for new/better inputs, and in this case - new/better artificial intelligence platforms. 

    My question is - how well will Infer.net deal with this type of use? From browsing some of the resources it seems apparent that there are many choices when using the Infer.net engine, and it seems more complex than some of the libraries I have been used to (alglib, aforge.net). Will infer.net handle this many inputs with good performance/memory handling?

    Additionally, can anyone provide any advice of the type of use I should be aiming for. When it comes to Bayesian Point Machines, Gaussian Processing etc. I get a bit lost.

    Thanks in advance.

    Friday, March 30, 2012 10:31 AM

All replies

  • Infer.NET can be thought of a language to build probabilistic models. The idea is that you should build a model to represent your unique problem rather than adapt your problem to use a particular black-box approach such as a neural net. We cannot give specific advice about modeling on this forum. There are certainly models which perform a similar function to neural nets. BPMs for example; non-linearity is typically handled by discretising continuous input features, and large scale feature spaces can be make use of the sparsity mechanisms in Infer.NET. Decision tree leaves can be used as features in such a BPM. Gaussian Processes directly model non-linearity in a Bayesian manner (so you get an uncertainty measure with your predictions), but they do no typically scale well with feature space dimension. HMMs and other chain type models can be used to model dynamics.

    John

    Wednesday, April 4, 2012 9:47 AM
    Owner