# Classifier with Infer.NET (Migrated from community.research.microsoft.com) • ### Question

• Flavien posted on 12-07-2009 11:45 AM

Hi,

I would like to do the following: let's say I have a list of "diseases". Let's assume the patient has only one disease at the same time. Then I have a list of yes/no questions "Do you have fever?", "Do you have a sore throat?"... I would like to ask the user a minimum number of questions to reach a certainty threshold on one of the disease to make a diagnostic. The next question asked should always be the most pertinent, in order to ask as few questions as possible.

After the program has made a diagnostic, the user tells the program which disease he/she actually had, so I would like the program to actually learn from that.

I don't really know where to start. What approach should I look at? Which distributions should I use?

Thanks

Friday, June 3, 2011 5:29 PM

• John Guiver replied on 12-17-2009 5:53 AM

Hi Flavien

I misled you a bit with regard to incremental training. If you are doing exact inference, then

p(θ|A,B) is proportional to p(B|θ)p(A|θ)p(θ) which is proportional to p(B|θ)p(θ|A),

and the order should not matter. Here we are doing approximate inference, so p(θ|A) is an approximation to the posterior based on A. This means that A before B, B before A, or A and B together will all give different approximations.

John

Friday, June 3, 2011 5:30 PM

### All replies

• John Guiver replied on 12-08-2009 6:18 AM

One way to do this would be to learn a set of scores for each question, disease, and response (true or false). If the question is not answered then the score for true and score for false would be chosen with equal probability. The scores could then be summed for each datum, to create log probabilities for each disease. The DiscreteFromLogProbs factor could then be used to convert these to an int variable. The training model would look like this:

int numK = 3; // Number of classes (i.e. diseases)
int numQ = 4; // Number of questions
int numD = 10; // Number of training data
Range K = new Range(numK);
Range Q = new Range(numQ);
Range D = new Range(numD);
var sum = Variable.Array(Variable.Array<double>(K), D);
var y = Variable.Array<int>(D);
var scoresTrue = Variable.Array(Variable.Array<double>(Q), K);
var scoresFalse = Variable.Array(Variable.Array<double>(Q), K);
var noise = Variable.GammaFromShapeAndScale(1, 1);
var scores = Variable.Array(Variable.Array(Variable.Array<double>(Q), K), D);
scoresTrue[K][Q] = Variable.GaussianFromMeanAndPrecision(0, 0.01).ForEach(K,Q);
scoresFalse[K][Q] = Variable.GaussianFromMeanAndPrecision(0, 0.01).ForEach(K, Q);
using (Variable.ForEach(D))
{
// If we don't have an answer, it could be true or false with equal probability

using (Variable.ForEach(Q))
{
using (Variable.If(answerIsMissing[ D ][ Q ]))
answer[ D ][ Q ] = Variable.Bernoulli(0.5);
}
using (Variable.ForEach(K))
{
using (Variable.ForEach(Q))
{
using (Variable.If(answer[ D ][ Q ]))
scores[ D ][ K ][ Q ] = Variable.GaussianFromMeanAndPrecision(scoresTrue[ K ][ Q ], noise);
using (Variable.IfNot(answer[ D ][ Q ]))
scores[ D ][ K ][ Q ] = Variable.GaussianFromMeanAndPrecision(scoresFalse[ K ][ Q ], noise);
}
sum[  D][ K ] = Variable.Sum(scores[ D ][ K ]);
}
// Softmax likelihood

y[ D ] = Variable.DiscreteFromLogProbs(sum[ D ]);
}

There may be better ways to do this, but the following test shows that this model is doing the right thing - for example look at the sum posteriors. To use this model in diagnostic mode you could build an analagous test model with a single datum, and with y not observed (I'll let you do that). Given any partial set of answers, you will then be able to get the posterior of y as a Discrete distribution. Perhaps you would use the scores to determine what order to give questions in.

bool[][] answerIsMissingObs = new bool[][] {
new bool[] {false, false, false, false}, new bool[] {false, true, false, false},   new bool[] {false, false, false, false},
new bool[] {false, false, true, false},   new bool[] {false, false, true, false},   new bool[] {false, false, false, false},
new bool[] {true, false, false, false},   new bool[] {false, false, false, false}, new bool[] {false, false, false, true},
new bool[] {true, true, false, false}};
bool[][] answerObs = new bool[][] {
new bool[] {true, true, true, false},    new bool[] {true, true, false, false},    new bool[] {true, false, true, false},
new bool[] {true, false, false, true},  new bool[] {false, true, true, false},    new bool[] {false, true, false, false},
new bool[] {false, true, true, false},  new bool[] {true, false, false, true},   new bool[] {false, false, false, true},
new bool[] {true, true, false, true}};
int[] yObs = new int[] {1, 1, 0, 0, 2, 2, 2, 1, 2, 1};
y.ObservedValue = yObs;
var ie = new InferenceEngine(new VariationalMessagePassing());
var scoreFalsePost = ie.Infer<Gaussian[][]>(scoresFalse);
var scoreTruePost = ie.Infer<Gaussian[][]>(scoresTrue);
var sumPost = ie.Infer<Gaussian[][]>(sum);

John G.

Friday, June 3, 2011 5:29 PM
• Flavien replied on 12-08-2009 9:40 AM

I adapted the model for a single datum for the diagnostics mode:

var sum = Variable.Array<double>(K);

var scoresTrue = Variable.Array(Variable.Array<double>(Q), K);

var scoresFalse = Variable.Array(Variable.Array<double>(Q), K);

var noise = Variable.GammaFromShapeAndScale(1, 1);

var scores = Variable.Array(Variable.Array<double>(Q), K);

scoresTrue[K][Q] =

Variable.GaussianFromMeanAndPrecision(0, 0.01).ForEach(K, Q);

scoresFalse[K][Q] = Variable.GaussianFromMeanAndPrecision(0, 0.01).ForEach(K, Q);

// If we don't have an answer, it could be true or false with equal probability

using (Variable.ForEach(Q))

{

Variable.Bernoulli(0.5);

}

using (Variable.ForEach(K))

{

using (Variable.ForEach(Q))

{

scores[K][Q] =

Variable.GaussianFromMeanAndPrecision(scoresTrue[K][Q], noise);

scores[K][Q] =

Variable.GaussianFromMeanAndPrecision(scoresFalse[K][Q], noise);

}

sum[K] =

Variable.Sum(scores[K]);

}

// Softmax likelihood

Variable<int> y = Variable.DiscreteFromLogProbs(sum);

but after setting the observed values for answer and answerIsMissing, when I run the inference on y, I get an error message " Cannot automatically determine distribution type for variable type 'int': you must specify a MarginalPrototype attribute for variable 'vint__0'. in int[] vint__0"

var

yPost = ie.Infer<Discrete>(y);

Also I'm not sure how to input the results of the inference during the training mode into the diagnostics mode.

Friday, June 3, 2011 5:29 PM
• John Guiver replied on 12-08-2009 12:02 PM

DiscreteFromLogProbs factor is only supported for observed y for VariationalMessagePassing. Use the Softmax factor in feedforward mode to get out a distribution over probability vectors (i.e. a Dirichlet distribution) - you can then get the mean of this. Following code snippet shows this piece of it:

Variable<Vector> y = Variable.Softmax(sum);
var yPost = ie.Infer<Dirichlet>(y);
var yPostMean = yPost.GetMean();

As regards hooking up the results of the training, the posteriors for scoresTrue, scoresFalse, and noise now become priors for the test model:

var trainedScoresTrue = Variable.Observed(scoreTruePost, K, Q);
var trainedScoresFalse = Variable.Observed(scoreFalsePost, K, Q);
var scoresTrue = Variable.Array(Variable.Array<double>(Q), K);
var scoresFalse = Variable.Array(Variable.Array<double>(Q), K);
scoresTrue[K][Q] = Variable.Random<double,Gaussian>(trainedScoresTrue[K][Q]);
scoresFalse[K][Q] = Variable.Random<double, Gaussian>(trainedScoresFalse[K][Q]);
var trainedNoise = Variable.Observed(noisePost);
var noise = Variable.Random<double, Gamma>(trainedNoise);

John G.

Friday, June 3, 2011 5:29 PM
• Flavien replied on 12-09-2009 10:19 AM

Thanks John, this works.

Now if I want the user to answer things such as "I don't know", or "Probably", the answer should probably be rated between 0 (definetly no) and 1 (definetly yes). 0.5 could be "I don't know" or "unanswered question", "probably" would be 0.75.

I changed the model to something like that:
and
using (Variable.ForEach( D ))
{
using (Variable.ForEach(K))
{
using (Variable.ForEach(Q))
{
scores[ D ][K][Q] =
Variable.GaussianFromMeanAndPrecision(scoresTrue[K][Q], noise) * answer[ D ][Q] + Variable.GaussianFromMeanAndPrecision(scoresFalse[K][Q], noise) * (1 - answer[ D ][Q]);
}
sum[ D ][K] =
Variable.Sum(scores[ D ][K]);
}
// Softmax likelihood
y[ D ] = Variable.DiscreteFromLogProbs(sum[ D ]);

Is it correct?

Also I was wondering if I have to store all the observed values. Is it possible to run an inference, store the posteriors, and when I get a new set of observed values, reuse the previous posteriors, and run the inference using just the new set of observed values? Or do I have to run the training on the complete set of observed values all the time?

Nice talk at PDC09, by the way Flavien

Friday, June 3, 2011 5:30 PM
• John Guiver replied on 12-09-2009 10:51 AM

Yes - you could certainly do it that way. You could also look at just having one set of scores rather than scores for true or for false, though what I like about your model and my original model is that they treat true and false equivalently rather than imposing an .assumption that a true answer gives more information than a false answer.  If you had a large amount of data, you could also use a switch statement based on the response. You could also consider learning a bias for any of the models we've discussed thus far.

As regards your seond question. You can always use posteriors from a previous model as priors for a new training session or test prediction - you don't need to keep the observed values around. However, note that these posteriors will reflect all the data you have seen so far.

Thanks with regards to PDC - it was an interested audience I think - hopefully we pitched it about right.

John

Friday, June 3, 2011 5:30 PM
• Flavien replied on 12-10-2009 11:14 AM

Hi John,

I am now trying to reuse the priors for a new training. I did the following:

double[][] trueMeans = ...;
double[][] falseMeans = ...;
Gamma noisePriors = ...;

var sum = Variable.Array(Variable.Array<double>(K), D);
var y = Variable.Array<int>(D);
var scoresTrue = Variable.Array(Variable.Array<double>(Q), K);
var scoresFalse = Variable.Array(Variable.Array<double>(Q), K);

var
noiseDistribution = Variable.Observed(noisePriors);
var noise = Variable.Random<double, Gamma>(noiseDistribution);

var
scores = Variable.Array(Variable.Array(Variable.Array<double>(Q), K), D);
var scoresTrueMean = Variable.Observed(trueMeans, K, Q);
var scoresFalseMean = Variable.Observed(falseMeans, K, Q);

scoresTrue[K][Q] =
Variable.GaussianFromMeanAndPrecision(scoresTrueMean[K][Q], noise);
scoresFalse[K][Q] =
Variable.GaussianFromMeanAndPrecision(scoresFalseMean[K][Q], noise);

But it seems that when I do one training with 10 observed values, I get a different result than when I do a training with 9 observed values, then reuse the post as priors for a new training with the 10th observed value.

Did I setup something wrong?

Thanks

Friday, June 3, 2011 5:30 PM
• John Guiver replied on 12-14-2009 4:51 AM

Hi Flavien

Your definitions of scoreTrueMean and scoresTrueFalse are not quite right. For example, the noise variable should not be involved in their definitions. If you are going to do incremental learning, you will have to define scoresTrue and scoresFalse as follows:

scoresTrue[K][Q] = Variable.Random<double,Gaussian>(priorScoresTrue[K][Q]);
scoresFalse[K][Q] = Variable.Random<double, Gaussian>(priorScoresFalse[K][Q]);

where priorScoresTrue and priorScoresFalse are variables over distributions which take on the posterior distributions from the previous bit of training. (Rather than making them variables over distributions, we could, more simply, make them just distributions, but this would incur a compile each time we ran.) The 'Random' factor says that scoresTrue[K][Q] and scoresTrue[K][Q] are distributed with distributions priorScoresTrue[K][Q] and priorScoresFalse[K][Q] respectively.

Define priorScoresTrue and priorScoresFalse as follows:

// Prior distributions for model. These are set up as jagged variable arrays over Gaussian distributions
// so that the model is not recompiled each time we run it
var priorScoresTrue = Variable.Array(Variable.Array<Gaussian>(Q), K);
var priorScoresFalse = Variable.Array(Variable.Array<Gaussian>(Q), K);

You can set the observed values for priorScoresTrue and priorScoresFalse to the posteriors you extracted from the end of the previous training session:

// Posteriors from the previous incremental training
var scoreTruePost = ie.Infer<Gaussian[][]>(scoresTrue);
var scoreFalsePost = ie.Infer<Gaussian[][]>(scoresFalse);

// Set the priors for the next incremental training
priorScoresTrue .ObservedValue = scoreTruePost;
priorScoresFalse .ObservedValue = scoreFalsePost;

John

Friday, June 3, 2011 5:30 PM
• Flavien replied on 12-15-2009 12:04 PM

Hi,

Thanks for your answer. I noticed that though: when I train with a set of observed values A, resue the posteriors as priors and train with a set of observed values B, the posteriors after the second training are different as when I train once with the set of observed values A+B in one single training:

DistributionsData initialDistributions = new DistributionsData()
{
ScoresTrue =
Enumerable.Range(0, 3).Select(y => Enumerable.Range(0, 4).Select(x => new Gaussian(0, 0.01)).ToArray()).ToArray(),
ScoresFalse =
Enumerable.Range(0, 3).Select(y => Enumerable.Range(0, 4).Select(x => new Gaussian(0, 0.01)).ToArray()).ToArray(),
Noise =
new Gamma(1, 1)
};

DistributionsData intermediate = Model.Train(initialDistributions,
new double[][] { new double[] {0, 0.5, 0, 0.5} },
new int[] { 1 });

DistributionsData result1 = Model.Train(intermediate, new double[][] {
new double[] {1, 0.5, 1, 0.5} },
new int[] { 2 });

DistributionsData result2 = Model.Train(initialDistributions,
new double[][] { new double[] { 0, 0.5, 0, 0.5 }, new double[] { 1, 0.5, 1, 0.5 } },
new int[] { 1, 2 });

Friday, June 3, 2011 5:30 PM
• John Guiver replied on 12-17-2009 5:53 AM

Hi Flavien

I misled you a bit with regard to incremental training. If you are doing exact inference, then

p(θ|A,B) is proportional to p(B|θ)p(A|θ)p(θ) which is proportional to p(B|θ)p(θ|A),

and the order should not matter. Here we are doing approximate inference, so p(θ|A) is an approximation to the posterior based on A. This means that A before B, B before A, or A and B together will all give different approximations.

John

Friday, June 3, 2011 5:30 PM