Answered by:
SumWhere and Vector of Bool with VMP vs InnerProd and Vector of double with EP
Question

I think I have gotten my model coded up correctly, but I am not getting good results on synthetic data. While I am debugging possible issues I was wondering whether someone had insight about using SumWhere and Vector of Bool with VMP vs InnerProd and Vector of double with EP.
Currently I have it coded up with SumWhere and a Vector of Bool (which forced me to use VMP).
I can supply my generated factor graph if my account is verified if that would help.
Friday, September 15, 2017 2:08 PM
Answers

When using Vector variables, the posterior approximation will represent the full covariance matrix of the vector. Therefore inference will be slow with large vectors. Arrays do not have this problem.
 Marked as answer by cyentist Wednesday, September 20, 2017 8:08 PM
Monday, September 18, 2017 8:43 AMOwner
All replies

What are you trying to model? If possible, how the main model code looks like?
Did you try to provide more data samples and/or do more sampling steps?
Friday, September 15, 2017 4:17 PM 
I am trying to write up a generalization of the StudentSKills tutorial. The tutorial uses a particular model called the DINA model and I want to program up a model called the GDINA model. The code is a bit complicated to paste here, but I feel like it would be well summarized by the factor graph.
I have increased the number of observations and iterations which has improved the fit significantly but the algorithm has slowed to a crawl.Friday, September 15, 2017 8:16 PM 
May be the issue is with initialization? Or providing slightly more informed priors?
Saturday, September 16, 2017 12:30 AM 
When using Vector variables, the posterior approximation will represent the full covariance matrix of the vector. Therefore inference will be slow with large vectors. Arrays do not have this problem.
 Marked as answer by cyentist Wednesday, September 20, 2017 8:08 PM
Monday, September 18, 2017 8:43 AMOwner 
I'm not sure I fully understand the differences between VariableArray and Variable<Vector>. Why doesn't InnerProduct accept two VariableArray<Double>? Similarly, why doesn't SumWhere accept VariableArray of bool and VariableArray of double?
In my code I am doing things like this:
lambda2Prod[student][question] = Variable.SumWhere(hasBothSkills[student][question], Variable.Vector(lambda2[question]));
where lambda2[question] is VariableArray<double>. Is this appropriate use of SumWhere? Is there a better way to handle this without using Variable.Vector?
 Edited by cyentist Monday, September 18, 2017 1:38 PM
Monday, September 18, 2017 1:33 PM 
Those overloads don't exist simply because they haven't been written yet. InnerProduct of two arrays will be in the next version, as listed at Release change history. Your code snippet is perfectly fine, it will just have the overhead that I pointed out. If you want to avoid using vectors, then use Variable.Sum as explained in the other thread.Monday, September 18, 2017 1:43 PMOwner

If the Boolean input to SumWhere is fixed, then you don't really need SumWhere since you can look up the relevant indices in lambda2 (using Subarray) and sum those. Something like this is already done in the Dina example.Monday, September 18, 2017 1:51 PMOwner

In my model, lambda2 is a vector of parameters for each skill to skill interaction (K skills => K(k1)/2 skill interactions). So the relevant model code is:
using (Variable.ForEach(student)) { using (Variable.ForEach(question)) { VariableArray<bool> hasSkills = Variable.Subarray(hasSkill[student], skillsRequiredForQuestion[question]).Named("hasSkills"); using (Variable.ForEach(interaction)) { hasBothSkills[student][question][interaction] = hasSkills[interactionIndex1[question][interaction]] & hasSkills[interactionIndex2[question][interaction]]; } Variable<bool> hasAllSkills = Variable.AllTrue(hasSkills).Named("hasAllSkills"); lambda1Prod[student][question] = Variable.SumWhere(hasSkills, Variable.Vector(lambda1[question])); lambda2Prod[student][question] = Variable.SumWhere(hasBothSkills[student][question], Variable.Vector(lambda2[question])); Variable<double> eta = lambda0[question] + lambda1Prod[student][question] + lambda2Prod[student][question]; Variable<double> prob1 = Variable.Logistic(eta + lambda3[question]).Named("prob1"); Variable<double> prob2 = Variable.Logistic(eta).Named("prob2"); using (Variable.If(hasAllSkills)) { responses[student][question] = Variable.Bernoulli(prob1); } using (Variable.IfNot(hasAllSkills)) { responses[student][question] = Variable.Bernoulli(prob2); } } }
where interactionIndex1 and interactionIndex2 are observed VariableArrays that convert from skilltoskill index to (j=1,..,K(k1)/2) to the two skills involved in the interaction. Does this seem like the appropriate way to code that up or is there a more efficient way? Like I said the code runs pretty slow.
Monday, September 18, 2017 3:26 PM 
Also is there an ETA on the next release?Monday, September 18, 2017 3:27 PM

If there are many skills, then your best option is to convert hasSkills into a double array (as in this thread), and use direct multiplication with lambda1 followed by a sum.Monday, September 18, 2017 3:43 PMOwner

Here is a simple way to implement SumWhere for the array case:
public static Variable<double> SumWhere(VariableArray<bool> a, VariableArray<double> b) { Range range = a.Range; VariableArray<double> products = Variable.Array<double>(range); using (Variable.ForEach(range)) { using (Variable.If(a[range])) { products[range] = b[range]; } using(Variable.IfNot(a[range])) { products[range] = 0.0; } } return Variable.Sum(products); }
Tuesday, September 19, 2017 11:42 AMOwner