Asked by:
newbie question: estimate parentage in pedigree

Question
-
Hi
the INFER.NET engine seems to be very promising in my field of research (plant genetics) and I would love to get on board but have some start up issues.
In a pilot project I produced some code (in VB) to generate plants and (by linking them up) a pedigree structure.
I currently use recursive code to obtain the expected average fraction of a certain founding plant in each of my derived progeny. I would like to replace this (point) estimate with a proper INFER.NET posterior distribution, setting prior values (.observedvalue?) to the founders and deriving posterior probabilities for mean and variance for the expected fraction of founder genome at each of the progeny.
It seems to me this must be possible using infer.net ; the probability to derive genome from a parent could be defined as a bernoulli(0.5)? but I am new to infer.net and need some help to get started.
Can someone give me some input how to implement this in a way similar to my getIBDwithTarget function (see code below)?
Thanks a lot
Ralph van Berloo
Imports MicrosoftResearch.Infer Imports MicrosoftResearch.Infer.Models Imports MicrosoftResearch.Infer.Distributions Public Class Plant Property Parent1 As Plant Property Parent2 As Plant Property Name As String Sub New(ByVal parent1 As Plant, ByVal parent2 As Plant, ByVal name As String) _Parent1 = parent1 _Parent2 = parent2 _Name = name End Sub Function IsFounder() As Boolean Return (_Parent1 Is Nothing AndAlso _Parent2 Is Nothing) End Function End Class Public Class DeriveIBD ' interest is in expectation AND variance for fraction of genome derived from Founder1 Dim founder1 As New Plant(Nothing, Nothing, "Founder1") Dim founder2 As New Plant(Nothing, Nothing, "Founder2") Dim F1 As New Plant(founder1, founder2, "F1") ' so expected fraction = 0.5 Dim BackCross1 As New Plant(F1, founder1, "BackCross1") ' so expected fraction = 0.75 Dim BackCross2 As New Plant(BackCross1, founder1, "BackCross2") ' so expected fraction = 0.875 Dim F2 As New Plant(F1, F1, "F2") ' expected fraction = again 0.5 but with larger variance Sub Main() Dim F1_IBD As Double = getIBDwithTarget(F1, founder1) Dim BackcrossIBD As Double = getIBDwithTarget(BackCross1, founder1) Dim Backcross1IBD As Double = getIBDwithTarget(BackCross1, founder1) Dim Backcross2IBD As Double = getIBDwithTarget(BackCross2, founder1) Dim F2_IBD As Double = getIBDwithTarget(F2, founder1) Console.WriteLine("IBD of different progeny with founder 1:") Console.WriteLine("F1: " + F1_IBD.ToString) Console.WriteLine("BackCross1: " + Backcross1IBD.ToString) Console.WriteLine("BackCross2: " + Backcross2IBD.ToString) Console.WriteLine("F2: " + F2_IBD.ToString) End Sub Function getIBDwithTarget(ByVal tester As Plant, ByVal reference As Plant) As Double ' to be replaced with a Discrete? Dim result As Double ' Discrete? If tester.IsFounder Then If tester Is reference Then 'in case our tester is a founder it can either be the reference or not result = 1 Else result = 0 End If Else ' in case tester is not a founder it will have received half of its genome from either parent result = 0.5 * getIBDwithTarget(tester.Parent1, reference) + 0.5 * getIBDwithTarget(tester.Parent2, reference) End If Return result End Function End Class
Running this gives as output:
IBD of different progeny with founder 1: F1: 0.5 BackCross1: 0.75 BackCross2: 0.875 F2: 0.5
So my code yields the expected answers, but as point estimates only. I would like to use proper distributions allowing me to sample these distributions (also in more complex pedigrees) and get estimates for variance, extremes and perform tests for exceeding thresholds etc.
Tuesday, November 6, 2012 2:49 PM
All replies
-
Hi Ralph
Are you saying that you have uncertainty in the 0.5's in getIDBwithTarget? So that result is a random variable distributed between 0 and 1?
If so, your model will involve products and sums of Beta-distributed random variables which is not supported in Infer.NET.
If you are just inferring in the forward direction it may be straightforward to add these operators, but probably just best to directly build a sampler for this. You can use the Beta distribution classes to do this.
John
Wednesday, November 7, 2012 6:12 PMOwner -
Hi John
I am a biologists with interest in programming and mathematics but certainly no expert in these fields, so bear with me please. This is just a proof of concept so for the moment I will settle for an example that gives me a feeling on what can be done in this way.
Back to the biology to try and clarify the problem a bit more:
When moving from parent to offspring, at an individual chromosome level there is a probability of 0.5 for a single chromosome to be transmitted to the next generation (disregarding recombination for the moment) but for a whole genome with say 10 chromosomes involved this probability will have a distribution peaking at 0.5, but with tails on both sides. Pherhaps this (the fraction of inherited genome) can be modeled using a gaussian? or should I model the number of inherited chromsomes as a discrete(0,N+1) ?
Does infer.net support recursive buildup of probabilities and if yes; if I go for the gaussian approach, with mu=0.5 and sigma 0.1 how should my recursive function look? I am indeed mostly interested in forward inferences here.
Thanks for your help.
Ralph
Thursday, November 8, 2012 2:15 PM -
If the number of chromosomes is small, then you can model this at the chromosome level. If there are K founders then each chromosome would have a random integer from 0, ..., (K-1) indicating which founder the chromosome came from. Call these 'founder indices'. In Infer.NET, this would be implemented as a VariableArray<int> for each progeny (the array ranges over chromosomes). Every time a progeny is created, you would generate its founder indices by making a random choice between the founder indices of the parents, for each chromosome. This can be done using branching, as in a mixture model. By inferring the distribution over founder indices, you can then work out the fractions. To make this more efficient, you could represent only the counts of indices, however that is more complicated to write in Infer.NET so I'd recommend not doing this until you are familiar with the approach.Friday, November 9, 2012 1:59 PMOwner