locked
newbie question: estimate parentage in pedigree RRS feed

  • Question

  • Hi

    the INFER.NET engine seems to be very promising in my field of research (plant genetics) and I would love to get on board but have some start up issues.

    In a pilot project I produced some code (in VB) to generate plants and (by linking them up) a pedigree structure.

    I currently use recursive code to obtain the expected average fraction of a certain founding plant in each of my derived progeny. I would like to replace this (point) estimate with a proper INFER.NET posterior distribution, setting prior values (.observedvalue?) to the founders and deriving posterior probabilities for mean and variance for the expected fraction of founder genome at each of the progeny.

    It seems to me this must be possible using infer.net ; the probability to derive genome from a parent could be defined as a bernoulli(0.5)?  but I am new to infer.net and need some help to get started.

    Can someone give me some input how to implement this in a way similar to my getIBDwithTarget function (see code below)?

    Thanks a lot

    Ralph van Berloo


    Imports MicrosoftResearch.Infer
    Imports MicrosoftResearch.Infer.Models
    Imports MicrosoftResearch.Infer.Distributions
    
    Public Class Plant
        Property Parent1 As Plant
        Property Parent2 As Plant
        Property Name As String
    
        Sub New(ByVal parent1 As Plant, ByVal parent2 As Plant, ByVal name As String)
            _Parent1 = parent1
            _Parent2 = parent2
            _Name = name
        End Sub
    
        Function IsFounder() As Boolean
            Return (_Parent1 Is Nothing AndAlso _Parent2 Is Nothing)
        End Function
    End Class
    
    Public Class DeriveIBD
    
        ' interest is in expectation AND variance for fraction of genome derived from Founder1
        Dim founder1 As New Plant(Nothing, Nothing, "Founder1")
        Dim founder2 As New Plant(Nothing, Nothing, "Founder2")
        Dim F1 As New Plant(founder1, founder2, "F1")                ' so expected fraction = 0.5
        Dim BackCross1 As New Plant(F1, founder1, "BackCross1")      ' so expected fraction = 0.75
        Dim BackCross2 As New Plant(BackCross1, founder1, "BackCross2")      ' so expected fraction = 0.875
        Dim F2 As New Plant(F1, F1, "F2")                            ' expected fraction = again 0.5 but with larger variance
    
        Sub Main()
            Dim F1_IBD As Double = getIBDwithTarget(F1, founder1)
            Dim BackcrossIBD As Double = getIBDwithTarget(BackCross1, founder1)
            Dim Backcross1IBD As Double = getIBDwithTarget(BackCross1, founder1)
            Dim Backcross2IBD As Double = getIBDwithTarget(BackCross2, founder1)
            Dim F2_IBD As Double = getIBDwithTarget(F2, founder1)
    
            Console.WriteLine("IBD of different progeny with founder 1:")
            Console.WriteLine("F1: " + F1_IBD.ToString)
            Console.WriteLine("BackCross1: " + Backcross1IBD.ToString)
            Console.WriteLine("BackCross2: " + Backcross2IBD.ToString)
            Console.WriteLine("F2: " + F2_IBD.ToString)
        End Sub
    
    
        Function getIBDwithTarget(ByVal tester As Plant, ByVal reference As Plant) As Double ' to be replaced with a Discrete?
            Dim result As Double ' Discrete?
            If tester.IsFounder Then
                If tester Is reference Then  'in case our tester is a founder it can either be the reference or not
                    result = 1
                Else
                    result = 0
                End If
            Else
                ' in case tester is not a founder it will have received half of its genome from either parent
                result = 0.5 * getIBDwithTarget(tester.Parent1, reference) + 0.5 * getIBDwithTarget(tester.Parent2, reference)
            End If
            Return result
        End Function
    
    End Class


    Running this gives as output:

    IBD of different progeny with founder 1:
    F1: 0.5
    BackCross1: 0.75
    BackCross2: 0.875
    F2: 0.5

    So my code yields the expected answers, but as point estimates only. I would like to use proper distributions allowing me to sample these distributions (also in more complex pedigrees) and get estimates for variance, extremes and perform tests for exceeding thresholds etc.

    Tuesday, November 6, 2012 2:49 PM

All replies

  • Hi Ralph

    Are you saying that you have uncertainty in the 0.5's in getIDBwithTarget? So that result is a random variable distributed between 0 and 1?

    If so, your model will involve products and sums of Beta-distributed random variables which is not supported in Infer.NET.

    If you are just inferring in the forward direction it may be straightforward to add these operators, but probably just best to directly build a sampler for this. You can use the Beta distribution classes to do this.

    John

    Wednesday, November 7, 2012 6:12 PM
    Owner
  • Hi John

    I am a biologists with interest in programming and mathematics but certainly no expert in these fields, so bear with me please. This is just a proof of concept so for the moment I will settle for an example that gives me a feeling on what can be done in this way.

    Back to the biology to try and clarify the problem a bit more:

    When moving from parent to offspring, at an individual chromosome level there is a probability of 0.5 for a single chromosome to be transmitted to the next generation (disregarding recombination for the moment) but for a whole genome with say 10 chromosomes involved this probability will have a distribution peaking at 0.5, but with tails on both sides. Pherhaps this (the fraction of inherited genome) can be modeled using a gaussian?  or should I model the number of inherited chromsomes as a discrete(0,N+1) ?

    Does infer.net support recursive buildup of probabilities and if yes; if I go for the gaussian approach, with mu=0.5 and sigma 0.1 how should my recursive function look?  I am indeed mostly interested in forward inferences here.

    Thanks for your help.

    Ralph

    Thursday, November 8, 2012 2:15 PM
  • If the number of chromosomes is small, then you can model this at the chromosome level.  If there are K founders then each chromosome would have a random integer from 0, ..., (K-1) indicating which founder the chromosome came from.  Call these 'founder indices'.  In Infer.NET, this would be implemented as a VariableArray<int> for each progeny (the array ranges over chromosomes).  Every time a progeny is created, you would generate its founder indices by making a random choice between the founder indices of the parents, for each chromosome.  This can be done using branching, as in a mixture model.  By inferring the distribution over founder indices, you can then work out the fractions.  To make this more efficient, you could represent only the counts of indices, however that is more complicated to write in Infer.NET so I'd recommend not doing this until you are familiar with the approach.
    Friday, November 9, 2012 1:59 PM
    Owner