Beginner help - discrete distributions? RRS feed

  • Question

  • Hello

    I have studied the Infer.NET documentation and I believe I understand most of it. My background is in software engineering, not in mathematics or probability theory, but I think I can just about manage the concepts.

    As I go through each example, I feel I easily 'get' them, and reading this documentation makes Infer.NET seem almost intuitive. (Very inspiring and exciting stuff too). However, I have something in mind I want to accomplish and I can not quite work out how to express it in the Infer.NET vocabulary. I am sure it is just lack of familiarity, rather than something fundamental, so I feel as though I could just use a helping hand now in the beginning to convert what I want to achieve into Infer.NET C#, and then I should be set to go. If someone here could help explain, I would be grateful.

    What I have is a an array of 'observations'. (Nothing to do with infer.net observations, something from the domain I want to model). The array could be quite short, say 5 to 10.  The observations are just strings. Words usually, but not necessarily. If it is relevant, we can cap the string length to something. The array of observations occur simultaneously, so we can consider this array something like the output of an input sensor, except the 'eye' is not operating on pixels but text. So each array of observations is a single 'observation set' by this sensor.

    What I want to assume here is that each value in this input array is a random variable of type string, whose expected value is dependent on the context (the other observations in the 'observation set') and of course the previous 'observation sets'. The position in the array of each string is not important.

    With the above I want to be able to present this 'learner' with various 'observation sets' (arrays of strings) and then ask Infer.NET to predict the most likely (or, if possible,  Top N most likely) 'observation set' given a subset of 'observations'...or alternatively, ask Infer.NET to express the probability of other observations given a limited input, a subset.

    So for example, given the following 'observation sets' 

    "B","A", "C"

    as previous Infer.NET observed values, then I would like to model things in such a way that given
    "A" as a known observation in one set
    then to show that of all available strings presented so far "B" is the most likely to appear, with equal and lesser probability of "C" and "D" appearing alongside too.

    Something is telling me this ought to be trivial with Infer.NET but I just can't seem to find the right way of expressing those ideas.

    Wednesday, December 2, 2015 3:25 PM


All replies

  • If you refer to an 'observation set' as a 'document' then you are trying to model the co-occurrence of words in documents.  See Latent Dirichlet Allocation for an example of this.
    Thursday, December 3, 2015 11:25 AM
  • Yes! Your accurate and succinct summary of what I am trying to do just totally set me on the right track. I just had one of those eureka moments.
    Thursday, December 3, 2015 10:28 PM