locked
data input for BayesPointMachineClassifier RRS feed

  • Question

  • Dear Sir, I am trying to figure out how to feed input data for BayesPointMachineClassifier but failed to get it to work even if I have read the documents. I think it would be much better if it allows to directly input data table or csv text string like the Accord framework (eg: http://accord-framework.net/docs/html/T_Accord_MachineLearning_Bayes_NaiveBayes.htm). My question:  

    Suppose I have a C# dataTable (or csv text file) as follows. The last column gender is the column we want to classify/predict. Could you show me the easiest way to feed this dataTable to BayesPointMachineClassifer? The column names can be used as feature name. Thank you for your kind help.  

    height weight footlength    gender
    166     120     22                f
    190     166     31                m
    171     132    28                 f
    175     159     25               m
    191     170     33               m


    • Edited by Jason_Peng Wednesday, May 25, 2016 1:48 AM
    Wednesday, May 25, 2016 1:25 AM

All replies

  • Hi Jason,

    The simplest way of using the Bayes Point Machine (BPM) classifier is to use the command-line runner (see documentation, in particular the section on Data Format on the bottom). The data you provide as an example would hence need to be available in a text file and formatted as follows (to work with the command-line version of the BPM):

    f height:166 weight:120 footlenth:22 bias:1
    m height:190 weight:166 footlenth:31 bias:1
    f height:171 weight:132 footlenth:28 bias:1
    m height:175 weight:159 footlenth:25 bias:1
    m height:191 weight:170 footlenth:33 bias:1

    Note that the format expects the label to be the first entry and features to come in a sparse representation. Note also that I have added a bias (as a constant feature).

    You could then run cross-validation, say:

    Learner Classifier BinaryBayesPointMachine CrossValidate --data-set jason.data --results jason.cross-validation.csv --folds 2 --batches 1 --iterations 25
    

    Training, prediction and evaluation would be run as follows:

    Learner Classifier BinaryBayesPointMachine Train --training-set jason.train --model jason.mdl 
    Learner Classifier BinaryBayesPointMachine Predict --test-set jason.test --model jason.mdl --predictions jason.predictions
    Learner Classifier Evaluate --ground-truth jason.test --predictions jason.predictions --report jason.evaluation.txt --roc-curve jason.roc.csv
    
    where jason.train and jason.test both have the format shown above.

    If you want more control over how data is read, you may want to use the C#/.NET API of the BPM. You may already have seen the tutorial introduction. While implementing what is called a mapping gives you ultimate control, it also means you would have to write your own data reader (probably in C#).

    Hope this helps!

    Alex

    Friday, May 27, 2016 9:35 AM