locked
Microsoft Machine Learning getting sample sizes and other information RRS feed

  • Question

  • Hi,

    We are using the Microsoft Machine Learning library (Microsoft.ML). We have the following script working:

               var trainingData = mlContext.Data.LoadFromTextFile<CSOData>(
                    path: @"C:\Users\Administrator\source\repos\WindowsFormsApp1\WindowsFormsApp1\level-data - reduced.txt",
                    hasHeader: false,
                    separatorChar: ',');
    
                // set up a learning pipeline
                // step 1: concatenate input features into a single column
                var pipeline = mlContext.Transforms.Concatenate(
                    "Features",
                    "Level")
    
                    // step 2: use the k-means clustering algorithm
                    // assume there are 3 clusters
                    .Append(mlContext.Clustering.Trainers.KMeans(
                        "Features",
                        numberOfClusters: 3));
    
                // train the model on the data file
                Debug.WriteLine("Start training model....");
                TransformerChain<ClusteringPredictionTransformer<KMeansModelParameters>> model = pipeline.Fit(trainingData);
                Debug.WriteLine("Model training complete!");
    
                // Transform data
                IDataView transformedData = model.Transform(trainingData);
    
                VBuffer<float>[] centroids = null;
                var last = model.LastTransformer;
                KMeansModelParameters kparams = (KMeansModelParameters)last.GetType().GetProperty("Model").GetValue(last);
                kparams.GetClusterCentroids(ref centroids, out int k);
                float cluster1 = centroids[0].GetValues().ToArray().FirstOrDefault();
                float cluster2 = centroids[1].GetValues().ToArray().FirstOrDefault();
                float cluster3 = centroids[2].GetValues().ToArray().FirstOrDefault();
    
                Debug.WriteLine(cluster1);
                Debug.WriteLine(cluster2);
                Debug.WriteLine(cluster3);
    So we are able to get the centroids of each cluster. What we need is the number of samples in each cluster and the withinss value for each cluster but we just cannot work out how to do this.

    Does anyone know how to access these values?  

    Regards

    Ian Hannah


    • Moved by CoolDadTx Friday, August 14, 2020 1:29 PM Azure related
    Thursday, August 13, 2020 8:22 PM

All replies

  • This forum is for C#-specific questions only. All Azure-related questions, including Machine Learning, have been moved to the new Microsoft Q&A forums. Please post your question there.

    Michael Taylor http://www.michaeltaylorp3.net

    Friday, August 14, 2020 1:29 PM