# Selecting a basis for Gaussian Process Classifier

• ### Question

• From the example:

http://research.microsoft.com/en-us/um/cambridge/projects/infernet/docs/Gaussian%20Process%20classifier.aspx

To get a sparse Gaussian Process, we pair a GaussianProcess with a set of basis vectors.  The basis vectors are intended to summarize the set of inputs into a smaller set.  By changing the size of the basis, you control the cost of the inference.  (For details, see the references at the end.)  If the basis set is exactly the set of inputs, then the distribution is equivalent to a full (non-sparse) Gaussian Process. A good strategy for computing the basis is to cluster the input vectors.  Another approach is to use a random subset of the input vectors.  Here for simplicity we will set them by hand to roughly partition the range of the inputs:

Could someone please explain what this basis represents.  Assuming that I am either randomly selecting my inputs or have identified what cluster each input belongs to.  How can I translate this into a basis that is representative of that that (random or cluster).  What I am not sure of is what each row in the basis below means exactly.  In the example link why are there only 4 rows?

// The basis
Vector[] basis = new Vector[] {
Vector.FromArray(new double[2] {0.2, 0.2}),
Vector.FromArray(new double[2] {0.2, 0.8}),
Vector.FromArray(new double[2] {0.8, 0.2}),
Vector.FromArray(new double[2] {0.8, 0.8})
};

Monday, August 12, 2013 5:45 PM

• The centroids themselves would be the basis points.
Tuesday, August 13, 2013 5:08 PM

### All replies

• In this simple example only 4 representative points spanning the whole 2-D space were used. In general you will need many more. Please see http://social.microsoft.com/Forums/en-US/d241c713-c3e5-4ad1-9081-c5d6d3a7d4b8/is-there-a-function-or-method-in-infernet-that-will-let-me-calculate-basis-vectors-from-training.

John

Tuesday, August 13, 2013 1:18 PM
• In this simple example only 4 representative points spanning the whole 2-D space were used. In general you will need many more. Please see http://social.microsoft.com/Forums/en-US/d241c713-c3e5-4ad1-9081-c5d6d3a7d4b8/is-there-a-function-or-method-in-infernet-that-will-let-me-calculate-basis-vectors-from-training.

John

There is talk in that post of using clustering, but nothing demonstrating it.

Assuming I have a cluster label for each one of my inputs.  What is the next step in applying the cluster information to form my basis.  This is what I am confused about.

Tuesday, August 13, 2013 3:56 PM
• You don't use the labels for clustering - those are just used in the GP classifier. The clustering is done on the input data to create some reasonable basis points (representative of the data) for the Sparse GP.

John

Tuesday, August 13, 2013 4:05 PM
• You don't use the labels for clustering - those are just used in the GP classifier. The clustering is done on the input data to create some reasonable basis points (representative of the data) for the Sparse GP.

John

Once clustering is run on the input what can be used to create some reasonable basis points? That is what I am uncertain of.  I have my centroids, I have the distance to each centroid.  What can be used here to generate them?
Tuesday, August 13, 2013 4:25 PM
• The centroids themselves would be the basis points.
Tuesday, August 13, 2013 5:08 PM
• The centroids themselves would be the basis points.
So the basis itself would have a maximum size equal to the number of clusters that I have?
Tuesday, August 13, 2013 5:26 PM
• Yes.
Tuesday, August 13, 2013 5:27 PM