NIPY logo

Site Navigation

NIPY Community

Table Of Contents

This Page

neurospin.clustering.bgmm

Module: neurospin.clustering.bgmm

Inheritance diagram for nipy.neurospin.clustering.bgmm:

Bayesian Gaussian Mixture Model Classes: contains the basic fields and methods of Bayesian GMMs the high level functions are/should be binded in C

The base class BGMM relies on an implementation that perfoms Gibbs sampling

A derived class VBGMM uses Variational Bayes inference instead

A third class is introduces to take advnatge of the old C-bindings, but it is limited to diagonal covariance models

fixme: the docs should be rewritten

Author : Bertrand Thirion, 2008-2009

Classes

BGMM

class nipy.neurospin.clustering.bgmm.BGMM(k=1, dim=1, means=None, precisions=None, weights=None, shrinkage=None, dof=None)

Bases: nipy.neurospin.clustering.gmm.GMM

This class implements Bayesian GMMs

this class contains the follwing fields - k (int): the number of components in the mixture - dim (int): is the dimension of the data - means array of shape (k,dim): all the means of the components - precisions array of shape (k,dim,dim): the precisions of the componenets - weights: array of shsape (k) weights of the mixture

  • shrinkage : array of shape (k):

scaling factor of the posterior precisions on the mean - dof : array of shape (k): the posterior dofs

  • prior_means : array of shape (k,dim):

the prior on the components means - prior_scale : array of shape (k,dim): the prior on the components precisions - prior_dof : array of shape (k): the prior on the dof (should be at least equal to dim) - prior_shrinkage : array of shape (k): scaling factor of the prior precisions on the mean - prior_weights : array of shape (k) the prior on the components weights - shrinkage : array of shape (k): scaling factor of the posterior precisions on the mean - dof : array of shape (k): the posterior dofs

fixme : - E-step and mstep, inhereitde from GMM, should be overriden/removed ? - only ‘full’ preicsion is supported

Methods

average_log_like
bayes_factor
bic
check
check_x
conditional_posterior_proba
estimate
evidence
guess_priors
guess_regularizing
initialize
initialize_and_estimate
likelihood
map_label
mixture_likelihood
plugin
pop
probability_under_prior
sample
sample_and_average
sample_indicator
set_priors
show
show_components
test
train
unweighted_likelihood
update
update_means
update_precisions
update_weights
__init__(k=1, dim=1, means=None, precisions=None, weights=None, shrinkage=None, dof=None)
Initialize the structure, at least with the dimensions of the problem At most, with what is necessary to compute the likelihood of a point under the model
average_log_like(x, tiny=1.0000000000000001e-15)

returns the averaged log-likelihood of the model for the dataset x

Parameters:

x: array of shape (nbitems,self.dim) :

the data used in the estimation process

tiny = 1.e-15: a small constant to avoid numerical singularities :

bayes_factor(x, z, nperm=0, verbose=0)

Evaluate the Bayes Factor of the current model using Chib’s method

Parameters:

x: array of shape (nbitems,dim) :

the data from which bic is computed

z: array of shape (nbitems), type = np.int :

the corresponding classification

nperm=0: int :

the number of permutations to sample to model the label switching issue in the computation of the Bayes Factor By default, exhaustive permutations are used

verbose=0: verbosity mode :

Returns:

bf (float) the computed evidence (Bayes factor) :

bic(like, tiny=1.0000000000000001e-15)

computation of bic approximation of evidence

Parameters:

like, array of shape (nbitem,self.k) :

component-wise likelihood

tiny=1.e-15, a small constant to avoid numerical singularities :

Returns:

the bic value :

check()
Checking the shape of sifferent matrices involved in the model
check_x(x)

essentially check that x.shape[1]==self.dim

x is returned with possibly reshaping

conditional_posterior_proba(x, z)

Compute the probability of the current parameters of self given x and z

Parameters:

x= array of shape (nbitems,dim) :

the data from which bic is computed

z= array of shape (nbitems), type = np.int :

the corresponding classification

estimate(x, niter=100, delta=0.0001, verbose=0)

estimation of self given a dataset x

Parameters:

x array of shape (nbitem,dim) :

the data from which the model is estimated

niter=100: maximal number of iterations in the estimation process :

delta = 1.e-4: increment of data likelihood at which :

convergence is declared

verbose=0: verbosity mode :

Returns:

bic : an asymptotic approximation of model evidence

evidence(x, z, nperm=0, verbose=0)
See bayes_factor(self,x,z,nperm=0,verbose=0)
guess_priors(x, nocheck=0)

Set the priors in order of having them weakly uninformative this is from Fraley and raftery; Journal of Classification 24:155-181 (2007)

Parameters:

x, array of shape (nbitems,self.dim) :

the data used in the estimation process

nocheck=0, Boolean, if nocheck==True, check is skipped :

guess_regularizing(x, bcheck=1)

Set the regularizing priors as weakly informative according to Fraley and raftery; Journal of Classification 24:155-181 (2007)

Parameters:

x array of shape (nbitems,dim) :

the data used in the estimation process

initialize(x)

initialize z using a k-means algorithm, then upate the parameters

Parameters:

x: array of shape (nbitems,self.dim) :

the data used in the estimation process

initialize_and_estimate(x, z=None, niter=100, delta=0.0001, ninit=1, verbose=0)

estimation of self given x

Parameters:

x array of shape (nbitem,dim) :

the data from which the model is estimated

z = None: array of shape (nbitem) :

a prior labelling of the data to initialize the computation

niter=100: maximal number of iterations in the estimation process :

delta = 1.e-4: increment of data likelihood at which :

convergence is declared

ninit=1: number of initialization performed :

to reach a good solution

verbose=0: verbosity mode :

Returns:

the best model is returned :

likelihood(x)

return the likelihood of the model for the data x the values are weighted by the components weights

Parameters:

x array of shape (nbitems,self.dim) :

the data used in the estimation process

Returns:

like, array of shape(nbitem,self.k) :

component-wise likelihood

map_label(x, like=None)

return the MAP labelling of x

Parameters:

x array of shape (nbitem,dim) :

the data under study

like=None array of shape(nbitem,self.k) :

component-wise likelihood if like==None, it is recomputed

Returns:

z: array of shape(nbitem): the resulting MAP labelling :

of the rows of x

mixture_likelihood(x)

returns the likelihood of the mixture for x

Parameters:

x: array of shape (nbitems,self.dim) :

the data used in the estimation process

plugin(means, precisions, weights)

Set manually the weights, means and precision of the model

Parameters:

means: array of shape (self.k,self.dim) :

precisions: array of shape (self.k,self.dim,self.dim) :

or (self.k, self.dim)

weights: array of shape (self.k) :

pop(z)

compute the population, i.e. the statistics of allocation

Parameters:

z array of shape (nbitems), type = np.int :

the allocation variable

Returns:

hist : array shape (self.k)n count variable

probability_under_prior()
Compute the probability of the current parameters of self given the priors
sample(x, niter=1, mem=0, verbose=0)

sample the indicator and parameters

Parameters:

x array of shape (nbitems,self.dim) :

the data used in the estimation process

niter=1 : the number of iterations to perform

mem=0: if mem, the best values of the parameters are computed :

verbose=0: verbosity mode :

Returns:

best_weights: array of shape (self.k) :

best_means: array of shape (self.k,self.dim) :

best_precisions: array of shape (self.k,self.dim,self.dim) :

possibleZ: array of shape (nbitems,niter) :

the z that give the highest posterior to the data is returned first

sample_and_average(x, niter=1, verbose=0)
sample the indicator and parameters the average values for weights,means, precisions are returned
Parameters:

x = array of shape (nbitems,dim) :

the data from which bic is computed

niter=1: number of iterations

Returns:

weights: array of shape (self.k) :

means: array of shape (self.k,self.dim) precisions: array of shape (self.k,self.dim,self.dim)

or (self.k, self.dim) these are the average parameters across samplings

sample_indicator(like)

sample the indicator from the likelihood

Parameters:

like: array of shape (nbitem,self.k) :

component-wise likelihood

Returns:

z: array of shape(nbitem): a draw of the membership variable :

set_priors(prior_means, prior_weights, prior_scale, prior_dof, prior_shrinkage)

Set the prior of the BGMM

Parameters:

prior_means: array of shape (self.k,self.dim) :

prior_weights: array of shape (self.k) :

prior_scale: array of shape (self.k,self.dim,self.dim) :

prior_dof: array of shape (self.k) :

prior_shrinkage: array of shape (self.k) :

show(x, gd, density=None, nbf=-1)
Function to plot a GMM -WIP Currently, works only in 1D and 2D
show_components(x, gd, density=None, mpaxes=None)

Function to plot a GMM – Currently, works only in 1D

Parameters:

x: array of shape(nbitems,dim) :

the data under study used to draw an histogram

gd: grid descriptor structure :

density = None: :

density of the model one the discrete grid implied by gd

mpaxes = None: axes handle to make the figure :

if None, a new figure is created

test(x, tiny=1.0000000000000001e-15)

returns the log-likelihood of the mixture for x

Parameters:

x array of shape (nbitems,self.dim) :

the data used in the estimation process

Returns:

ll: array of shape(nbitems) :

the log-likelihood of the rows of x

train(x, z=None, niter=100, delta=0.0001, ninit=1, verbose=0)
idem initialize_and_estimate
unweighted_likelihood(x)

return the likelihood of each data for each component the values are not weighted by the component weights

Parameters:

x: array of shape (nbitems,self.dim) :

the data used in the estimation process

Returns:

like, array of shape(nbitem,self.k) :

unweighted component-wise likelihood

update(x, z)

update function (draw a sample of the GMM parameters)

Parameters:

x array of shape (nbitems,self.dim) :

the data used in the estimation process

z array of shape (nbitems), type = np.int :

the corresponding classification

update_means(x, z)

Given the allocation vector z, and the corresponding data x, resample the mean

Parameters:

x array of shape (nbitems,self.dim) :

the data used in the estimation process

z array of shape (nbitems), type = np.int :

the corresponding classification

update_precisions(x, z)

Given the allocation vector z, and the corresponding data x, resample the precisions

Parameters:

x array of shape (nbitems,self.dim) :

the data used in the estimation process

z array of shape (nbitems), type = np.int :

the corresponding classification

update_weights(z)

Given the allocation vector z, resmaple the weights parameter

Parameters:

z array of shape (nbitems), type = np.int :

the allocation variable

VBGMM

class nipy.neurospin.clustering.bgmm.VBGMM(k=1, dim=1, means=None, precisions=None, weights=None, shrinkage=None, dof=None)

Bases: nipy.neurospin.clustering.bgmm.BGMM

Particular subcalss of Bayesian GMMs (BGMM) that implements Variational bayes estimation of the parameters

Methods

average_log_like
bayes_factor
bic
check
check_x
conditional_posterior_proba
estimate
evidence
guess_priors
guess_regularizing
initialize
initialize_and_estimate
likelihood
map_label
mixture_likelihood
plugin
pop
probability_under_prior
sample
sample_and_average
sample_indicator
set_priors
show
show_components
test
train
unweighted_likelihood
update
update_means
update_precisions
update_weights
__init__(k=1, dim=1, means=None, precisions=None, weights=None, shrinkage=None, dof=None)
average_log_like(x, tiny=1.0000000000000001e-15)

returns the averaged log-likelihood of the model for the dataset x

Parameters:

x: array of shape (nbitems,self.dim) :

the data used in the estimation process

tiny = 1.e-15: a small constant to avoid numerical singularities :

bayes_factor(x, z, nperm=0, verbose=0)

Evaluate the Bayes Factor of the current model using Chib’s method

Parameters:

x: array of shape (nbitems,dim) :

the data from which bic is computed

z: array of shape (nbitems), type = np.int :

the corresponding classification

nperm=0: int :

the number of permutations to sample to model the label switching issue in the computation of the Bayes Factor By default, exhaustive permutations are used

verbose=0: verbosity mode :

Returns:

bf (float) the computed evidence (Bayes factor) :

bic(like, tiny=1.0000000000000001e-15)

computation of bic approximation of evidence

Parameters:

like, array of shape (nbitem,self.k) :

component-wise likelihood

tiny=1.e-15, a small constant to avoid numerical singularities :

Returns:

the bic value :

check()
Checking the shape of sifferent matrices involved in the model
check_x(x)

essentially check that x.shape[1]==self.dim

x is returned with possibly reshaping

conditional_posterior_proba(x, z)

Compute the probability of the current parameters of self given x and z

Parameters:

x= array of shape (nbitems,dim) :

the data from which bic is computed

z= array of shape (nbitems), type = np.int :

the corresponding classification

estimate(x, niter=100, delta=0.0001, verbose=0)

estimation of self given x

Parameters:

x array of shape (nbitem,dim) :

the data from which the model is estimated

z = None: array of shape (nbitem) :

a prior labelling of the data to initialize the computation

niter=100: maximal number of iterations in the estimation process :

delta = 1.e-4: increment of data likelihood at which :

convergence is declared

verbose=0: :

verbosity mode

evidence(x, L=None, verbose=0)

computation of evidence or integrated likelihood

Parameters:

x array of shape (nbitems,dim) :

the data from which bic is computed

l=None: array of shape (nbitem,self.k) :

component-wise likelihood If None, it is recomputed

verbose=0: verbosity model :

Returns:

ev (float) the computed evidence :

guess_priors(x, nocheck=0)

Set the priors in order of having them weakly uninformative this is from Fraley and raftery; Journal of Classification 24:155-181 (2007)

Parameters:

x, array of shape (nbitems,self.dim) :

the data used in the estimation process

nocheck=0, Boolean, if nocheck==True, check is skipped :

guess_regularizing(x, bcheck=1)

Set the regularizing priors as weakly informative according to Fraley and raftery; Journal of Classification 24:155-181 (2007)

Parameters:

x array of shape (nbitems,dim) :

the data used in the estimation process

initialize(x)

initialize z using a k-means algorithm, then upate the parameters

Parameters:

x: array of shape (nbitems,self.dim) :

the data used in the estimation process

initialize_and_estimate(x, z=None, niter=100, delta=0.0001, ninit=1, verbose=0)

estimation of self given x

Parameters:

x array of shape (nbitem,dim) :

the data from which the model is estimated

z = None: array of shape (nbitem) :

a prior labelling of the data to initialize the computation

niter=100: maximal number of iterations in the estimation process :

delta = 1.e-4: increment of data likelihood at which :

convergence is declared

ninit=1: number of initialization performed :

to reach a good solution

verbose=0: verbosity mode :

Returns:

the best model is returned :

likelihood(x)

return the likelihood of the model for the data x the values are weighted by the components weights

Parameters:

x: array of shape (nbitems,self.dim) :

the data used in the estimation process

Returns:

L array of shape(nbitem,self.k) :

component-wise likelihood

map_label(x, L=None)

return the MAP labelling of x

Parameters:

x array of shape (nbitem,dim) :

the data under study

L=None array of shape(nbitem,self.k) :

component-wise likelihood if L==None, it is recomputed

Returns:

z: array of shape(nbitem): the resulting MAP labelling :

of the rows of x

mixture_likelihood(x)

returns the likelihood of the mixture for x

Parameters:

x: array of shape (nbitems,self.dim) :

the data used in the estimation process

plugin(means, precisions, weights)

Set manually the weights, means and precision of the model

Parameters:

means: array of shape (self.k,self.dim) :

precisions: array of shape (self.k,self.dim,self.dim) :

or (self.k, self.dim)

weights: array of shape (self.k) :

pop(z)

compute the population, i.e. the statistics of allocation

Parameters:

z array of shape (nbitems), type = np.int :

the allocation variable

Returns:

hist : array shape (self.k)n count variable

probability_under_prior()
Compute the probability of the current parameters of self given the priors
sample(x, niter=1, mem=0, verbose=0)

sample the indicator and parameters

Parameters:

x array of shape (nbitems,self.dim) :

the data used in the estimation process

niter=1 : the number of iterations to perform

mem=0: if mem, the best values of the parameters are computed :

verbose=0: verbosity mode :

Returns:

best_weights: array of shape (self.k) :

best_means: array of shape (self.k,self.dim) :

best_precisions: array of shape (self.k,self.dim,self.dim) :

possibleZ: array of shape (nbitems,niter) :

the z that give the highest posterior to the data is returned first

sample_and_average(x, niter=1, verbose=0)
sample the indicator and parameters the average values for weights,means, precisions are returned
Parameters:

x = array of shape (nbitems,dim) :

the data from which bic is computed

niter=1: number of iterations

Returns:

weights: array of shape (self.k) :

means: array of shape (self.k,self.dim) precisions: array of shape (self.k,self.dim,self.dim)

or (self.k, self.dim) these are the average parameters across samplings

sample_indicator(like)

sample the indicator from the likelihood

Parameters:

like: array of shape (nbitem,self.k) :

component-wise likelihood

Returns:

z: array of shape(nbitem): a draw of the membership variable :

set_priors(prior_means, prior_weights, prior_scale, prior_dof, prior_shrinkage)

Set the prior of the BGMM

Parameters:

prior_means: array of shape (self.k,self.dim) :

prior_weights: array of shape (self.k) :

prior_scale: array of shape (self.k,self.dim,self.dim) :

prior_dof: array of shape (self.k) :

prior_shrinkage: array of shape (self.k) :

show(x, gd, density=None, nbf=-1)
Function to plot a GMM -WIP Currently, works only in 1D and 2D
show_components(x, gd, density=None, mpaxes=None)

Function to plot a GMM – Currently, works only in 1D

Parameters:

x: array of shape(nbitems,dim) :

the data under study used to draw an histogram

gd: grid descriptor structure :

density = None: :

density of the model one the discrete grid implied by gd

mpaxes = None: axes handle to make the figure :

if None, a new figure is created

test(x, tiny=1.0000000000000001e-15)

returns the log-likelihood of the mixture for x

Parameters:

x array of shape (nbitems,self.dim) :

the data used in the estimation process

Returns:

ll: array of shape(nbitems) :

the log-likelihood of the rows of x

train(x, z=None, niter=100, delta=0.0001, ninit=1, verbose=0)
idem initialize_and_estimate
unweighted_likelihood(x)

return the likelihood of each data for each component the values are not weighted by the component weights

Parameters:

x: array of shape (nbitems,self.dim) :

the data used in the estimation process

Returns:

like, array of shape(nbitem,self.k) :

unweighted component-wise likelihood

update(x, z)

update function (draw a sample of the GMM parameters)

Parameters:

x array of shape (nbitems,self.dim) :

the data used in the estimation process

z array of shape (nbitems), type = np.int :

the corresponding classification

update_means(x, z)

Given the allocation vector z, and the corresponding data x, resample the mean

Parameters:

x array of shape (nbitems,self.dim) :

the data used in the estimation process

z array of shape (nbitems), type = np.int :

the corresponding classification

update_precisions(x, z)

Given the allocation vector z, and the corresponding data x, resample the precisions

Parameters:

x array of shape (nbitems,self.dim) :

the data used in the estimation process

z array of shape (nbitems), type = np.int :

the corresponding classification

update_weights(z)

Given the allocation vector z, resmaple the weights parameter

Parameters:

z array of shape (nbitems), type = np.int :

the allocation variable

Functions

nipy.neurospin.clustering.bgmm.Wishart_eval(n, V, W, dV=None, dW=None, piV=None)

Evaluation of the probability of W under Wishart(n,V)

Parameters:

n: float, :

the number of degrees of freedom (dofs)

V: array of shape (n,n) :

the scale matrix of the Wishart density

W: array of shape (n,n) :

the sample to be evaluated

dV: float, optional, :

determinant of V

dW: float, optional, :

determinant of W

piV: array of shape (n,n), optional :

psuedo-inverse of V

Returns:

(float) the density :

nipy.neurospin.clustering.bgmm.apply_perm(perm, z)
Permutation of the values of z
nipy.neurospin.clustering.bgmm.dirichlet_eval(w, alpha)

Evaluate the probability of a certain discrete draw w from the Dirichlet density with parameters alpha

Parameters:

w: array of shape (n) :

alpha: array of shape (n) :

FIXME : check that the dimensions of x and alpha are compatible

nipy.neurospin.clustering.bgmm.dkl_dirichlet(w1, w2)
returns the KL divergence between two dirichelt distribution of parameters w1 and w2
nipy.neurospin.clustering.bgmm.dkl_gaussian(m1, P1, m2, P2)
Returns the KL divergence between gausians with densities (m1,P1) and (m2,P2) where m = mean and P = precision
nipy.neurospin.clustering.bgmm.dkl_wishart(a1, B1, a2, B2)
returns the KL divergence bteween two Wishart distribution of parameters (a1,B1) and (a2,B2), where a1 and a2 are degrees of freedom B1 and B2 are scale matrices
nipy.neurospin.clustering.bgmm.generate_Wishart(n, V)

Generate a sample from Wishart

Parameters:

n (scalar) = the number of degrees of freedom (dofs) :

V = array of shape (n,n) the scale matrix :

Returns:

W: array of shape (n,n): the Wishart draw :

nipy.neurospin.clustering.bgmm.generate_normals(m, P)

Generate a Gaussian sample with mean m and precision P

Parameters:

m array of shape n: the mean vector :

P array of shape (n,n): the precision matrix :

Returns:

ng : array of shape(n): a draw from the gaussian density

nipy.neurospin.clustering.bgmm.generate_perm(k, nperm=100)
returns an array of shape(nbperm, k) representing the permutations of k elements
Parameters:

k, int the number of elements to be permuted :

nperm=100 the maximal number of permutations if gamma(k+1)>nperm: only nperm random draws are generated

Returns:

p: array of shape(nperm,k): each row is permutation of k :

nipy.neurospin.clustering.bgmm.multinomial(Likelihood)

Generate samples form a miltivariate distribution

Parameters:

Likelihood: array of shape (nelements, nclasses): :

likelihood of each element belongin to each class each row is assumedt to sum to 1 One sample is draw from each row, resulting in

Returns:

z array of shape (nelements): the draws, :

that take values in [0..nclasses-1]

nipy.neurospin.clustering.bgmm.normal_eval(mu, P, x, dP=None)

Probability of x under normal(mu,inv(P))

Parameters:

mu: array of shape (n): the mean parameter :

P: array of shape (n,n): the precision matrix :

x: array of shape (n): the data to be evaluated :

Returns:

(float) the density :

nipy.neurospin.clustering.bgmm.pop(self, L, tiny=1.0000000000000001e-15)

compute the population, i.e. the statistics of allocation

Parameters:

L array of shape (nbitem,self.k): :

the likelihood of each item being in each class