Inheritance diagram for nipy.neurospin.clustering.hierarchical_clustering:
These routines perform some hierrachical agglomerative clustering of some input data. The following alternatives are proposed: - Distance based average-link - Similarity-based average-link - Distance based maximum-link - Ward’s algorithm under graph constraints - Ward’s algorithm without graph constraints
In this latest version, the results are returned in a ‘WeightedForest’ structure, which gives access to the clustering hierarchy, facilitates the plot of the result etc.
For back-compatibility, *_segment versions of the algorithms have been appended, with the old API (except the qmax parameter, which now represents the number of wanted clusters)
Author : Bertrand Thirion,Pamela Guevara, 2006-2009
Bases: nipy.neurospin.graph.forest.Forest
This is a weighted Forest structure, i.e. a tree - ecah node has one parent and children (hierarchical structure) - some of the nodes can be viewed as leaves, other as roots - the edges within a tree are associated with a weight: +1 from child to parent -1 from parent to child - additionally, the nodes have a value, which is called ‘height’, especially useful from dendrograms
Methods
| Parameters: | V: the number of edges of the graph : parents=None: array of shape (V) :
height=None: array of shape(V) :
|
|---|
Creates the Minimum Spanning Tree self using Kruskal’s algo. efficient is self is sparse
| Returns: | K: WeightedGraph instance :
|
|---|
Creates the Minimum Spanning Tree self using Kruskal’s algo. efficient is self is sparse
| Returns: | K: WeightedGraph instance :
|
|---|
label = self.Voronoi_Labelling(seed) performs a voronoi labelling of the graph
| Parameters: | seed array of shape (nseeds), type (np.int), :
|
|---|---|
| Returns: | - labels : array of shape (self.V) the labelling of the vertices fixme: how is dealt the case of diconnected graph ? : |
Defines the graph as the Voronoi diagram (VD) that links the seeds. The VD is defined using the sample points.
| Parameters: | seeds: array of shape (self.V,dim) : samples: array of shape (nsamples,dim) : |
|---|
returns the sum of weighted degree of graph self
| Parameters: | c (int): side selection :
|
|---|---|
| Returns: | wd : array of shape (self.V),
|
Create the adjacency matrix of self
| Returns: | A : an ((self.V*self.V),np.double) array
|
|---|
returns all the distances of the graph as a tree
| Parameters: | seed=None array of shape(nbseed) with valuesin [0..self.V-1] :
|
|---|---|
| Returns: | dg: array of shape(nseed, self.V): the resulting distance : |
Returns an array of labels corresponding to the different connex components of the graph.
| Returns: | label: array of shape(self.V), labelling of the vertices : |
|---|
Check that the proposed is indeed a graph, i.e. contains no loop
| Returns: | a boolean b=0 iff there are loops, 1 otherwise : |
|---|
Extraction of the graphe cliques these are defined using replicator dynamics equations
| Returns: | - cliques: array of shape (self.V), type (np.int) :
|
|---|
self.compute_children() define the children list
| Returns: | children: a list of self.V lists, :
|
|---|
self.cut_redudancies() Remove possibly redundant edges: if an edge (ab) is present twice in the edge matrix, only the first instance in kept. The weights are processed accordingly
| Returns: | - E(int): the number of edges, self.E : |
|---|
returns the degree of the graph vertices
| Returns: | rdegree: array of shape self.V, the right degree : ldegree: array of shape self.V, the left degree : |
|---|
compute a labelling of the nodes which is 0 for the leaves, 1 for their parents etc and maximal for the roots
| Returns: | depth: array of shape (self.V): the depth values of the vertices : |
|---|
returns all the [graph] geodesic distances starting from seed it is mandatory that the graph weights are non-negative
| Parameters: | seed (int, >-1,<self.V) or array of shape(p) :
|
|---|---|
| Returns: | dg: array of shape (self.V) , :
|
set the graph to be the eps-nearest-neighbours graph of the data
| Parameters: | X array of shape (self.V) or (self.V,p) :
eps=1. (float), the neighborhood width : |
|---|---|
| Returns: | self.E the number of edges of the resulting graph : |
Idem plot, but the valid edges are enahanced
| Returns: | ax, the axes handle of the plot : |
|---|
Compute all the geodesic distances starting from seeds it is mandatory that the graph weights are non-negative
| Parameters: | seed= None: array of shape (nbseed), type np.int :
|
|---|---|
| Returns: | dg array of shape (nbseed,self.V) :
|
set the graph to be the topological neighbours graph of the thre-dimensional coordinate set xyz, in the k-connectivity scheme
| Parameters: | xyz: array of shape (self.V,3) and type np.int, : k = 18: the number of neighbours considered. (6,18 or 26) : |
|---|---|
| Returns: | E(int): the number of edges of self : |
sets the edges of self according to the adjacency matrix M
| Parameters: | M: array of shape(sef.V,self.V) : |
|---|
E = knn(X,k) set the graph to be the k-nearest-neighbours graph of the data
| Parameters: | X array of shape (self.V) or (self.V,p) :
k=1 : is the number of neighbours considered |
|---|---|
| Returns: | - self.E (int): the number of edges of the resulting graph : |
tests whether the given nodes within ids represent all the leaves of a certain subtree of self
| Parameters: | idds: array of shape (n) that takes values in [0..self.V-1] : custom == False, boolean :
|
|---|
| Returns: | the left incidence matrix of self :
|
|---|
Returns the indexes of the vertices within the main cc
| Returns: | idx: array of shape (sizeof main cc) : |
|---|
makes self the MST of the array X
| Parameters: | X: an array of shape (self.V,dim) :
|
|---|---|
| Returns: | tl (float) the total length of the mst : |
Normalize the graph according to the index c Normalization means that the sum of the edges values that go into or out each vertex must sum to 1
| Parameters: | c=0 in {0,1,2}, optional: index that designates the way :
|
|---|
Plot the dendrogram associated with self the rank of the data in the dendogram is returned
| Parameters: | ax: axis handle, optional : |
|---|---|
| Returns: | ax, the axis handle : |
label = self.propagate_upward(label) Assuming that label is a certain positive integer field (i.e. labels) that is defined at the leaves of the tree and can be compared, this propagates these labels to the parents whenever the children nodes have coherent properties otherwise the parent value is unchanged
| Parameters: | label: array of shape(self.V) : |
|---|---|
| Returns: | label: array of shape(self.V) : |
propagates some binary property in the forest that is defined in the leaves so that prop[parents] = logical_and(prop[children])
| Parameters: | prop, array of shape(self.V), the input property : |
|---|---|
| Returns: | prop, array of shape(self.V), the output property field : |
Removes all the edges for which valid==0
| Parameters: | valid, an array of shape (self.E) : |
|---|
Removes trivial edges, i.e. edges that are (vv)-like self.weights and self.E are corrected accordingly
| Returns: | - self.E (int): The number of edges : |
|---|
Reorder the graph according to the index c
| Parameters: | c=0 in {0,1,2}, index that designates the array :
|
|---|
reorder the tree so that the leaves come first then their parents and so on, and the roots are last the permutation necessary to apply to all vertex-based information
| Returns: | order: array of shape(self.V) :
|
|---|
| Returns: | the right incidence matrix of self :
|
|---|
l = self.subtree(k) returns an array of the nodes included in the subtree rooted in k
| Parameters: | k (int): the vertex from which the subtree is searched : |
|---|---|
| Returns: | idx : array of shape>=1 the index of the nodes beneath k |
Compute the weights of the graph as the distances between the corresponding rows of X, which represents an embdedding of self
| Parameters: | X array of shape (self.V, edim), :
|
|---|
Compute the weights of the graph as a gaussian function of the dinstance between the corresponding rows of X, which represents an embdedding of self
| Parameters: | X array of shape (self.V,dim) :
sigma=0, float : the parameter of the gaussian function |
|---|
| Parameters: | weights : an array of shape(self.V), edges weights |
|---|
a = self.show(X=None) plots the current graph in 2D
| Parameters: | X=None, array of shape (self.V,2) :
ax: ax handle, optional : |
|---|---|
| Returns: | ax: axis handle : |
creates a subforest with the vertices for which valid>0 and with the correponding set of edges the children of deleted vertices become their own parent
| Parameters: | valid: array of shape (self.V) : |
|---|---|
| Returns: | a new forest instance : |
Creates a subgraph with the vertices for which valid>0 and with the correponding set of edges
| Parameters: | valid array of shape (self.V): nonzero for vertices to be retained : |
|---|---|
| Returns: | G WeightedGraph instance, the desired subgraph of self : |
converts the graph to a neighboring system The neighboring system is nothing but a (sparser) representation of the edge matrix
| Returns: | ci, ne, we: arrays of shape (self.V+1), (self.E), (self.E) :
|
|---|
Average link clustering based on a pairwise distance matrix.
| Parameters: | D array of shape (nbitem,nbitem) with nonnegative values :
verbose=0, verbosity level : |
|---|---|
| Returns: | t a weightForest structure that represents the dendrogram of the data : |
Average link clustering based on a pairwise distance matrix.
| Parameters: | D: a (n,n) distance matrix between some items : stop=-1: stopping criterion, i.e. distance threshold at which :
qmax = 1; the number of desired clusters :
verbose=0, verbosity level : |
|---|---|
| Returns: | u: a labelling of the graph vertices according to the criterion : cost the cost of each merge step during the clustering procedure : |
Average link clustering based on data matrix.
| Parameters: | X array of shape (nbitem,dim): data matrix :
verbose=0, verbosity level : |
|---|---|
| Returns: | t a weightForest structure that represents the dendrogram of the data : |
Agglomerative function based on a (hopefully sparse) similarity graph
| Parameters: | G the input graph : |
|---|---|
| Returns: | t a weightForest structure that represents the dendrogram of the data : |
Agglomerative function based on a (hopefully sparse) similarity graph
| Parameters: | G the input graph : stop=0: the stopping criterion : qmax=1: the number of desired clusters :
|
|---|---|
| Returns: | u: array of shape (G.V) :
cost: array of shape (G.V (?)) :
|
maximum link clustering based on a pairwise distance matrix.
| Parameters: | D: array of shape (n,n) :
verbose=0, verbosity level : |
|---|---|
| Returns: | t a weightForest structure that represents the dendrogram of the data : |
maximum link clustering based on a pairwise distance matrix.
| Returns: | u: array of shape (nbitems) :
cost: array of shape (nbitems-1) :
|
|---|
Maximum link clustering based on data matrix.
| Parameters: | X: array of shape (nbitem,dim) :
verbose=0, verbosity level : |
|---|---|
| Returns: | t a weightForest structure that represents the dendrogram of the data : |
Agglomerative function based on a topology-defining graph and a feature matrix.
| Returns: | t: a WeightedForest structure that represents the dendrogram : |
|---|
Agglomerative function based on a field structure
| Parameters: | F the input field (graph+feature) : stop = -1: the stopping crterion :
qmax=1: the maximum number of desired clusters :
|
|---|---|
| Returns: | u: array of shape (F.V) :
cost array of shape (F.V-1) :
|
Agglomerative function based on a topology-defining graph and a feature matrix.
| Parameters: | G graph instance, :
feature: array of shape (G.V,dim_feature): :
|
|---|---|
| Returns: | t: weightForest instance, :
|
Agglomerative function based on a topology-defining graph and a feature matrix.
| Parameters: | G: neurospin.graph.WeightedGraph instance :
feature array of shape (G.V,dim_feature) :
stop = -1: the stopping crterion :
qmax=1: the maximum number of desired clusters :
|
|---|---|
| Returns: | u: array of shape (G.V) :
cost: array of shape (G.V-1) :
|
Agglomerative function based on a topology-defining graph and a feature matrix.
| Parameters: | G the input graph (a topological graph essentially) : feature array of shape (G.V,dim_feature) :
stop = -1: the stopping crterion :
qmax=1: the maximum number of desired clusters :
|
|---|---|
| Returns: | u: array of shape (G.V): :
cost: array of shape (G.V-1) :
NOTE: : A euclidean distance is used in the feature space : caveat : when the number of cc in G (nbcc)
|
Ward clustering based on a Feature matrix
| Returns: | t a weightForest structure that represents the dendrogram of the data : |
|---|