Package org.snpeff.geneOntology
Class GoTerms
java.lang.Object
org.snpeff.geneOntology.GoTerms
- All Implemented Interfaces:
Serializable,Iterable<GoTerm>
A collection of GO terms
- Author:
- Pablo Cingolani
- See Also:
-
Field Summary
Fields -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionAdd a GOTerm (if not already in this GOTerms) WARNING: Creates 'fake' symbolNames based on symbolIds.voidaddInterestingSymbol(String symbolId, int rank, HashSet<String> noGoTermFound) Add a symbol as 'interesting' symbol (to every corresponding GOTerm in this set)booleanaddSymbolId(GoTerm goTerm, String symbolId) Add a symbolId (as well as all needed mappings)voidUse symbols for chids in DAG For every GOTerm, each child's symbols are added to the term so that root term contains every symbol and every interestingSymbolCreate a set with all the symbolsvoidcheckInterestingSymbolIds(Set<String> interestingSymbolIds) Checks that every symboolID is in the set (as 'interesting' symbols)disjointSet(List<GoTerm> goTermList, int activeSets) Produce a GOTerm based on a list of GOTerms and a 'mask'getGoTermsBySymbolId(String symbolId) intgetLabel()intintGet symbol's rankiterator()Iterate through each GOterm in this GOTermskeySet()intlevels()Calculate each node's level (in DAG)listTopTerms(int numberToSelect) Select a number of GOTermsintCalculate how many interesting symbol-IDs in are there in all these GOTermsintNumber of nodes in this DAGintCalculate the number of nodes in that have at least one interesting symbolintCalculate the number of nodes in that have at least one annotated symbolintCalculate how many symbol-IDs in are there in all these GOTermsvoidreadGeneAssocFile(String goGenesFile, boolean useGeneId) Reads a file containing every gene (names and ids) associated GO termsvoidreadInterestingSymbolIdsFile(String fileName) Reads a file with a list of 'interesting' genes (one per line)voidreadOboFile(String oboFile, boolean removeObsolete) Read an OBO filevoidremoveGOTerm(String goTermAcc) Remove a GOTermvoidReset every 'interesting' symbolId (on every single GOTerm in this GOTerms)voidsaveGseaGeneSets(String fileName) Save gene sets file for GSEA analysis Format specification: http://www.broad.mit.edu/cancer/software/gsea/wiki/index.php/Data_formats#GMT:_Gene_Matrix_Transposed_file_format_.28.2A.gmt.29voidtoString()values()Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, waitMethods inherited from interface java.lang.Iterable
forEach, spliterator
-
Field Details
-
debug
public static boolean debug -
verbose
public static boolean verbose
-
-
Constructor Details
-
GoTerms
public GoTerms()Default constructor -
GoTerms
public GoTerms(String oboFile, String nameSpace, String interestingGenesFile, String geneAssocFile, boolean removeObsolete, boolean useGeneId) Constructor- Parameters:
oboFile- : Path to OBO description filenameSpace- : Can be 'null' for "all namespaces"interestingGenesFile- : Path to a file containing a list of 'interesting' genes (one geneName per line)geneAssocFile- : A file containing lines like: "GOterm \t gene_product_id \t gene_name \n"
-
-
Method Details
-
add
Add a GOTerm (if not already in this GOTerms) WARNING: Creates 'fake' symbolNames based on symbolIds. This method is used mostly for testing / debugging -
addInterestingSymbol
Add a symbol as 'interesting' symbol (to every corresponding GOTerm in this set)- Parameters:
symbolName- : Symbol's namerank- : symbol's ranknoGoTermFound- : Add symbol here if there are no GOTerms associated with this symbol
-
addSymbolId
Add a symbolId (as well as all needed mappings)- Parameters:
goTermAcc-symbolId-symbolName-goTermType-description-- Returns:
- true if OK, false on error (GOTerm 'goTermAcc' not found)
-
addSymbolsFromChilds
public void addSymbolsFromChilds()Use symbols for chids in DAG For every GOTerm, each child's symbols are added to the term so that root term contains every symbol and every interestingSymbol -
allSymbols
Create a set with all the symbols -
checkInterestingSymbolIds
Checks that every symboolID is in the set (as 'interesting' symbols)- Parameters:
interestingSymbolIds- : A set of interesting symbols Throws an exception on error
-
disjointSet
Produce a GOTerm based on a list of GOTerms and a 'mask'- Parameters:
goTermList- : A list of GOTermsactiveSets- : An integer (binary mask) that specifies weather a set in the list should be taken into account or not. The operation performed is: Intersection{ GOTerms where mask_bit == 1 } - Union{ GOTerms where mask_bit == 0 } ) where the minus sign '-' is actually a 'set minus' operation. This operation is done for both sets in GOTerm (i.e. symbolIds and interestingSymbolIds)- Returns:
- A GOTerm
-
getGoTerm
-
getGoTermsByGoTermAcc
-
getGoTermsBySymbolId
-
getGoTermsBySymbolId
-
getInterestingSymbolIdsSet
-
getInterestingSymbolIdsSize
public int getInterestingSymbolIdsSize() -
getLabel
-
getMaxRank
public int getMaxRank() -
getNameSpace
-
getRank
Get symbol's rank- Parameters:
symbolId-- Returns:
-
getRankSymbolId
-
iterator
Iterate through each GOterm in this GOTerms -
keySet
-
levels
public int levels()Calculate each node's level (in DAG)- Returns:
- maximum level
-
listTopTerms
Select a number of GOTerms- Parameters:
numberToSelect-- Returns:
-
numberOfInterestingSymbols
public int numberOfInterestingSymbols()Calculate how many interesting symbol-IDs in are there in all these GOTerms- Returns:
- Number of interesting symbols
-
numberOfNodes
public int numberOfNodes()Number of nodes in this DAG- Returns:
-
numberOfNodesWithOneInterestingSymbol
public int numberOfNodesWithOneInterestingSymbol()Calculate the number of nodes in that have at least one interesting symbol- Returns:
-
numberOfNodesWithOneSymbol
public int numberOfNodesWithOneSymbol()Calculate the number of nodes in that have at least one annotated symbol- Returns:
-
numberOfSymbols
public int numberOfSymbols()Calculate how many symbol-IDs in are there in all these GOTerms- Returns:
- Number of interesting symbols
-
readGeneAssocFile
Reads a file containing every gene (names and ids) associated GO terms- Parameters:
goGenesFile- : A file containing gene associations to GO terms
-
readInterestingSymbolIdsFile
Reads a file with a list of 'interesting' genes (one per line)- Parameters:
fileName- : Can be "-" for no-file
-
readOboFile
Read an OBO file- Parameters:
oboFile-nameSpace-
-
removeGOTerm
Remove a GOTerm -
resetInterestingSymbolIds
public void resetInterestingSymbolIds()Reset every 'interesting' symbolId (on every single GOTerm in this GOTerms) -
rootNodes
-
saveGseaGeneSets
Save gene sets file for GSEA analysis Format specification: http://www.broad.mit.edu/cancer/software/gsea/wiki/index.php/Data_formats#GMT:_Gene_Matrix_Transposed_file_format_.28.2A.gmt.29- Parameters:
fileName-
-
setLabel
-
toString
-
values
-