Package jebl.evolution.sequences
Class Utils
- java.lang.Object
-
- jebl.evolution.sequences.Utils
-
public class Utils extends java.lang.Object
- Version:
- $Id: Utils.java 918 2008-06-04 01:28:08Z twobeers $
- Author:
- Andrew Rambaut, Alexei Drummond
-
-
Method Summary
All Methods Static Methods Concrete Methods Modifier and Type Method Description static State[]
cleanSequence(java.lang.CharSequence seq, SequenceType type)
Produce a clean sequence filtered of spaces and digits.static NucleotideState[]
complement(NucleotideState[] sequence)
static int
getGaplessLocation(Sequence sequence, int gappedLocation)
Gets the site location index for this sequence excluding any gaps.static int
getGappedLocation(Sequence sequence, int gaplessLocation)
Gets the site location index for this sequence that corresponds to a location given excluding all gaps.static byte[]
getStateIndices(State[] sequence)
static int
getStopCodonCount(Sequence sequence)
Counts the number of stop codons in an amino acid sequencestatic SequenceType
guessSequenceType(java.lang.CharSequence seq)
Guess type of sequence from contents.static boolean
isPredominantlyRNA(java.lang.CharSequence sequenceString, int maximumNonGapsToLookAt)
Is the given NucleotideSequence predominantly RNA? (i.e the more occurrences of "U" than "T")static State[]
reverse(State[] sequence)
static java.lang.String
reverseComplement(java.lang.String nucleotideSequence)
static NucleotideState[]
reverseComplement(NucleotideState[] sequence)
static java.lang.String
reverseComplementWithGaps(java.lang.String nucleotideSequence)
static State[]
stripGaps(State[] sequence)
static java.lang.String
toString(State[] states)
static java.lang.String
translate(java.lang.String nucleotideSequence, GeneticCode geneticCode)
A wrapper fortranslateCharSequence(CharSequence,GeneticCode)
that takes a nucleotide sequence as a String only rather than a CharSequence.static Sequence
translate(Sequence sequence, GeneticCode geneticCode)
static Sequence
translate(Sequence sequence, GeneticCode geneticCode, int readingFrame)
static AminoAcidState[]
translate(State[] states, GeneticCode geneticCode)
Translates each of a given sequence ofNucleotideState
s orCodonState
s to theAminoAcidState
corresponding to it under the given genetic code.static AminoAcidState[]
translate(State[] states, GeneticCode geneticCode, int readingFrame)
Translates each of a given sequence ofNucleotideState
s orCodonState
s to theAminoAcidState
corresponding to it under the given genetic code.static java.lang.String
translateCharSequence(java.lang.CharSequence nucleotideSequence, GeneticCode geneticCode)
Translates the given nucleotideSequence into an amino acid sequence string, using the given geneticCode.
-
-
-
Method Detail
-
translate
public static Sequence translate(Sequence sequence, GeneticCode geneticCode)
Translates a givenSequence
to a correspondingSequence
under the given genetic code. Simply a utility function that calls AminoAcidState[] translate(final State[] states, GeneticCode geneticCode)- Parameters:
sequence
- the Sequence.geneticCode
-- Returns:
-
translate
public static Sequence translate(Sequence sequence, GeneticCode geneticCode, int readingFrame)
Translates a givenSequence
to a correspondingSequence
under the given genetic code. Simply a utility function that calls AminoAcidState[] translate(final State[] states, GeneticCode geneticCode)- Parameters:
sequence
- the Sequence.geneticCode
-readingFrame
-- Returns:
-
translate
public static AminoAcidState[] translate(State[] states, GeneticCode geneticCode)
Translates each of a given sequence ofNucleotideState
s orCodonState
s to theAminoAcidState
corresponding to it under the given genetic code. Translation doesn't stop at stop codons; these are translated toAminoAcids.STOP_STATE
. If translating fromNucleotideState
and the number of states is not a multiple of 3, then the excess states at the end are silently dropped.- Parameters:
states
- States to translate; must all be of the same type, either NucleotideState or CodonState.geneticCode
-- Returns:
-
translate
public static AminoAcidState[] translate(State[] states, GeneticCode geneticCode, int readingFrame)
Translates each of a given sequence ofNucleotideState
s orCodonState
s to theAminoAcidState
corresponding to it under the given genetic code. Translation doesn't stop at stop codons; these are translated toAminoAcids.STOP_STATE
. If translating fromNucleotideState
and the number of states is not a multiple of 3, then the excess states at the end are silently dropped.- Parameters:
states
- States to translate; must all be of the same type, either NucleotideState or CodonState.geneticCode
-readingFrame
-- Returns:
-
isPredominantlyRNA
public static boolean isPredominantlyRNA(java.lang.CharSequence sequenceString, int maximumNonGapsToLookAt)
Is the given NucleotideSequence predominantly RNA? (i.e the more occurrences of "U" than "T")- Parameters:
sequenceString
- the sequence string to inspect to determine if it's RNAmaximumNonGapsToLookAt
- for performance reasons, only look at a maximum of this many non-gap residues in deciding if the sequence is predominantly RNA. Can be -1 or Integer.MAX_VALUE to look at the entire sequence.- Returns:
- true if the given NucleotideSequence predominantly RNA
-
reverseComplement
public static java.lang.String reverseComplement(java.lang.String nucleotideSequence)
-
reverseComplementWithGaps
public static java.lang.String reverseComplementWithGaps(java.lang.String nucleotideSequence)
-
translateCharSequence
public static java.lang.String translateCharSequence(java.lang.CharSequence nucleotideSequence, GeneticCode geneticCode)
Translates the given nucleotideSequence into an amino acid sequence string, using the given geneticCode. The translation is done triplet by triplet, starting with the triplet that is at index 0..2 in nucleotideSequence, then the one at index 3..5 etc. until there are less than 3 nucleotides left. This method usestranslate(State[],GeneticCode)
to do the translation, hence it shares some properties with that method: 1.) Any excess nucleotides at the end will be silently discarded, 2.) Translation doesn't stop at stop codons; instead, they are translated to "*", which isAminoAcids.STOP_STATE
's code.- Parameters:
nucleotideSequence
- nucleotide sequence to translategeneticCode
- genetic code to use for the translation- Returns:
- A string with length nucleotideSequence.length() / 3 (rounded
down), the translation of
nucleotideSequence
with the given genetic code
-
translate
public static java.lang.String translate(java.lang.String nucleotideSequence, GeneticCode geneticCode)
A wrapper fortranslateCharSequence(CharSequence,GeneticCode)
that takes a nucleotide sequence as a String only rather than a CharSequence. This is to preserve backwards compatibility with existing compiled code.- Parameters:
nucleotideSequence
- nucleotide sequence string to translategeneticCode
- genetic code to use for the translation- Returns:
- A string with length nucleotideSequence.length() / 3 (rounded
down), the translation of
nucleotideSequence
with the given genetic code
-
complement
public static NucleotideState[] complement(NucleotideState[] sequence)
-
reverseComplement
public static NucleotideState[] reverseComplement(NucleotideState[] sequence)
-
getStateIndices
public static byte[] getStateIndices(State[] sequence)
-
getGaplessLocation
public static int getGaplessLocation(Sequence sequence, int gappedLocation)
Gets the site location index for this sequence excluding any gaps. The location is indexed from 0.- Parameters:
sequence
- the sequencegappedLocation
- the location including gaps- Returns:
- the location without gaps.
-
getGappedLocation
public static int getGappedLocation(Sequence sequence, int gaplessLocation)
Gets the site location index for this sequence that corresponds to a location given excluding all gaps. The first non-gapped site in the sequence has a gaplessLocation of 0.- Parameters:
sequence
- the sequencegaplessLocation
-- Returns:
- the site location including gaps
-
guessSequenceType
public static SequenceType guessSequenceType(java.lang.CharSequence seq)
Guess type of sequence from contents.- Parameters:
seq
- the sequence- Returns:
- SequenceType.NUCLEOTIDE or SequenceType.AMINO_ACID, if sequence is believed to be of that type. If the sequence contains characters that are valid for neither of these two sequence types, then this method returns null.
-
getStopCodonCount
public static int getStopCodonCount(Sequence sequence)
Counts the number of stop codons in an amino acid sequence- Parameters:
sequence
- the sequence string to count stop codons- Returns:
- the number of stop codons
-
cleanSequence
public static State[] cleanSequence(java.lang.CharSequence seq, SequenceType type)
Produce a clean sequence filtered of spaces and digits.- Parameters:
seq
- the sequencetype
- the sequence type- Returns:
- An array of valid states of SequenceType (may be shorter than the original sequence)
-
toString
public static java.lang.String toString(State[] states)
-
-