Package jebl.evolution.distances
Class HKYDistanceMatrix
- java.lang.Object
-
- jebl.evolution.distances.BasicDistanceMatrix
-
- jebl.evolution.distances.HKYDistanceMatrix
-
- All Implemented Interfaces:
DistanceMatrix
public class HKYDistanceMatrix extends BasicDistanceMatrix
Compute HKY corrected distance matrix- Version:
- $Id: HKYDistanceMatrix.java 1036 2009-11-17 03:45:48Z matt_kearse $ Adapted from BEAST source code. The code in this file appeared originally in the file F84DistanceMatrix, and the comment said (as above) HKY. Initially I thought it was one model under two different names, but that was my ignorance. While similar, there is an small difference which I was able to understance only by examining the transition probability and rate matrices for both models. Simply put, HKY assumes *the same* ratio of transitions/transversions for all bases, while F84 has a different ratio for Purines and Pyrimidines. The confusion stems from the fact that those two different ratios depend on just one parameter. If Kappa is the ratio for HKY then for F84 Kappa(Purine aka A,G) = 1 + Kappa/(pi(A)+pi(G)) and Kappa(Pyrimidine aka C,T) = 1 + Kappr/(pi(C)+pi(T)). This difference simplifies the estimation of evolutionary distance and Kappa from F84, since some entries in the HKY transition matrix contain expressions involving exp(-t alpha), where alpha depend on both Kappa and wheather the element is a Purine or a Pyrimidine. pi(A,C,G,T) = stationary frequencies F84 HKY Tamura-Nei K(AG) = 1 + Kappa/(pi(A)+pi(G)) Kappa a1 K(CT) = 1 + Kappa/(pi(C)+pi(T)) Kappa a2 b = 1 1 beta Rate Matrix | - b K(AG) b | | pi(A) 0 0 0 | Q = | b - b K(CT) | | 0 pi(C) 0 0 | | K(AG) 1 - b | | 0 0 pi(G) 0 | | b K(CT) b - | | 0 0 0 pi(T) | Transition Probability Matrix PI(A) = PI(G) = pi(A) + pi(G) PI(C) = PI(T) = pi(C) + pi(T) Alpha(j) = 1 + Kappa F84 1 + PI(j)*(k-1) HKY P(transversion) = pi,(i+1)%4 = pi,(i+3)%4 = pi(i) * (1 - exp(-t)) P(transition) = pi,(i+2)%4 = pi(i) + pi(j)(1/PI(j) - 1) * exp(-t) - pi(i)/PI(i) * exp(-t * Alpha(i)) Pi,i = pi(i) + pi(i)(1/PI(i) - 1) * exp(-t) - (pi(i)/PI(i) - 1) * exp(-t * Alpha(i)) From the above it is easy to see that for F84 Sum(all transversions) = 2 Sum(p(i)) (1 - exp(-t)) = 2(1-exp(-t)) which gives the best estimate of t as -log(1 - Sum(all transversions)/2). When there are no transversions (say when sequences are very short or distance is very small) this estimate is 0 even when sequences are not identical. I am not sure if there is an easy way to fix this since in this case Kappa is confused with t and only t*(Kappa+1) can be estimated. The code in this file estimates the "evolutionary distance", which is (2*Kappa*(pi(A)*pi(G) + pi(T)*pi(C)) + 2*(pi(A) + PI(C))*beta) * t. (*) This raises a question since the stationary frequencies are estimated from all the sequences, but transition/transversion rates are done for each pair individually. The result is that distances are not neccesarily scaled correctly for the matrix as a whole. I am still trying to figure this out. (*) The original code had a bug which counted only A<-->G substitutions while C<-->T where ignored.
- Author:
- Andrew Rambaut, Joseph Heled
-
-
Constructor Summary
Constructors Constructor Description HKYDistanceMatrix(Alignment alignment, ProgressListener progress)
HKYDistanceMatrix(Alignment alignment, ProgressListener progress, boolean useTwiceMaximumDistanceWhenPairwiseDistanceNotCalculatable)
-
Method Summary
-
Methods inherited from class jebl.evolution.distances.BasicDistanceMatrix
getDistance, getDistance, getDistances, getSize, getSubmatrix, getTaxa
-
-
-
-
Constructor Detail
-
HKYDistanceMatrix
public HKYDistanceMatrix(Alignment alignment, ProgressListener progress, boolean useTwiceMaximumDistanceWhenPairwiseDistanceNotCalculatable) throws CannotBuildDistanceMatrixException
-
HKYDistanceMatrix
public HKYDistanceMatrix(Alignment alignment, ProgressListener progress) throws CannotBuildDistanceMatrixException
-
-