Protein Structure Geometry, Topology and Structure W. Taylor, et al (2001) WW

[ Pobierz całość w formacie PDF ]
.Many of the most recent developments behave in this way andthese are best described as hybrid methods but can generally be decomposed intotheir components using the ideas described above.3.4 Dynamic Programming3.4.1 The basic evolutionary modelThe ability of proteins to lose or gain sequence elements over evolutionary time(relative insertions or deletions: jointly referred to as indels) has led many meth-ods of structure comparison to follow the simple model of evolutionary changewhich is used in sequence alignment methods.This assumes that the only pro-cesses at work are substitution of amino acids (or rather the underlying nu-cleotides) and their deletion or insertion.More complex operations such as re-27ab� �A B A BFigure 8: Topological and structural equivalance.Two �-strands,AandB, are shown schematically as triangles packing against an �-helix(circle) in twodistinct structural fragments, a (��) and b (��).The packing in the twofragments could be identical but a comparison method that takes account of thetopology (or connectivity) of the units would not detect any great similarity.28versals, translocation and duplication events are forbidden.This model furtherassumes that these processes are uniformly applied along the sequence length andare the same for all proteins.In addition, most alignment methods implicitly as-sume that the substitutions4 in one place do not affect substitutions elsewhere.From our knowledge of protein structure this latter assumption is clearly un-true (one part of a structure can influence any other part) but, despite this, thesequence alignment model provides a good starting point.This model of sequence evolution is implemented in a simple algorithm calleddynamic programming.As this algorithm will be referred to frequently below, itwill be described here initially in the context of sequence comparison followedby an outline application to structural data.3.4.2 Sequence AlignmentThe alignment of one sequence with another can be represented by constructinga grid (or matrix) with a sequence on each axis.Each cell in this matrix links apair of elements (residues or nucleotides) in the two sequences and an alignmentof the two sequences is a path through the matrix that progresses without anybackwards or stationary steps in either sequence.The problem to be solved is tofind the path through the matrix (top or left edge to the bottom or right edge)that passes through the highest scoring cells finding a maximum sum of scores.(See Figure 9 for a worked example).This algorithm is guaranteed to find theoptimal alignment under a given scoring scheme, providing pairwise matches areindependent: that is; the score for each match is unaffected by matches elsewhere.(See Pearson and Miller (1992) for a review).The dynamic programming algorithm forms the basis of many widely usedsequence alignment algorithms which can align one whole sequence against an-other, giving an overall or global alignment (Needleman and Wunsch, 1970), orfind the section that aligns best, giving a local alignment (Smith and Waterman,1981).In general, any information that can be encoded as a sequence (providingthe elements can be matched independently) can be aligned using the basic dy-namic programming algorithm.For proteins, this can be either pure sequences orcan be structural data (encoded as a string) allowing structures to be comparedagainst each other and with an amino acid sequence.3.4.3 Gap-penaltyA penalty against gaps in a sequence alignment can easily be incorporated into thestandard dynamic programming algorithm simply by subtracting a constant valuefrom each score inherited by any transition other than the diagonally adjacentcell.This can impose a fixed penalty for any size of gap but can be refined tobe partly (or wholly) dependent on the gap size.(See Figure 9 for examples.)4Substitutions are realised as mismatches in an alignment of two sequences [ Pobierz całość w formacie PDF ]

Linki