Programming Research Group
Research Report RR-04-24
Information categorisation in biological sequence alignments
Sumedha Gunewardena and Peter Jeavons
November 2004, 57pp.
Abstract
This is a two-part report. In the first part we introduce the reader to biological sequence alignment. We discus dynamic
programming as is used in sequence alignment, first in the case of two sequences and later, how it is adopted for multiple
sequence alignment. Several references are given to the different sequence alignment strategies reported in the literature
used to enhance the standard dynamic programming algorithm for sequence alignment to suit biological sequences. A short
discussion on how alignments are scored is given. Finally, some of the existing sequence alignment tools are described.
The second part of this report presents a critical analysis of information as it relates to biological sequence alignment.
Information relating to the sequences being aligned form the basis on which any alignment is built. In its basic form this
information might quantify how individual residues are scored when aligned with each other or how gaps are scored when introduced
between two residues. Every biological sequence has if not explicit, at least some form of implicit information relating to its
residues that form distinguishing markers along the sequence. There are many ways of extracting this information such as from
databases of the relevant sequences, from the literature, prior processing etc. It is reasonable to assume that the more sequence
information we use in an alignment, the more confidant we can be of the resulting alignment, and hence make better hypothesis of
the unknown sequences. The aim of this part of the report is to build a framework on how to represent this information in such a way
as to facilitate the dynamic and flexible incorporation of it to facilitate sequence alignments.
This paper is available as a 1,388,975 bytes ps file.
|