Life Sciences Biocomputing and Bio-informatics e-mail: Jules Ruis |
Biocomputing and bioinformatics 28 November 1999, Prof. Peter Hilbers We consider biocomputing different from bioinformatics,
although both terms are interchangeably used. With biocomputing we mean
the construction and use of computers which function like living organisms
or contain biological components. In order to reason about and use such
`machines', special algorithms have to be designed and new complexity
theories have to be developed. Since some of these algorithms are also
being used in the bioinformatics context, the confusion of these two
areas might be explained. Although biocomputing is an interesting reseach
topic on its own, bioinformatics is considered to have more scientific
and economic impact. In this note we discuss several aspects of bioinformatics.
First we give an introduction into the area. Then we investigate the
several research topics in bioinformatics where a program could be defined
for the mathematics and computing science department and we conclude
with a possible bioinformatics profile for the TU Eindhoven. 1 Introduction Proteins are the fundamental building blocks of
life. Cell structure is either made up of proteins, or is being produced
by enzymes, which are proteins. Proteins are variable length linear
mixed polymers of in total 20 different amino acids. These linear polymers
fold upon themselves to generate a shape characteristic of each different
protein, and this shape along with the difFerent chemical properties
of the 20 amino acids determine the function of the protein. Since the
sequence of a protein can be determined from the DNA sequence wich encodes
it, most protein sequences are in fact inferred from DNA sequences. Consequently, topics in molecular biology research
are finding the DNA sequences of various organisms,
understanding the functions of the proteins which
are encoded in the sequence, unraveling the structure of the proteins, and,
understanding when and how these proteins become
active. Improvements in these areas lead to a better understanding
of organisms, their metabolism and their evolution. Health care and
drug design, new (bio)materials and their engineering, food (engineering)
and food production, are obvious examples that may directly profit from
this improved knowledge. Several research areas in computing science and
mathematics play an important role in these areas, some of them are
dealt with in the next section. 2 Research topics in computing science As described in the previous section much attention
in molecular biology research is devoted to analysing sequences. Large
databases of DNA sequences have been collected: in the USA GenBank,
in Europe EMBL and in Japan DDBJ. These database are very huge: The
latest release of GenBank exceeded one billion base pairs. Not only
the size of the sequence data is rapidly increasing, but also the number
of characterized genes from many organisms and protein structures doubles
about every two years. The earliest tasks in bioinformatics were therefore
the creation and maintenance of such databases of biological information.
DNA sequences (and the protein sequences derived from them) comprise
the majority of such databases. While the storage and organization of
millions of nucleotides is far from trivial, designing a database and
developing an interface whereby researchers can both access existing
information and submit new entries is a challenging task. New database
and datamining techniques are to be developed to handle this. In order to efficiently compare a sequence with
a vast number of other sequences, algorithms have been and are to be
developed. Most algorithms are based upon a similarity measure of two
or more sequences. This measure is used in determining the alignments
of the sequences, i.e., the arrangements of the sequences showing the
places where they are similar and where they differ. The problem of
finding the optimal alignment is a problem area in which techniques
from dynamic programming, combinatorial optimization, heuristic search
methods, neural network theory, and statistics are applied. Next to analysing sequences much bioengineering
research is devoted to developing methods to predict the structure and/or
function of (newly discovered) proteins. As is noted above, the structure
of a protein is produced by the folding of the polymer chain back onto
itself, and the association of multiple chains. Current research on
protein folding and structure prediction uses two basic approaches:
homology based and ab initio. In ab initio approaches the structure of a protein
is tried to determine which minimizes free energy. Large scale computing
techniques from the molecular modelling scene, such as the molecular
dynamics and Monte Carlo techniques and genetic algoritms, are successfully
applied in this area. Homology-based approaches attempt to determine
the structure of a protein by comparing its sequence to that of related
proteins whose structure is known. Clustering protein sequences into
families of related sequences and the development of protein models
are here important topics. Datamining techniques, statistics and genetic
algorithms are applied for generating phylogenetic trees to examine
evolutionary relationships. Moreover, algorithms are developed to study the
evolutionary process of DNA sequences. From the DNA sequence a genetic
algorithm is derived by which a protein model can be simulated. The
results of these simulations are then compared to experimental results,
and when necessary the protein model be improved. In this scheme the
principle of evolution is used to model structures and to simulate biomolecular
reactions. Heuristic methods, and energy minirnzation techniques are
here applied as computational methods. 3 Biodatamining and protein simulations Before indicating what the bioinformatics profile
for the TU Eindhoven could be, we emphasize that although within the
mathematics and computing science department knowledge and expertise
is available on several of the above-mentioned research areas (heuristic
methods, neural networks, computer simulations, statistics, combinatorial
optimization), there is hardly any experience with biosystems. In order
to be successful in the bioinformatics area, expertise from the biosciences
in a cooperative relationship is therefore needed. If we consider the several research areas highlighted
then the most prominent gap is on (bio)datamining techniques in combination
with computer simulation expertise on biosystems. A research group within
computing science focussed on protein (simulation) algorithms and datamining
techniques applicable to biosystems is expected to collaborate intensively
with research groups in the biomedical engineering, the chemistry and
the mechanical engineering departments on the engineering of new biomaterials
with the appropriate mechanical and physicochemical properties. |