What is bioinformatics?
Grzegorz Papaj, 13 May 2020
Bioinformatics is a discipline of science combining biology and computer science. However, as an interdisciplinary field of science, bioinformatics descends from chemistry and physics. This article discusses the most important fields of bioinformatics research.
Historically, the bioinformatics started to emerge from biological sciences, in particular from biochemistry and molecular biology in the mid-twentieth century, when the two-time Nobel Prize winner Frederic Sanger determined the amino acid sequence of insulin.
Along with the sequencing of subsequent proteins, it became clear that the generated data are too large to be conveniently stored and analyzed without specialized tools. Margaret Oakley Dayhoff, the creator of the first set of all known protein sequences, became a pioneer of biological data computerization. Her Atlas of Protein Sequence and Structure from 1995, containing 65 sequences, was published as a book. After many updates, it became the basis of the published in 1984 first online database of protein sequences: Protein Information Resource (PIR).
Dayhoff’s work involved not only the collection of protein sequences but also the creation of the first algorithms and programs comparing protein sequences, laying the foundations for comparative and phylogenetic analysis.
At the same time, analogous sets of nucleotide sequences were created. The first published (1982) online DNA sequence database was GenBank, shortly before PIR. The launch of GenBank can be considered as a symbolic date for the beginnings of modern bioinformatics.
Bioinformatics flourished at the turn of the 20th and 21st centuries when computer prices dropped significantly and their computing power increased. The popularization of the Internet as a tool enabling access to remote databases and various types of scientific computing websites was also an important factor. The concurrent development of distributed computing technologies and computing clusters with previously unimaginable power and developing in many scientific centers enabled to routinely solve problems which was previously out of researchers' reach. Thanks to access to high computing power, the results of calculations could be obtained in hours or minutes, not within days or weeks as before.
Thanks to the progressive decrease in the price of IT infrastructure, equipping a modern bioinformatics laboratory costs a fraction of the price needed to provide basic equipment for a biological or chemical laboratory. Huge biological data resources are available for free on the Internet, not only in the form of "raw" data but also in the form of tools.
Bioinformatics is no longer an academic subject cultivated somewhere on the margins of mathematics and biology faculties, but an independent and full-fledged scientific discipline. The results of its research not only affect science but have a huge impact on modern pharmacology and medicine. It is not without reason that many experts expect the future of biomedical sciences to be in bioinformatics, and in virtually every scientific or research center associated with molecular biology has been created a laboratory or at least a group specializing in some area of bioinformatics.
The main fields of bioinformatics research
The flourishing of the bioinformatics fields related to the collection and processing of the protein and nucleotide sequences have led to the development of other research directions of this discipline in a short time.
In short, the following fields of bioinformatics research can currently be distinguished:
- sequence analysis,
- analysis of expression,
- structural bioinformatics,
- analysis of network and systems biology,
- analysis of cellular organization.
The sequence analysis is the oldest area of research of the modern bioinformatics. It covers:
- protein amino acid sequences,
- nucleotide sequences (DNA and RNA).
This analysis includes:
- alignment research and sequence comparison studies,
- development of methods related to the assembly of sequences and genomes,
- genes and genome annotation,
- phylogenetic and evolutionary analyses of individual genes and entire genomes,
- analysis of point mutations (especially in the human genome) and genetic diseases related to them.
Analysis of expression
Studies on expression and co-expression at both mRNA and protein levels allow for the analysis of dependence and links between the products and activities of different genes.
In this field, the data from high-throughput research methods like microarrays, SAGE (Serial Analysis of Gene Expression) and HTMS (high throughput mass spectrometry) are used.
The subject of structural bioinformatics research is macromolecule structures (proteins, RNA, macromolecule complexes).
The tested structure can be both determined experimentally (crystallographic methods, NMR, atomic force microscopes) and predicted using computational methods (structure modeling).
Structural bioinformatics research results are widely used in interaction analysis, particularly useful to pharmacology. Techniques such as virtual screening and molecular docking are now the first step in the process of design of new drugs.
Analysis of network and systems biology
The interrelationships between different proteins, mutual regulation networks, and feedbacks, and the broadly understood modeling of the interaction between macromolecules are the research subject of bioinformatics systems.
In its area, various metabolic and signaling pathways, interactions, connections, and interactions are modeled, mapped, and visualized.
Analysis of the cellular organization
Bioinformatics methods are also used in research on the cellular organization. Primarily they focus on the development of algorithms and tools related to the processing of results from various types of microscopic imaging techniques.
The results of these studies lead to such information like the location of individual RNA nad proteins molecules in specific celluar compartments and the identification of interactions within chromosomes.
The fields mentioned above are not a complete list of bioinformatics topics. It is still a very young and dynamic discipline, constantly expanding to new research areas. Bioinformatics not only uses typically computational methods, but also stimulate the development of new high-throughput experimental techniques that generate terabytes of data.
A separate area of bioinformatics work is the development of new algorithms and tools (software, databases, workflow systems) which facilitate and automate work with bioinformatics data.
Bioinformatics is currently an independent and rapidly developing scientific discipline used not only in basic research, but also in diagnostics, medicine (including personalized medicine), pharmacology, and pharmacotherapy.
It is a huge field for both scientists and researchers as well as algorithmics engineers and programmers. Everything indicates that in the future we can expect further biomedical breakthroughs that will be based on achievements in bioinformatics.