Evolutionary Informatics And Its Challenges Source: Gunnar De Winter
In recent years, huge amounts of genetic and genomic data has been gathered and a complex computational infrastructure to handle it has been developed. Yet, evolutionary biologists are also interested in data which is not yet integrated with the digital world as thoroughly as the genomic data (such as, for example, character evolution, evolution of development, …).
Evolutionary informatics aims to capture, store and integrate all these kinds of data. The interdisciplinary efforts required to make this work are subjected to some interlinked challenges, which are identified and reviewed in an article in Trends in Ecology and Evolution.
As ‘grand goal’ for evolutionary informatics, the authors of the review article mention:
        Link together evolutionary data across the great Tree of Life by developing analytical tools and proper documentation and then use this framework to conduct comparative analyses, studies of evolutionary process and biodiversity analyses.
Next, they go on to identify five main challenges for this field of science.
        Challenge 1: enhance integration and discovery across the Tree of Life and its subtrees by depositing machine-readable phylogenies into repositories, linking together clades, taxonomic names and unique identifiers, and developing new ways to handle the growing number of lineages that lack formal taxonomic names.
The next steps to be taken in facing this challenge, according to the researchers, are the deposition of machine-readable trees, which include details about the underlying data, into repositories, and the development of automated approaches to place trees into larger reference trees.
        Challenge 2: leverage our existing knowledge about the biodiversity of the world by digitizing, semantically enhancing and mobilizing legacy biodiversity sources.
Here, the steps to be taken are the development of text mining algorithms that will allow a thorough mass digitization of biodiversity literature, and large-scale approaches to mass-digitize various collections of specimens.
        Challenge 3: gather and deliver digital data relevant to all of evolutionary biology by building sustainable digital community repositories that provide access to rich data and metadata.
To deal with this challenge, the authors propose to develop data standards to assure common formats, and to attach professional rewards to data deposition or curation.
        Challenge 4: building a semantic web for evolutionary biology by providing the means to discover, mine and integrate data sources across domains.
Here also, the development and use of appropriate data formats is an important next step, as is the assignment of unique identifiers that allow handling increasingly large datasets.
        Challenge 5: building a community of cooperation within evolutionary biology and in society by sharing best practices, using standards, taking new modes of professional credit seriously, and engaging citizens in its science.
To address this issue, it is proposed that biologists, journals and funding agencies should encourage sharing data by using some sort of reward system. Also, collaborative, open workspaces should be developed, and cooperation with a broader community, through, for example, social media and citizen science projects, should not be shunned.
For an illustration of the five challenges and how they intertwine, see figure 1.
   
Figure 5: The five challenges for evolutionary informatics, and their relationships to each other.
(Source: Parr et al., 2011)
   
Overall, it is concluded that:
        Evolutionary informatics brings together evolutionary biologists, computer scientists, library and information scientists, and their hybrids. Statistical approaches, algorithms and software development have been part of the evolutionary biology landscape since the advent of personal computers and have sparked massive growth in evolutionary knowledge products. Most of these products, unfortunately, are difficult to discover and re-use, because although the data are often born and analyzed digitally, they are published in the least amenable format for further integration (e.g. PDF-based journal articles). Informatics approaches can overcome this impediment by developing the social and technological processes to better describe and document data and link resources together.
   
Reference
Parr, C.S.; Guralnick, R.; Cellinese, N. and Page, R.D.M. (2011). Evolutionary informatics: unifying knowledge about the diversity of life. Trends in Ecology and Evolution.
| }
|