Computational Biology

Computational Biology: Genomics and Proteomics

Towards the end of the first period, Dr. Marco Falda explored possible applications of Temporal Reasoning to Computational Genomics, and namely to the reconstruction of gene regulatory networks from time series of expression profiles. The aim was the identification of cause-effect relationships among genes measured, for instance, using Microarrays. Temporal Reasoning could be applied to the symbolic representations of the series: the “qualitative” features of the time series were identified and encoded as sequences of symbols to reason about. This research has lead to the development of specialized algorithmsable to identify precedence relations between the symbols and recognize the direction of regulation between the genes. In particular, these algorithms are based on the search of shared sub-strings and sub-sequences, and on a variation of the Dynamic Time Warping. The research in Computational Genomics has been carried out in collaboration with the Bioengineering group coordinated by Professor G. M. Toffolo and Professor C. Cobelli. For this research group Dr. Marco Falda also reengineered in C a gene regulation network simulator that was developed in R (obtaining 35-fold speed-up, according to the last tests); he also designed and implemented an R package to provide a graphical interface to the previous tool, that greatly increased its usability, enabling an easier setup of networks creation, simulated knock-downs, external stimuli and local or global noise. A second result in the transition phase was the design of a Fuzzy Mutual Information measure that exploits the definition of conditional probability in terms of fuzzy membership functions. This new measure introduced in a traditional algorithm of Reverse Engineering, the REVEAL algorithm, increased its performance and achieved results comparable with the state of the art. For this work Marco Falda has received an award for the best paper and the best presentation at the International Conference on Fuzzy Calculus held in October 2009.

In the present research Dr. Falda is studying Mass Spectrometry-based Proteomics and functional annotations tools. As far as Proteomics is concerned, he has developed an automated and distributed workflow written in SCons (Python) and OpenMS to perform and support computational analyses for feature-based quantification; he is also exploring 3D visualization prototypes (Python and C++) that exploit OpenGL to ease the overall quality control of a Proteomics analysis. As an ultimate objective of this research stream, the aim is to develop an integrated solution for the analysis of Proteomics data, from the MS and MS/MS spectra processing, managed in a cloud environment, to the final immersive representation of the features used in quantification and/or a user friendly interactive web interface.

A more mature topic is about functional annotation of unknown sequences. In this case an already established tool, Argot, written in Java, has been slightly modified and included in a distributed computing environment based on Sun Grid Engine and supported by Perl scripts; a web interface (implemented in PHP and jQuery) has also been provided; the intrinsic enhancements together with the integration of multiple search engines lead to a second major version of the Argot tool, Argot², that was ranked the second best performing method among more than 50 participants at the “Critical Assessment of Functional Annotation” 2010-2011 (CAFA) international contest. A first optimized version of the Perl scripts layer, recoded in C++, has been already developed and it is now under regression testing.