Statistical Mechanics for Big Data: acquisition, analysis and modeling
Dates: from Jan. 1, 2014 to June 30, 2017
Funder: MINECO (Spain)
Project id: FIS2013-47532-C3
Total Funding: 90000€
Node Funding: 36000€
Visit the project web page
Technological advances during the last fifteen years have boosted our capacity to generate and store data. Indeed, according to some estimates 90% of the world’s stored data has been generated in the last two years, and the availability of such large quantities of data is changing the way we face crisis response, social mobilization, marketing, and intelligence. In science, the hopes created by big data are also high, and analyses of large datasets are behind recent breakthroughs in areas such as astrophysics, genomics or particle physics.
Big data is a particularly relevant opportunity for the study of complex systems such as cells, societies, ecosystems or economies, the study of which has traditionally been constrained by the limited information available from the different components (and layers of components) comprising the systems. The availability of unprecedented highly detailed data on these systems opens the door to significantly advance our understanding of their behavior, and of how this behavior evolves in time.
Our project is based on the premise that statistical mechanics tools and methods, when applied to large-scale data acquisition, analytics, and modeling, have the potential to successfully address the problem of transforming data into knowledge. The overarching goal of this project is, precisely, to develop and apply a comprehensive set of statistical mechanics methods for large-scale data analysis. To achieve this goal, we propose three main objectives: (M) To develop statistical mechanics tools for the analysis of large-scale data; (D) to develop crowd-sourced data acquisition and processing protocols; (A) To analyze and model, using the methodologies and/or data from objectives M and A, complex systems in three different areas: biochemical systems, techno-social systems, and economic systems.
The methods we propose to develop are aimed at network and non-network data that are, in general, heterogeneous, multidimensional and multilevel, and time-resolved. The development of such methods will allow the construction of predictive models from large quantities of heterogeneous data, and provide guidance and innovative recommendations to practitioners and stakeholders (from academia, industry, and government). Our project relies not only on the interdisciplinary nature of the methodologies of the participants (statistical mechanics, computer science, mathematics, statistics) but also on the direct contact with experts in the fields of economy, finance, biology or chemistry. In this sense it is important to stress the quality of the members of the teams, their experience in different fields and the collaborations they maintain with internationally renowned experts, companies and institutions in many different areas.
Finally, our project deals with problems of large impact in technology and society. Given its comprehensive nature and the collaboration with companies, we expect our project to produce a large number of results with direct impact in our society and economy. Not only on the big data business, but also in problems like urban planning, marketing, financial markets or biology.
Publications
- CliqueMS: a computational tool for annotating in-source metabolite ions from LC-MS untargeted metabolomics data based on a coelution similarity network - Bioinformatics 35 (20) , 4089 -4097 (2019).
- Leader evaluation and team cohesiveness in the process of team development: A matter of gender? - PLoS ONE 12 (10) , e0186045 (2017).
- Bone fusion in normal and pathological development is constrained by the network architecture of the human skull - Sci. Rep. 7 , art. no. 3376 (2017).
- iMet: A network-based computational tool to assist in the annotation of metabolites from tandem mass spectra - Anal. Chem. 89 (6) , 3474 -3482 (2017).
- Accurate and scalable social recomendation using mixed-membership stochastic block models - Proc. Natl. Acad. Sci. USA 113 (50) , 14207 -14212 (2016).
- Differences in Collaboration Patterns across Discipline, Career Stage, and Gender - PLoS Biol. 14(11) , e1002573 (2016).
- Inferring propagation paths for sparsely observed perturbations on complex networks - Sci. Adv. 2 , e1501638 (2016).
- Multilayer stochastic block models reveal the multilayer structure of complex networks - Phys. Rev. X 6 , 011036 (2016).
- Long-term evolution of email networks: statistical regularities, predictability and stability of social behaviors - PLOS ONE 11(1) , e0146113 (2016).
- A comprehensive study on different modelling approaches to predict platelet deposition rates in a perfusion chamber - Sci. Rep. 5 , 13606 (2015).
- Mapping high-growth phenotypes in the flux space of microbial metabolism - J. R. Soc. Interface 12 , 20150543 (2015).
- Scaling and optimal synergy: Two principles determining microbial growth in complex media - Phys. Rev. E 91 , 062703 (2015).
- Control of cell–cell forces and collective cell dynamics by the intercellular adhesome - Nat. Cell Biol. 17 , 409 -420 (2015).