WP3

–

Molecular behaviour of IDR ensembles

Objectives

IDPs are characterised by their high flexibility and their complex conformational ensembles. Far from being random polymers, IDR ensembles are evolutionarily constrained to fulfil their biological functions. Contacts between residues is a major pressure to modulate conformational diversity of the ensemble as well as amino acid substitution pattern. The functional role of transient contacts and their distribution in IDR ensembles is far from being properly modelled. The main objective of WP3 is to develop robust methods for generating ensembles and molecular dynamics (MD) protocols that can accurately model multivalent and highly flexible biomolecular complexes. Our focus will be on natural linker IDRs, IDRs with Post-Translational Modifications (PTMs), as well as specific datasets like phosphorylated peptides, which can be considered IDRs since they are not long enough to form structured domains.
WP3 includes an evaluation of publicly available IDRs software tools and models, including those developed in this proposal, through the organisation of new critical assessments which will also provide a fair unbiased way to evaluate the software developed in this proposal. New challenges will be organised within the Critical Assessment of Protein Intrinsic Disorder Prediction as well as under other initiatives. The IDPfun Consortium will leverage its involvement in the CASP Special Interest Group on Ensembles of Alternative Conformations, the Critical Assessment of Functional Annotation (CAFA) and the ML4NGP Cost Action, whose objectives align with this WP, to gather interest from the scientific community and increase the visibility of this initiative. Top performing validated ensemble software tools will be executed to generate predictions to be integrated into the PED database and then shared with core database resources. To accomplish this, we will leverage the network, technology and standards developed in WP1 to calculate ensemble structural properties and generate metadata.

Results of WP3 will be instrumental for the investigation of the protein systems mastered within the IDPfun Consortium. By fulfilling these objectives, we aim to enhance the understanding and modelling of multivalent biomolecular complexes, providing valuable insights into the structural dynamics, aggregation potential and functional properties of proteins.

Task list

Task 3.1 – Manual curation of IDR ensembles

Details

The objective of Task 3.1 is to curate new data on amino acid conformations obtained from large structural repositories and molecular dynamics (MD) simulations. This data will include phosphorylated peptides, natural linker IDRs, and PTMs in proteomes. The curated data will be integrated into the databases maintained by the consortium, such as the PED and DisProt databases. This new data will serve as training data for the protocols developed in this WP. These data will be also shared with core resources as described in WP1.

Task 3.2 – Implementation of new IDR ensemble generation methods

Details

Task 3.2 focuses on the development of novel methods for generating ensembles coupled with high-resolution structures of globular domains. These methods will be used to model large, multivalent, and partly disordered biomolecular complexes, including recognition motifs and PTM residues. By combining these new ensemble generation methods with improved molecular dynamics (MD) protocols, we will sample the conformational space of IDRs. The validity of these ensembles will be assessed by comparing them with existing experimental data from the BMRB, DisProt, and SASBDB databases. The IDPfun Consortium contributes directly to these databases as described in previous task and WP1.

Task 3.3 – Evaluation of IDR prediction software

Details

Task 3.3 aims to evaluate existing software and the software developed in this WP for ensemble generation methods, and WP2 for predicting IDR features from protein sequence data. The evaluation will be conducted using the infrastructure of the Critical Assessment of Protein Intrinsic Disorder Prediction (CAID) challenge, available at https://caid.idpcentral.org/challenge as well as other initiatives like CASP and ML4NGP. The CAID challenge will be adapted to assess IDR function and provide continuous evaluation. The objective is to encourage the scientific community to develop better software in terms of accuracy and sustainability. The evaluation will consider the quality of predictions and the carbon footprint associated with computational resource usage. Evaluated software will be registered in bio.tools and integrated into the CAID Prediction Portal.

Deliverables

D3.1

New ensemble data integrated into IDR databases

[Confidential document]

D3.2

New ensemble predictions integrated into IDR databases

[Confidential document]

Scientific focus

WP3

–

Molecular behaviour of IDR ensembles

Objectives

Task list

Task 3.1 – Manual curation of IDR ensembles

Task 3.2 – Implementation of new IDR ensemble generation methods

Task 3.3 – Evaluation of IDR prediction software

Deliverables

New ensemble data integrated into IDR databases

New ensemble predictions integrated into IDR databases