Integrative modeling of compositional and conformationally heterogeneous protein assemblies.
To fully map the structure-function relationships of protein assemblies it is essential to capture their compositional and conformational heterogeneities. To this end, we are currently developing a flexible and general scheme, implemented in the Integrative Modeling Platform (IMP), to represent, score, sample, and analyze models of multiple states and complexes. Particularly, we are developing methods based on solution or in vivo data capable of modeling the underlying structural diversity instead of modeling ensemble averages. The ability to perform structural studies using solution or in vivo data offers several advantages, including determining the structures of difficult-to-isolate protein assemblies in their native environments and describing their full range of structural dynamics. Our approach aims to increase the accuracy, precision, and completeness of the model and to provide an estimate of its uncertainty.
Using genetic data to model protein structures.
Our research has demonstrated that one can harness genome-scale genetic mapping to enable the modeling of the structures of individual proteins or protein complexes. We have shown that genetic interactions measured using point-mutant epistatic miniarray profile (pE-MAP) or chemical genetics miniarray profiles (CG-MAP) approaches can be used to determine the structure of protein assemblies. Modeling of protein complexes based on genetic interaction data proved accurate, precise, and generalizable, suggesting that the approach can effectively obtain structural information from sparse genetic interaction datasets. However, disentangling direct and indirect relationships between residue positions and using these relationships to infer the roles of conformational heterogeneity and allostery remains challenging. We are currently developing methods to model heterogeneous and/or transient molecular assemblies based on genetic interaction data.
Integrative modeling of host-pathogen protein complexes
We have established pipelines for integrative structure determination of host-human protein complexes. This effort involves marrying proteomics, structural, biochemical, and genetic data using an integrative modeling approach. We have applied this approach to determine the structure and structural dynamics of HIV-1 TAR (RNA) and Tat (protein) binding to the human super elongation complex to promote proviral transcription, and the solution structures of the A3G-Vif(HIV-1)-CRL5-CBFβ, Nef(HIV-1)-CD4[CD]–AP2Δμ2-CTD, and the mycobacterial polyketide synthase Pks13 protein complexes. However, structural characterization of host-pathogen protein assemblies remains challenging, largely due to their compositional and conformational heterogeneity. For instance, a high proportion of pathogen proteins contain intrinsically disordered regions. We continue to develop tools for incorporating orthogonal data types to build multi-state models of host-pathogen assemblies.
Structure and dynamics of the nuclear pore complex (NPC).
The NPC mediates nucleocytoplasmic transport between the cytoplasm and nucleoplasm. In yeast, this large macromolecular assembly (~52 MDa) is composed of ~30 distinct Nups (~550 in total) arranged with C8 symmetry around the pore. Multiple NPC isoforms are present in cells, having different structures and compositions. We have applied our integrative modeling approach to determine the structures of these different states. We have focused on understanding how transitions between these states are accessed. Major structural modules of the NPC are held together by flexible connectors that provide strength and resilience in the manner of a suspension bridge. Notably, these connectors extend through the NPC inner-ring, interacting with all major nucleoporins. Our work is focused on obtaining high resolution structures of the different NPC states and describing the ensemble of possible conformation of connector Nups.