Relationship between Insertion/Deletion (Indel) Frequency of Proteins and Essentiality

Background: In a previous study, we demonstrated that some essential proteins from pathogenicorganisms contained sizable insertions/deletions (indels) when aligned to human proteins of highsequence similarity. Such indels may provide sufficient spatial differences between the pathogenicprotein and human proteins to allow for selective targeting. In one example, an indel difference wastargeted via large scale in-silico screening. This resulted in selective antibodies and smallcompounds which were capable of binding to the deletion-bearing essential pathogen proteinwithout any cross-reactivity to the highly similar human protein. The objective of the current studywas to investigate whether indels were found more frequently in essential than non-essentialproteins.Results: We have investigated three species, Bacillus subtilis, Escherichia coli, and Saccharomycescerevisiae, for which high-quality protein essentiality data is available. Using these data, wedemonstrated with t-test calculations that the mean indel frequencies in essential proteins weregreater than that of non-essential proteins in the three proteomes. The abundance of indels in bothtypes of proteins was also shown to be accurately modeled by the Weibull distribution. However,Receiver Operator Characteristic (ROC) curves showed that indel frequencies alone could not beused as a marker to accurately discriminate between essential and non-essential proteins in thethree proteomes. Finally, we analyzed the protein interaction data available for S. cerevisiae andobserved that indel-bearing proteins were involved in more interactions and had greaterbetweenness values within Protein Interaction Networks (PINs).Conclusion: Overall, our findings demonstrated that indels were not randomly distributed acrossthe studied proteomes and were likely to occur more often in essential proteins and those thatwere highly connected, indicating a possible role of sequence insertions and deletions in theregulation and modification of protein-protein interactions. Such observations will provide newinsights into indel-based drug design using bioinformatics and cheminformatics tools.
