Preprint/Evolution of protein domain repeats

PLOS Topic Pages
PLOS Computational Biology • PLOS Genetics • PLOS ONE

This is a PLOS Topic Page draft

Public peer review comments are posted here
All content on this page is being developed under a CC BY 4.0 license


Authors
About the Authors 

Sara Light
AFFILIATION: Dept. Biochemistry and Biophysics, Bioinformatics Infrastructure for Life Sciences, Science for Life Laboratory, Stockholm University, Stockholm , P-Box 1031, 17121 Solna, Sweden

Arne Elofsson
AFFILIATION: Dept. Biochemistry and Biophysics, Swedish E-Science Research Center, Science for Life Laboratory, Stockholm University, Stockholm , P-Box 1031, 17121 Solna, Sweden


Protein domain repeats are evolutionarily related units that occur in tandem within a protein. Many proteins are composed of multiple protein domains, functional units of common origin. One variety of domain combination that is particularly frequent among Metazoa, is tandem domain repeats. These are stretches of domains from the same family, situated next to each other in a protein. Certain properties characterize these domains. First, they are often quite short, often less than 50 residues, and, second, they tend to be highly variable with only a few residues that are crucial for the functionality of the domain. Structurally, repeat domains are diverse and may form modular structures on their own or form larger filaments where each repeat is dependent on other repeats for its functionality. Their sequences are malleable, both with regard to the repeating unit and in the number of repeats, and they therefore provide flexible binding to many partners.

File:Merged fig1.png
Repeat protein examples. a) The left-handed superhelix of 12 ankyrin repeats from human ankyrinR [1]. b) The horseshoe-like structure of the porcine ribonuclease inhibitor featuring leucine rich repeats [2]

Many repeat proteins expand through duplications of neighbouring domains but for some repeat domains low similarity between adjacent domains have been observed. Causes for this observed pattern that have been proposed include aggregation prevention, as adjacent highly similar domains may aggregate, and constraints imposed on the repeat protein by its binding partners. It is possible, however, that it is simply a propagating pattern initially derived through random drift. Further, while multi-domain proteins evolve by single domain insertions/deletions at the N- or C-termini, repeat domains tend to expand through internal tandem duplications, where several units at a time are duplicated. Additionally, repeat proteins often expand to more than twenty repeating units within one protein, possibly through homologous recombination.

History

edit

Protein domains are structural, functional and evolutionary building blocks that, within one protein, can form various architectures that may be composed of one or several domains[3],[4]. One subset of such domain architectures is domain repeats, i.e. strings of the same class of domains repeated after one another (tandem repeats), as for instance ankyrin repeats (see figure 1). These domains are often short, such as the ankyrin repeat of ~33 residues, and their sequences are highly varied, where typically only a motif is retained. The latter property confounds the automated assignment of repeat domains, since their signatures are often only a small part of the sequence, such in the case of the common HEAT repeat that has on average around 13% sequence identity[5]. As more sequences have become available, and the fraction of protein domain repeats is substantial, several methods to identify repeats have been devised. One method that is commonly used is using hmmer[6] with relaxed e-values for repeating regions[7]. However, de novo methods also exist, such as HHpredID[8], a method based on HMM-HMM comparisons.

File:Rps all hsa.png
A schematic illustration of the five most important nebulin-containing proteins in human; NRAP, Lasp, Nebulette and Nebulin. Each square represents one nebulin domain where one nebulin domain per exon is red, two green and three blue. Other domains are shown as grey or yellow. The lines indicate reciprocal best hit alignments.

The study of repeat proteins started with Margaret Dayhoff's observations of internal gene duplications in 1978[9] and the structurally confirmed observation of tandem domains in acid proteases[10]. Many of these duplications take place within repeat proteins, generating mutations that are often associated with disease. The perhaps best known case is huntingtin, a protein that contains many HEAT domain repeats[11] preceeded by a trinucleotide repeat, the expansion of which causes Huntington's disease[12]. However, while the disease-causing trinucleotide repeats of huntingtin are short, many proteins contain repeats encompassing entire protein domains (protein domain repeats). Such domain repeats were found to be important as structural components of virus proteins, such as the shaft repeat of the adenovirus fiber protein[13] and an accumulation of ankyrin repeats in poxviruses has been observed[14]. Indeed, structural roles are common for repeat proteins[15], and some of the best studied specimens play important roles as cytoskeleton crosslinkers, e.g. spectrin[16].

The tandem repeats of several proteins have been linked to complex diseases, such as cancer[17], as for instance the polyglycine repeats of the androgen receptor[18]. Further, leucine rich repeat (LRR) repeats may play a role in Parkinson's disease[19][20]. Clearly, many repeat domains are of medical importance as exemplified by the immunoglobulin domain, the domain that is the main structural component of antibodies. In recent years, it has become evident that other protein domain repeats may also be used as protein scaffolds capable of specific protein binding[21][22]. Indeed, the LRR repeat is the main component of the adaptive immune system of jawless vertebrates[23] and enables plants to adjust to new pathogens[24]. Using alternative repeat domains for specific protein binding may allow optimization of the biophysical properties of protein scaffolds.

The many cellular roles of repeat proteins

edit

There are several different classes of protein domain repeats[25] ranging from those that fold independently, to those repeats that aggregate[26]. Other repeats, forming elongated structures, can only fold in the context of similar repeats, sometimes forming superhelical structures that are stabilized by adjacent repeat domains[27]. A further distinctive characteristic of repeat domains is their tendency to, unlike most other domains, have little direct interaction between sequentially distant residues[28].

File:Summary.svg
Summary of repeat expansions. a) A schematic representation of domain insertion in multi-domain proteins. Domains are typically added in singletons at the termini. b) For repeat proteins, two or more domains are typically added in the center of the protein.
File:Selected domfams fix.png
Domain similarity vs distance. The autocorrelations from BLAST bit scores for all proteins with at least ten repeat domains, as previously described[7] . The upper plot shows Nebulin and the lower is all collected repeat proteins.

Repeat proteins have diverse functions, but are particularly common in cell cycle regulation, transcriptional regulation, protein transport, protein folding or in structural roles[29]. Although fairly rare in prokaryotes, repeat proteins in these organisms tend to be located on the cell surface, often playing an important role in pathogenesis[30][31]. In eukaryotes, many repeat proteins help shape cellular structure, such in the cases of filamin, spectrin and titin. Other domain repeats are important in complex formation through flexible binding surfaces. Ankyrin repeats, for example, mediate many different protein-protein interactions and is a scaffold that evolves continuously toward the binding of new ligands[32]. Indeed, repeat proteins are common among the hubs in the protein-protein interaction networks[33] [7] [34].

Repeat proteins are longer than other proteins, and, indeed, constitute some of the proteins that shape the cytoskeleton. One example of this is titin, one of the largest known proteins, with hundreds of immunoglobulin repeats in tandem[35]. Many of the repeat proteins that are important for the function of the cytoskeleton have cellular roles that were most likely cemented at the dawn of the vertebrate lineage, such as filamin, a mechanotransductor important for signalling[36] that exists in three closely related paralogs in all sequenced vertebrates[37].

Domain/Clan NumProt Function Length SCOP fold MaxRepeat Clan/Family
TPR 4,895 Protein-Protein interaction 36 α-α 119 CL0020
β-propeller 3,521 Diverse 40 Four stranded β-sheet 32 CL0186
Helix-turn-helix 2,376 DNA-binding 57 DNA/RNA-binding 3 helical bundle 12 CL0123
C2H2 zinc finger 1,652 DNA-binding 23 β-β-α zinc-finger 32 CL0361
PAS domain clan 1,451 Signalling 102 Profilin-like 16 CL0183
Ankyrin repeat 1,298 Protein-protein interaction 33 α-α 52 PF00023
Ig-like fold 1,190 Diverse 83 Immunoglobulin-like β-sandwich 64 CL0159
OB fold 938 Oligonucleotide/oligosaccharide binding 71 OB-fold (Barrel) 13 Cl0021
DNA gyrase C-terminal 928 DNA-binding 50 β-propeller 4 PF03989
DNA clamp 894 DNA-binding 120 DNA clamp 4 CL0060
S-layer homology domain 605 Cell surface 44 - 43 PF00395
Mitochondrial carrier 601 Substrate carriers 95 Mitochondrial carrier 4 PF00153
POTRA domain 584 Hypothetical chaperone-like 78 - 8 CL0191
Zinc β-ribbon 534 Diverse/Unidentified 38 Rubredoxin-like 6 CL0167
Pentapeptide repeat 501 Unidentified 38 - 9 PF00805
4Fe-4S ferredoxin 501 Enzymatic 22 Ferredoxin-like 15 CL0344
RHS-repeat 477 Ligand-binding 38 - 38 PF05593
Immunoglobulin superfamily 456 Diverse 90 Immunoglobulin-like β-sandwich 69 CL0011
EF-hand like superfamily 434 Calcium-binding 30 EF-hand like 10 CL0220
PASTA domain 360 Beta-lactam binding 62 Penicillin-binding protein 5 PF03793
The twenty most common domains and properties pertaining to them. The number of repeat proteins is calculated as previously described[7], using a failry strict e-value of 0.001 and a secondary e-value of 0.1, from a dataset with 1,046 prokaryotes and 16 eukaryotes (manuscript in preparation). MaxRep - maximum number of repeats. Most of the listed domains are in fact clans and, therefore, the typical fold and function listed may reflect only a portion of the entire clan.

Evolution

edit

Proteins evolve both through mutations involving one or a few residues and by domain rearrangements. The latter are comparatively well tolerated since, in many cases, protein domains perform modular functions. Repeat proteins have high variability with regard to the number of repeats in the protein. They differ from other proteins in the sense that they tend to expand through internal duplications rather than domain shuffling[38]. A likely scenario, is that repeat proteins expand rapidly until a physical/structural limit has been reached and subsequently diverge rapidly since repeat domains tend to only have weak sequence similarity[38] One possible explanation for their propensity is that their structures allow expansion and, additionally, may provide novel ligand binding[38].


File:Flow rep.png
Repeat protein flowchart. In the top panel a schematic repeat protein is shown, followed by the resulting protein after a tandem duplication. Subsequently, pairwise alignments between all domains are generated and the internal similarity matrix (bottom panel) reflects the tandem duplication by parallel lines at the distance of the size of the duplication unit, which in this case is three. The darker the cell, the higher the sequence similarity between the domains.
File:Merged matrix2.png
The internal similarity matrix for human nebulin. The darker the cell, the higher the sequence similarity between the domains. The N-terminal is in the top left corner while the C-terminal is in the bottom right. The bottom figure shows a similarity matrix for the Nebulin super-repeats. Each number signifies the number of the super-repeat. The shade in each cell indicates the similarity of the domains.

Particularly higher eukaryotes are prone to rapid repeat expansion[39][40], as is immediately obvious from the abundance of repeats in eukaryotic genomes compared to prokaryotic[26]. The three muscle proteins titin, twitchin and nebulin provide a few extreme examples of repeat expansion[41][42][43]. For nebulin, the vertebrate lineage has seen rapid expansion of four proteins, figure 2 where the largest protein is composed of more than two hundred domains.

The immunoglobulin and filamin repeats, which share the β-sandwich fold, exhibit a pattern where roughly every other domain shows high sequence similarity[44][37]. This pattern is probably the result of a divergence of adjacent domains and subsequent duplications of the resulting pairs. Although such patterns may evolve by chance as the duplication of the diverged domain pair propagates, functional explanations for these dissimilar adjacent domains have been suggested. For instance, a lack of similarity between adjacent domains may serve to prevent aggregation, as suggested by Wright and coworkers[44]. Alternatively, functional constraints set by other proteins in the interaction where the repeat domain is involved may cause this patter, as in the case of filamin[37].

Although different domains evolve through internal duplications of different sized cassettes, the most common cassette size is two and the most common location of duplication is in the middle[7] , see figure 3. Intriguingly, the abundant EGF and immunoglobulin domains have enriched exon junctions in domain boundaries[45], thus suggesting a role for exon shuffling in repeat expansion for these domains. Other domains, such as the eukaryote specific C2H2 zinc finger domain shows a very different pattern where the numerous repeating domains are contained within one giant exon[7] .

Clearly, internal repeat duplications most frequently involve more than one domain, figure 4. In some cases, repeat proteins contain another level of repeats, namely super-repeats units that, in an already repetitive sequence, show internal similarity in larger bocks. Expansions such as these are easily detected through inspection of domain similarity matrices where parallel lines of very similar domains, all seven domains apart, see figure 5 and figure 4, are clearly visible. As Figure figure 6 shows, in nebulin, the duplications are consistently taking place in cassettes of seven units. A similar super-repeat is found in the skeletal muscle isoform of titin[40].

Although it is not the only mechanism behind protein domain repeat expansions, homologous recombination is likely to be important since any region of homology between two sequences may cause homologous recombination and subsequent tandem duplication[46]. Indeed, larger repeating regions tend to be duplicated[26]. Further, duplications in more malleable regions, such as those that contain domains that have a repeat--forming characteristic, are less likely to have a detrimental effect on protein function. In fact, an increase in the number of repeated domains might not, to any great extent, alter the protein structure and can promote protein stability[32][47].

Repeat proteins and complexity

edit

Higher complexity is associated with larger genomes and an abundance of sequence repeats[48]. Indeed, protein domain repeats are more common in complex organisms[15][26] . For coding repeats there are likely to be additional constraints affecting their abundance aside from constraints on genome size. In particular, protein domain repeats are generally uncommon in prokaryotes[7] . This may be explained by the prokaryotic lack of the sophisticated protein synthesis machinery, including the endoplasmatic reticuluum and golgi, that allows eukaryotes to handle the multi-domain, non-globular folds that characterize repeat proteins[26].

While nebulin and titin are extreme examples of protein domain repeats, forming enormous proteins that are essential for cellular structure, most vertebrates contain a large number of repeat proteins that have between 3 and twenty consecutive domains. An estimated 17% of the human proteome consists of protein domain repeats while the corresponding number is around 5% for prokaryotes[7] , and indeed unicellular organisms in general[15]. Although the reason for the predominance of protein domain repeats in Metazoa is not clear, it is possible that repeats provide another source of variability in compensation for low generation rates[26]. Further, studies show that protein domain repeats are comparatively recent development in genome evolution, since ancient proteins tend to have few repeats[26]. It should also be noted that certain Metazoa, such as Drosophila melanogaster, are comparable to unicellular organisms with regard to repeat proteins[7] .

Further Reading

edit
  • Marcotte (1999), for an important review of repeat proteins[26].
  • Boersma (2011), for a review of the biotechnological applications of protein repeat engineering[21].
  • Kajava (2011), for a recent review of the structural classes of repeat proteins[25].

Acknowledgements

edit

This work was supported by grants from the Swedish Research Council by grants to AE, SSF (the Foundation for Strategic Research). SL was financed by Bioinformatics Infrastructure for Life Sciences. The authors gratefully acknowledge Dr. Åsa K. Björklund for assistance with figure preparation and, further, the Journal of Molecular Biology for permission to reuse images previously published.

References

edit
  1. Michaely, P.; Tomchick, D. R.; Machius, M.; Anderson, R. G. (2002). "Crystal structure of a 12 ANK repeat stack from human ankyrinR". The EMBO Journal 21 (23): 6387–6396. doi:10.1093/emboj/cdf651. PMID 12456646. PMC 136955. //www.ncbi.nlm.nih.gov/pmc/articles/PMC136955/. 
  2. Kobe, B.; Deisenhofer, J. (1995). "A structural basis of the interactions between leucine-rich repeats and protein ligands". Nature 374 (6518): 183–186. doi:10.1038/374183a0. PMID 7877692. 
  3. Rossmann, M. G.; Moras, D.; Olsen, K. W. (1974). "Chemical and biological evolution of nucleotide-binding protein". Nature 250 (463): 194–199. doi:10.1038/250194a0. PMID 4368490. 
  4. Murzin, A. G.; Brenner, S. E.; Hubbard, T.; Chothia, C. (1995). "SCOP: A structural classification of proteins database for the investigation of sequences and structures". Journal of Molecular Biology 247 (4): 536–540. doi:10.1006/jmbi.1995.0159. PMID 7723011. 
  5. Andrade, M. A.; Petosa, C.; O'Donoghue, S. I.; Müller, C. W.; Bork, P. (2001). "Comparison of ARM and HEAT protein repeats". Journal of Molecular Biology 309 (1): 1–18. doi:10.1006/jmbi.2001.4624. PMID 11491282. 
  6. Eddy, S. R. (1998). "Profile hidden Markov models". Bioinformatics (Oxford, England) 14 (9): 755–763. doi:10.1093/bioinformatics/14.9.755. PMID 9918945. 
  7. 7.0 7.1 7.2 7.3 7.4 7.5 7.6 7.7 7.8 Björklund, A. K.; Ekman, D.; Elofsson, A. (2006). "Expansion of protein domain repeats". PLOS Computational Biology 2 (8): e114. doi:10.1371/journal.pcbi.0020114. PMID 16933986. PMC 1553488. //www.ncbi.nlm.nih.gov/pmc/articles/PMC1553488/. 
  8. Biegert, A.; Söding, J. (2008). "De novo identification of highly diverged protein repeats by probabilistic consistency". Bioinformatics (Oxford, England) 24 (6): 807–814. doi:10.1093/bioinformatics/btn039. PMID 18245125. 
  9. Barker, W. C.; Ketcham, L. K.; Dayhoff, M. O. (1978). "A comprehensive examination of protein sequences for evidence of internal gene duplication". Journal of Molecular Evolution 10 (4): 265–281. doi:10.1007/BF01734217. PMID 633380. 
  10. Tang, J.; James, M. N.; Hsu, I. N.; Jenkins, J. A.; Blundell, T. L. (1978). "Structural evidence for gene duplication in the evolution of the acid proteases". Nature 271 (5646): 618–621. doi:10.1038/271618a0. PMID 24179. 
  11. Andrade, M. A.; Bork, P. (1995). "HEAT repeats in the Huntington's disease protein". Nature Genetics 11 (2): 115–116. doi:10.1038/ng1095-115. PMID 7550332. 
  12. "A novel gene containing a trinucleotide repeat that is expanded and unstable on Huntington's disease chromosomes. The Huntington's Disease Collaborative Research Group". Cell 72 (6): 971–983. 1993. doi:10.1016/0092-8674(93)90585-e. PMID 8458085. 
  13. Green, N. M.; Wrigley, N. G.; Russell, W. C.; Martin, S. R.; McLachlan, A. D. (1983). "Evidence for a repeating cross-beta sheet structure in the adenovirus fibre". The EMBO Journal 2 (8): 1357–1365. doi:10.1002/j.1460-2075.1983.tb01592.x. PMID 10872331. PMC 555283. //www.ncbi.nlm.nih.gov/pmc/articles/PMC555283/. 
  14. Bork, P. (1993). "Hundreds of ankyrin-like repeats in functionally diverse proteins: Mobile modules that cross phyla horizontally?". Proteins 17 (4): 363–374. doi:10.1002/prot.340170405. PMID 8108379. 
  15. 15.0 15.1 15.2 Apic, G.; Gough, J.; Teichmann, S. A. (2001). "Domain combinations in archaeal, eubacterial and eukaryotic proteomes". Journal of Molecular Biology 310 (2): 311–325. doi:10.1006/jmbi.2001.4776. PMID 11428892. 
  16. Speicher, D. W.; Marchesi, V. T. (1984). "Erythrocyte spectrin is comprised of many homologous triple helical segments". Nature 311 (5982): 177–180. doi:10.1038/311177a0. PMID 6472478. 
  17. Warfel, N. A.; Newton, A. C. (2012). "Pleckstrin homology domain leucine-rich repeat protein phosphatase (PHLPP): A new player in cell signaling". The Journal of Biological Chemistry 287 (6): 3610–3616. doi:10.1074/jbc.R111.318675. PMID 22144674. PMC 3281723. //www.ncbi.nlm.nih.gov/pmc/articles/PMC3281723/. 
  18. McEwan, I. J. (2001). "Structural and functional alterations in the androgen receptor in spinal bulbar muscular atrophy". Biochemical Society Transactions 29 (Pt 2): 222–227. doi:10.1042/0300-5127:0290222. PMID 11356158. 
  19. Paisán-Ruíz, C.; Jain, S.; Evans, E. W.; Gilks, W. P.; Simón, J.; Van Der Brug, M.; López De Munain, A.; Aparicio, S. et al. (2004). "Cloning of the gene containing mutations that cause PARK8-linked Parkinson's disease". Neuron 44 (4): 595–600. doi:10.1016/j.neuron.2004.10.023. PMID 15541308. 
  20. Zimprich, A.; Biskup, S.; Leitner, P.; Lichtner, P.; Farrer, M.; Lincoln, S.; Kachergus, J.; Hulihan, M. et al. (2004). "Mutations in LRRK2 cause autosomal-dominant parkinsonism with pleomorphic pathology". Neuron 44 (4): 601–607. doi:10.1016/j.neuron.2004.11.005. PMID 15541309. 
  21. 21.0 21.1 Boersma, Y. L.; Plückthun, A. (2011). "DARPins and other repeat protein scaffolds: Advances in engineering and applications". Current Opinion in Biotechnology 22 (6): 849–857. doi:10.1016/j.copbio.2011.06.004. PMID 21715155. 
  22. Yadid, I.; Tawfik, D. S. (2011). "Functional β-propeller lectins by tandem duplications of repetitive units". Protein Engineering, Design & Selection : Peds 24 (1–2): 185–195. doi:10.1093/protein/gzq053. PMID 20713410. 
  23. Pancer, Z.; Cooper, M. D. (2006). "The evolution of adaptive immunity". Annual Review of Immunology 24: 497–518. doi:10.1146/annurev.immunol.24.021605.090542. PMID 16551257. 
  24. Ellis, J.; Dodds, P.; Pryor, T. (2000). "Structure, function and evolution of plant disease resistance genes". Current Opinion in Plant Biology 3 (4): 278–284. doi:10.1016/s1369-5266(00)00080-7. PMID 10873844. 
  25. 25.0 25.1 Kajava, A. V. (2012). "Tandem repeats in proteins: From sequence to structure". Journal of Structural Biology 179 (3): 279–288. doi:10.1016/j.jsb.2011.08.009. PMID 21884799. 
  26. 26.0 26.1 26.2 26.3 26.4 26.5 26.6 26.7 Marcotte, E. M.; Pellegrini, M.; Yeates, T. O.; Eisenberg, D. (1999). "A census of protein repeats". Journal of Molecular Biology 293 (1): 151–160. doi:10.1006/jmbi.1999.3136. PMID 10512723. 
  27. Ferreiro, D. U.; Komives, E. A. (2007). "The plastic landscape of repeat proteins". Proceedings of the National Academy of Sciences of the United States of America 104 (19): 7735–7736. doi:10.1073/pnas.0702682104. PMID 17483477. PMC 1876514. //www.ncbi.nlm.nih.gov/pmc/articles/PMC1876514/. 
  28. Main, E. R.; Jackson, S. E.; Regan, L. (2003). "The folding and design of repeat proteins: Reaching a consensus". Current Opinion in Structural Biology 13 (4): 482–489. doi:10.1016/s0959-440x(03)00105-2. PMID 12948778. 
  29. d'Andrea, L. D.; Regan, L. (2003). "TPR proteins: The versatile helix". Trends in Biochemical Sciences 28 (12): 655–662. doi:10.1016/j.tibs.2003.10.007. PMID 14659697. 
  30. Deivanayagam, C. C.; Rich, R. L.; Carson, M.; Owens, R. T.; Danthuluri, S.; Bice, T.; Höök, M.; Narayana, S. V. (2000). "Novel fold and assembly of the repetitive B region of the Staphylococcus aureus collagen-binding surface protein". Structure (London, England : 1993) 8 (1): 67–78. doi:10.1016/s0969-2126(00)00081-2. PMID 10673425. 
  31. Yeats, C.; Finn, R. D.; Bateman, A. (2002). "The PASTA domain: A beta-lactam-binding domain". Trends in Biochemical Sciences 27 (9): 438. doi:10.1016/s0968-0004(02)02164-3. PMID 12217513. 
  32. 32.0 32.1 Kohl, A.; Binz, H. K.; Forrer, P.; Stumpp, M. T.; Plückthun, A.; Grütter, M. G. (2003). "Designed to be stable: Crystal structure of a consensus ankyrin repeat protein". Proceedings of the National Academy of Sciences of the United States of America 100 (4): 1700–1705. doi:10.1073/pnas.0337680100. PMID 12566564. PMC 149896. //www.ncbi.nlm.nih.gov/pmc/articles/PMC149896/. 
  33. Ekman, D.; Light, S.; Björklund, A. K.; Elofsson, A. (2006). "What properties characterize the hub proteins of the protein-protein interaction network of Saccharomyces cerevisiae?". Genome Biology 7 (6): R45. doi:10.1186/gb-2006-7-6-r45. PMID 16780599. PMC 1779539. //www.ncbi.nlm.nih.gov/pmc/articles/PMC1779539/. 
  34. Dosztányi, Z.; Chen, J.; Dunker, A. K.; Simon, I.; Tompa, P. (2006). "Disorder and sequence repeats in hub proteins and their implications for network evolution". Journal of Proteome Research 5 (11): 2985–2995. doi:10.1021/pr060171o. PMID 17081050. 
  35. Labeit, S.; Barlow, D. P.; Gautel, M.; Gibson, T.; Holt, J.; Hsieh, C. L.; Francke, U.; Leonard, K. et al. (1990). "A regular pattern of two types of 100-residue motif in the sequence of titin". Nature 345 (6272): 273–276. doi:10.1038/345273a0. PMID 2129545. 
  36. Ehrlicher, A. J.; Nakamura, F.; Hartwig, J. H.; Weitz, D. A.; Stossel, T. P. (2011). "Mechanical strain in actin networks regulates FilGAP and integrin binding to filamin A". Nature 478 (7368): 260–263. doi:10.1038/nature10430. PMID 21926999. PMC 3204864. //www.ncbi.nlm.nih.gov/pmc/articles/PMC3204864/. 
  37. 37.0 37.1 37.2 Light, S.; Sagit, R.; Ithychanda, S. S.; Qin, J.; Elofsson, A. (2012). "The evolution of filamin-a protein domain repeat perspective". Journal of Structural Biology 179 (3): 289–298. doi:10.1016/j.jsb.2012.02.010. PMID 22414427. PMC 3728663. //www.ncbi.nlm.nih.gov/pmc/articles/PMC3728663/. 
  38. 38.0 38.1 38.2 Andrade, M. A.; Petosa, C.; O'Donoghue, S. I.; Müller, C. W.; Bork, P. (2001). "Comparison of ARM and HEAT protein repeats". Journal of Molecular Biology 309 (1): 1–18. doi:10.1006/jmbi.2001.4624. PMID 11491282.  Cite error: Invalid <ref> tag; name "Andrade2001" defined multiple times with different content
  39. Looman, C.; Abrink, M.; Mark, C.; Hellman, L. (2002). "KRAB zinc finger proteins: An analysis of the molecular mechanisms governing their increase in numbers and complexity during evolution". Molecular Biology and Evolution 19 (12): 2118–2130. doi:10.1093/oxfordjournals.molbev.a004037. PMID 12446804. 
  40. 40.0 40.1 Kenny, P. A.; Liston, E. M.; Higgins, D. G. (1999). "Molecular evolution of immunoglobulin and fibronectin domains in titin and related muscle proteins". Gene 232 (1): 11–23. doi:10.1016/s0378-1119(99)00122-5. PMID 10333517. 
  41. Higgins, D. G.; Labeit, S.; Gautel, M.; Gibson, T. J. (1994). "The evolution of titin and related giant muscle proteins". Journal of Molecular Evolution 38 (4): 395–404. doi:10.1007/BF00163156. PMID 8007007. 
  42. McElhinny, A. S.; Kazmierski, S. T.; Labeit, S.; Gregorio, C. C. (2003). "Nebulin: The nebulous, multifunctional giant of striated muscle". Trends in Cardiovascular Medicine 13 (5): 195–201. doi:10.1016/s1050-1738(03)00076-8. PMID 12837582. 
  43. Björklund, A. K.; Light, S.; Sagit, R.; Elofsson, A. (2010). "Nebulin: A study of protein repeat evolution". Journal of Molecular Biology 402 (1): 38–51. doi:10.1016/j.jmb.2010.07.011. PMID 20643138. 
  44. 44.0 44.1 Wright, C. F.; Teichmann, S. A.; Clarke, J.; Dobson, C. M. (2005). "The importance of sequence diversity in the aggregation and evolution of proteins". Nature 438 (7069): 878–881. doi:10.1038/nature04195. PMID 16341018. 
  45. Patthy, L. (1999). "Genome evolution and the evolution of exon-shuffling--a review". Gene 238 (1): 103–114. doi:10.1016/s0378-1119(99)00228-0. PMID 10570989. 
  46. Koszul, R.; Fischer, G. (2009). "A prominent role for segmental duplications in modeling eukaryotic genomes". Comptes Rendus Biologies 332 (2–3): 254–266. doi:10.1016/j.crvi.2008.07.005. PMID 19281956. 
  47. Tripp, K. W.; Barrick, D. (2004). "The tolerance of a modular protein to duplication and deletion of internal repeats". Journal of Molecular Biology 344 (1): 169–178. doi:10.1016/j.jmb.2004.09.038. PMID 15504409. 
  48. Lavorgna, G.; Patthy, L.; Boncinelli, E. (2001). "Were protein internal repeats formed by "bricolage"?". Trends in Genetics : Tig 17 (3): 120–123. doi:10.1016/s0168-9525(00)02207-1. PMID 11226587.