Skip to main content
Howard Hughes Medical Institute Author Manuscripts logoLink to Howard Hughes Medical Institute Author Manuscripts
. Author manuscript; available in PMC: 2025 Jun 13.
Published in final edited form as: Science. 2024 Jul 18;385(6708):538–543. doi: 10.1126/science.adq0553

Structure Guided Discovery of Ancestral CRISPR-Cas13 Ribonucleases

Peter H Yoon 1,2,3, Zeyuan Zhang 2,3,4,5, Kenneth J Loi 1,2, Benjamin A Adler 2,3,5, Arushi Lahiri 1,2, Kamakshi Vohra 2,5, Honglue Shi 2,3, Daniel Bellieny Rabelo 2,5, Marena Trinidad 2,3, Ron S Boger 2,3,4, Muntathar J Al-Shimary 1,2,3, Jennifer A Doudna 1,2,3,5,6,7,8,9,*
PMCID: PMC12165695  NIHMSID: NIHMS2074710  PMID: 39024377

Abstract

The RNA-guided ribonuclease CRISPR-Cas13 enables adaptive immunity in bacteria and programmable RNA manipulation in heterologous systems. Cas13s share limited sequence similarity, hindering discovery of related or ancestral systems. To address this, we developed an automated structural-search pipeline to identify an ancestral clade of Cas13 (Cas13an), and further trace Cas13 origins to defense-associated ribonucleases. Despite being one third the size of other Cas13s, Cas13an mediates robust programmable RNA depletion and defense against diverse bacteriophages. However, unlike its larger counterparts, Cas13an uses a single active site for both CRISPR RNA processing and RNA-guided cleavage, revealing the ancestral nuclease domain has two modes of activity. Discovery of Cas13an deepens our understanding of CRISPR-Cas evolution and expands opportunities for precision RNA editing, showcasing the promise of structure-guided genome mining.


Type VI CRISPR-Cas systems provide adaptive immunity in prokaryotes by targeting RNA transcripts of invading mobile genetic elements (13). Interference is mediated by the Cas13 protein and its CRISPR RNA (crRNA) that together form an RNA-guided ribonuclease, whose simple reprogrammability has facilitated widespread repurposing in biotechnology (39). The defining feature of Cas13 is a pair of higher eukaryotes and prokaryotes nucleotide-binding (HEPN) domains. In Cas13, the two HEPN domains (HEPN1 and HEPN2) dimerize intramolecularly to form the active site in response to target-transcript recognition, which contrasts other HEPN proteins that typically homodimerize (10, 11). The HEPN superfamily of ribonucleases exhibits great sequence and structural diversity (10, 11). Even within the Cas13 family, the lack of sequence conservation makes conventional homology searches and evolutionary analyses difficult. As a result, compared to other Class 2 CRISPR-Cas effectors including Cas9 (Type II) and Cas12 (Type V), few distinct Cas13 subtypes have been identified to date, and little is known about their evolutionary origins.

Structural comparisons offer a solution to the challenges posed by low sequence conservation within protein families like Cas13, as protein folds exhibit greater conservation (12). Historically, the limited availability of protein structures has bottlenecked structure-centric evolutionary analyses. However, the advent of atomic-accuracy prediction programs and their associated databases, which now approach a billion structures, has largely overcome this obstacle (1315). Nevertheless, exploiting these databases poses new challenges, as traditional structural comparison programs were not designed for such scale. To address this, machine learning programs such as Foldseek (16) accelerate structural homology searches compared to gold standard programs like DALI or TMalign (17, 18). However, machine learning-based programs are less sensitive than traditional programs (16, 19), highlighting the need for scalable approaches with maximal sensitivity to uncover novel relationships.

Automated structural homology search uncovers ancient CRISPR-Cas13 systems

Motivated by this challenge, we developed an automated structural-search pipeline that combines the speed of machine learning-based search methods with the sensitivity of traditional structure alignment programs. Specifically, we leveraged a Foldseek-clustered AlphaFold database (20), whose reduced search space makes slow-but-sensitive DALI-searches feasible (Methods). Using representative HEPN dimers within known Cas13 proteins (2123) as the search query (Fig. 1A), we found twelve previously uncharacterized protein clusters in the AlphaFold database bearing an intramolecular HEPN dimer (fig. S1 and table S1, 2). Further sequence-based homology searches and genomic analyses revealed that two of the newly identified clusters occur next to CRISPR arrays, representing a new Cas13 subtype (Cas13an) (Fig. 1A and fig. S2). Notably, neither Foldseek nor hidden Markov model searches were able to detect significant homology between previously known Cas13s and Cas13an (fig. S3). This highlights the considerable divergence of Cas13an compared to known Cas13 proteins, and underscores the importance of sensitive search strategies.

Fig. 1. Structural homology search enabled discovery of ancestral Cas13 systems.

Fig. 1

(A) Schematic of automated structure-based discovery pipeline. AFDB, AlphaFold Database. (B) Comparison of Cas13a (PDBID: 5XWY) and Cas13an (AFDBID: A0A7C5SD50) structure and domain architecture. (C) Maximum-likelihood phylogenetic tree of Cas13 subtypes. Sequences are provided in table S6, and alignment and tree files are provided in data S1. (D) Structural comparison of HEPN domains with canonical organization (HEPT; PDBID: 5YEP) and shared rearrangements (Cas13an HEPN1 and HEPN2 domains; AFDBID: A0A7C5SD50, and AbiD/F gene; AFDBID: A0A3S5XYX8). (E) Maximum-likelihood phylogenetic tree of Cas13an HEPN1 and HEPN2 domains and their structural homologs. Sequences and annotations of these proteins are available in table S9, and alignment and tree files are provided in data S1.

In total, we identified thirteen diverse Cas13an sequences, ten of which occur next to CRISPR arrays with conserved repeat sequences (fig. S2, 4 and table S3). Notably, all Cas13an loci lacked other Cas genes, including the acquisition-associated genes cas1 and cas2 (fig. S2). Nevertheless, the CRISPR arrays appeared to be actively acquiring new spacers, which we found target double-stranded DNA (dsDNA) phages (fig. S5 and table S4, 5). Consistent with this, four of the ten Cas13an encoding genomes have cas1 and cas2 genes in trans belonging to Type II CRISPR-Cas systems (fig. S6). This raises the possibility that Cas13an hijacks adaptation modules from Type II systems, as previously suggested for other CRISPR-Cas systems (24, 25).

Ranging in size from 429 to 577 amino acids, Cas13an proteins are remarkably small compared to previously characterized Cas13 orthologs, which are typically 800–1400 amino acids in length (1, 26). Structural comparisons suggest that Cas13an’s compact size is due to a lack of large insertions in the HEPN domains and the absence of a canonical REC lobe that functions to bind the crRNA in Class 2 CRISPR-Cas effectors (Fig. 1B). The underdeveloped REC lobe in Cas13an is reminiscent of diminutive REC lobes in smaller Cas9s and Cas12s and their ancestral proteins, IscB and TnpB (27, 28), suggesting that Cas13an may represent an early evolutionary form of Cas13.

Motivated by this unusual predicted structure, we next explored the evolutionary relationship of Cas13an to other Cas13 subtypes. Combining structural and sequence-based phylogenetic analyses (Methods) revealed that Cas13an is likely ancestral to other proteins in the lineage (Fig. 1C, table S6 and data S1). This suggests that all three Class 2 CRISPR-Cas effectors—Cas9, Cas12, and Cas13—originated from compact ancestral proteins that underwent domain accretion over time. We also noticed that within the compact architecture of Cas13an, the primary sequences of the two HEPN domains were similarly rearranged compared to canonical HEPN proteins, implying a close relationship (Fig. 1D). To test this hypothesis, we isolated Cas13an HEPN1 and HEPN2 domains as separate queries for our structural-search pipeline. Across both searches, the shared hits revealed that Cas13an HEPN domains bear structural similarities to the non-CRISPR HEPN nucleases Swt1, DZIP3 and AbiD/F (table S7, 8). Notably, phylogenetic analysis revealed that both Cas13an HEPN domains form a clade with AbiD/F (Fig. 1E, table S9 and data S1), which are phage defense-associated ribonucleases predicted to co-occur with a non-coding RNA of unknown function (RNA family: RF03085) (29). Summarizing our findings, we propose that Cas13 evolved from compact ancestral enzymes formed by the fusion of two closely related defense-associated HEPN genes, with potential RNA-guided capabilities predating the Cas13 lineage.

Compact Cas13s mediate potent programmable RNA interference in vivo

Building on our bioinformatics insights, we next examined CRISPR-Cas13an systems using E. coli as a heterologous host to determine their active components. To test the hypothesis that the Cas13an-adjacent CRISPR array encodes guide RNAs, we transformed E. coli with plasmids encoding a Cas13an, a CRISPR array, and intergenic regions between the two. Small RNA-sequencing revealed the expression of crRNAs whose spacer sequence is positioned before the repeat region (Fig. 2A and fig. S7). This distinctive arrangement is observed only in crRNAs of the Cas13b family, which includes variants previously reported as Cas13X, Cas13Y, and Cas13e–i (30, 31). In light of Cas13an, this unusual RNA arrangement can be reinterpreted to be an ancestral characteristic instead of a derived one.

Fig. 2. Cas13an systems provide targeted RNA knockdown and defense against phages.

Fig. 2.

(A) Small RNA-sequencing of CRISPR-Cas13an1 locus heterologously expressed in E. coli. Inset shows reads of length 40–80 nucleotides (nt) corresponding to processed crRNA. Black squares denote CRISPR-repeat, and green diamond denotes spacer sequence. (B) Schematic of green-fluorescent protein (GFP) depletion assay in E. coli. (C) Serial dilutions of E. coli in GFP depletion assays. Each spot progression represents a 10-fold dilution. (D) Schematic of phage challenge assays in E. coli. (E) Phage challenge assay results for lytic T4-phage using Cas13an8. Each spot progression represents a 10-fold dilution of phage stock. (F) Efficiency of plaquing (EOP) summary of Cas13an8 targeting phages of unrelated, diverse genera and labeled by genome nucleic acid composition. Labels I, II, III, and IV represent Podovirus, Myovirus, Siphovirus, and Jumbo Myoviruses respectively.

Next, we tested the possible crRNA-guided ribonuclease activity of Cas13an by targeting transcripts of the green fluorescent protein (GFP). We co-transformed E. coli with plasmids encoding either GFP or the related red fluorescent protein (RFP) along with separate plasmids encoding various Cas13an orthologs with GFP-targeting crRNAs (Fig. 2B and table S10). We observed diminished GFP but not RFP fluorescence, consistent with Cas13an’s specificity as an RNA-guided nuclease (Fig. 2C and fig. S8). GFP expression was unaffected in this experiment when a non-targeting crRNA was used (Fig. 2C). Furthermore, point mutations in the active site of either HEPN domain abolished Cas13an-mediated GFP reduction, confirming the critical role of the HEPN domains in this activity (Fig. 2C). These results show that the Cas13an system is the most compact CRISPR-Cas effector complex known to date, comprising a protein as small as 429 amino acids and a single RNA component of ~60 nucleotides (nt) in length.

Motivated by these results and the previously established efficacy of larger Cas13 systems in antiphage defense (6), we investigated whether Cas13an could similarly protect against phage infection. To do this, we challenged E. coli harboring plasmids encoding Cas13an8, which was chosen for its robust activity, using the lytic dsDNA phage T4 (Fig. 2D). We found that when expressed together with a T4-phage-targeting crRNA (table S10), Cas13an provided >103-fold protection, as shown by phage plaquing efficiencies. Notably, this effect was independent of the essentiality of the targeted phage gene, as targeting the non-essential soc gene restricted phage replication (Fig. 2E). This implies Cas13an provides phage-defense through both targeted RNA cleavage in cis, and indiscriminate cleavage in trans like other Cas13 subtypes (2, 3, 7, 32, 33). crRNAs matching the antisense (non-coding) phage DNA strand had no effect on phage plaquing, consistent with Cas13an’s targeting of RNA (Fig. 2E). We next tested whether robust defense observed for T4-phage extends to a broader spectrum of phages of different genera and life cycles (6). We found all phages tested to be susceptible to at least one phage-targeting crRNA (Fig. 2F, fig. S9 and table S10). Similar to what was reported for Cas13a, targeting late rather than early genes was reliably more effective (6, 33). We conclude that despite its small size, Cas13an provides potent, broad-spectrum defense against diverse phages, suggesting this trait is ancestral in the Cas13 lineage.

Ancestral HEPN domain active site is a multifunctional ribonucleolytic center

Having established the in vivo activity of Cas13an, we chose Cas13an2 for biochemical analysis due to its in vitro stability. Using purified Cas13an protein produced in E. coli, together with in vitro transcribed full-length crRNAs, we observed that Cas13an cleaved RNAs bearing sequence complementarity to the 30nt crRNA spacer sequence (Fig. 3A, B and table S11). Additionally, Cas13an exhibited trans-cleavage activity, indiscriminately cleaving a fluorophore-quencher-labeled reporter RNA in the presence of a crRNA-complementary RNA target (Fig. 3C). Though direct comparisons are difficult, the trans-cleavage rate for Cas13an appeared to be slower than those of previously reported Cas13 variants (3, 21, 34, 35), and was instead more comparable to ssDNA trans-cleavage rates of Cas12 proteins (36, 37). In control experiments, we observed cleavage inhibition in Cas13an proteins bearing alanine substitutions in the conserved Rx4H motifs of the HEPN domain active sites (Fig. 3A, C). We also observed inhibition of Cas13an activity in presence of the Mg2+ chelator EDTA (Fig. 3A, C). However, this inhibitory effect was less pronounced than that observed for other Cas13 subtypes (2, 3, 7, 21), as EDTA addition did not completely abolish cis-cleavage (Fig. 3A).

Fig. 3. Complementary RNA triggers both cis- and trans-cleavage in Cas13an.

Fig. 3.

(A) End point measurement (60 min) of Cas13an2 ribonucleoprotein complex (RNP) cleavage of 5’-fluorescein (FAM)-labeled guide complementary (target RNA) and non-complementary (non-target RNA) substrates. In mutants, alanine substitutions were introduced in HEPN Rx4H motifs: R127A/H132A for dHEPN1, R363A/H368A for dHEPN2, and all four for dHEPN1/2. (B) Kinetics of Cas13an2 RNP mediated target RNA cleavage (cis-cleavage). (C) Schematic of fluorophore-quencher assay to measure collateral RNA cleavage (trans-cleavage) induced by Cas13an2 target RNA recognition (top). Trans-cleavage induced fluorescence traces for Cas13an RNP with target RNA and controls (bottom). AU, arbitrary units. (D) Schematic of mismatch tolerance assay in E. coli. (E) Heatmap representation of Cas13an2 mismatch tolerance for single and double mismatch spacers.

To further investigate target recognition requirements for Cas13an, we designed a comprehensive mismatch library of spacers with disrupted complementarity to the kanamycin resistance (kanR) gene in E. coli (Fig. 3D and table S12). We co-transformed E. coli with helper plasmids encoding Cas13an2 systems alongside a separate target plasmid containing the kanR gene. Cultivation on kanamycin-containing plates resulted in the selective survival of cells with defective spacers. Sequencing of surviving cells showed that tolerance for single-base mismatches was target sequence dependent (Fig. 3E). In contrast, tolerance for double mismatches appeared consistent between targets, with complementarity in the 1–6nt regions and 24–27nt regions of the spacer being crucial for Cas13an activity in vivo (Fig. 3E). We next modified the mismatch assay to test whether sequences flanking the target site influence Cas13an cleavage efficacy (fig. S10, Methods). This revealed that Cas13an2 did not have a strong preference for nearby bases, suggesting a flexible target scope (fig. S10).

In addition to cleaving transcripts in response to target RNA recognition, previously characterized Cas13 subtypes mediate processing of CRISPR array derived pre-crRNA into individual crRNAs (3, 7, 32). crRNA maturation enhances interference and is a crucial regulatory step conserved across CRISPR-Cas systems. In previously studied Cas13 subtypes, maturation is catalyzed by a Mg2+-independent active site distinct from the Mg2+-dependent HEPN active site used for targeted RNA cleavage (3, 21, 38). The minimal architecture of Cas13an and absence of a recognizable pre-crRNA processing center made it unclear whether Cas13an could also mediate maturation. To test this, we analyzed Cas13an-mediated cleavage of in vitro transcribed pre-crRNA containing a repeat-spacer-repeat sequence flanked by additional 15nt (Fig. 4A and table S11). Denaturing gel analysis revealed Cas13an-dependent pre-crRNA processing, in which site-specific RNA cleavage occurred only within the spacer sequence (Fig. 4A, B and fig. S11A), further confirmed by RNA-sequencing data for the reaction (fig. S12). To unambiguously determine the orientation of cleavage, we repeated the experiment with a 5’ labeled crRNA containing the full-length 30nt spacer followed by a 36nt complete repeat (Fig. 4A and table S11). This resulted in a 6–7nt labeled product, revealing that Cas13an cleaved 24 or 23nt upstream of the repeat (Fig. 4A, C and fig. S11B, C), consistent with the in vivo small RNA-sequencing results (Fig. 2A and fig. S7). In vitro cleavage using crRNAs with shorter spacers showed that the optimal spacer length is 30nt (fig. S13 and table S11), in agreement with the observed importance of the 1–6nt region for in vivo interference (Fig. 3E). The discrepancy between the spacer length of mature crRNA (23–24nt) and optimal crRNA (30nt) warrants future investigation of the interplay between processing and interference.

Fig. 4. Multifunctional HEPN domains enable RNA-guided cleavage and pre-crRNA processing in Cas13an.

Fig. 4.

(A) Substrates used for RNA processing assays. Top: pre-crRNA and processing site (black triangle) inferred from RNA-sequencing (fig. S12). Bottom: full-length crRNA and processing sites inferred from denaturing gels (fig. S11C). (B) In vitro pre-crRNA processing by Cas13an2. Cas13an processes pre-crRNA in vitro only in the presence of Mg2+ and catalytic residues of both HEPN domains are necessary for processing. Gel with the ladder is attached in fig. S11A. (C) In vitro RNA processing of 5’-FAM-labeled full-length crRNA. Same gel was first examined for FAM signal, and then for SYBR-GOLD stain signal. Gel with the ladder attached is shown in fig. S11B. (D) Parallels in evolutionary paths between two unrelated Class 2 CRISPR-Cas effectors, Cas13 and Cas12.

The unique pre-crRNA cleavage pattern of Cas13an suggests a processing mechanism distinct from other Cas13 subtypes, whose pre-crRNA cleavage occurs near the repeat sequence (fig. S14). Experiments showed that adding EDTA inhibited Cas13an-mediated pre-crRNA maturation, indicating Mg2+ dependence of the reaction (Fig. 4B). We hypothesized that the Mg2+-dependent HEPN domains are responsible for crRNA maturation, similar to the multifunctional nucleic acid processing domains found in compact Cas12 enzymes and their TnpB ancestors (fig. S15) (39, 40). Notably, neither addition of EDTA nor mutations in the HEPN domains affected binding of Cas13an to crRNA, based on electrophoretic mobility shift assays (EMSAs) (fig. S16). However, HEPN domain mutations ablated Cas13an-catalyzed pre-crRNA processing, uncovering a dual functionality of the HEPN domains that enables both target RNA cleavage and guide RNA processing (Fig. 4B).

These findings reveal an unanticipated convergence among ancestral Class 2 CRISPR-Cas nucleases to license a single active site for cleavage of both target nucleic acids and guide RNA (Fig. 4D and fig. S15). Potentially, other HEPN nucleases structurally similar to Cas13 identified in this study also possess such dual functionality (Fig. 1D, E, fig. S1 and table S1, 2, 7). The parallels observed between the evolution of RNA-targeting Cas13 and DNA-targeting Cas12 highlight recurrent solutions to address shared evolutionary pressures imposed on small RNA-guided nucleases (Fig. 4D and fig. S14, 15). This parallel is further reinforced by secondary pre-crRNA processing active sites having been acquired on multiple independent occasions within both the Cas13 and Cas12 lineages (fig. S14, 15). Collectively, these insights not only bridge a significant gap in our understanding of Class 2 CRISPR-Cas system origins but also establish a foundation for future exploration of the evolution and mechanisms of RNA-guided ribonuclease activity.

In this study, we developed and applied a structure-based search strategy that combines rapid clustering and sensitive comparisons to uncover homology between highly divergent proteins. We further leveraged structural and sequence comparisons to resolve complex phylogenetic relationships, enabling the discovery of recurrent themes underlying CRISPR-Cas enzyme evolution. Although this study focused on the AlphaFold database, our strategy generalizes to other structure prediction databases, including those of metagenomic or viral origin (15, 41). As structure prediction methods and associated databases continue to advance, structure-guided protein mining will become increasingly powerful, enabling greater access to biological insights that have long evaded detection. This study paves the way for future investigations of shared folds and functions across remote homologs, which will further illuminate principles underlying biomolecular evolution.

Supplementary Material

Yoon et al Supplementary Material
Yoon et al Supplementary Data 1-3
Yoon et al Supplementary Tables 1-12

Acknowledgments:

We thank members of the Doudna lab and the Innovative Genomics Institute for helpful discussions. We thank UCSF for giving us access to the high performance compute cluster Wynton to meet our compute needs. We acknowledge Mr. Nicholas T. Perry (Arc Institute, UC Berkeley and UCSF) for assistance in figure generation, and Ms. Netravathi Krishnappa (NGS Core Operations Manager and Sequencing Specialist, Center for Translational Genomics, Innovative Genomics Institute, UC Berkeley) for sequencing. We acknowledge the use of GPT-4 for providing assistance in text editing.

Funding:

National Science Foundation Graduate Research Fellowship (PHY, MJA)

HHMI Fellow of The Jane Coffin Childs Fund for Medical Research (HS)

m-CAFEs Microbial Community Analysis & Functional Evaluation in Soils (m-CAFEs@lbl.gov), a Science Focus Area led by Lawrence Berkeley National Laboratory based upon work supported by the US Department of Energy, Office of Science, Office of Biological & Environmental Research [DE-AC02–05CH11231] (BAA)

Howard Hughes Medical Institute (HHMI) Investigator (JAD)

Data and materials availability:

All sequencing data underlying this article are publicly available on NCBI SRA under PRJNA1128359. Alignments and phylogenetic trees generated in the study are found in supplementary data S1. Sequences of all synthetic oligos and biological constructs generated in this study are provided in supplementary data 2, 3, and all constructs are available upon request. All scripts used to perform the analyses and relevant databases are publicly available through Zenodo (42).

Footnotes

Competing interests:

JAD is a co-founder of Caribou Biosciences, Editas Medicine, Intellia Therapeutics, Mammoth Biosciences and Scribe Therapeutics, and a director of Altos, Johnson & Johnson and Tempus. JAD is a scientific advisor to Caribou Biosciences, Intellia Therapeutics, Mammoth Biosciences, Inari, Scribe Therapeutics and Algen. JAD also serves as Chief Science Advisor to Sixth Street and a Scientific Advisory Board member at The Column Group. JAD conducts academic research projects sponsored by Roche and Apple Tree Partners. The Regents of the University of California have patents and patents pending on CRISPR technologies on which P.H.Y, Z.Z, K.J.L, B.A.A, J.A.D are inventors.

References and Notes

  • 1.Shmakov S, Abudayyeh OO, Makarova KS, Wolf YI, Gootenberg JS, Semenova E, Minakhin L, Joung J, Konermann S, Severinov K, Zhang F, Koonin EV, Discovery and Functional Characterization of Diverse Class 2 CRISPR-Cas Systems. Mol. Cell 60, 385–397 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Abudayyeh OO, Gootenberg JS, Konermann S, Joung J, Slaymaker IM, Cox DBT, Shmakov S, Makarova KS, Semenova E, Minakhin L, Severinov K, Regev A, Lander ES, Koonin EV, Zhang F, C2c2 is a single-component programmable RNA-guided RNA-targeting CRISPR effector. Science 353, aaf5573 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.East-Seletsky A, O’Connell MR, Knight SC, Burstein D, Cate JHD, Tjian R, Doudna JA, Two distinct RNase activities of CRISPR-C2c2 enable guide-RNA processing and RNA detection. Nature 538, 270–273 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Abudayyeh OO, Gootenberg JS, Essletzbichler P, Han S, Joung J, Belanto JJ, Verdine V, Cox DBT, Kellner MJ, Regev A, Lander ES, Voytas DF, Ting AY, Zhang F, RNA targeting with CRISPR–Cas13. Nature 550, 280–284 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Cox DBT, Gootenberg JS, Abudayyeh OO, Franklin B, Kellner MJ, Joung J, Zhang F, RNA editing with CRISPR-Cas13. Science 358, 1019–1027 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Adler BA, Hessler T, Cress BF, Lahiri A, Mutalik VK, Barrangou R, Banfield J, Doudna JA, Broad-spectrum CRISPR-Cas13a enables efficient phage genome editing. Nat. Microbiol 7, 1967–1979 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Konermann S, Lotfy P, Brideau NJ, Oki J, Shokhirev MN, Hsu PD, Transcriptome Engineering with RNA-Targeting Type VI-D CRISPR Effectors. Cell 173, 665–676.e14 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Chandrasekaran SS, Tau C, Nemeth M, Pawluk A, Konermann S, Hsu PD, Rewriting endogenous human transcripts with trans-splicing. bioRxiv [Preprint] (2024). 10.1101/2024.01.29.577779. [DOI] [Google Scholar]
  • 9.Borrajo J, Javanmardi K, Griffin J, Martin SJS, Yao D, Hill K, Blainey PC, Al-Shayeb B, Programmable multi-kilobase RNA editing using CRISPR-mediated trans-splicing. bioRxiv [Preprint] (2023). 10.1101/2023.08.18.553620. [DOI] [Google Scholar]
  • 10.Anantharaman V, Makarova KS, Burroughs AM, Koonin EV, Aravind L, Comprehensive analysis of the HEPN superfamily: identification of novel roles in intra-genomic conflicts, defense, pathogenesis and RNA processing. Biol. Direct 8, 15 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Pillon MC, Gordon J, Frazier MN, Stanley RE, HEPN RNases – an emerging class of functionally distinct RNA processing and degradation enzymes. Crit. Rev. Biochem. Mol. Biol 56, 88–108 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Illergård K, Ardell DH, Elofsson A, Structure is three to ten times more conserved than sequence—A study of structural response in protein cores. Proteins Struct. Funct. Bioinforma 77, 499–508 (2009). [DOI] [PubMed] [Google Scholar]
  • 13.Jumper J, Evans R, Pritzel A, Green T, Figurnov M, Ronneberger O, Tunyasuvunakool K, Bates R, Žídek A, Potapenko A, Bridgland A, Meyer C, Kohl SAA, Ballard AJ, Cowie A, Romera-Paredes B, Nikolov S, Jain R, Adler J, Back T, Petersen S, Reiman D, Clancy E, Zielinski M, Steinegger M, Pacholska M, Berghammer T, Bodenstein S, Silver D, Vinyals O, Senior AW, Kavukcuoglu K, Kohli P, Hassabis D, Highly accurate protein structure prediction with AlphaFold. Nature 596, 583–589 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Varadi M, Anyango S, Deshpande M, Nair S, Natassia C, Yordanova G, Yuan D, Stroe O, Wood G, Laydon A, Žídek A, Green T, Tunyasuvunakool K, Petersen S, Jumper J, Clancy E, Green R, Vora A, Lutfi M, Figurnov M, Cowie A, Hobbs N, Kohli P, Kleywegt G, Birney E, Hassabis D, Velankar S, AlphaFold Protein Structure Database: massively expanding the structural coverage of protein-sequence space with high-accuracy models. Nucleic Acids Res. 50, D439–D444 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Lin Z, Akin H, Rao R, Hie B, Zhu Z, Lu W, Smetanin N, Verkuil R, Kabeli O, Shmueli Y, dos Santos Costa A, Fazel-Zarandi M, Sercu T, Candido S, Rives A, Evolutionary-scale prediction of atomic-level protein structure with a language model. Science 379, 1123–1130 (2023). [DOI] [PubMed] [Google Scholar]
  • 16.van Kempen M, Kim SS, Tumescheit C, Mirdita M, Lee J, Gilchrist CLM, Söding J, Steinegger M, Fast and accurate protein structure search with Foldseek. Nat. Biotechnol 42, 243–246 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Holm L, Benchmarking fold detection by DaliLite v.5. Bioinformatics 35, 5326–5327 (2019). [DOI] [PubMed] [Google Scholar]
  • 18.Zhang Y, Skolnick J, TM-align: a protein structure alignment algorithm based on the TM-score. Nucleic Acids Res. 33, 2302–2309 (2005).. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Boger R, Lu A, Chithrananda S, Yang K, Skopintsev P, Adler B, Wallace E, Yoon P, Abbeel P, Doudna J, “Toph (true retrieval of proteins homologs): Adapting a contrastive question-answering framework for protein search” in ICML Workshop on Computational Biology (2023; https://icml-compbio.github.io/2023/papers/WCBICML2023_paper138.pdf). [Google Scholar]
  • 20.Barrio-Hernandez I, Yeo J, Jänes J, Mirdita M, Gilchrist CLM, Wein T, Varadi M, Velankar S, Beltrao P, Steinegger M, Clustering predicted structures at the scale of the known protein universe. Nature 622, 637–645 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Slaymaker IM, Mesa P, Kellner MJ, Kannan S, Brignole E, Koob J, Feliciano PR, Stella S, Abudayyeh OO, Gootenberg JS, Strecker J, Montoya G, Zhang F, High-Resolution Structure of Cas13b and Biochemical Characterization of RNA Targeting and Cleavage. Cell Rep. 26, 3741–3751.e5 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Liu L, Li X, Ma J, Li Z, You L, Wang J, Wang M, Zhang X, Wang Y, The Molecular Architecture for RNA-Guided RNA Cleavage by Cas13a. Cell 170, 714–726.e10 (2017). [DOI] [PubMed] [Google Scholar]
  • 23.Zhang C, Konermann S, Brideau NJ, Lotfy P, Wu X, Novick SJ, Strutzenberg T, Griffin PR, Hsu PD, Lyumkis D, Structural Basis for the RNA-Guided Ribonuclease Activity of CRISPR-Cas13d. Cell 175, 212–223.e17 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Hoikkala V, Ravantti J, Díez-Villaseñor C, Tiirola M, Conrad RA, McBride MJ, Moineau S, Sundberg L-R, Cooperation between Different CRISPR-Cas Types Enables Adaptation in an RNA-Targeting System. mBio 12, 10.1128/mbio.03338-20 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Shmakov SA, Utkina I, Wolf YI, Makarova KS, Severinov KV, Koonin EV, CRISPR Arrays Away from cas Genes. CRISPR J. 3, 535–549 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Kannan S, Altae-Tran H, Jin X, Madigan VJ, Oshiro R, Makarova KS, Koonin EV, Zhang F, Compact RNA editors with small Cas13 proteins. Nat. Biotechnol 40, 194–197 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Altae-Tran H, Kannan S, Demircioglu FE, Oshiro R, Nety SP, McKay LJ, Dlakić M, Inskeep WP, Makarova KS, Macrae RK, Koonin EV, Zhang F, The widespread IS200/IS605 transposon family encodes diverse programmable RNA-guided endonucleases. Science 374, 57–65 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Karvelis T, Druteika G, Bigelyte G, Budre K, Zedaveinyte R, Silanskas A, Kazlauskas D, Venclovas Č, Siksnys V, Transposon-associated TnpB is a programmable RNA-guided DNA endonuclease. Nature 599, 692–696 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Weinberg Z, Lünse CE, Corbino KA, Ames TD, Nelson JW, Roth A, Perkins KR, Sherlock ME, Breaker RR, Detection of 224 candidate structured RNAs by comparative analysis of specific subsets of intergenic regions. Nucleic Acids Res. 45, 10811–10823 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Xu C, Zhou Y, Xiao Q, He B, Geng G, Wang Z, Cao B, Dong X, Bai W, Wang Y, Wang X, Zhou D, Yuan T, Huo X, Lai J, Yang H, Programmable RNA editing with compact CRISPR–Cas13 systems from uncultivated microbes. Nat. Methods 18, 499–506 (2021). [DOI] [PubMed] [Google Scholar]
  • 31.Hu Y, Chen Y, Xu J, Wang X, Luo S, Mao B, Zhou Q, Li W, Metagenomic discovery of novel CRISPR-Cas13 systems. Cell Discov. 8, 1–4 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Smargon AA, Cox DBT, Pyzocha NK, Zheng K, Slaymaker IM, Gootenberg JS, Abudayyeh OA, Essletzbichler P, Shmakov S, Makarova KS, Koonin EV, Zhang F, Cas13b Is a Type VI-B CRISPR-Associated RNA-Guided RNase Differentially Regulated by Accessory Proteins Csx27 and Csx28. Mol. Cell 65, 618–630.e7 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Meeske AJ, Nakandakari-Higa S, Marraffini LA, Cas13-induced cellular dormancy prevents the rise of CRISPR-resistant bacteriophage. Nature 570, 241–245 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.East-Seletsky A, O’Connell MR, Burstein D, Knott GJ, Doudna JA, RNA Targeting by Functionally Orthogonal Type VI-A CRISPR-Cas Enzymes. Mol. Cell 66, 373–383.e3 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Kuo H-C, Prupes J, Chou C-W, Finkelstein IJ, Massively parallel profiling of RNA-targeting CRISPR-Cas13d. Nat. Commun 15, 498 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Chen JS, Ma E, Harrington LB, Da Costa M, Tian X, Palefsky JM, Doudna JA, CRISPR-Cas12a target binding unleashes indiscriminate single-stranded DNase activity. Science 360, 436–439 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Feng W, Zhang H, Le XC, Signal Amplification by the trans-Cleavage Activity of CRISPR-Cas Systems: Kinetics and Performance. Anal. Chem 95, 206–217 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Zhang B, Ye Y, Ye W, Perčulija V, Jiang H, Chen Y, Li Y, Chen J, Lin J, Wang S, Chen Q, Han Y-S, Ouyang S, Two HEPN domains dictate CRISPR RNA maturation and target cleavage in Cas13d. Nat. Commun 10, 2544 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Pausch P, Al-Shayeb B, Bisom-Rapp E, Tsuchida CA, Li Z, Cress BF, Knott GJ, Jacobsen SE, Banfield JF, Doudna JA, CRISPR-CasΦ from huge phages is a hypercompact genome editor. Science 369, 333–337 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Nety SP, Altae-Tran H, Kannan S, Demircioglu FE, Faure G, Hirano S, Mears K, Zhang Y, Macrae RK, Zhang F, The Transposon-Encoded Protein TnpB Processes Its Own mRNA into ωRNA for Guided Nuclease Activity. CRISPR J. 6, 232–242 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Nomburg J, Price N, Doudna JA, Birth of new protein folds and functions in the virome. bioRxiv [Preprint] (2024). 10.1101/2024.01.22.576744. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Bellieny D, Doudna-lab/snakedali: snakedali database s3 bucket update, version v1.3, Zenodo (2024); 10.5281/zenodo.11495555. [DOI] [Google Scholar]
  • 43.Mirdita M, Schütze K, Moriwaki Y, Heo L, Ovchinnikov S, Steinegger M, ColabFold: making protein folding accessible to all. Nat. Methods 19, 679–682 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Meng EC, Goddard TD, Pettersen EF, Couch GS, Pearson ZJ, Morris JH, Ferrin TE, UCSF ChimeraX: Tools for structure building and analysis. Protein Sci. 32, e4792 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Altschul SF, Madden TL, Schäffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ, Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25, 3389–3402 (1997). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.HMMER. http://hmmer.org/. [Google Scholar]
  • 47.Sayers EW, Bolton EE, Brister JR, Canese K, Chan J, Comeau DC, Connor R, Funk K, Kelly C, Kim S, Madej T, Marchler-Bauer A, Lanczycki C, Lathrop S, Lu Z, Thibaud-Nissen F, Murphy T, Phan L, Skripchenko Y, Tse T, Wang J, Williams R, Trawick BW, Pruitt KD, Sherry ST, Database resources of the national center for biotechnology information. Nucleic Acids Res. 50, D20–D26 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Bland C, Ramsey TL, Sabree F, Lowe M, Brown K, Kyrpides NC, Hugenholtz P, CRISPR Recognition Tool (CRT): a tool for automatic detection of clustered regularly interspaced palindromic repeats. BMC Bioinformatics 8, 209 (2007). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Notredame C, Higgins DG, Heringa J, T-coffee: a novel method for fast and accurate multiple sequence alignment1. J. Mol. Biol 302, 205–217 (2000). [DOI] [PubMed] [Google Scholar]
  • 50.Nguyen L-T, Schmidt HA, von Haeseler A, Minh BQ, IQ-TREE: A Fast and Effective Stochastic Algorithm for Estimating Maximum-Likelihood Phylogenies. Mol. Biol. Evol 32, 268–274 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.B. Q. Minh, M. A. T. Nguyen, A. von Haeseler, Ultrafast Approximation for Phylogenetic Bootstrap. Mol. Biol. Evol 30, 1188–1195 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52.Shedlovskiy D, Shcherbik N, Pestov DG, One-step hot formamide extraction of RNA from Saccharomyces cerevisiae. RNA Biol. 14, 1722–1726 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53.Bushnell B, Rood J, Singer E, BBMerge – Accurate paired shotgun read merging via overlap. PLOS ONE 12, e0185056 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54.Chen S, Zhou Y, Chen Y, Gu J, fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics 34, i884–i890 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Yoon et al Supplementary Material
Yoon et al Supplementary Data 1-3
Yoon et al Supplementary Tables 1-12

Data Availability Statement

All sequencing data underlying this article are publicly available on NCBI SRA under PRJNA1128359. Alignments and phylogenetic trees generated in the study are found in supplementary data S1. Sequences of all synthetic oligos and biological constructs generated in this study are provided in supplementary data 2, 3, and all constructs are available upon request. All scripts used to perform the analyses and relevant databases are publicly available through Zenodo (42).

RESOURCES