BMC bioinformatics

Not all predicted CRISPR-Cas systems are equal: isolated cas genes and classes of CRISPR like elements.

PMID 28166719


The CRISPR-Cas systems in prokaryotes are RNA-guided immune systems that target and deactivate foreign nucleic acids. A typical CRISPR-Cas system consists of a CRISPR array of repeat and spacer units, and a locus of cas genes. The CRISPR and the cas locus are often located next to each other in the genomes. However, there is no quantitative estimate of the co-location. In addition, ad-hoc studies have shown that some non-CRISPR genomic elements contain repeat-spacer-like structures and are mistaken as CRISPRs. Using available genome sequences, we observed that a significant number of genomes have isolated cas loci and/or CRISPRs. We found that 11%, 22% and 28% of the type I, II and III cas loci are isolated (without CRISPRs in the same genomes at all or with CRISPRs distant in the genomes), respectively. We identified a large number of genomic elements that superficially reassemble CRISPRs but don't contain diverse spacers and have no companion cas genes. We called these elements false-CRISPRs and further classified them into groups, including tandem repeats and Staphylococcus aureus repeat (STAR)-like elements. This is the first systematic study to collect and characterize false-CRISPR elements. We demonstrated that false-CRISPRs could be used to reduce the false annotation of CRISPRs, therefore showing them to be useful for improving the annotation of CRISPR-Cas systems.