How to make a guide RNA for a Cas9 knockout

Guide RNA lore is split across multiple papers, people, and places, and I’m frequently asked about the “best” way to make a guide RNA for Cas9. The following is the state of the art as I understand it, as of today (8/11/14), split into several steps. The steps below assume you want to use Streptococcus pyogenes Cas9 to cut a gene to introduce an insertion/deletion (“indel”) to make a knockout (the simplest use case) using a double-strand cut (wild type Cas9). The process may differ if you want to (for example) use CRISPRi to inhibit transcription. I’ve used * to mark steps that would be at least somewhat altered for other applications or if you’re using less common parts (e.g. Cas9 from another species, different guide RNA promoter, etc).   Before you start

  1. Decide what kind of targeting you want to do. Here we’re considering double-stranded cutting to make a knockout via introduction of an indel.
  2. Decide which Cas9 you’ll use. Here, we’ll assume you’re using Streptococcus pyogenes (aka “Spy“). This choice would affect the protospacer adjacent motif (“PAM”) you’ll look for.*
  3. Get the genomic sequence you want to target from NCBI Gene or elsewhere (e.g. if you’re targeting an intergenic region).*
  4. For knockouts, you generally want to introduce an indel as close to the 5′ end of the coding region as possible. This will have the highest likelihood of creating a protein-destroying frameshift.*

Finding Guides

  1. Use one of many servers to find guide RNAs in the region you’d like to cut. For example, CRISPR-MIT, E-CRISP, or CHOPCHOP. Which tool you choose is mostly personal preference, and each has their own model for scoring guide RNAs.
  2. Each target site will either be ~23 bases ending in “GG” (guide binds Crick strand) or ~23 bases starting in “CC” (guide binds Watson strand). The protospacer adjacent motif (“PAM”) refers to those last or first three bases and is present in the DNA you’re targeting but should not be used in the guide RNA. Hence, the guide RNA itself will be ~20 bases and lack a PAM. SpyCas9 can also use “AG” (Watson “CT“) as a PAM, but not as efficiently.
  3. The exact length of the guide doesn’t seem to matter very much; anything from 17-27 bases (remember, guides don’t include the PAM) seems basically OK (with some qualifiers).

Choosing a guide Now you have a (possibly very long) list of potential guides. Each one has an associated score. How do you choose which one to use? Here’s a semi-ordered list of factors to consider, from most to least important. Consider these qualitative, rather than a quantitative score.

  1. For a knockout, it doesn’t matter which strand the guide RNA binds, but CRISPRi guides should be complementary to the non-coding strand.*
  2. Guides should be perfectly complementary to the region you want to target in the 8-12 bases closest to the PAM.
  3. Never choose a guide that has any significant off-target sites (perfect match for the 8-12 bases closest to the PAM) in a coding of the genome.
  4. Never use a guide with >=3 U’s in a row, since these sequences act as Pol III terminators. This is obviously not applicable if you are using a Pol II vector instead of the common U6 promoter vectors.*
  5. Prefer guides with a PAM of NGG instead of NAG.*
  6. Avoid sequences with significant secondary structure (The Vienna web server is a great place to check this). You should also avoid guides that disrupt the secondary structure of the 3′ constant region (the most common constant region sequence is "GUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAUCAACUUGAAAAAGUGGCACCGAGUCGGUGCUUUUUU").
  7. GC content should be between ~ 30-80%, the higher the better (but not too high!).
  8. Avoid additional G‘s after the PAM. For example, a genomic sequence of “|AGG|CCAT” is probably OK (where “AGG” is the PAM). But “|AGG|GCAT” is probably not a good idea, and “|AGG|GGGG” is a definite no-no.*
  9. Some groups have shown that U‘s are disfavored in the -1, -2, and -4 position (counting back from the first base of the constant region). Other groups haven’t seen this. Your mileage may vary.
  10. Prefer guides in DNAse hypersensitive regions (as annotated by ENCODE on the UCSC genome browser). This isn’t a necessity, but probably won’t hurt.
  11. It’s recently been shown that microhomology at the site of cutting can substantially increase the chance of getting an out-of-frame indel. This doesn’t affect cutting itself, but could help you get the knockout.

Construction of the final guide

  1. Take the guide sequence you chose above and append the constant sequence “GUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAUCAACUUGAAAAAGUGGCACCGAGUCGGUGCUUUUUU” to the 3′ end.
  2. If your guide does not begin with a 5′ G, just add one. This increases efficiency of transcription from the U6 promoter and does not need to be homologous to the region you’re targeting.*
  3. Add cloning sites appropriate for the expression vector you’ve chosen. For example, if using the Zhang’s lab pX330 vector, append “CACC” to the 5′ end of the Watson strand and “AAAC” to the 5′ end of the Crick strand.*
  4. Order oligos, anneal, and ligate.

The above might seem like a lot, but it’s really not all that bad. You’ll quickly get a feel for what makes good vs bad guides. Since it’s so easy to test multiple guides, I always recommend making at least two guides per knockout you’d like to make. That way if one is a dud, you aren’t caught flat-footed. Obviously, there are many *s in the list above, denoting steps that might be a bit different if your application or parts differ from SpyCas9 making a double-strand break for the purposes of a knockout. The toolbox is always expanding, so options abound! But hopefully the above provides a general idea of how to get started. Do you have another neat trick to share? Did I miss something important? Want to expound on the best way to make a CRISPRi guide (a whole other ball of wax)? Feel free to leave a comment!

Jacob Corn

Jacob Corn is the Professor of Genome Biology at ETH Zürich. Follow him on twitter @jcornlab.

COMMENTS

  1. Zhe Zhang

    Dr. Meyer’s paper (doi: 10.1534/genetics.115.175166) in April 2015, shows that guide RNAs with a GG motif at the 3’ end of the target-specific sequences will achieve higher genome editing frequency, which is kind of similar as your guideline “Choosing a guide step 8”.

    Reply
  2. Jacob Corn Post author

    Zhe: True, but in human cells the main effect seems to be related to the post-PAM composition. For some reason, in human cells, guides without a 3’GG in the protospacer aren’t generally less active than those that do have a 3’GG. Something of a mystery, since Barbara’s data in worms are very convincing. Might be related to subtle differences in repair pathways in each organism.

    Reply
  3. Zhe Zhang

    Agree, the CRISPR/Cas9 system is already mature as a technology or an engineering tool, but there is still many mysteries left as a science. Some in vivo data we obtained via modifying either Cas9 or sgRNA expression vector do not match with the in vitro data of our collaborates. And it shows that meiotically developing oocyte in C. elegans exhibits a bias in repair pathways towards HR other than NHEJ(DOI: 10.1371/journal.pgen.1003276), which may lead to the lower efficiency of precise genome knock-in with human cell lines.

    Reply
  4. Pingback: Giz Explains: Everything You Need To Know About CRISPR, The New Tool That Edits DNA | Gizmodo Australia

  5. Pingback: Tudo o que você precisa saber sobre a CRISPR, nova ferramenta de edição de DNA - Boa Informação

  6. Riley Doyle

    Jake,

    Great summary, especially the advice about considering the vector as a whole, not just the 20nt spacer sequence and always trying multiple guides in parallel.

    I would propose adding that one should 1) trust the “experimental reality” of his or her target cell line’s genome and target gene over public databases and 2) plan out your controls and the downstream selection and troubleshooting strategy in advance. For example, guides that exist in online tools and databases may have (very) different scores or not even occur in the target cell line due to SNPs or copy number variations. Further, the dominant splice variant(s) in the cell line may differ from the “consensus”.

    This has an impact on screening. Ask: “if my target gene had an indel, how would I know? What if I only modify one allele? What’s my detection threshold?” Often this means finding clever ways of resolving the newly truncated protein and/or transcript “signal” from the unmodified wild type “noise”. Some advanced planning in the design stage can prevent downstream analysis from becoming a bottleneck.

    Reply
  7. Jacob Corn Post author

    Great points, Riley. We typically sequence the target locus from our system of interest. Sometimes this has shown big changes from the reference sequence, including heterozygous or homozygous SNPs that add/remove PAMs and even insertions and deletions. Biology is rarely as clear-cut as database sequences!

    Reply
  8. Pingback: How to make a guide RNA for a CAS9 Knockout | Genetic Meme

  9. Pingback: CRISPR: Accelerating the pace of molecular biology | DownHouseSoftware

  10. Brian

    Thanks for the great article!

    I am a bit confused about #8, could you provide any references for this rule? I am concerned with the “why” of this rule. Does it decrease efficiency of introducing indels or does it increase off target activity?

    Thanks again!

    Reply
    1. Jacob Corn Post author

      Seems to reduce efficiency in some contexts, mainly human cells. Look for papers on this coming out soon. Although, paradoxically, work from Barbara Meyer’s lab suggests that 3′-GGs actually help.

      Reply
  11. Jana

    I am a bit confused about “Choosing a guide” #4.
    You say that a guide should not contain 3 or more Ts (that is Us).
    But as far as I know the shortest efficient terminator-sequence for mammalian PolIII described in the the literature is T4. (Sequence context effects on oligo(dT) termination signal recognition by Saccharomyces cerevisiae RNA polymerase III, Braglia P. J et al., Biol Chem. 2005, doi: 10.1074/jbc.M412238200)

    So why not three Ts? They shouln’t act as a terminator, should they?
    Do you have references for T3 as a terminator?

    Reply
    1. Jacob Corn Post author

      These rules somewhat conflate different uses of guides and could probably be updated – I wrote them when I was thinking a lot about guide RNA libraries. T4 is a strong terminator for 1-by-1 RNAs, with T3 far weaker. But T3 is still strong enough to bias guide distributions independent of the targeted gene. Check out the sequence preferences apparent in the various genome-wide CRISPR libraries (polyU is explicitly shown in the supplemental of the CRISPRi/a genome wide library paper from Jonathan Weissman’s lab).

      Reply
  12. Nikita

    Hi Jacob. A really helpful article. I am just starting planning my first Cripsr experiment and I had some basic questions. If I want to introduce a mutation, is it necessary to have the guide be complementary to the target sequence (the one I aim to mutate). I have seen some papers where the target sequence is later (from 5′-3′) than the guide and PAM. Also, what is the efficiency (in terms of binding and off target effects) of the SpCas9 EQR variant that uses NGAG as PAM?

    Reply
    1. Jacob Corn Post author

      The guide can be complementary to either strand, but should be designed to be as close as possible to the mutation you want to make. Check out our recent paper in NBT for some other ideas about designing single stranded DNA donors for HDR. We’ve found that ssODNs are not always active in every organism or cell type, but the ideas in the paper can help a lot if your cells will use ssODNs.

      Reply
  13. Claudia

    Great guidelines. I have designed some gRNAs that I will place the order soon.
    I used a sequence of a lentiviral vector (sold by a company) and using SnapGene, I inserted my gRNA sequence in the restriction site (PshAI) in this vector. This site is located upstream of a sequence labeled sgRNA scaffold [in the vector], meaning that the final sequence will be: sgRNA-sgRNA scaffold.
    Do I have to append the constant sequence to the 3′-end of my sgRNA?
    Any help, I appreciate it.
    Many thanks.

    Reply

Leave a Reply to Jacob Corn Cancel reply

Your email address will not be published. Required fields are marked *

Filters

Tweets