Author / Jacob Corn

Looking at genome-wide guide libraries

Jacob Corn

Genome-wide sgRNA libraries for knockout screening are all the rage, and researchers have very generously made them available on Addgene. But which one shou...


Genome-wide sgRNA libraries for knockout screening are all the rage, and researchers have very generously made them available on Addgene. But which one should you choose for your project? Data is always better than guesswork, and I've started looking at what's out there in a relatively systematic way. Note that this is a work in progress, so results will probably change a bit, but the general gist will probably be similar. Also, details in scoring of guides is a controversial topic, so for now let's discuss stick to big picture themes everyone can probably agree upon.

Below is a smoothed distribution of scores (lower is better, scale is arbitrary), based on several metrics for three different guide libraries related to two papers (score on the x-axis and frequency of that score on the y-axis). Why three libraries for two papers? Turns out one Addgene-deposited library isn't the same as what's tested in the paper. Thankfully the authors were very transparent and deposited both sets of sequences into Addgene. Kudos to them!

The first thing to notice is that most guides score below 50. That's good! It means those guides have only been penalized for relatively trivial things. Hence, the libraries are pretty good and most guides should be functional (a no-brainer, given the awesome papers on these libraries). The second thing to notice is that some libraries have a peak around a score of 50. That's bad, because 50 is the penalty added to guides containing Pol III terminator sequences. These are likely to be useless guides, since they won't even be fully transcribed, but at least they should be silent in the library. The third thing to notice are the jagged peaks  above a score of ~100. These guides are relatively rare in the libraries, but are potentially scary. Guides can only get a score this high if they have likely off-target sequences that occur within coding regions. In fact, guides get +100 for each off-target coding region. So the saw-tooth pattern >200 represents guides in these libraries that could potentially knockout more than one gene other than the one target. Hence, when using these libraries it's very important that your phenotype of interest occur with more than one guide (as stressed in the papers). Trusting just one guide could lead you very far astray due to off-target effects.

The above isn't mean to dissuade anyone from using these libraries. They're an incredible and unprecedented resource and the originators have done the community a huge service by making them available for a nominal fee on Addgene. But I think all seasoned scientists know that it's dangerous to treat things as a black box, and that looking under the hood never hurts. I'm sure library improvements will be a hot topic for some time to come and that can only be a good thing for people who want to use these for new biology.


X Close

When it rains it Cascades

Jacob Corn

This week saw not one, but two papers with structures of the E. coli Cascade complex from the labs of Yanli Wan...


This week saw not one, but two papers with structures of the E. coli Cascade complex from the labs of Yanli Wang (Nature) and Scott Bailey (Science). Cascade is a bit like Cas9, in that it's a bacterial immunity endonuclease targeted via CRISPR nucleic acid, but far more complex. While Cas9 is a single protein (and hence attractive for genome engineering), Cascade is 405 kDa split over 11 separate polypeptides and 5 open reading frames. In both structures, the crRNA is stretched out across the entire complex. The structure from Bailey's group also has ssDNA bound, and while it generally follows the path of the crRNA, kinking and base flipping allows the pairing to severely underwind into a ribbon. As is often the case in a large, complex structure like this, there are all kinds of exciting bits to poke into and look at to explain existing biochemical data. I'm looking forward to carefully reading both papers and playing with the structures when they're released. Kudos to both groups!

On a side note, these are huge ~ 3 A structures that also contain nucleic acid, yet both are refined to levels that would have been unthinkable just a few years ago: R/Rfrees of 22.5/29.9 and 20.7/16.4! Of course there's more to structure quality (and a structure) than the R-stats. But still, it's astounding.

X Close

How to make a guide RNA for a Cas9 knockout

Jacob Corn

Guide RNA lore is split across multiple papers, people, and places, and I'm frequently asked about the "best" way to make a guide RNA for Cas9. The following is the state of the art as I understan...


Guide RNA lore is split across multiple papers, people, and places, and I'm frequently asked about the "best" way to make a guide RNA for Cas9. The following is the state of the art as I understand it, as of today (8/11/14), split into several steps. The steps below assume you want to use Streptococcus pyogenes Cas9 to cut a gene to introduce an insertion/deletion ("indel") to make a knockout (the simplest use case) using a double-strand cut (wild type Cas9). The process may differ if you want to (for example) use CRISPRi to inhibit transcription. I've used * to mark steps that would be at least somewhat altered for other applications or if you're using less common parts (e.g. Cas9 from another species, different guide RNA promoter, etc).   Before you start

  1. Decide what kind of targeting you want to do. Here we're considering double-stranded cutting to make a knockout via introduction of an indel.
  2. Decide which Cas9 you'll use. Here, we'll assume you're using Streptococcus pyogenes (aka "Spy"). This choice would affect the protospacer adjacent motif ("PAM") you'll look for.*
  3. Get the genomic sequence you want to target from NCBI Gene or elsewhere (e.g. if you're targeting an intergenic region).*
  4. For knockouts, you generally want to introduce an indel as close to the 5' end of the coding region as possible. This will have the highest likelihood of creating a protein-destroying frameshift.*

Finding Guides

  1. Use one of many servers to find guide RNAs in the region you'd like to cut. For example, CRISPR-MIT, E-CRISP, or CHOPCHOP. Which tool you choose is mostly personal preference, and each has their own model for scoring guide RNAs.
  2. Each target site will either be ~23 bases ending in "GG" (guide binds Crick strand) or ~23 bases starting in "CC" (guide binds Watson strand). The protospacer adjacent motif ("PAM") refers to those last or first three bases and is present in the DNA you're targeting but should not be used in the guide RNA. Hence, the guide RNA itself will be ~20 bases and lack a PAM. SpyCas9 can also use "AG" (Watson "CT") as a PAM, but not as efficiently.
  3. The exact length of the guide doesn't seem to matter very much; anything from 17-27 bases (remember, guides don't include the PAM) seems basically OK (with some qualifiers).

Choosing a guide Now you have a (possibly very long) list of potential guides. Each one has an associated score. How do you choose which one to use? Here's a semi-ordered list of factors to consider, from most to least important. Consider these qualitative, rather than a quantitative score.

  1. For a knockout, it doesn't matter which strand the guide RNA binds, but CRISPRi guides should be complementary to the non-coding strand.*
  2. Guides should be perfectly complementary to the region you want to target in the 8-12 bases closest to the PAM.
  3. Never choose a guide that has any significant off-target sites (perfect match for the 8-12 bases closest to the PAM) in a coding of the genome.
  4. Never use a guide with >=3 U's in a row, since these sequences act as Pol III terminators. This is obviously not applicable if you are using a Pol II vector instead of the common U6 promoter vectors.*
  5. Prefer guides with a PAM of NGG instead of NAG.*
  6. Avoid sequences with significant secondary structure (The Vienna web server is a great place to check this). You should also avoid guides that disrupt the secondary structure of the 3' constant region (the most common constant region sequence is "GUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAUCAACUUGAAAAAGUGGCACCGAGUCGGUGCUUUUUU").
  7. GC content should be between ~ 30-80%, the higher the better (but not too high!).
  8. Avoid additional G's after the PAM. For example, a genomic sequence of "|AGG|CCAT" is probably OK (where "AGG" is the PAM). But "|AGG|GCAT" is probably not a good idea, and "|AGG|GGGG" is a definite no-no.*
  9. Some groups have shown that U's are disfavored in the -1, -2, and -4 position (counting back from the first base of the constant region). Other groups haven't seen this. Your mileage may vary.
  10. Prefer guides in DNAse hypersensitive regions (as annotated by ENCODE on the UCSC genome browser). This isn't a necessity, but probably won't hurt.
  11. It's recently been shown that microhomology at the site of cutting can substantially increase the chance of getting an out-of-frame indel. This doesn't affect cutting itself, but could help you get the knockout.

Construction of the final guide

  1. Take the guide sequence you chose above and append the constant sequence "GUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAUCAACUUGAAAAAGUGGCACCGAGUCGGUGCUUUUUU" to the 3' end.
  2. If your guide does not begin with a 5' G, just add one. This increases efficiency of transcription from the U6 promoter and does not need to be homologous to the region you're targeting.*
  3. Add cloning sites appropriate for the expression vector you've chosen. For example, if using the Zhang's lab pX330 vector, append "CACC" to the 5' end of the Watson strand and "AAAC" to the 5' end of the Crick strand.*
  4. Order oligos, anneal, and ligate.

The above might seem like a lot, but it's really not all that bad. You'll quickly get a feel for what makes good vs bad guides. Since it's so easy to test multiple guides, I always recommend making at least two guides per knockout you'd like to make. That way if one is a dud, you aren't caught flat-footed. Obviously, there are many *s in the list above, denoting steps that might be a bit different if your application or parts differ from SpyCas9 making a double-strand break for the purposes of a knockout. The toolbox is always expanding, so options abound! But hopefully the above provides a general idea of how to get started. Do you have another neat trick to share? Did I miss something important? Want to expound on the best way to make a CRISPRi guide (a whole other ball of wax)? Feel free to leave a comment!

X Close

Introducing the IGI Blog

Jacob Corn

Welcome to Making the Cut, the IGI blog about next-gen genome editing and regulation. The main page is for official announcements, news, job postings, that kind of thing. This area is for personal...


Welcome to Making the Cut, the IGI blog about next-gen genome editing and regulation. The main page is for official announcements, news, job postings, that kind of thing. This area is for personal thoughts on what's happening in this incredible, crazy field. As well as where it should go in the future, where it's been in the past, and so on. Posts will be from myself (Jacob Corn, scientific director of the IGI), as well as from scientists,  postdocs, and students in the lab. To encourage openness and discourage doublespeak, all posts are the uncensored views of the author. Comments are open as well, so please be civil. (I know, asking people on the internet to be civil is like carrying coals to Newcastle, but still...)

X Close



March 12, 2020 0 Comments

Welcome to Lena

Welcome to Lena Kobel, who joins the lab as a Cell Line Engineer. Lena has a long history in genome engineering, with previous experience in Martin Jinek’s...

October 16, 2018 2 Comments

Bootstrapping a lab

Today I’m going to talk about setting up a lab from a 10,000 foot view. I got thinking about this because my social media feed was recently filled with people announcing...

June 12, 2017 1 Comment

Shapers and Mechanists

There’s a series of cyberpunk short stories and a book written in the 1980s by Bruce Sterling called The Schismatrix. It centers around two major offshoots...

June 1, 2017 1 Comment

Backpacking season

It’s important to spend time outside the lab. And before you ask, that’s not why the blog has been dormant. I was teaching this last semester (a general biochemistry...

November 9, 2016 0 Comments

Sequence replacement to cure sickle cell disease

My lab recently published a paper, together with outstanding co-corresponding authors David Martin (CHORI) and Dana Carroll (University of Utah), in which...

September 12, 2016 1 Comment

Improved knockout with Cas9

Cas9 is usually pretty good at gene knockout. Except when it isn’t. Most people who have gotten their feet wet with gene editing have had an experience like that...

August 29, 2016 0 Comments

Safety for CRISPR

This post is all about establishing safety for CRISPR gene editing cures for human disease. Note that I did not say this post is about gene editing off-targets....

July 5, 2016 0 Comments

CAR-Ts and first-in-human CRISPR

(This post has been sitting in my outbox for a bit thanks to some exciting developments in the lab, so excuse any “dated” references that are off...

May 25, 2016 0 Comments

CRISPR Challenges – Imaging

This post is the first in a new, ongoing series: what are big challenges for CRISPR-based technologies, what progress have we made so far, and what might we look...

May 17, 2016 0 Comments

Ideas for better pre-prints

A few weeks ago, Jacob wrote a blog post about his recent experience with posting pre-prints to bioRxiv. His verdict? “…preprints are still an experiment rather...

Contact Us

Questions and/or comments about Corn Lab and its activities may be addressed to: