Coding for the wet lab

Large datasets are becoming the norm in modern biology, but I still see wet-lab students and postdocs who are stymied by anything that can’t be accomplished in Excel. Should every grad student learn how to write code? Probably not. But those who know even the basics of programming will find that their lives are oh so much easier. Not only will they be able to automate analyses to turn a daunting (or impossible) task into an easy afternoon’s work, they’ll also be able to “speak the language”, which will give them a leg up in describing problems and brainstorming solutions with dedicated computational biologists.

There are some nice pieces about this, but let me provide a few examples that should be more concrete to those involved in genome editing.

  • Let’s say you’re doing a CRISPRi screen. You’ve done some sequencing, removed constant regions, and now you have a file containing a huge number of protospacers. These should all map to guides and associated genes, which are listed in another file with the format <gene> <guide sequence>. You can write a very simple program to map guide sequences to genes and count the number of times each one appears.
  • Or maybe you’ve just edited a gene and you’re interested in the distribution of indels around the cut site. It’s not difficult to record the frequency of each insertion and deletion from the millions of reads in a next gen sequencing experiment.
  • Even more simple: You want to target a whole bunch of genes in a pathway, so plan to make 50 guides targeting 25 distinct genomic regions. You can easily automate the design all of the primers needed for the T7E1 assays.

Of course, you must also be aware of the necessity for good statistics. The ability to automate tasks is no replacement for a good computational biologist or biostatistician, who can analyze whether measurements are actually significant. This is a good example of knowing enough to get you into trouble; be aware of when to seek out an expert! But a few hours’ work on a program can save days of trying to process data with inappropriate tools (e.g. Excel).

Personally, I think python is the way to go for most simple programming tasks. It’s reasonably fast, pretty easy to learn, deals well with text, has some good biology-oriented modules (e.g. Biopython), and has a growing user base (making it easy to find help). Python certainly isn’t the fastest, but the focus here is on writing quick one-offs to get a job done rather than making an ultra-fast production tool. General programming knowledge is probably sufficient for most tasks, and here are a few resources to get started:

To illustrate how simple a program can be, here’s a script to accomplish the CRISPRi (or any guide library) guide-gene mapping/counting tasks described above. It’s obsessively documented, to try to show how this can be quite straightforward. It is also written for ease of understanding by a beginner rather than speed or conciseness. Writing this took about 20 minutes.

 

Jacob Corn

Jacob Corn is the Scientific Director of the IGI and faculty at UC Berkeley. Follow him on twitter @jcornlab.

SUBMIT A COMMENT

Your email address will not be published. Required fields are marked *

Filters

Latest Blog Posts

October 16, 2018 2 Comments

Bootstrapping a lab

Today I’m going to talk about setting up a lab from a 10,000 foot view. I got thinking about this because my social media feed was recently filled with people announcing...

June 12, 2017 1 Comment

Shapers and Mechanists

There’s a series of cyberpunk short stories and a book written in the 1980s by Bruce Sterling called The Schismatrix. It centers around two major offshoots...

June 1, 2017 1 Comment

Backpacking season

It’s important to spend time outside the lab. And before you ask, that’s not why the blog has been dormant. I was teaching this last semester (a general biochemistry...

Blog Archive

Tweets