Monday, April 6, 2009

Getting the most out of n=1

They say studies with n=1 are problematic -- larger numbers smooth out idiosyncracies in the data, after all. Sometimes, though, it pays not to be average(d).

In a study published today in Nature Methods [doi:10.1038/nmeth.1315] M. Azim Surani of the University of Cambridge, UK, and Kaiqin Lao of Applied Biosystems, and colleagues report the transcriptomic analysis of a single cell -- specifically, one of four cells in a mouse embryo (as well as normal and mutant oocytes).

Using so-called next-generation DNA sequencing (with Applied Biosystems' SOLiD system) the team obtained about 110 million reads, or about 5 gigabases' worth of data, from a single cell (that's n=1, for those not statistically inclined). Each of those reads was a mere snippet of DNA, just 35 or 50 bases long. Each represents one messenger RNA, and from the resulting data, the team was able to map the expression characteristics of that one individual cell.

Among the findings arising from this single blastomere:
  • 61.4% (11,920 / 19,400) of genes were expressed;
  • "about 335 genes (19% of all known genes with at least two known isoforms)" expressed more than one splicing isoform concurrently in the same cell;
  • at least 1,753 novel splice junctions were identified;
  • and, by comparing their analysis to microarray studies of 320 theoretically identical, pooled cells (80 4-cell mouse embryos), 5,270 more genes were expressed than could be detected by the array, including 1,027 that the array doesn't test.
  • Using wild-type and mutant oocytes, the team identified those genes that were up- or down-regulated as a result of those mutations -- again, at the single-cell level.
That's a lot of data for one cell -- and, it would be completely overlooked by traditional biochemical strategies. Take that second bullet point, for instance. Without a single-cell approach, how else could you tell if multiple splice isoforms are present in the same cell? You can't -- batch-based analyses would tend to average the results. Both transcripts would be seen, but it wouldn't be possible to know if they coexist, or exist in different cells.

The fourth bullet puts an exclamation point, as it were, on the power of the study. It demonstrates both the sensitivity of digital RNA analysis (that is, its ability to detect low abundance transcripts) and the shortcoming of microarrays -- namely, that arrays can only detect what they were designed to find -- in one go.

I'm excited about the possibilities this study opens up. Here's one: from a single fertilized mouse egg, an entire body plan emerges. That body plan has anterior, posterial, dorsal, and ventral "sides," and it doesn't take long in development for those differences to become obvious. Wouldn't it be neat to study each cell at the two, four, eight, and sixteen cell stages, to see precisely when those changes, which initially are morphologically invisible, emerge?

Tuesday, March 10, 2009

Busted!, by DNA

For most people, DNA forensics conjures up some sort of CSI caper linking a suspect to the scence of the crime. But as a new paper demonstrates, there's more than one way to get busted by DNA.

In a March 9 paper in the Journal of the American Chemical Society [doi:10.1021/ja806531z], Hyongsok (Tom) Soh of the University of California, Santa Barbara, and colleagues describe using folding DNA molecules called aptamers to develop a continuous, real-time cocaine-monitoring system.

An aptamer is a short nucleic acid that can adopt one of two shapes (conformations) depending on the presence or absence of whatever molecule it is designed to bind. Many such molecules have been created, targeting a variety of mostly small molecules; in this case, the aptamer changes shape in the presence of cocaine. Bound at one end to a gold electrode, and at the other to an electron donor, the conformational change that accompanies cocaine binding brings the electron donor into close proximity with the electrode, producing a measurable current.

But it takes more than just a recognition event to make a good sensor. As the authors note, "a sensor needs not only be sensitive, stable, and selective enough to deploy directly in complex sample matrices, but it also needs to be reagentless, regenerable, and able to respond rapidly relative to the time scale with which the target concentration fluctuates."

The device described here, which Soh et al.
call an E-AB (electrochemical, aptamer-based sensor) appears to fit the bill. The team used spiked samples to demonstrate that their E-AB could detect as little as 10 micromolar cocaine in fetal calf serum, a relatively complicated (and real-world) sample. Increasing levels of cocaine rapidly (within minutes) produced increasing currents, and washing out the cocaine dropped the current back to baseline. Thus, the system is fast, sensitive, and regenerative. It's also reagentless -- the sensor can operate on its own, without intervention -- and stable, at least for a while anyway. The authors do observe some signal loss after a few hours exposure to flowing serum, which they attribute to protein deposition on the sensor surface.

To make this aptamer-electrode combo into a working sensor, Soh's team incorporated it into a small microfluidic device. Microfluidics is the science of microscale fluid dynamics, and it is playing an ever larger role in biotech and pharma. One particularly promising application is with so-called "point of care" tests -- miniature diagnostics that can be run at the bedside ... or in a police station.

"Here," the authors note,
"microfluidic technology provides an important advantage," in that it delivers the serum to the sensor rapidly (withn 5 seconds), yielding detection speeds "which [compare] favorably with the minutes-to-hours physiological time scale of the uptake, action and metabolism of the drug."

So, assuming researchers can work out all the kinks, and if the sensor proves robust, cocaine junkies could find themselves ... Busted!, by DNA.

Monday, February 23, 2009

The end of bisulfite sequencing?

A new report in Nature Nanotechnology [doi:10.1038/nnano.2009.12, Feb. 22, 2009] could spell the beginning of the end for bisulfite DNA sequencing.

Hagan Bayley of Oxford University and colleagues provide proof-of-concept data demonstrating the feasibility of DNA sequencing via protein nanopores, based on the differential current each nucleotide -- A, C, G, or T -- induces.

Nanopore-based sequencing is a serious departure from other so-called "next-generation" DNA sequencing strategies, including those from 454 Life Sciences, Illumina, and Life Technologies (Applied Biosystems). For one thing, those systems all use PCR to amplify the template prior to sequencing. They also all involve some variation on the idea of repeatedly extending surface-bound DNA, whether via DNA polymerase or DNA ligase, to determine the order of its bases. In contrast the Oxford method (which is being commercialized by Oxford Nanopore) involves sequential degradation of the DNA template.

Daniel MacArthur at Genetic Future has a nice summary of the technology, with video goodness. As GenomeWeb reported earlier today:
Each of the DNA bases causes a characteristic current disruption as it moves through the nanopore. That allows continuous sequencing without fluorescent labeling. Oxford Nanopore's BASE technology relies on a protein nanopore coupled to a processive exonuclease enzyme that cleaves DNA bases from the overall DNA strand and places them in the nanopore.
The approach has considerable promise, not least because it is both single-molecule (like Helicos and Pacific Biosciences) and label-free (like 454). But perhaps its most interesting aspect is that the authors were able to discriminate not only A from C from G from T, but also from 5-methyl-cytosine, "the 'fifth base' " of DNA.

5-MeC is a crucial regulator of gene activity; highly expressed genes are typically undermethylated, while transcriptionally silent regions are relatively overmethylated. Researchers often want to identify which cytosines are methylated and which are not, but most sequencing methods cannot tell the two apart.

Enter bisulfite sequencing. Bisulfite sequencing is a chemical method that converts cytosine to uracil (a component of RNA), while leaving 5-MeC alone; thus, by comparing bisulfite-converted DNA to untreated DNA, researchers can determine which cytosines were modified in the original sample.

Simple in theory, but a pain in practice, bisulfite sequencing both damages the DNA and makes it harder to read, because it reduces a system with four-base complexity down to three. Even barring those issues, the conversion process adds additional steps to the sequencing procedure, decreasing efficiency. Nevertheless, researchers have developed a technique, called BS-Seq, that uses bisulfite chemistry to identify methylated cytosines across the genome.

Now, if nanopore sequencing works in practice as its developers hope it will, the sequencing community may no longer have to worry about such things.

Wednesday, February 11, 2009

UMich team identifies new prostate cancer biomarker

Interesting news out of the University of Michigan today. Arul Chinnaiyan and colleagues report in the journal Nature [new link added 16 Feb: http://www.nature.com/nature/journal/v457/n7231/abs/nature07762.html] that sarcosine, a derivative of the amino acid glycine, appears to be a urine-based biomarker of malignant prostate cancer [12 February 2009, doi:10.1038/nature07762].

Bottom line: It might one day be possible to get tested for prostate cancer by peeing into a cup. That certainly beats the annual "digital-rectal" exam, I'd say, not to mention prostate-specific antigen (PSA) testing.

Indeed, you might be asking, why do we need another prostate cancer indicator if we already have PSA? PSA is a blood test, not a urine test, and PSA isn't that good of a marker anyway -- lots of things can make PSA levels fluctuate, only one of which is cancer. In contrast, sarcosine distinguishes between benign, malignant, and metastatic disease, meaning it can help differentiate between those patients whose enlarged prostates are nothing to worry about and those who need to see an oncologist, stat!

But what most interests me about this study is the way it was done. [Update 16 Feb: The article is a technical tour-de-force, combining mass spectrometry, RNA interference, cell motility assays, and chromatin immunoprecipitation, among other techniques.]

Chinnaiyan is the brains behind Oncomine, a cancer gene-expression database that he once described to me as "like a Google for high-throughput gene-expression data in cancer." Using Oncomine, researchers can probe different forms of cancers to see which genes are turned up or down relative to normal tissues.

In this study, Chinnaiyan's team mostly skipped the gene-expression work in favor of metabolomics -- mass spectrometric profiling of all the small molecule metabolites in both normal and cancerous biopsies, urine, and serum. In all, the team surveyed 1,126 metabolites in 262 samples, asking which are more abundant or less abundant in cancerous vs normal samples, before hitting on sarcosine as the metabolite to beat.

Then, they demonstrated that sarcosine is not an accidental biomarker, but one that seems directly related to disease progression. Exposing non-cancerous cells to sarcosine in culture, or blocking its degradation, causes them to become more invasive; inhibiting sarcosine production reduces that behavior. And, expression of the enzymes that direct sarcosine production are directly controlled by two "key mediators of prostate cancer progression," including androgen (testosterone), the authors note.

"Thus," the authors conclude, "components of the sarcosine pathway may have potential as biomarkers of prostate cancer progression and serve as new avenues for therapeutic intervention."

Tuesday, February 10, 2009

Life in the cloud

Forgive the diversion from biotech for a moment, but yesterday I achieved total nerdvana.

I use a Mac and an iPhone, and my love for them is, shall we say, extreme. But it irritates me, ever so slightly, that I have to sync my phone with my computer in order to keep my calendars and address books up to date on both devices. I mean, isn't this what the internet, WiFi, and cell phone signals are for?

Yesterday, however, I read on my favorite productivity blog, lifehacker, about Google Sync, a service that allows you to keep your mobile calendar and address book in real-time sync with Google Calendar and GMail's contact list. It's a form of cloud computing, like Apple's MobileMe service, but it's free, see?

Google Sync was a bit of a pain to set up, I admit -- not all of my calendars were syncing, and my two contact lists got a bit jumbled -- but not too bad. I got it sorted out pretty quickly.

Now, the next time I go to the doctor and put an appointment in my phone, my desktop calendar will automatically know it. And I won't have to worry that my phone contacts aren't up to date, either. In other words, I feel like the phone is finally working like I would expect it to.

There was a time, not so long ago, when I was using Thunderbird for my e-mail, iCal for my calendaring, Address Book for my contact lists, and a kick-ass plain-text to-do list manager to keep my tasks organized. Problem was, using four different applications is annoying, and Thunderbird doesn't use Apple's Address Book. Plus, there was no way to get my to do list onto my phone -- that's #3 on my list of things that drive me crazy about the iPhone: #1, no copy-and-paste; #2, no LED to indicate you've missed a call; #3, no way to sync notes between the computer and the phone.

Then I discovered (again, thanks lifehacker!) Remember the Milk, a cloud-based to-do list manager with a snazzy companion iPhone app. (Remember the Milk is free, but you have to pay an annual subscription fee of $25 to use it with the mobile app -- it's worth it IMO). Now my to-do lists are up to date and with me wherever I go.

But, I didn't want to have to log into the RTM service to view and update my list -- I wanted something like the Microsoft Outlook task manager, right there with my email. There is a way to integrate RTM with Thunderbird, but it is very much in alpha testing at the moment, and kept flaking out. This, combined with occassional random issues I had with Thunderbird, convinced me it was time to jump ship.

So, I switched my email to GMail (using GMail as an email aggregator), imported my calendars into GCal, and downloaded the Remember the Milk for Firefox extension, and voila! Now, in one Firefox tab, I can see my mail, my calendar, and my task list. It's like Outlook, only, you know, better.

Now fold in the Google Sync thing and computationally, I'm thrilled. I have achieved nerdvana in a state of computopia.

Wednesday, February 4, 2009

The power of cryo compels you...

The Lab Tools feature in the Feb. 2009 issue of The Scientist, by yours truly, dives into the subject of cryo electron microscopy, or cryoEM. [23[2]:56]. CryoEM is an up-and-coming player in the world of structural biology; the technique accounts for just 216 of 55,660 structures in the RCSB Protein Data Bank, all but two of them solved since 2000.

Now, two papers have been published in quick succession that reinforce the power of the technique. The first, "The native 3D organization of bacterial polysomes," [Brandt et al., Cell, 136[2]:261-271, Jan. 23, 2009], uses cryo-electron tomography to solve the structure of polysomes -- massive supramolecular complexes comprising several ribosomes translating individual mRNA molecules into protein. The second article, "Visualization of a missing link in retrovirus capsid assembly," [Cardone et al., Nature, 457:694-699, Feb. 5, 2009], employs single molecule reconstruction to understand how retroviral protein coats are formed.

CryoEM occupies an interesting sort of middle ground in structural biology. It provides lower resolution than x-ray crystallography, yet requires neither purification to homogeneity nor crystallization. It lacks the dynamic information of NMR, yet again, requires much less sample. As these two most recent studies show, cryoEM may be the new kid on the structural block, but it's clearly got the chops to stay.

Thursday, January 29, 2009

Proteomics gets a new acronym!

Good news, protein jockeys, you have a new acronym.

The Jan. 28 issue of the Journal of the American Chemical Society reports a study by Michigan State University chemist Gavin Reid detailing femtosecond laser-induced ionization/dissociation, or fs-LID, a new mass spectrometry fragmentation technique for polypeptide analysis. [J. Am. Chem. Soc., 2009, 131 (3), pp 940–942, DOI: 10.1021/ja8089119]

Mass spectrometry is the workhorse of proteomics researchers -- those who seek to identify and characterize all or most of the proteins in a cell, subcellular compartment, biochemical pathway, or large macromolecular machine. To identify each protein, researchers generally fragment them, like shattering a vase, in order to identify its component pieces, from which they can infer the original sequence of amino acid building blocks.

The most common approach is collision induced dissociation, or CID, which uses collisions between the protein molecule under study and another molecule (often helium gas) to provide the energy to break apart the protein. The problem is, CID can yield incomplete fragmentation patterns, resulting in gaps in the protein sequence. Just as importantly, it doesn't do so well with proteins that have molecular adornments, called post-translational modifications, which can modulate their activity.

As researchers often are keenly interested in these adornments, they have devised several alternative fragmentation methods to deal with this problem, including electron capture dissociation (ECD), electron transfer dissociation (ETD), ultraviolet photodissociation (UVPD), and high-energy collisional dissociation (HCD). Each has its strengths and weaknesses -- some only work on multiply charged protein ions, for instance, or require special hardware.

And now we have fs-LID, designed, the authors write, "to overcome these limitations." Here's how it works: instead of fragmenting ions with collisions or intermolecular reactions, fs-LID uses short (less than 35 femtosecond) laser pulses to break apart the protein backbone.

The authors tested the process on a variety of singly and multiply charged protein ions, with and without protein modifications. According to a news brief in Chemical and Engineering News [87(3), Jan. 19, 2009], "fs-LID provided equivalent or better sequence ion coverage, including the less-common c-, x-, and z-type product ions, compared with collision-induced dissociation."

Why should we care? Well, as the study authors conclude:

Although further optimization of the fs-LID technique, and statistical evaluation of the fs-LID fragmentation behavior compared to that observed by CID or ECD/ETD, will require the acquisition of data from a significantly larger number of peptides, the initial results outlined above suggest that fs-LID is a viable alternate ion activation strategy for peptide sequence and modification analysis, with great promise for improving the capabilities of tandem mass spectrometry methods for comprehensive proteome analysis, particularly for the sequence analysis and characterization of singly protonated peptides (i.e., those formed by MALDI), where alternate dissociation methodologies to CID are currently lacking.

In other words, should fs-LID prove its mettle in further studies, it could overcome a key limitation of what is perhaps the single most prevalent ionization source used in biological mass spectrometry today, the MALDI (matrix-assisted laser desorption ionization) mass analyzer.