10.21433/GZMW-TD47
French, Courtney
Wei, Gang
Brenner, Steven
Transcriptome Analysis Reveals Extensive Alternative Splicing-Coupled
Nonsense-Mediated mRNA Decay in a Human Cell Line
UC Berkeley
2015
RNA-seq
Nonsense-mediated mRNA decay
Alternative splicing
2015-09-25T10:03:56+00:00
Collection
240593770
Creative Commons Attribution 4.0 International (CC BY 4.0)
To further explore the regulatory potential of nonsense-mediated mRNA
decay (NMD) in human cells, we globally surveyed the transcripts targeted
by this pathway via RNA-Seq analysis of HeLa cells in which NMD had been
inhibited. We first identified those transcripts with both a premature
termination codon more than 50 nucleotides upstream of an exon-exon
junction (50nt rule) and a significant increase in abundance upon NMD
inhibition. Remarkably, at least 2,793 transcripts derived from 2,116
genes are physiological NMD targets (9.2% of expressed transcripts and
>20% of alternatively spliced genes). Our analysis identifies
previously inferred unproductive isoforms and numerous previously
uncharacterized ones. NMD-targeted transcripts were derived from genes
involved in many functional categories, and are particularly enriched for
RNA splicing genes and ultraconserved elements. By investigating the
features of all transcripts impacted by NMD, we find that the 50nt rule is
a strong predictor of NMD degradation while 3’ UTR length generally has
only a small effect in human cells. Additionally, thousands more
transcripts without a premature termination codon upstream of an exon-exon
junction in the main coding sequence contain a uORF and display
significantly increased abundance upon NMD inhibition indicating
potentially widespread regulation through decay coupled with uORF
translation. Our results support the hypothesis that alternative splicing
coupled with NMD is a prevalent post-transcriptional mechanism in human
cells with broad potential for biological regulation.
HeLa cells were inoculated on plates in Dulbecco’s MEM medium with 0.1 mM
non-essential amino acids and 10% fetal bovine serum, incubated at 37°C
and 5% CO2. Plates at 80% cell confluency were transfected with plasmids
pSUPERpuro-hUpf1/II and pSUPERpuro-Scramble, whose functions were to
knockdown UPF1 and act as a mock control, respectively. Transfections were
performed using Lipofectamine™ LTX and PLUS™ Reagents (Invitrogen)
according to the manufacturer’s protocol, and the following culture
according to the published method (Paillusson et al., 2005). The whole
cell lysates were prepared in 1% sodium dodecyl sulfate and incubated at
100 °C for 5 min, then centrifuged at 12,000 rpm for 10 min. Total RNA was
extracted using the QIAGEN RNeasy® Mini kit according to the
manufacturer’s manual. Directional and paired-end RNA-seq libraries were
constructed according to the protocol published on the Illumina website
(www.illumina.com), with a few changes: The adapters were prepared
according to the reported methods (Vigneault et al., 2008). The PCR
process to prepare the library was divided in two steps. In the first
step, 3 cycles of PCR were performed (according to the protocol) to
prepare the library template, and then the library was run on a 2% agarose
gel, fragments of desired size were cut out and isolated by QIAquick® Gel
Extraction Kit. A second round of PCR (12 cycles) was performed to enrich
the library and then it was purified twice with Agencourt AMPure XP kit as
suggested by the Illumina protocol. The libraries were then assayed by
Agilent 2100 BioAnalyzer. These RNA-Seq libraries were prepared from cells
with inhibited NMD and control cells for two biological replicates. One
biological replicate was sequenced on an Illumina GAIIx machine and the
other on HiSeq 2000.
Paired end reads for each library were aligned to the NCBI human RefSeq
transcriptome (Pruitt et al., 2009) with Bowtie (Langmead et al., 2009) to
determine the average insert size and standard deviation, required as a
parameter by TopHat (Trapnell et al., 2009). The reads of each library
were then aligned to the human genome (hg19 assembly, Feb. 2009;
downloaded from UCSC genome browser (Fujita et al., 2011)) using TopHat
v1.2.0 with default parameters plus the following: --coverage search,
--allow indels, --microexon search, and --butterfly search. Cufflinks
1.0.1 (Roberts et al., 2011; Trapnell et al., 2010) was used to assemble
each set of aligned reads into transcripts with the UCSC known transcript
set (Fujita et al., 2011) as the reference guide, along with the following
parameters: --frag-bias-correct, and --multi-read-correct. Cuffcompare (a
sub-tool of Cufflinks) was used to merge the resulting sets of assembled
transcripts. Each junction was assigned a Shannon entropy score based on
offset of spliced reads across all four libraries. Transcripts with a
junction that had an entropy score <1 and was not present in the
reference annotation were filtered out. Cuffdiff (a sub-tool of Cufflinks)
was used to quantify and compare transcript abundance (measured by FPKM,
Fragments Per Kilobase per Million reads) between the UPF1 knockdown and
control samples. For each sample, the reads from two biological replicates
were provided. The following parameters were used: --frag-bias-correct and
--multi-read-correct. Only transcripts with FPKM>1 in either the
control or UPF1 knockdown sample were used for further analysis. A
transcript was called significantly more abundant in the UPF1 knockdown
sample if Cuffdiff called it significantly changing and the fold change
was greater than 1.5x. Significantly decreased transcript abundances were
determined in the same way. For each transcript, the coding sequence (CDS)
was determined as described in the Supplementary Methods. A coding
sequence was defined to terminate in a premature stop codon (PTC50nt) if
it stops at least 50 nucleotides upstream of the last exon-exon junction
(50nt rule in mammals (Nagy and Maquat, 1998)). NMD targets were defined
as those transcripts with both a PTC50nt and significantly increased
expression abundance in NMD inhibited (UPF1 knockdown) cells. The
transcripts must also increase in each biological replicate when analyzed
independently and come from a gene with a non-PTC50nt-containing isoform
with FPKM>0. To obtain a more reliable list of NMD-targeted
transcripts, only those transcripts that adhered to either of the
following criteria were kept: 1) No non-PTC50nt-containing isoform from
the gene was more than 1.2-fold higher in the NMD inhibited sample, or 2)
the PTC50nt-containing isoform increased at least 2x more than the sum of
all non-PTC50nt-containing isoform FPKMs from the gene in NMD inhibited
cells.