Wen's Project

  • Abstract

In Arabidopsis, AGO4 functions as the effector  in the RNA directed DNA methylation pathway (RdDM). The requirement of small RNAs bound to AGO4 is important for AGO4 protein stabilization.  AGO4 protein is completely reduced in pol iv or rdr2 mutant, yet only partially reduced in dcl3 mutant. Bioinformatic analysis showed all 20-26nt sRNAs except 23- and 24-nt sRNAs are increased in dcl3 mutant, while 22- and 26-nt sRNAs are reduced in dcl2 dcl3 dcl4. Meanwhile, sRNAs originated from five prime UTR are highly increased in dcl3 and decreased to col-0 level in dcl2 dcl3 dcl4. These results indicate the potential sRNAs that might bind and further stabilize AGO4 in dcl3 mutant.

  • Introduction

Eukaryote organisms can apply small RNAs to regulate gene regulation, either at transcriptional or post-transcriptional level. In plants, a major group of siRNAs produced by one of the two plant specific polymerases, Pol IV, and thus are named Pol IV-dependent siRNAs, are applied to induce de novo DNA methylation, which further induce transcriptional gene regulation. Within this RNA-dependent DNA methylation (RdDM), siRNAs originate from the single strand RNA transcribed by Pol IV, which is complemented by RDR2. Afterwards, DCL3 is in charge of cutting the long double strand RNAs into 24-nt double strand siRNAs, which further load to the effector AGO4 and lead the downstream DNA methylation pathway. siRNAs loading seems to be very important for stability of AGO4 protein, as AGO4 is completely eliminated in either pol iv or rdr2 mutant (Havecker et al, 2010). However, AGO4 is only partially reduced in dcl3 mutant, which indicates capability of loading by alternative siRNAs.

In Arabidopsis, there are four DCL proteins, named DCL1-4. While DCL1 is in charge of the production of microRNAs, other three DCL proteins can produce different length of siRNAs. Previously global DNA methylation assay shows that dcl3 mutant has minor decrease in global DNA methylation, while dcl2 dcl3 dcl4 triple mutant shows significant decrease (Stroud et al, 2013), which indicate siRNAs produced by DCL2 and DCL4 are also functional in DNA methylation. Therefore,  siRNAs of different sizes produced by DCL2 and DCL4 may also load to AGO4. To find out the answer to this possibility, we examined the siRNA profiles produced by high throughput sequencing in col-0, dcl3, and dcl2 dcl3 dcl4 mutants. Trying to find out the difference of siRNAs either between col-0 and  dcl3 or between dcl3 and dcl2 dcl3 dcl4, I grouped the siRNAs based on size of siRNA populations as well as the origin of the siRNAs.

  • Results

Changes of abundance of sRNAs with different sizes between dcl3-1 and dcl2 dcl3 dcl4:

Occurrences of different sizes of small RNAs (20-26 nt ) from col-0, dcl3-1, and dcl2 dcl3 dcl4 mutants have been counted, normalized to total number of all small RNAs, and averaged when repeats are available. Percentages each size of small RNAs takes were compared with each other as shown in the pie charts (Fig. 1). In col-0, 24-nt sRNAs take about half of the whole small RNAs profile (51% and 41%, respectively), while all other small RNAs (21-23, 25-, and 26- nt) make around 30% and 35% respectively. The ratios between other sRNAs and 24-nt sRNAs are around 60% and 85%.  Quite differently, in dcl3-1, this ratio increased to 10, and in dcl2 dcl3 dcl4 triple mutant is around 2.2. The increased ratio might be due to the decrease of 24-nt in the mutants for loss of function of DCL3, or perhaps  because the rest sRNAs increased.

 

Figure 1. Pie charts show the composition of each small RNA file after mapping to the genomic DNA. 


Since dcl3-1 and dcl2 dcl3 dcl4 samples are examined by independent two groups, to detect the relative abundance of each size of small RNAs in different mutants, the percentages of small RNAs from mutants were normalized to those from each col-0 (Fig. 2).  In both  dcl3-1 and dcl2 dcl3 dcl4, 24-nt sRNAs are significantly reduced, compared to col-0. Unexpectedly, the reduction of 24-nt sRNAs in  dcl2 dcl3 dcl4 is less than in  dcl3-1. Compared to col-0,  20-, 22-, and 26-nt sRNAs are significantly increased in dcl3-1 , while 21- and 25--nt sRNAs are less increased too. Compared to dcl3-1, 22- and 26-nt sRNAs are reduced in dcl2 dcl3 dcl4, indicating that DCL2 and DCL4 might produce 22- and 26-nt sRNAs that can bind and stabilize AGO4. Interestingly, when all DCL2-4 are absent, 20-nt and 21-nt sRNAs have been significantly increased compared to col-o and dcl3-1, especially 20-nt sRNAs.

sRNAsdcl3/col-0dcl2_3_4/col-0
20nt1.9239.54
21nt1.423.53
22nt2.21.54
23nt0.650.57
24nt0.110.36
25nt1.41.2
26nt2.420.91

 

Figure 2. Distribution of  sRNAs with different sizes in dcl2 dcl3 dcl4 and dcl3-1 mutants, normalized to col-0. 


Changes of abundance of sRNAs with different origins between dcl3-1 and dcl2 dcl3 dcl4:

Occurrences of  small RNAs originating from different genomic features in col-0, dcl3-1, and dcl2 dcl3 dcl4 mutants have been counted, normalized to total number of all small RNAs, and averaged when repeats are available. Percentages  small RNAs originating from genes, transposable elements, and intergenic areas are compared with each other as shown in the pie charts (Fig. 3). In both dcl3-1 and dcl2 dcl3 dcl4 mutants, compared to col-0, abundance of small RNAs from genes increased, which is more significantly in dcl2 dcl3 dcl4 mutant than in dcl3-1 mutant. Meanwhile, small RNAs from the other sources decreased in both mutants, compared to col-0. 

Figure 3. Pie charts show the composition of each small RNA file after detection the origin of the small RNAs from different genomic feature.

To closely look at the abundance changes in the two mutants, small RNAs with different sources in each mutant are normalized to those in col-0 (Fig. 4). Consistently, small RNAs originated from intergenic and transposable element regions have slightly reduced abundance in both mutants, while the abundance of gene derived small RNAs are much higher in dcl2 dcl3 dcl4 than in dcl3

To further detect the source of the increased small RNAs in the gene region, we compared small RNAs numbers from exon, five prime UTR, three prime UTR, intron, as well as the whole coding area among col-0 and the two mutants. Compared to col-0, dcl3 shows higher abundance of sRNAs originated from exon and even higher from five prime UTR. And when compare dcl2 dcl3 dcl4 and dcl3, sRNAs from five prime UTR are largely reduced to the level as in col-0, indicating the production of sRNAs from five prime UTR by DCL2 and DCL4 contribute the loading to and stabilization of AGO4. 

Figure 4. Distribution of sRNAs with different intragenic origins in dcl2 dcl3 dcl4 and dcl3-1 mutants, normalized to col-0. 

  • Methods

Sequences:

FASTA files of col-0 genomic DNA sequences, including mitochondria and chloroplast, and annotation file (TAIR10) are downloaded from TAIR website. FASTQ files of three repeats of small RNAs produced from col-0 or dcl3-1 mutant flowers are downloaded form NCBI GEO website (GSE62801, Groth et al, 2014).  Also, FASTQ files of small RNAs produced from col-0 and dcl2 dcl3 dcl4 triple mutant flowers are downloaded (GSE49866).

Adapter trimming:

 3-terminal adapters are trimmed from small RNAs sequences by AlienTrimmer. Adapter used: for col-0 and dcl3 small RNAs, 5-TGGAATTCTCGGGTGCCAAGGAACTCCAGT-3; for col-0 and dcl2 dcl3 dcl4 small RNAs, 5-ATCTCGTATGCCGTCTTCTGCTTGAC-3. Command lines used:

$ java -jar AlienTrimmer.jar -i sRNAs.fastq -c adaptor.txt -o sRNAs_trim.fastq -k 10 -q 0

Adaptor.txt is written in fasta format. -q refers to sequencing quality score cut-off, 0 is assigned to -q to keep all nuclear tides, which is important for counting sRNA length. 

Mapping: 

All trimmed small RNA fastq files are subject to mapping by using bowtie2 in iPlant. 

Counting the length of small RNAs:

small RNAs sequences were extracted from the .sam files resulted from Mapping, using the command line:

$ cut -f9 sRNAs.sam > sRNAs_sequence.txt

Command lines were used to count length of each sequence:

$ awk '{ print length ($0);}' sRNAs_sequence.txt > sRNAs_length.txt

Command lines were used to count occurence of a specific length (20-26 nt) of sRNAs:

$ grep -c '21' sRNAs_length.txt

Overlapping small RNAs to the genomic features:

Bowtie2 output .sam files of small RNAs were converted to .bam files using SAM_to_sorted_BAM tool in iPlant discovery environment, and then were further converted to .bed files by Convert_from_BAM_to_BED tool online at GALAXY project (https://usegalaxy.org/). 

Each feature is extracted from .gff anotation file of col-0 genome, using the command lines:

$ grep -s 'feature' TAIR10_GFF3_genes_transposons.gff > feature.gff

The resulted feature.gff are converted into .bed files using GFF-to-BED converter online at GALAXY project.

Intron and intergenic features are produced by using bedtools substract, e.g.:

$ bedtools substract -a mRNA.bed -b exon.bed > intron.bed

Overlapping between small RNAs and each genomic feature was detected by using the bedtools intersect:

$ bedtools intersect -c -a feature.bed -b sRNA.bed > feature_sRNA_intersect.bed

-c will count the number of overlaps for each feature in feature.bed with sRNA.bed.

The column containing number of overlaps were extracted by the command lines:

$ cut -f7 CDS_col-0_intersect.bed > CDS_col-0_intersect_clm7.bed

And to sum up the overlap numbers, command lines were used:

$ awk '{sum += $1} END  {print sum}' CDS_col-0_intersect_clm7.bed

  • Discussion

To detect the sRNAs really loaded to and stabilize AGO4, the idea samples would derive from IP for the AGO4 and high through sequencing with the sRNAs loaded to AGO4, which unfortunately are unavailable from updated literature. To find out the sRNAs that can potentially bind to AGO4 and stabilize AGO4, I used the sRNA high through sequencing files produced from col-0 and dcl3 flowers, or from col-0 and dcl2 dcl3 dcl4 flowers. Grouping based on sRNA sizes, all 20-26nt sRNAs except 23- and 24-nt sRNAs are higher abundant in dcl3. Among them, in dcl2 dcl3 dcl4, 22-nt sRNAs are partially reduced, and 26-nt sRNAs are reduced to the similar level as in col-0. Therefore, while we have found five different sizes of sRNAs that might work with AGO4, DCL2 and DCL4 might contribute the product of partial 22-nt sRNAs and major 26-nt sRNAs.

Detection of the overlaps between sRNAs and genomic features gives out some interesting results. sRNAs from intergenic areas and transposable elements are unexpectedly only slightly reduced in both dcl3 and dcl2 dcl3 dcl4 mutants, although DCL3 is supposed to produce sRNAs mainly derived from transposable elements and other repetitive sequences. Meanwhile, sRNAs from genic regions are slightly (dcl3) or significantly (dcl2 dcl3 dcl4 ) increased. This might due to the involvement of other sRNAs in the original sRNA sequencing files, which quite possibly derived from gene transcripts. However, when further check out the the sRNAs from minor features, sRNAs from five prime UTR are increased in dcl3  decreased in dcl2 dcl3 dcl4, which is therefore quite possible functional sRNAs rather than genic transcripts derivatives. 

To better understand the potential sRNAs candidates that bind to AGO4, some further works are necessary: Firstly, performing the overlapping between 20-26nt sRNAs and genomic features will help exclude the influence by the other sRNAs. Secondly, to better understand the origin of the sRNAs with particular sizes, the abundance of which is increased in dcl3 mutant, one might find out the overlaps between each sized sRNAs and genomic features. Finally, since each features have different numbers of loci, information for sRNAs read level at each locus might give out better understanding on how DCL proteins control sRNA expression in different features.

  • References

Groth, M., Stroud, H., Feng, S., Greenberg, M.V.C., Vashisht, A.A., Wohlschlegel, J.A., Jacobsen, S.E., Ausin, I. (2014). SNF2 chromatin remodeler-family proteins FRG1 and 2 are required for RNA-directed DNA methylation. Proc Natl Acad Sci USA. 111 (49): 17666-17671.

Havecker, E.R., Walibridge, L.M., Hardcastle, T.J.,  Bush, M.S., Kelly, K.A., Dunn, R.M., Schwach, F., Doonan, J.H., Baulcombe, D.C. (2010). The Arabidopsis RNA-directed DNA methylation argonautes functionally diverge based on their expression and interaction with target loci. Plant Cell. 22 (2): 321-334.

Stroud, H., Greenberg, M.V., Feng, S.,  Bernativichute, Y.V., Jacobsen, S.E. (2013). Comprehensive analysis of silencing mutants reveals complex regulation of the Arabidopsis methylome. Cell. 152: 352-364.

objective of my project:

find out profile change of the potential sRNAs loaded to AGO4.