1 Trim adapters and remove low sequencing quality using Cutadapt+Trimmomatic OR trim_galore (example).
cutadapt -a CTGTAGGCACCA --error-rate=0.1 --overlap=3 --times=2 --output=sample.trimed.fastq.gz sample.fastq.gz
-a:3'adapter used in Ribo-seq. The most commonly used 3'adaters for Ribo-seq are: "AGATCGGAAGAG","TGGAATTCTCGG" and "CTGTAGGCACCA".
--error-rate: Maximum allowed error rate as value between 0 and 1 (no. of errors divided by length of matching region) (0.1 = 10%)
--overlap: MINLENGTH Require MINLENGTH overlap between read and adapter for an adapter to be found.
--times: Remove up to COUNT adapters from each read or number of round for adapter finding and removal.
java -jar /path/trimmomatic-0.39.jar SE -phred33 sample.trimed.fastq.gz sample.trimed.cleaned.fastq.gz LEADING:30 TRAILING:30 MINLEN:25
-phred33: Sequence quality type which can be infer using FastQC
LEADING: Cut bases off the start of a read, if below a threshold quality
TRAILING: Cut bases off the end of a read, if below a threshold quality
MINLEN: Drop the read if it is below a specified length
-
trim_galore sample.trimed.fastq.gz --phred33 CTGTAGGCACCATCA --length 25 --max_length 34 -e 0.1 -q 30 --stringency 3 -o outdir
--phred33: Sequence quality type which can be infer using FastQC
--length: Discard reads that became shorter than length INT because of either quality or adapter trimming. A value of '0' effectively disables this behaviour.
-max_length: Cut bases off the end of a read, if below a threshold quality
-e: same as --error-rate in cutadapt
-q: Phred score cut-off
--stringency: same as --overlap in cutadapt
2 Generate collapsed FASTA file.
-i: input cleaned FASTQ sequences (sample_trimmed.fq.gz for trim_galore output)
-o: output collapsed FASTA file
-m: discard reads shorter than minimum length (integer, default: 25)
-l: discard reads longer than maximum length (integer, default: 35)
Download fq2collapedFa_v1.2.pl
Use featureCounts from Subread to generate gene count matrix
featureCounts -t CDS -s 0 -g gene_id -a mm10.annotation.gtf -o sample2.featurecount.txt sample2.bam
featureCounts -t CDS -s 0 -g gene_id -a mm10.annotation.gtf -o sample3.featurecount.txt sample3.bam
featureCounts -t CDS -s 0 -g gene_id -a mm10.annotation.gtf -o sample4.featurecount.txt sample4.bam
paste <(awk '{print $1,"\t",$7}' sample1.featurecount.txt) <(awk '{print $7}' sample2.featurecount.txt) <(awk '{print $7}' sample3.featurecount.txt) <(awk '{print $7}' sample4.featurecount.txt) > all.count.txt
Using htseq-count from HTseq to generate gene count matrix
htseq-count -f bam --stranded no -i gene_id sample2.bam mm10.annotation.gtf > sample2.htseq-count.txt
htseq-count -f bam --stranded no -i gene_id sample3.bam mm10.annotation.gtf > sample3.htseq-count.txt
htseq-count -f bam --stranded no -i gene_id sample4.bam mm10.annotation.gtf > sample4.htseq-count.txt
paste <(awk '{print $1"\t"$2}' sample1.htseq-count.txt) <(awk '{print $2}' sample2.htseq-count.txt) <(awk '{print $2}' sample3.htseq-count.txt) <(awk '{print $2}' sample4.htseq-count.txt) > all.htcount.txt
Species | Genome version | Source | Example | gene ID |
---|---|---|---|---|
Homo sapien | hg19 (gtf file) | GENCODE | ENSG00000185097, ENSG00000230092 | .txt |
Homo sapien | hg38 (gtf file) | GENCODE | ENSG00000185097, ENSG00000230092 | .txt |
Mus musculus | mm10 (gtf file) | GENCODE | ENSMUSG00000102693, ENSMUSG00000033813 | .txt |
Mus musculus | mm39 (gtf file) | GENCODE | ENSMUSG00000102693, ENSMUSG00000064842 | gtf file |
Rattus norvegicus | rn6 (gtf file) | Ensembl | ENSRNOG00000023659, ENSRNOG00000016381 | .txt |
Danio rerio | GRCz11 (gtf file) | Ensembl | ENSDARG00000102123, ENSDARG00000098311 | .txt |
Bos taurus | ARS-UCD1.2 (gtf file) | Ensembl | ENSBTAG00000006648, ENSBTAG00000049697 | gtf file |
Canis lupus familiaris | ROS Cfam 1.0 (gtf file) | Ensembl | ENSCAFG00845015183, ENSCAFG00845015195 | gtf file |
Gallus gallus | GRCg7b (gtf file) | Ensembl | ENSGALG00010000711, ENSGALG00010000715 | gtf file |
Gorilla gorilla | gorGor4 (gtf file) | Ensembl | ENSGGOG00000010861, ENSGGOG00000040578 | gtf file |
Macaca mulatta | Mmul_10 (gtf file) | Ensembl | ENSMMUG00000023296, ENSMMUG00000036181 | gtf file |
Pan troglodytes | Pan_tro_3.0 (gtf file) | Ensembl | ENSPTRG00000052463, ENSPTRG00000042737 | gtf file |
Sus scrofa | Sscrofa11.1 (gtf file) | Ensembl | ENSSSCG00000028996, ENSSSCG00000005267 | gtf file |
Xenopus tropicalis | UCB Xtro 10.0 (gtf file) | Ensembl | ENSXETG00000045641, ENSXETG00000042394 | gtf file |
Drosophila melanogaster | Dm6 (gtf file) | Ensembl | FBgn0267430, FBgn0086378 | .txt |
Caenorhabditis elegans | WBcel235 (gtf file) | Ensembl | WBGene00017080, WBGene00005298 | .txt |
Saccharomyces cerevisiae | R64 (gtf file) | Ensembl | YDL163W, YDL076C | .txt |
Anopheles gambiae | AgamP4 (gtf file) | Ensembl | AGAP002473, AGAP001777 | gtf file |
Arabidopsis thaliana | TAIR10 (gtf file) | Ensembl | AT1G01010, AT1G01020 | .txt |
Oryza sativa | IRGSP-1.0 (gtf file) | Ensembl | Os01g0100100, Os01g0100300 | .txt |
Zea mays | B73_RefGen_v4 (gtf file) | Ensembl | Zm00001d027230, Zm00001d027259 | .txt |
Zea mays | B73 NAM 5.0 (gtf file) | Ensembl | Zm00001eb015280, Zm00001eb000610 | gtf file |
Glycine max | Wm82.a2 (gtf file) | Ensembl | Zm00001d027230, Zm00001d027259 | .txt |
Solanum lycopersicum | SL3.0 (gtf file) | Ensembl | Solyc01g005030, Solyc01g005620 | .txt |
Vitis vinifera | PN40024.v4 (gtf file) | Ensembl | Vitvi18g04695, Vitvi18g00828 | gtf file |
Triticum aestivum | IWGSC RefSeq 1.0 (gtf file) | Ensembl | TraesCS3B02G271600, TraesCS3B02G001500 | gtf file |
Sorghum bicolor | NCBIv3 (gtf file) | Ensembl | SORBI_3001G113500, SORBI_3001G533100 | gtf file |
Solanum tuberosum | SolTub 3.0 (gtf file) | Ensembl | PGSC0003DMG400042058, PGSC0003DMG400042081 | gtf file |
Populus trichocarpa | Pop_tri_v3 (gtf file) | Ensembl | POPTR_001G146100v3, POPTR_001G078300v3 | gtf file |
Physcomitrium patens | Phypa_V3 (gtf file) | Ensembl | ENSRNA049990314, ENSRNA049990316 | gtf file |
Medicago truncatula | MedtrA17 4.0 (gtf file) | Ensembl | MTR_4g073010, MTR_4g028310 | gtf file |
Gossypium raimondii | Graimondii2_0_v6 (gtf file) | Ensembl | B456_009G035000, B456_009G301200 | gtf file |
Cucumis sativus | ASM407v2 (gtf file) | Ensembl | Csa_3G732670, Csa_3G889960 | gtf file |
Chlamydomonas reinhardtii | v5.5 (gtf file) | Ensembl | CHLRE_12g484834v5, CHLRE_12g558353v5 | gtf file |
Brassica rapa | Brapa 1.0 (gtf file) | Ensembl | Bra027712, Bra031227 | gtf file |
Brachypodium distachyon | bdi.v3.0 (gtf file) | Ensembl | BRADI_1g14170v3, BRADI_1g53295v3 | gtf file |
Malus domestica golden | ASM211411v1 (gtf file) | Ensembl | MD15G0021200, MD15G0206200 | gtf file |
Escherichia coli | K-12 MG1655 (gtf file) | Ensembl | b0001, b0002 | .txt |
Bacillus subtilis | Strain 168 (gtf file) | Ensembl | BSU00010, EBG00000977999 | .txt |
Pyrococcus furiosus | DSM 3638 (gtf file) | Ensembl | PF0001, EBG00001119311 | .txt |
Halobacterium salinarum | NRC1 (gtf file) | Ensembl | VNG_0001H, VNG_0003C | .txt |
OS | Version | Chrome | Firefox | Microsoft Edge | Safari |
---|---|---|---|---|---|
Linux | Centos 7 | 79.0 | 68.3.0 | n/a | n/a |
MacOS | Catalina | 79.0 | 71.0 | n/a | 13.0 |
Windows | 10 | 79.0 | 71.0 | 79.0 | n/a |