What is this file? What does it look like inside?
This recipe helps the user understand what commonly used filetypes look like, so they can read them when they need to, or check them to make sure they are what you expected from an application. Here are some snippets of commonly used files.
Fasta - note the >, which starts the name line. Multiple fasta files can exist within a single file.
>Pt dna:chromosome chromosome:TAIR10:Pt:1:154478:1
ATGGGCGAACGACGGGAATTGAACCCGCGATGGTGAATTCACAATCCACTGCCTTAATCC
ACTTGGCTACATCCGCCCCTACGCTACTATCTATTCTTTTTTGTATTGTCTAAAAAAAAA
AAAAAATACAAATTTCAATAAAAAATAAAAAAAGGTAGCAAATTCCACCTTATTTTTTTT
CTAATAAAAAATATATAGTAATTTTTTATTATTTATTATTATTATTTATTATTAATATAA
TAAATAAAGTAAAATATGATACTCTATAAAAATTTGCTCATTTTTATAGAAAAAAACGAG
TAATATAAGCCCTCTTTCTTATTTAAAGAAGGCTTATATTGCTCGTTTTTTACTAAACTA
GATCTAGACTAACACTAACGAATTATCCATTTGTAGATGGAGCCTCAACAGCAGCTAGGT
CTAGAGGGAAGTTGTGAGCATTACGTTCATGCATAACTTCCATACCAAGGTTAGCACGGT
TAATAATATCAGCCCAAGTATTAATAACACGTCCTTGACTATCAACTACTGATTGGTTGA
Fastq–paired, forward reads – Note the @ for name lines, and the space just before 1:N, indicating a forward or first read.
@HWI-D00635:76:C7GEAANXX:7:1101:1212:1954 1:N:0:AGTCAA
NTTCTATCCTGGAGAAGAAAAATGAAATGGTTTCCAGCACATGAGCAAGGG
+
#<<BBFFFFFFFFFFBFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
@HWI-D00635:76:C7GEAANXX:7:1101:1243:1972 1:N:0:AGTCAA
NAAGCAACTTCACATATTGGGTCTATGCACGAACATCCAAGCTGCTTGCTA
+
#</<BFFFFFFFFFFFFFFFFFBFFFFFFFFFFFFFFFFFFFFFFFFFFFF
@HWI-D00635:76:C7GEAANXX:7:1101:1251:1914 1:N:0:AGTCAA
NGGAAATTACATATGCATAGGGAAAACAGACGCTGCCAGTTCGGCATTTCC
+
#<<BBFBFFFBBFFFFFFFFFFFFFFFFFFFFFFFFFFFF<FFFFFFFFFF
Fastq–paired, reverse reads – Note the @ for name lines, and the space just before 2:N, indicating a reverse or second read.
@HWI-D00635:76:C7GEAANXX:7:1101:1212:1954 2:N:0:AGTCAA
CAGTTAATAATTGGAGGAAAAAAATTCCTATGCTTTGACCACATCCACATG
+
BBBBBFFFFFFFBFFFFFFFFFFFFBFFFFFFFFFF<FFFFFFFFFFFFFF
@HWI-D00635:76:C7GEAANXX:7:1101:1243:1972 2:N:0:AGTCAA
CTCCTCTAACTCCTTTCCATCTACTGACCGTGAACTAGAGATCTTGGTGCT
+
BBBBBFFFFFFFFFBFFFFFFBFFFFFFF<FF//<<FFFFBFBB/FFFF<B
@HWI-D00635:76:C7GEAANXX:7:1101:1251:1914 2:N:0:AGTCAA
GCTGAACCCGGCGAGCGAGTACCCGCCCGGCCTGGAAGACACGCTCATCCT
+
BBBBBFFFFFBFFFFF<FFFFF/BFFFFFFFFFFFFBFFFFFFFFFFFFFF
GFF -- seqname source feature start end score strand frame attribute
Mt protein_coding exon 273 734 . - . gene_id "ATMG00010"; transcript_id "ATMG00010.1"; exon_number "1"; gene_name "ORF153A"; transcript_name "ATMG00010.1"; seqedit "false";
Mt protein_coding CDS 276 734 . - 0 gene_id "ATMG00010"; transcript_id "ATMG00010.1"; exon_number "1"; gene_name "ORF153A"; transcript_name "ATMG00010.1"; protein_id "ATMG00010.1";
Mt protein_coding start_codon 732 734 . - 0 gene_id "ATMG00010"; transcript_id "ATMG00010.1"; exon_number "1"; gene_name "ORF153A"; transcript_name "ATMG00010.1";
Mt protein_coding stop_codon 273 275 . - 0 gene_id "ATMG00010"; transcript_id "ATMG00010.1"; exon_number "1"; gene_name "ORF153A"; transcript_name "ATMG00010.1";
Mt rRNA exon 8848 11415 . - . gene_id "ATMG00020"; transcript_id "ATMG00020.1"; exon_number "1"; gene_name "RRN26"; transcript_name "ATMG00020.1"; seqedit "false";
Mt protein_coding exon 11918 12241 . + . gene_id "ATMG00030"; transcript_id "ATMG00030.1"; exon_number "1"; gene_name "ORF107A"; transcript_name "ATMG00030.1"; seqedit "false";
Mt protein_coding CDS 11918 12238 . + 0 gene_id "ATMG00030"; transcript_id "ATMG00030.1"; exon_number "1"; gene_name "ORF107A"; transcript_name "ATMG00030.1"; protein_id "ATMG00030.1";
Mt protein_coding start_codon 11918 11920 . + 0 gene_id "ATMG00030"; transcript_id "ATMG00030.1"; exon_number "1"; gene_name "ORF107A"; transcript_name "ATMG00030.1";
Mt protein_coding stop_codon 12239 12241 . + 0 gene_id "ATMG00030"; transcript_id "ATMG00030.1"; exon_number "1"; gene_name "ORF107A"; transcript_name "ATMG00030.1";
Mt protein_coding exon 16844 17791 . - . gene_id "ATMG00040"; transcript_id "ATMG00040.1"; exon_number "1"; gene_name "ORF315"; transcript_name "ATMG00040.1"; seqedit "false";
SAM -- @SQ reference sequences, then alignment lines: QNAME FLAG RNAME POS MAPQ CIGAR RNEXT PNEXT TLEN SEQ QUAL
@SQ SN:Pt LN:154478
@SQ SN:Mt LN:366924
@SQ SN:4 LN:18585056
@SQ SN:2 LN:19698289
@SQ SN:3 LN:23459830
@SQ SN:5 LN:26975502
@SQ SN:1 LN:30427671
@PG ID:bwa PN:bwa VN:0.7.12-r1039 CL:bwa mem -t 8 genome.fa -T 20 seq3/s_6_arabpy.fastq
unknown:6:1:8:1178#0 4 * 0 0 * * 0 0 CTGGNACAAAACCAGAGGGGATTGGTNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN abb^D[abbbabaab```bbBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB AS:i:0 XS:i:0
unknown:6:1:8:647#0 4 * 0 0 * * 0 0 GCGANGCGTCTTCAGTGTCTAGATCGGNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN aY]VD^\^\J^aYY`[`^BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB AS:i:0 XS:i:0
unknown:6:1:8:367#0 4 * 0 0 * * 0 0 TGCANATGTGGTGACTCTTGCAGTTGCNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN ab_[D\ababb_ba_baabb`BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB AS:i:0 XS:i:0
unknown:6:1:8:596#0 0 3 22733892 0 27M45S * 0 0 CGAANATGTGTGCTTCCGTCGGAATCCNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN aba^D\a`a`_`W`b`^```aBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB NM:i:1 MD:Z:4C22
AS:i:25 XS:i:25 XA:Z:2,+18910685,27M45S,1;
unknown:6:1:8:761#0 16 5 22308455 34 45S27M * 0 0 NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNCAATCCCAGTTTGGACCAAGAANTGAG BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBbba``aaaabbabba[DY_b` NM:i:1 MD:Z:22C4
AS:i:25 XS:i:0
unknown:6:1:8:571#0 16 2 14309627 34 45S27M * 0 0 NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNAACGGTCACTAACAAACACAAANGACA
BAM — It's binary, don't expect it to look like text!
<FE><E7><FF><FE>?<FF><FD><DF><FE><D7>_<FF><ED><9F><FF><F9>^O^?<FE><D3>^_<FF><E1><BF><FE><B7>^?<B9>J<8C>><DA><FA><9B><F2><87>^?<FC><D3>^_c<B4>r}<99>g<BB><EF><EB>/<E6>Rz<BD><BF><CD>#F<F9><8B><B9><B5>{%k<9D><F7>yv<94>d<ED>q<AC><B3><B4><F9>m<9E><D7><FD>yt<ED><D7>?y<95><F1>1<D7>^V<F1>m^W<FF><98><FB><A8>-YK|<BE>j<94>
<BB>&k?^?<B9><AC>?^_<BD><AE><88><F9><F9><E6>Y?<AD><AB>]<F1><F9><AC><B5><BE>_{<95><BB>|<FE>rESC<EB><FB>/?<D6><F1><F9><ED><F3><F5><DF>o<FD>|p9#<B2><EE><EF><F1>Z<E5><F9><AA><BF>X<EF><A8><E9><A5><EB>^U<ED>|S^]<DF><D6>V<CF>`<B6>9?[m<9F><DF><C6><F3><E9><D9>zF<BA><F5><99><9E><DC>K9<A3>1<E7><F7><F2>Y<CF><E2><FB>^Le<94>
<FB>J/=<AE><FA>^Y<AC><FB>Y<A9><DF><D6>Y<E7><E7>^O<C7><DD><D3>`<CD><D9><CE>?<91><FF><F2>3 <FD><B3>^B<9E>u<96><AC><CF><E4><9C>o<BA><D3>^\<AE>;<FA>Y<F4>`<<FB>i<95><9A>^V?<B5><B4>j<A4><E9>^?f<F5><B3>#j?c<8D><FE>yr^Y3<8D>?R<CF>o<EF><F1><B5>^S<EB><F5><FC>w^<B9>^?O<F0><B6>?8<9E>=<9B><AD><E3><F3><DB><D2>?<AD><F3><FC><B6>EO<D6>R<EA>g^F<D7>^H<B0><CE>3 <BD>dk??<D5><FC>Eq6Ry> <AC><9F><BF>[?<D7><DA>?<83>K<9A><DE>m<AD><E3
PSL – matches misMatches repMatches nCount qNumInsert qBaseInsert tNumInsert tBaseInsert strand qName qSize qStart qEnd tName tSize tStart tEnd blockCount blockSizes qStarts tStarts
match mis-match rep-match N's Q-gap-count Q-gap-bases T-gap-bases T-gap-count strand-bases Qname Qsize Qstart Qend Tname Tsize Tstart Tend block-count blockSizes qStarts tStarts
---------------------------------------------------------------------------------------------------------------------------------------------------------------
180 90 0 0 3 483 3 498 ++ scaffold1 1075 6 759 gi|347966521|ref|XM_321319.5| 1062 114 882 4 54,42,57,117, 6,192,252,642, 114,318,435,765,
69 36 0 0 0 0 0 0 ++ scaffold1 1075 627 732 gi|195574275|ref|XM_002105079.1| 1026 720 825 1 105, 627, 720,
69 36 0 0 0 0 0 0 ++ scaffold1 1075 627 732 gi|195390513|ref|XM_002053877.1| 1026 720 825 1 105, 627, 720,
69 36 0 0 0 0 0 0 ++ scaffold1 1075 627 732 gi|195349831|ref|XM_002041410.1| 1026 720 825 1 105, 627, 720,
80 34 0 0 1 18 1 75 ++ scaffold1 1075 207 339 gi|170035907|ref|XM_001845756.1| 1008 276 465 2 27,87, 207,252, 276,378,
85 47 0 0 0 0 0 0 ++ scaffold1 1075 627 759 gi|170035897|ref|XM_001845751.1| 1104 711 843 1 132, 627, 711,
52 20 0 0 0 0 0 0 ++ scaffold1 1075 627 699 gi|170035895|ref|XM_001845750.1| 1078 711 783 1 72, 627, 711,
136 71 0 0 2 594 2 523 ++ scaffold1 1075 206 1007 gi|170029648|ref|XM_001842652.1| 1053 323 1053 3 27,129,51, 206,627,956, 323,741,1002,
Related articles
Filter by label
There are no items with the selected labels at this time.