What is this file? What does it look like inside?

This recipe helps the user understand what commonly used filetypes look like, so they can read them when they need to, or check them to make sure they are what you expected from an application. Here are some snippets of commonly used files.

Fasta - note the >, which starts the name line. Multiple fasta files can exist within a single file.

>Pt dna:chromosome chromosome:TAIR10:Pt:1:154478:1
ATGGGCGAACGACGGGAATTGAACCCGCGATGGTGAATTCACAATCCACTGCCTTAATCC
ACTTGGCTACATCCGCCCCTACGCTACTATCTATTCTTTTTTGTATTGTCTAAAAAAAAA
AAAAAATACAAATTTCAATAAAAAATAAAAAAAGGTAGCAAATTCCACCTTATTTTTTTT
CTAATAAAAAATATATAGTAATTTTTTATTATTTATTATTATTATTTATTATTAATATAA
TAAATAAAGTAAAATATGATACTCTATAAAAATTTGCTCATTTTTATAGAAAAAAACGAG
TAATATAAGCCCTCTTTCTTATTTAAAGAAGGCTTATATTGCTCGTTTTTTACTAAACTA
GATCTAGACTAACACTAACGAATTATCCATTTGTAGATGGAGCCTCAACAGCAGCTAGGT
CTAGAGGGAAGTTGTGAGCATTACGTTCATGCATAACTTCCATACCAAGGTTAGCACGGT
TAATAATATCAGCCCAAGTATTAATAACACGTCCTTGACTATCAACTACTGATTGGTTGA
 

Fastq–paired, forward reads –  Note the @ for name lines, and the space just before 1:N, indicating a forward or first read.

 

@HWI-D00635:76:C7GEAANXX:7:1101:1212:1954 1:N:0:AGTCAA
NTTCTATCCTGGAGAAGAAAAATGAAATGGTTTCCAGCACATGAGCAAGGG
+
#<<BBFFFFFFFFFFBFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
@HWI-D00635:76:C7GEAANXX:7:1101:1243:1972 1:N:0:AGTCAA
NAAGCAACTTCACATATTGGGTCTATGCACGAACATCCAAGCTGCTTGCTA
+
#</<BFFFFFFFFFFFFFFFFFBFFFFFFFFFFFFFFFFFFFFFFFFFFFF
@HWI-D00635:76:C7GEAANXX:7:1101:1251:1914 1:N:0:AGTCAA
NGGAAATTACATATGCATAGGGAAAACAGACGCTGCCAGTTCGGCATTTCC
+
#<<BBFBFFFBBFFFFFFFFFFFFFFFFFFFFFFFFFFFF<FFFFFFFFFF


Fastq–paired, reverse reads –  Note the @ for name lines, and the space just before 2:N, indicating a reverse or second read.

 

@HWI-D00635:76:C7GEAANXX:7:1101:1212:1954 2:N:0:AGTCAA
CAGTTAATAATTGGAGGAAAAAAATTCCTATGCTTTGACCACATCCACATG
+
BBBBBFFFFFFFBFFFFFFFFFFFFBFFFFFFFFFF<FFFFFFFFFFFFFF
@HWI-D00635:76:C7GEAANXX:7:1101:1243:1972 2:N:0:AGTCAA
CTCCTCTAACTCCTTTCCATCTACTGACCGTGAACTAGAGATCTTGGTGCT
+
BBBBBFFFFFFFFFBFFFFFFBFFFFFFF<FF//<<FFFFBFBB/FFFF<B
@HWI-D00635:76:C7GEAANXX:7:1101:1251:1914 2:N:0:AGTCAA
GCTGAACCCGGCGAGCGAGTACCCGCCCGGCCTGGAAGACACGCTCATCCT
+
BBBBBFFFFFBFFFFF<FFFFF/BFFFFFFFFFFFFBFFFFFFFFFFFFFF

 

GFF -- seqname    source     feature    start    end    score    strand    frame    attribute

Mt     protein_coding    exon   273 734  .  -  .  gene_id "ATMG00010"; transcript_id "ATMG00010.1"; exon_number "1"; gene_name "ORF153A"; transcript_name "ATMG00010.1"; seqedit "false";
Mt     protein_coding    CDS   276 734  .  -  0  gene_id "ATMG00010"; transcript_id "ATMG00010.1"; exon_number "1"; gene_name "ORF153A"; transcript_name "ATMG00010.1"; protein_id "ATMG00010.1";
Mt     protein_coding    start_codon   732 734  .  -  0  gene_id "ATMG00010"; transcript_id "ATMG00010.1"; exon_number "1"; gene_name "ORF153A"; transcript_name "ATMG00010.1";
Mt     protein_coding    stop_codon   273 275  .  -  0  gene_id "ATMG00010"; transcript_id "ATMG00010.1"; exon_number "1"; gene_name "ORF153A"; transcript_name "ATMG00010.1";
Mt     rRNA                   exon   8848 11415  .  -  .  gene_id "ATMG00020"; transcript_id "ATMG00020.1"; exon_number "1"; gene_name "RRN26"; transcript_name "ATMG00020.1"; seqedit "false";
Mt     protein_coding   exon   11918 12241  .  +  .  gene_id "ATMG00030"; transcript_id "ATMG00030.1"; exon_number "1"; gene_name "ORF107A"; transcript_name "ATMG00030.1"; seqedit "false";
Mt     protein_coding   CDS   11918 12238  .  +  0  gene_id "ATMG00030"; transcript_id "ATMG00030.1"; exon_number "1"; gene_name "ORF107A"; transcript_name "ATMG00030.1"; protein_id "ATMG00030.1";
Mt     protein_coding   start_codon   11918 11920  .  +  0  gene_id "ATMG00030"; transcript_id "ATMG00030.1"; exon_number "1"; gene_name "ORF107A"; transcript_name "ATMG00030.1";
Mt     protein_coding   stop_codon   12239 12241  .  +  0  gene_id "ATMG00030"; transcript_id "ATMG00030.1"; exon_number "1"; gene_name "ORF107A"; transcript_name "ATMG00030.1";
Mt     protein_coding  exon   16844 17791  .  -  .  gene_id "ATMG00040"; transcript_id "ATMG00040.1"; exon_number "1"; gene_name "ORF315"; transcript_name "ATMG00040.1"; seqedit "false";

 

SAM -- @SQ reference sequences, then alignment lines: QNAME    FLAG    RNAME    POS    MAPQ    CIGAR    RNEXT    PNEXT    TLEN    SEQ    QUAL

@SQ SN:Pt LN:154478
@SQ SN:Mt LN:366924
@SQ SN:4 LN:18585056
@SQ SN:2 LN:19698289
@SQ SN:3 LN:23459830
@SQ SN:5 LN:26975502
@SQ SN:1 LN:30427671
@PG ID:bwa PN:bwa VN:0.7.12-r1039 CL:bwa mem -t 8 genome.fa -T 20 seq3/s_6_arabpy.fastq
unknown:6:1:8:1178#0 4 * 0 0 * * 0 0 CTGGNACAAAACCAGAGGGGATTGGTNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN abb^D[abbbabaab```bbBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB AS:i:0 XS:i:0
unknown:6:1:8:647#0 4 * 0 0 * * 0 0 GCGANGCGTCTTCAGTGTCTAGATCGGNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN aY]VD^\^\J^aYY`[`^BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB AS:i:0 XS:i:0
unknown:6:1:8:367#0 4 * 0 0 * * 0 0 TGCANATGTGGTGACTCTTGCAGTTGCNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN ab_[D\ababb_ba_baabb`BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB AS:i:0 XS:i:0
unknown:6:1:8:596#0 0 3 22733892 0 27M45S * 0 0 CGAANATGTGTGCTTCCGTCGGAATCCNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN aba^D\a`a`_`W`b`^```aBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB NM:i:1 MD:Z:4C22
AS:i:25 XS:i:25 XA:Z:2,+18910685,27M45S,1;
unknown:6:1:8:761#0 16 5 22308455 34 45S27M * 0 0 NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNCAATCCCAGTTTGGACCAAGAANTGAG BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBbba``aaaabbabba[DY_b` NM:i:1 MD:Z:22C4
AS:i:25 XS:i:0
unknown:6:1:8:571#0 16 2 14309627 34 45S27M * 0 0 NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNAACGGTCACTAACAAACACAAANGACA

BAM — It's binary, don't expect it to look like text!

<FE><E7><FF><FE>?<FF><FD><DF><FE><D7>_<FF><ED><9F><FF><F9>^O^?<FE><D3>^_<FF><E1><BF><FE><B7>^?<B9>J<8C>><DA><FA><9B><F2><87>^?<FC><D3>^_c<B4>r}<99>g<BB><EF><EB>/<E6>Rz<BD><BF><CD>#F<F9><8B><B9><B5>{%k<9D><F7>yv<94>d<ED>q<AC><B3><B4><F9>m<9E><D7><FD>yt<ED><D7>?y<95><F1>1<D7>^V<F1>m^W<FF><98><FB><A8>-YK|<BE>j<94>

<BB>&k?^?<B9><AC>?^_<BD><AE><88><F9><F9><E6>Y?<AD><AB>]<F1><F9><AC><B5><BE>_{<95><BB>|<FE>rESC<EB><FB>/?<D6><F1><F9><ED><F3><F5><DF>o<FD>|p9#<B2><EE><EF><F1>Z<E5><F9><AA><BF>X<EF><A8><E9><A5><EB>^U<ED>|S^]<DF><D6>V<CF>`<B6>9?[m<9F><DF><C6><F3><E9><D9>zF<BA><F5><99><9E><DC>K9<A3>1<E7><F7><F2>Y<CF><E2><FB>^Le<94>

<FB>J/=<AE><FA>^Y<AC><FB>Y<A9><DF><D6>Y<E7><E7>^O<C7><DD><D3>`<CD><D9><CE>?<91><FF><F2>3 <FD><B3>^B<9E>u<96><AC><CF><E4><9C>o<BA><D3>^\<AE>;<FA>Y<F4>`<<FB>i<95><9A>^V?<B5><B4>j<A4><E9>^?f<F5><B3>#j?c<8D><FE>yr^Y3<8D>?R<CF>o<EF><F1><B5>^S<EB><F5><FC>w^<B9>^?O<F0><B6>?8<9E>=<9B><AD><E3><F3><DB><D2>?<AD><F3><FC><B6>EO<D6>R<EA>g^F<D7>^H<B0><CE><BD>dk??<D5><FC>Eq6Ry>  <AC><9F><BF>[?<D7><DA>?<83>K<9A><DE>m<AD><E3

PSL – matches    misMatches    repMatches    nCount    qNumInsert    qBaseInsert    tNumInsert    tBaseInsert    strand    qName    qSize    qStart    qEnd    tName tSize    tStart    tEnd    blockCount    blockSizes    qStarts    tStarts

match   mis-match    rep-match    N's     Q-gap-count   Q-gap-bases   T-gap-bases  T-gap-count   strand-bases  Qname               Qsize       Qstart        Qend       Tname                Tsize       Tstart       Tend       block-count   blockSizes      qStarts  tStarts

---------------------------------------------------------------------------------------------------------------------------------------------------------------

180     90      0       0       3       483     3       498     ++      scaffold1       1075    6       759     gi|347966521|ref|XM_321319.5|   1062    114     882     4       54,42,57,117,   6,192,252,642,  114,318,435,765,

69      36      0       0       0       0       0       0       ++      scaffold1       1075    627     732     gi|195574275|ref|XM_002105079.1|        1026    720     825     1       105,    627,    720,

69      36      0       0       0       0       0       0       ++      scaffold1       1075    627     732     gi|195390513|ref|XM_002053877.1|        1026    720     825     1       105,    627,    720,

69      36      0       0       0       0       0       0       ++      scaffold1       1075    627     732     gi|195349831|ref|XM_002041410.1|        1026    720     825     1       105,    627,    720,

80      34      0       0       1       18      1       75      ++      scaffold1       1075    207     339     gi|170035907|ref|XM_001845756.1|        1008    276     465     2       27,87,  207,252,        276,378,

85      47      0       0       0       0       0       0       ++      scaffold1       1075    627     759     gi|170035897|ref|XM_001845751.1|        1104    711     843     1       132,    627,    711,

52      20      0       0       0       0       0       0       ++      scaffold1       1075    627     699     gi|170035895|ref|XM_001845750.1|        1078    711     783     1       72,     627,    711,

136     71      0       0       2       594     2       523     ++      scaffold1       1075    206     1007    gi|170029648|ref|XM_001842652.1|        1053    323     1053    3       27,129,51,      206,627,956,    323,741,1002,

 

 

There is no content with the specified labels