Update 2-24-15
Inputs: fasta files, parameters
Outputs: top motifs, visualization of GC skew across sequence
Programs: Python and R (or just Python).
Very raw pseudocode for motif finding:
wf = open(top_motifs.txt, w) seq = "store input file into a string" user input for window and slide size = input("some message:" usr_input) window_size = usr_input slide_size = usr_input motifs = {} for window in seq: if not previously stored motif: unk_motif = "unk_motif" motifs[unk_motif] = 1 elif: update dict value + 1 to preexisting motif motifs[ else: break loop when window reaches the end of seq if top_motifs.txt.closed == False: top_motifs.txt.closed()
I realize I need to find a method for building a dictionary on the fly while also updating values to reoccurring keys. Also, I am thinking that I should also keep track of motif position as some sort of metadata that can be called with an option when running the program. Calculating GC skew will use a similar method of a sliding window however, I still do not know how to generate figures in python.
, multiple selections available,