Update 2-24-15
Inputs: fasta files, parameters
Outputs: top motifs, visualization of GC skew across sequence
Programs: Python and R (or just Python).
Very raw pseudocode for motif finding:
wf = open(top_motifs.txt, w)
seq = "store input file into a string"
user input for window and slide size = input("some message:" usr_input)
window_size = usr_input
slide_size = usr_input
motifs = {}
for window in seq:
if not previously stored motif:
unk_motif = "unk_motif"
motifs[unk_motif] = 1
elif:
update dict value + 1 to preexisting motif
motifs[
else:
break loop when window reaches the end of seq
if top_motifs.txt.closed == False:
top_motifs.txt.closed()
I realize I need to find a method for building a dictionary on the fly while also updating values to reoccurring keys. Also, I am thinking that I should also keep track of motif position as some sort of metadata that can be called with an option when running the program. Calculating GC skew will use a similar method of a sliding window however, I still do not know how to generate figures in python.