Beginner Bioinformatics in Python — Part 6

Motifs: 
AGA, TCA, TGA
The Profile Matrix:
{'A':[1/3,0,1], 'T':[2/3,0,0], 'C':[0,1/3,0], 'G':[0,2/3,0]}
Sample Input: 
AGA
{
'A':[0.33,0,1],
'T':[0.67,0,0],
'C':[0,0.33,0],
'G':[0,0.67,0]
}
Sample Output:
0.22
def probability_of_generation(motif, profile_matrix):
return reduce(lambda x, y: x * y, [profile_matrix[motif[i]][i] for i in range(len(motif))], 1)
Sample Input: 
AGATGACCA
3
{
'A':[0.33,0,1],
'T':[0.67,0,0],
'C':[0,0.33,0],
'G':[0,0.67,0]
}
Sample Output:
TGA
def ProfileMostProbableKmer(text, k, profile):
kmers = [text[iterator:iterator + k] for iterator in range(len(text) - k + 1)]
probabilities = [probability_of_generation(kmer, profile) for kmer in kmers]
return kmers[probabilities.index(max(probabilities))]
  1. Iterate over all k-mers of the first DNA string. For each iteration, take that k-mer as our motif. Construct a profile matrix with just this substring, setting the probabilities to 1 for each character of this substring
  2. Move on to the second DNA string, and find the closest match to the first string. Based on this match, create another profile matrix.
  3. Move to the third and repeat.
  4. Repeat this whole process for all k-mers of the first DNA. Give a closeness score to each motif combination. The one with the best score wins. For scoring the motifs, we can use the score algorithm developed in the previous post.
  5. In the beginning, we can initialise our best motifs to the zeroeth substring of each string, so we have something to compare against.
Sample Input:3 5
GGCGTTCAGGCA
AAGAATCAGTCA
CAAGGAGTTCGC
CACGTCAATCAC
CAATAATATTCG
Sample Output:CAG
CAG
CAA
CAA
CAA
def GreedyMotifSearch(Dna, k, t):
motif_combinations = [best_motifs_for_given_iteration(Dna, k, i) for i in range(len(Dna[0]) - k + 1)]
motif_scores = [Score(motifs) for motifs in motif_combinations]
return motif_combinations[motif_scores.index(min(motif_scores))]

def best_motifs_for_given_iteration(dna, substring_length, index):
substring = dna[0][index: index + substring_length]
profile_matrix = Profile([substring])
return recursive_compute_best_motifs(dna, substring_length, [substring], profile_matrix, 1)

def recursive_compute_best_motifs(dna, substring_length, previous_motifs, profile_matrix, row_index):
if row_index == len(dna):
return previous_motifs
motif_for_row_index = ProfileMostProbableKmer(dna[row_index], substring_length, profile_matrix)
current_motifs = previous_motifs + [motif_for_row_index]
return recursive_compute_best_motifs(dna, substring_length, current_motifs, Profile(current_motifs), row_index + 1)

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store