10.1 Creating many_lists_lenght
from many_lists
many_lists
is a multi-dimensional list that was created in one of the earlier codeblocks in your Google Colab notebook under week 7’s lab session:
= ['AEE31160.1.fa', 'NP_194967.1.fa', 'NP_057185.1.fa', 'NP_171654.1.fa'] #these are our file IDs
files = [] #this is a list that will contain the data from each file
many_lists
for i in files:
= [] #This temporary variable will store result from each file, and is getting reset at the beginning of each for loop
currentFile = open(i, 'r').readlines() #open the file as a list of list. Each list element is a row in the file
curFile += [curFile[0].split(" ")[0][1:]] #Grab the protein ID (accession), add the string to currentFile
currentFile += [curFile[0].split(".1 ")[1].split(" [")[0]] #Grab annotation...
currentFile += [curFile[0].split("[")[1][:-2]] #Grab the organism...
currentFile
= ''
temp for row in curFile[1:]: #the protein sequence starts from row 2 and spans multiple rows, hence [1:] (from 2nd line on, do...)
+=row.rstrip() #each sequence in a row ends with a newline \n character, which we remove with .rstrip()
temp
+=[temp] #we add the whole protein sequence to currentFile
currentFile##currentFile now contains [accession, annotation, organism, sequence]
#we then save the result from current file to master list
many_lists.append(currentFile)
print('This is the many_lists list')
print(many_lists)
And many_lists
takes on the following form:
[[accession, annotation, organism, sequence], [accession, annotation, organism, sequence], ...]
Nonetheless, here’s the problem that we need to solve: we want to create another variable many_lists_lenght
that takes on the following form:
[[accession, sequence length, sequence], [accession, sequence length, sequence], ...]
In other words, there is no need to modify the many_lists
variable itself (not to mention that you will also need to re-use this variable later on)! Rather, we can apply our knowledge of list indices to get many_lists_lenght
!
From inspection, it looks like we need to extract the first and the last element of each sublist of many_lists
- we can do this using a for
loop:
for i in many_lists:
= i[0], i[-1], len(i[-1])
accession, sequence, sequenceLength # Rest of my code here...
But before that, let’s not forget to define the many_lists_lenght
variable first:
= []
many_lists_lenght
for i in many_lists:
= i[0], i[-1], len(i[-1])
accession, sequence, sequenceLength # Rest of my code here...
And since many_lists_lenght
is a list of lists (i.e., like many_lists
), we can also create a temporary list tempList
to store the contents of the sublists of many_lists_lenght
. We can use the .append()
method from the list class to do so, after which we can then re-use the same method to store tempList
into many_lists_lenght
:
= []
many_lists_lenght
for i in many_lists:
= []
tempList = i[0], i[-1], len(i[-1])
accession, sequence, sequenceLength
tempList.append(accession)
tempList.append(sequenceLength)
tempList.append(sequence) many_lists_lenght.append(tempList)
And there is the first part of the problem done! Granted - the code above is a little verbose, but it does get the job done!
10.1.1 A more concise approach
Interestingly, the same problem can also be solved in one line of code:
= [[i[0], len(i[-1]), i[-1]] for i in many_lists] many_lists_length
Here, the general idea described above is still the same, albeit list comprehension was used to make things more concise!