Tuples
Exercise 1: Revise a previous program as follows: Read and parse the “From” lines and
pull out the addresses from the line. Count the number of messages from each person
using a dictionary.
After all the data has been read, print the person with the most commits by creating a list
of (count, email) tuples from the dictionary. Then sort the list in reverse order and print
out the person who has the most commits.
Sample Line:
From [email protected] Sat Jan 5 09:14:16 2008
Enter a file name: mbox-short.txt
[email protected] 5
Enter a file name: mbox.txt
[email protected] 195
def test(file_name):
try:
file = open(file_name)
except:
print("file %s not found"%file_name)
exit()
c = dict()
for line in file:
line = line.rstrip()
if line.startswith("From") and not line.startswith("From:") :
word = line.split()[1]
if word not in c.keys():
c[word] = 1
else:
c[word] += 1
l = []
for key,value in list(c.items()):
l.append((key,value))
l.sort(reverse = True)
print(l)
max_guy,max = l[0]
for x,y in l:
if y > max:
max_guy = x
max = y
print(max_guy,max)
test("mbox-short.txt")
Exercise 2: This program counts the distribution of the hour of the day for each of the
messages. You can pull the hour from the “From” line by finding the time string and
then splitting that string into parts using the colon character. Once you have accumulated
the counts for each hour, print out the counts, one per line, sorted by hour as shown
below.
Sample Execution:
python timeofday.py
Enter a file name: mbox-short.txt
04 3
06 1
07 1
09 2
10 3
11 6
14 1
15 2
16 4
17 2
18 1
19 1
def test(file_name):
try:
file = open(file_name)
except:
print("file %s not found"%file_name)
exit()
c = dict()
for line in file:
line = line.rstrip()
if line.startswith("From") and not line.startswith("From:") :
word = line.split()[5].split(":")[0]
if word not in c.keys():
c[word] = 1
else:
c[word] += 1
l = []
for key,value in list(c.items()):
l.append((key,value))
l.sort(reverse = False)
print(l)
for x,y in l:print(x,y)
test("mbox-short.txt")
Exercise 3: Write a program that reads a file and prints the letters in decreasing order
of frequency. Your program should convert all the input to lower case and only count
the letters a‑z. Your program should not count spaces, digits, punctuation, or anything
other than the letters a‑z. Find text samples from several different languages and see
how letter frequency varies between languages. Compare your results with the tables at
wikipedia.org/wiki/Letter_frequencies.
import string
def test(file_name):
try:
file = open(file_name)
except:
print("file %s not found"%file_name)
exit()
c = dict()
for line in file:
words = line.rstrip().translate(str.maketrans('', '', string.punctuation)).lower().split()
for word in words:
for i in word:
if i not in(['0','1','2','3','4','5','6','7','8','9']):
if i not in c:
c[i] = 1
else:
c[i] += 1
# print(c)
l = list(c.items())
sorted_letters = sorted(c.items(), key=lambda item: item[1], reverse=True)#sort by digital value
l.sort()#sort by letter
# print(l)
count_percent = 0
for x,y in sorted_letters:
percent = round(y/sum(c.values())*100,2)
Letter = x.upper()
count_percent += percent
print("%s: %.2f%%"%(Letter,percent))
# print("total percent is %f%%, the bias error is %f%%"%(count_percent,(100-count_percent)))
test("mbox-short.txt")
标签:Everybody,word,name,Python,percent,file,print,line From: https://www.cnblogs.com/millionyh/p/18106563