Problem:
Scansion involves discerning stressed and unstressed syllables in English, posing challenges for non-native speakers, including myself, my schoolmates, and beginners. While dictionaries and online resources can identify syllables and stresses for individual words, existing programs merely count the number of syllables in a line, neglecting specific syllable identification and stress patterns. The tedious process of determining the meter necessitates frequent word look-ups for their syllables and stresses. Consequently, I sought to develop a program that automatically outputs the scansion for each line of poetry, streamlining the entire process.
The Python module Prosodic can perform scansion, representing stressed and unstressed syllables with upper- and lower-case letters. However, it removes all spaces and punctuation, except apostrophes, in its output. Additionally, given Prosodic’s specialized nature, users must familiarize themselves with its array of functions to utilize it effectively.
Method:
First, my program dissects each line into individual words, capturing any associated punctuation at the end of each word. I then implemented a web crawler to extract the root form of each word from Merriam-Webster.com and retrieve its syllabic structure, including its primary stress, from HowManySyllables.com. Recognizing potential discrepancies between a word and its root form, I re-analyzed the original word, accommodating any variations and appending suffixes and punctuation. Additionally, by acknowledging frequent metrical patterns such as the prevalence of iambs, the program proposes scansion for each line.
Result:
My program offers three output “modes”: the first divides words into syllables, the second highlights the primary stress of each word in bold, and the third generates suggestions for each line's scansion. Unlike the Prosodic module, all three modes maintain the text's original format, preserving spaces, punctuation and upper- and lower-case letters, thus facilitating user readability and analysis.
What I did:
Problems with Prosodic:
1. the original format is lost:
• upper- and lower- case represent stressed and unstressed syllables
• no spaces and no punctuation (apart from apostrophes)
2. the user has to learn how to use its various functions
My improved program:
Three “modes” available using different tools:
Tool 1 (determining the syllables in the first line of “The Buck in the Snow” by Edna St. Vincent Millay):
Tool 2 (showing the syllables and primary stresses in this line from “Who in One Lifetime” by Muriel Rukeyser):
Tool 3 (giving the suggested scansion of the same line by considering common metrical patterns such as iambs):
My code in Python:
from bs4 import BeautifulSoup import requests # web crawler import openpyxl # Python and Excel workbook = openpyxl.load_workbook('C:\\Users\\hongz\\Desktop\\CS\\wordata.xlsx') worksheet = workbook.active # a database of words and their syllables wordict = dict() # a currently-empty dictionary def Dictionary(): # to set up the dictionary print("loading dictionary ……", end = " ") for i in range(2, worksheet.max_row + 1): row = worksheet[i] # the word and its syllables text = [syl.value for syl in row if syl.value is not None] word = text[0]; text.remove(word) wordict[word] = text # the word's syllables print("dictionary loaded"); print("") def Showords(): print("words in the dictionary") for word in wordict.keys(): print(word, end = ", ") print("") # to check the dictionary def Showordict(): print("dictionary of base words and their syllables") for i in range(2, worksheet.max_row + 1): word = worksheet.cell(row = i, column = 1).value print(word, end = ": "); print(wordict[word]) print("") # to check the dictionary is updated def Store(bswd): # to process the base word response = requests.get('https://www.howmanysyllables.com/syllables/' + bswd) soup = BeautifulSoup(response.text, 'html.parser') # print(soup.prettify()) # to format the page's source code meter = soup.find('span',{'class':'no_b'}) if meter != None: # howmanysyllables.com has its meter meter = str(meter) # to convert the meter from class Tag to class string meter = meter.replace('<span class="no_b" data-nosnippet="">', "") meter = meter.replace('</span>', "") # delete other HTML tags meter = meter.replace('<span class="Ans_st">', "*") # “*”s mark stressed syllables syllables = meter.split("-") # to split the text into a list of the word's syllables else: # the word is a monosyllable or howmanysyllables.com does not have its meter meter = soup.find_all('span',class_='Answer_Red')[1] syllables = meter.text.split("-") # a list of the word's syllables # monosyllables, apart from common exceptions such as "the", are usually stressed if len(syllables) == 1: syllables[0] = "*" + syllables[0] wordict[bswd] = syllables # to store the word's syllables in the dictionary # to add this word and its syllables to the spreadsheet wdsyl = syllables.copy(); wdsyl.insert(0, bswd); worksheet.append(wdsyl) ''' the NLTK module's Porter stemmer and Snowball stemmer have some inaccuacies, and lemmatization is slower than directly crawling Merriam-Webster.com ''' def Simplify(word): # to get the base word of each word (e.g. "flowers" → "flower") # in Merriam-Webster.com, single/plural and and upper/lower-case do not matter response = requests.get('https://www.merriam-webster.com/dictionary/' + word) soup = BeautifulSoup(response.text, 'html.parser') bswd = soup.find('meta',property='og:aria-text')['content'].split()[4] if wordict.get(bswd) == None: Store(bswd) # to create a new datum return bswd # to compare with the original word (to find its suffix) def Verse(line): # to adjust spaces adjacent to hyphens and dashes for bar in ["-", "–", "—"]: line = line.replace(" " + bar, bar) line = line.replace(bar, bar + " ") # print(line) # to check that this works text = line.split() # to store the still largely original formatting of the line syllables = list() # a currently-empty list of syllables for i in range(0, len(text)): word = text[i] # to compare the processed word with the original word # to wipe off any suffixes with apostrophes apostrophes = ["n't", "'ve", "'re", "'ll", "'t", "'s", "'m", "'d"] for apos in apostrophes: word = word.replace(apos, "") punc = "" # to store punctuation marks at the end while word[-1] in (""",;.:'"(-)[–]{—}?!"""): punc = word[-1] + punc; word = word[:-1] # ; print(punc) # to wipe off punctuation marks at the front if word[0] in ("""('"[{"""): word = word[1:] bswd = Simplify(word) # to find the base word # print(bswd, end = ": "); print(wordict[bswd]) char_index = 0 # a pointer to iterate through the word's characters for j in range(0, len(wordict[bswd])): bsyl = wordict[bswd][j] # a syllable if bsyl[0] == "*": syl = "*" else: # this syllable is not stressed syl = word[char_index] char_index = char_index + 1 for k in range(1, len(bsyl)): syl = syl + word[char_index] char_index = char_index + 1 syllables.append(syl) if text[i][0] in ("""('"[{"""): suffix = text[i][len(bswd) + 1:len(text[i]) - len(punc)] if wordict[bswd][0][0] == "*": syllables[-len(wordict[bswd])] = "*" + text[i][0] + syllables[-len(wordict[bswd])][1:] else: syllables[-len(wordict[bswd])] = text[i][0] + syllables[-len(wordict[bswd])] else: suffix = text[i][len(bswd):len(text[i]) - len(punc)] if len(suffix) == 0 or suffix[0] == "'": syllables[-1] = syllables[-1] + suffix elif suffix[-1] in ["d", "s"]: syllables[-1] = syllables[-1] + suffix else: syllables.append(suffix) if text[i][-1] in ["-", "–", "—"]: syllables[-1] = syllables[-1] + punc else: syllables[-1] = syllables[-1] + punc + " " # print(syllables) return syllables def Sylscan(syllables): # only showing syllables for i in range(0, len(syllables)): if syllables[i][0] == "*": syl = syllables[i][1:] else: syl = syllables[i] if syl[-1] in [" ", "-", "–", "—"]: print(syl, end = "") else: print(syl + "·", end = "") def Scansion(syllables): # showing syllables and stresses for i in range(0, len(syllables)): if syllables[i][0] == "*": syl = syllables[i][1:] syl = "\033[1m" + syl + "\033[0m" else: syl = syllables[i] if syllables[i][-1] in [" ", "-", "–", "—"]: print(syl, end = "") else: print(syl + "·", end = "") def Write(syllable, stress): syl = syllable # to remove the “*” if syl[0] == "*": syl = syl[1:] if stress == 0: # this syllable is stressed syl = "\033[1m" + syl + "\033[0m" if syllable[-1] in [" ", "-", "–", "—"]: print(syl, end = "") else: print(syl + "·", end = "") def Suggestions(syls): # list of indices of stressed syllables stressed = list() for i in range(0, len(syls)): if syls[i][0] == "*": stressed.append(i) if len(stressed) == 0: stressed.append(1) for i in range(0, stressed[0]): Write(syls[i], (i + stressed[0]) % 2) for j in range(1, len(stressed)): for i in range(stressed[j-1], stressed[j]): Write(syls[i], (i + stressed[j-1]) % 2) for i in range (stressed[-1], len(syls)): Write(syls[i], (i + stressed[-1]) % 2) # main 函数 Dictionary() # ; Showords() print("Directions for using this scansion tool: please enter") print(" — “1” to show only the syllables of each word in the poem") print(" — “2” to show the syllables and the primary stress of each word") print(" — “3” to show the suggested scansion (with syllables and stresses)") tool = str(input("tool to use: ")) # to input which tool to use if tool != "1" and tool != "2" and tool != "3": tool = "2"; print("sorry, using tool 2 as the default") print("please enter the poem line-by-line (enter “0” to finish and save)") line = str(input()) while(line != "0"): syllables = Verse(line) if tool == "1": Sylscan(syllables) if tool == "2": Scansion(syllables) if tool == "3": Suggestions(syllables) print(""); print("") # Showordict() line = str(input()) print("saving new words ……", end = " ") workbook.save(filename = 'wordata.xlsx') print("new words saved in dictionary") ''' the Prosodic module is pretty good, but it has several disadvantages: 1. it represents stressed syllables using capital letters 2. it deletes all spaces and punctuation apart from apostrophes in its output 3. the user has to be familiar with the various functions used in contrast, my program has a much simpler user interface and its output is more legible, preserving most of the original formatting '''
Video documentation: