Verse Scansion Program

标签：word Scansion len syl Verse Program print syllables line

Problem:

Scansion involves discerning stressed and unstressed syllables in English, posing challenges for non-native speakers, including myself, my schoolmates, and beginners. While dictionaries and online resources can identify syllables and stresses for individual words, existing programs merely count the number of syllables in a line, neglecting specific syllable identification and stress patterns. The tedious process of determining the meter necessitates frequent word look-ups for their syllables and stresses. Consequently, I sought to develop a program that automatically outputs the scansion for each line of poetry, streamlining the entire process.

The Python module Prosodic can perform scansion, representing stressed and unstressed syllables with upper- and lower-case letters. However, it removes all spaces and punctuation, except apostrophes, in its output. Additionally, given Prosodic’s specialized nature, users must familiarize themselves with its array of functions to utilize it effectively.

Method:

First, my program dissects each line into individual words, capturing any associated punctuation at the end of each word. I then implemented a web crawler to extract the root form of each word from Merriam-Webster.com and retrieve its syllabic structure, including its primary stress, from HowManySyllables.com. Recognizing potential discrepancies between a word and its root form, I re-analyzed the original word, accommodating any variations and appending suffixes and punctuation. Additionally, by acknowledging frequent metrical patterns such as the prevalence of iambs, the program proposes scansion for each line.

Result:

My program offers three output “modes”: the first divides words into syllables, the second highlights the primary stress of each word in bold, and the third generates suggestions for each line's scansion. Unlike the Prosodic module, all three modes maintain the text's original format, preserving spaces, punctuation and upper- and lower-case letters, thus facilitating user readability and analysis.

What I did:

Problems with Prosodic:

1. the original format is lost:

　• upper- and lower- case represent stressed and unstressed syllables

　• no spaces and no punctuation (apart from apostrophes)

2. the user has to learn how to use its various functions

My improved program:

Three “modes” available using different tools:

Tool 1 (determining the syllables in the first line of “The Buck in the Snow” by Edna St. Vincent Millay):

Tool 2 (showing the syllables and primary stresses in this line from “Who in One Lifetime” by Muriel Rukeyser):

Tool 3 (giving the suggested scansion of the same line by considering common metrical patterns such as iambs):

My code in Python:

from bs4 import BeautifulSoup
import requests # web crawler

import openpyxl # Python and Excel

workbook = openpyxl.load_workbook('C:\\Users\\hongz\\Desktop\\CS\\wordata.xlsx')
worksheet = workbook.active # a database of words and their syllables

wordict = dict() # a currently-empty dictionary

def Dictionary(): # to set up the dictionary
    print("loading dictionary ……", end = " ")
    for i in range(2, worksheet.max_row + 1):
        row = worksheet[i] # the word and its syllables
        text = [syl.value for syl in row if syl.value is not None]
        word = text[0]; text.remove(word)
        wordict[word] = text # the word's syllables
    print("dictionary loaded"); print("")

def Showords():
    print("words in the dictionary")
    for word in wordict.keys():
        print(word, end = ", ")
    print("") # to check the dictionary

def Showordict():
    print("dictionary of base words and their syllables")
    for i in range(2, worksheet.max_row + 1):
        word = worksheet.cell(row = i, column = 1).value
        print(word, end = ": "); print(wordict[word])
    print("") # to check the dictionary is updated

def Store(bswd): # to process the base word

    response = requests.get('https://www.howmanysyllables.com/syllables/' + bswd)
    soup = BeautifulSoup(response.text, 'html.parser')
    # print(soup.prettify()) # to format the page's source code
    meter = soup.find('span',{'class':'no_b'})

    if meter != None: # howmanysyllables.com has its meter

        meter = str(meter) # to convert the meter from class Tag to class string
        meter = meter.replace('<span class="no_b" data-nosnippet="">', "")
        meter = meter.replace('</span>', "") # delete other HTML tags
        meter = meter.replace('<span class="Ans_st">', "*") # “*”s mark stressed syllables
        syllables = meter.split("-") # to split the text into a list of the word's syllables

    else: # the word is a monosyllable or howmanysyllables.com does not have its meter

        meter = soup.find_all('span',class_='Answer_Red')[1]
        syllables = meter.text.split("-") # a list of the word's syllables

        # monosyllables, apart from common exceptions such as "the", are usually stressed
        if len(syllables) == 1: syllables[0] = "*" + syllables[0]

    wordict[bswd] = syllables # to store the word's syllables in the dictionary

    # to add this word and its syllables to the spreadsheet
    wdsyl = syllables.copy(); wdsyl.insert(0, bswd); worksheet.append(wdsyl)

'''
the NLTK module's Porter stemmer and Snowball stemmer have some inaccuacies,
and lemmatization is slower than directly crawling Merriam-Webster.com
'''

def Simplify(word):
    # to get the base word of each word (e.g. "flowers" → "flower")
    # in Merriam-Webster.com, single/plural and and upper/lower-case do not matter
    response = requests.get('https://www.merriam-webster.com/dictionary/' + word)
    soup = BeautifulSoup(response.text, 'html.parser')
    bswd = soup.find('meta',property='og:aria-text')['content'].split()[4]
    if wordict.get(bswd) == None: Store(bswd) # to create a new datum
    return bswd # to compare with the original word (to find its suffix)

def Verse(line):

    # to adjust spaces adjacent to hyphens and dashes
    for bar in ["-", "–", "—"]:
        line = line.replace(" " + bar, bar)
        line = line.replace(bar, bar + " ")
    # print(line) # to check that this works

    text = line.split() # to store the still largely original formatting of the line

    syllables = list() # a currently-empty list of syllables

    for i in range(0, len(text)):

        word = text[i] # to compare the processed word with the original word

        # to wipe off any suffixes with apostrophes
        apostrophes = ["n't", "'ve", "'re", "'ll", "'t", "'s", "'m", "'d"]
        for apos in apostrophes: word = word.replace(apos, "")

        punc = "" # to store punctuation marks at the end
        while word[-1] in (""",;.:'"(-)[–]{—}?!"""):
            punc = word[-1] + punc; word = word[:-1] # ; print(punc)

        # to wipe off punctuation marks at the front
        if word[0] in ("""('"[{"""): word = word[1:]

        bswd = Simplify(word) # to find the base word

        # print(bswd, end = ": "); print(wordict[bswd])

        char_index = 0 # a pointer to iterate through the word's characters

        for j in range(0, len(wordict[bswd])):
            bsyl = wordict[bswd][j] # a syllable
            if bsyl[0] == "*": syl = "*"
            else: # this syllable is not stressed
                syl = word[char_index]
                char_index = char_index + 1
            for k in range(1, len(bsyl)):
                syl = syl + word[char_index]
                char_index = char_index + 1
            syllables.append(syl)

        if text[i][0] in ("""('"[{"""):
            suffix = text[i][len(bswd) + 1:len(text[i]) - len(punc)]
            if wordict[bswd][0][0] == "*":
                syllables[-len(wordict[bswd])] = "*" + text[i][0] + syllables[-len(wordict[bswd])][1:]
            else:
                syllables[-len(wordict[bswd])] = text[i][0] + syllables[-len(wordict[bswd])]
        else: suffix = text[i][len(bswd):len(text[i]) - len(punc)]

        if len(suffix) == 0 or suffix[0] == "'":
            syllables[-1] = syllables[-1] + suffix
        elif suffix[-1] in ["d", "s"]:
            syllables[-1] = syllables[-1] + suffix
        else:
            syllables.append(suffix)

        if text[i][-1] in ["-", "–", "—"]:
            syllables[-1] = syllables[-1] + punc
        else:
            syllables[-1] = syllables[-1] + punc + " "

    # print(syllables)
    return syllables

def Sylscan(syllables): # only showing syllables
    for i in range(0, len(syllables)):
        if syllables[i][0] == "*":
            syl = syllables[i][1:]
        else:
            syl = syllables[i]
        if syl[-1] in [" ", "-", "–", "—"]:
            print(syl, end = "")
        else:
            print(syl + "·", end = "")

def Scansion(syllables): # showing syllables and stresses
    for i in range(0, len(syllables)):
        if syllables[i][0] == "*":
            syl = syllables[i][1:]
            syl = "\033[1m" + syl + "\033[0m"
        else:
            syl = syllables[i]
        if syllables[i][-1] in [" ", "-", "–", "—"]:
            print(syl, end = "")
        else:
            print(syl + "·", end = "")

def Write(syllable, stress):

    syl = syllable # to remove the “*”
    if syl[0] == "*": syl = syl[1:]

    if stress == 0: # this syllable is stressed
        syl = "\033[1m" + syl + "\033[0m"

    if syllable[-1] in [" ", "-", "–", "—"]:
        print(syl, end = "")
    else:
        print(syl + "·", end = "")

def Suggestions(syls):

    # list of indices of stressed syllables
    stressed = list()
    for i in range(0, len(syls)):
        if syls[i][0] == "*":
            stressed.append(i)

    if len(stressed) == 0:
        stressed.append(1)

    for i in range(0, stressed[0]):
        Write(syls[i], (i + stressed[0]) % 2)

    for j in range(1, len(stressed)):
        for i in range(stressed[j-1], stressed[j]):
            Write(syls[i], (i + stressed[j-1]) % 2)

    for i in range (stressed[-1], len(syls)):
        Write(syls[i], (i + stressed[-1]) % 2)

# main 函数

Dictionary() # ; Showords()

print("Directions for using this scansion tool: please enter")
print(" — “1” to show only the syllables of each word in the poem")
print(" — “2” to show the syllables and the primary stress of each word")
print(" — “3” to show the suggested scansion (with syllables and stresses)")

tool = str(input("tool to use: ")) # to input which tool to use
if tool != "1" and tool != "2" and tool != "3":
    tool = "2"; print("sorry, using tool 2 as the default")

print("please enter the poem line-by-line (enter “0” to finish and save)")

line = str(input())

while(line != "0"):

    syllables = Verse(line)

    if tool == "1": Sylscan(syllables)
    if tool == "2": Scansion(syllables)
    if tool == "3": Suggestions(syllables)

    print(""); print("")

    # Showordict()

    line = str(input())

print("saving new words ……", end = " ")
workbook.save(filename = 'wordata.xlsx')
print("new words saved in dictionary")

'''
the Prosodic module is pretty good, but it has several disadvantages:
1. it represents stressed syllables using capital letters
2. it deletes all spaces and punctuation apart from apostrophes in its output
3. the user has to be familiar with the various functions used

in contrast, my program has a much simpler user interface and
its output is more legible, preserving most of the original formatting
'''

Video documentation:

Link to my database:

标签：word,Scansion,len,syl,Verse,Program,print,syllables,line
From： https://www.cnblogs.com/hazel-wu/p/17985725