首页 > 其他分享 >Verse Scansion Program

Verse Scansion Program

时间:2024-01-24 19:59:42浏览次数:23  
标签:word Scansion len syl Verse Program print syllables line

Problem:

Scansion involves discerning stressed and unstressed syllables in English, posing challenges for non-native speakers, including myself, my schoolmates, and beginners. While dictionaries and online resources can identify syllables and stresses for individual words, existing programs merely count the number of syllables in a line, neglecting specific syllable identification and stress patterns. The tedious process of determining the meter necessitates frequent word look-ups for their syllables and stresses. Consequently, I sought to develop a program that automatically outputs the scansion for each line of poetry, streamlining the entire process.

The Python module Prosodic can perform scansion, representing stressed and unstressed syllables with upper- and lower-case letters. However, it removes all spaces and punctuation, except apostrophes, in its output. Additionally, given Prosodic’s specialized nature, users must familiarize themselves with its array of functions to utilize it effectively.

Method:

First, my program dissects each line into individual words, capturing any associated punctuation at the end of each word. I then implemented a web crawler to extract the root form of each word from Merriam-Webster.com and retrieve its syllabic structure, including its primary stress, from HowManySyllables.com. Recognizing potential discrepancies between a word and its root form, I re-analyzed the original word, accommodating any variations and appending suffixes and punctuation. Additionally, by acknowledging frequent metrical patterns such as the prevalence of iambs, the program proposes scansion for each line.

Result:

My program offers three output “modes”: the first divides words into syllables, the second highlights the primary stress of each word in bold, and the third generates suggestions for each line's scansion. Unlike the Prosodic module, all three modes maintain the text's original format, preserving spaces, punctuation and upper- and lower-case letters, thus facilitating user readability and analysis.

What I did:

Problems with Prosodic:

1. the original format is lost:

 

  • upper- and lower- case represent stressed and unstressed syllables

 • no spaces and no punctuation (apart from apostrophes)

2. the user has to learn how to use its various functions

My improved program:

Three “modes” available using different tools:

Tool 1 (determining the syllables in the first line of “The Buck in the Snow” by Edna St. Vincent Millay):

Tool 2 (showing the syllables and primary stresses in this line from “Who in One Lifetime” by Muriel Rukeyser):

Tool 3 (giving the suggested scansion of the same line by considering common metrical patterns such as iambs):

 My code in Python:
from bs4 import BeautifulSoup
import requests # web crawler

import openpyxl # Python and Excel

workbook = openpyxl.load_workbook('C:\\Users\\hongz\\Desktop\\CS\\wordata.xlsx')
worksheet = workbook.active # a database of words and their syllables

wordict = dict() # a currently-empty dictionary

def Dictionary(): # to set up the dictionary
    print("loading dictionary ……", end = " ")
    for i in range(2, worksheet.max_row + 1):
        row = worksheet[i] # the word and its syllables
        text = [syl.value for syl in row if syl.value is not None]
        word = text[0]; text.remove(word)
        wordict[word] = text # the word's syllables
    print("dictionary loaded"); print("")

def Showords():
    print("words in the dictionary")
    for word in wordict.keys():
        print(word, end = ", ")
    print("") # to check the dictionary

def Showordict():
    print("dictionary of base words and their syllables")
    for i in range(2, worksheet.max_row + 1):
        word = worksheet.cell(row = i, column = 1).value
        print(word, end = ": "); print(wordict[word])
    print("") # to check the dictionary is updated

def Store(bswd): # to process the base word

    response = requests.get('https://www.howmanysyllables.com/syllables/' + bswd)
    soup = BeautifulSoup(response.text, 'html.parser')
    # print(soup.prettify()) # to format the page's source code
    meter = soup.find('span',{'class':'no_b'})

    if meter != None: # howmanysyllables.com has its meter

        meter = str(meter) # to convert the meter from class Tag to class string
        meter = meter.replace('<span class="no_b" data-nosnippet="">', "")
        meter = meter.replace('</span>', "") # delete other HTML tags
        meter = meter.replace('<span class="Ans_st">', "*") # “*”s mark stressed syllables
        syllables = meter.split("-") # to split the text into a list of the word's syllables

    else: # the word is a monosyllable or howmanysyllables.com does not have its meter

        meter = soup.find_all('span',class_='Answer_Red')[1]
        syllables = meter.text.split("-") # a list of the word's syllables

        # monosyllables, apart from common exceptions such as "the", are usually stressed
        if len(syllables) == 1: syllables[0] = "*" + syllables[0]

    wordict[bswd] = syllables # to store the word's syllables in the dictionary

    # to add this word and its syllables to the spreadsheet
    wdsyl = syllables.copy(); wdsyl.insert(0, bswd); worksheet.append(wdsyl)

'''
the NLTK module's Porter stemmer and Snowball stemmer have some inaccuacies,
and lemmatization is slower than directly crawling Merriam-Webster.com
'''

def Simplify(word):
    # to get the base word of each word (e.g. "flowers" → "flower")
    # in Merriam-Webster.com, single/plural and and upper/lower-case do not matter
    response = requests.get('https://www.merriam-webster.com/dictionary/' + word)
    soup = BeautifulSoup(response.text, 'html.parser')
    bswd = soup.find('meta',property='og:aria-text')['content'].split()[4]
    if wordict.get(bswd) == None: Store(bswd) # to create a new datum
    return bswd # to compare with the original word (to find its suffix)

def Verse(line):

    # to adjust spaces adjacent to hyphens and dashes
    for bar in ["-", "–", "—"]:
        line = line.replace(" " + bar, bar)
        line = line.replace(bar, bar + " ")
    # print(line) # to check that this works

    text = line.split() # to store the still largely original formatting of the line

    syllables = list() # a currently-empty list of syllables

    for i in range(0, len(text)):

        word = text[i] # to compare the processed word with the original word

        # to wipe off any suffixes with apostrophes
        apostrophes = ["n't", "'ve", "'re", "'ll", "'t", "'s", "'m", "'d"]
        for apos in apostrophes: word = word.replace(apos, "")

        punc = "" # to store punctuation marks at the end
        while word[-1] in (""",;.:'"(-)[–]{—}?!"""):
            punc = word[-1] + punc; word = word[:-1] # ; print(punc)

        # to wipe off punctuation marks at the front
        if word[0] in ("""('"[{"""): word = word[1:]

        bswd = Simplify(word) # to find the base word

        # print(bswd, end = ": "); print(wordict[bswd])

        char_index = 0 # a pointer to iterate through the word's characters

        for j in range(0, len(wordict[bswd])):
            bsyl = wordict[bswd][j] # a syllable
            if bsyl[0] == "*": syl = "*"
            else: # this syllable is not stressed
                syl = word[char_index]
                char_index = char_index + 1
            for k in range(1, len(bsyl)):
                syl = syl + word[char_index]
                char_index = char_index + 1
            syllables.append(syl)

        if text[i][0] in ("""('"[{"""):
            suffix = text[i][len(bswd) + 1:len(text[i]) - len(punc)]
            if wordict[bswd][0][0] == "*":
                syllables[-len(wordict[bswd])] = "*" + text[i][0] + syllables[-len(wordict[bswd])][1:]
            else:
                syllables[-len(wordict[bswd])] = text[i][0] + syllables[-len(wordict[bswd])]
        else: suffix = text[i][len(bswd):len(text[i]) - len(punc)]

        if len(suffix) == 0 or suffix[0] == "'":
            syllables[-1] = syllables[-1] + suffix
        elif suffix[-1] in ["d", "s"]:
            syllables[-1] = syllables[-1] + suffix
        else:
            syllables.append(suffix)

        if text[i][-1] in ["-", "–", "—"]:
            syllables[-1] = syllables[-1] + punc
        else:
            syllables[-1] = syllables[-1] + punc + " "

    # print(syllables)
    return syllables

def Sylscan(syllables): # only showing syllables
    for i in range(0, len(syllables)):
        if syllables[i][0] == "*":
            syl = syllables[i][1:]
        else:
            syl = syllables[i]
        if syl[-1] in [" ", "-", "–", "—"]:
            print(syl, end = "")
        else:
            print(syl + "·", end = "")

def Scansion(syllables): # showing syllables and stresses
    for i in range(0, len(syllables)):
        if syllables[i][0] == "*":
            syl = syllables[i][1:]
            syl = "\033[1m" + syl + "\033[0m"
        else:
            syl = syllables[i]
        if syllables[i][-1] in [" ", "-", "–", "—"]:
            print(syl, end = "")
        else:
            print(syl + "·", end = "")

def Write(syllable, stress):

    syl = syllable # to remove the “*”
    if syl[0] == "*": syl = syl[1:]

    if stress == 0: # this syllable is stressed
        syl = "\033[1m" + syl + "\033[0m"

    if syllable[-1] in [" ", "-", "–", "—"]:
        print(syl, end = "")
    else:
        print(syl + "·", end = "")

def Suggestions(syls):

    # list of indices of stressed syllables
    stressed = list()
    for i in range(0, len(syls)):
        if syls[i][0] == "*":
            stressed.append(i)

    if len(stressed) == 0:
        stressed.append(1)

    for i in range(0, stressed[0]):
        Write(syls[i], (i + stressed[0]) % 2)

    for j in range(1, len(stressed)):
        for i in range(stressed[j-1], stressed[j]):
            Write(syls[i], (i + stressed[j-1]) % 2)

    for i in range (stressed[-1], len(syls)):
        Write(syls[i], (i + stressed[-1]) % 2)

# main 函数

Dictionary() # ; Showords()

print("Directions for using this scansion tool: please enter")
print(" — “1” to show only the syllables of each word in the poem")
print(" — “2” to show the syllables and the primary stress of each word")
print(" — “3” to show the suggested scansion (with syllables and stresses)")

tool = str(input("tool to use: ")) # to input which tool to use
if tool != "1" and tool != "2" and tool != "3":
    tool = "2"; print("sorry, using tool 2 as the default")

print("please enter the poem line-by-line (enter “0” to finish and save)")

line = str(input())

while(line != "0"):

    syllables = Verse(line)

    if tool == "1": Sylscan(syllables)
    if tool == "2": Scansion(syllables)
    if tool == "3": Suggestions(syllables)

    print(""); print("")

    # Showordict()

    line = str(input())

print("saving new words ……", end = " ")
workbook.save(filename = 'wordata.xlsx')
print("new words saved in dictionary")

'''
the Prosodic module is pretty good, but it has several disadvantages:
1. it represents stressed syllables using capital letters
2. it deletes all spaces and punctuation apart from apostrophes in its output
3. the user has to be familiar with the various functions used

in contrast, my program has a much simpler user interface and
its output is more legible, preserving most of the original formatting
'''
Video documentation:

 

Link to my database:

标签:word,Scansion,len,syl,Verse,Program,print,syllables,line
From: https://www.cnblogs.com/hazel-wu/p/17985725

相关文章

  • BUUCTF Reverse easyre wp
    使用exeinfo工具查看文件信息使用IDA64位打开文件,再使用Shift+F12打开字符串窗口,发现flag字符串双击跳转到字符串在汇编代码中的存储地址点击字符串下方注释中的跳转链接,即可跳转至引用它的函数对应的汇编代码处按F5反汇编,生成对应汇编代码处的C语言伪代码分析代码。......
  • 00000030.ReverseAnalysis.ring0层注册表监控
    00000030.ReverseAnalysis.ring0层注册表监控links深入理解注册表监控从0环监测注册表机制...好像还是有点门槛的如何监控注册表是windows的重要数据库,存放了很多重要的信息以及一些应用的设置,对注册表进行监控并防止篡改是十分有必要的。在64位系统下微软提供了CmRegiste......
  • Toyota Programming Contest 2024#1(AtCoder Beginner Contest 337)
    ToyotaProgrammingContest2024#1(AtCoderBeginnerContest337)A-Scoreboard代码:#include<bits/stdc++.h>usingnamespacestd;usingll=longlong;usingpii=pair<ll,ll>;#definefifirst#definesesecondusingi128=__int128_t;void......
  • Toyota Programming Contest 2024#1(AtCoder Beginner Contest 337)
    ToyotaProgrammingContest2024#1(AtCoderBeginnerContest337)比赛链接A-Scoreboard思路简单的模拟,统计一下总分数就可以了Code#include<bits/stdc++.h>usingnamespacestd;#defineintlonglongvoidsolve(){ intn; intans1=0; intans2=0; cin>>n; for......
  • 大模型新篇章:元象XVERSE-Long-256K实现256K超长文本分析
    引言在人工智能的快速发展中,大模型技术始终是推动行业进步的重要力量。特别是在处理长文本上下文方面,长文本技术已成为衡量一个大模型技术成熟度的重要标准。近日,元象科技发布了全球首个256K上下文窗口长度的开源大模型——XVERSE-Long-256K,这一创新举措不仅填补了开源生态的空白,也......
  • 2020-2021 ICPC Southeastern European Regional Programming Contest (SEERC 2020)
    Preface最害怕的一集,徐神感冒身体不适只能口胡前半场,祁神中途也有事下机导致一段时间内只有我一个人在写题最后也是不负众望体现出没有队友我究竟是多么地彩笔,后面也索性开摆了直接后面3h梭哈写H题(主要写个假做法浪费很长时间)最后喜被卡常打完这场特意叫了一天休息,一是为了徐神......
  • SciTech-Math-AdvancedAlgebra-Dot Product + Linear Equations And Inverse Matrices
    LinearEquationsAndInverseMatrices:https://math.mit.edu/~gs/dela/dela_4-1.pdfDotProduct:Theotherimportantoperationonvectorsisakindofmultiplication.Thisisnotordinarymultiplicationandwedon'twritevw.Theoutputfromvandwwi......
  • The 2021 Sichuan Provincial Collegiate Programming Contest
    题目链接:The2021SichuanProvincialCollegiateProgrammingContestA.Chuanpai题意:定义每一张川牌包含两个数字x,y,并且1<=x<= y<=6,求牌面上数字之和为n的牌种类解题思路:签到,预处理枚举即可查看代码map<int,int>mp;voidinit(){ for(inti=1;i<=6;i......
  • Reverse a linked list【1月17日学习笔记·】
    点击查看代码//Reverssealinkedlist#include<iostream>usingnamespacestd;structnode{ intdata; node*next;};node*A;voidreverse(){ node*next;//用于保存下一个·节点地址以便索引 node*current=A;//当前索引 node*prev=NULL;//保存上一个节点......
  • 2020-2021 ACM-ICPC Latin American Regional Programming Contest J. Job Allocator
    Preface今天因为下午被强行拉回老家了,而且没带电脑回去然后就变成了徐神和祁神两个人写,我拿个手机在后面口胡了3h最后变成了在缺我一个人的前提下还能4h过10题的情况,感觉就算我在的话最多就是快点过H然后把剩下的时间拿去写个J这场因为没啥参与就不写整场的博客了,把赛后写的这......