首页 > 其他分享 >520 666 信号抽取

520 666 信号抽取

时间:2023-04-24 09:01:17浏览次数:38  
标签:抽取 666 each possible length 520 CTC output WFST


(520|600).666 Information Extraction Homework # 6

Due Thursday, April 27, 2023.

Connectionist Temporal Classification

Consider the task of recognizing an M length sequence of tokens, y1M , from a T length input xT1 . The CTC objective function is one objective for such sequence transduction tasks which works by converting y1M into an alignment sequence sT1 of the same length as xT1 by using an additional symbol ? to fill the extra space. Note that it assumes T ≥ M. β?1?y1M? represents the set of all possible T -length alignments, sT1 , for the sequence y1M .

T

logp?y1M|xT1 ? = log X Yp?st|xT1 ? (1)

sT1 ∈β?1(y1M)t=1

We will consider that p ?st|xT1 ? is obtained by normalizing the outputs of a neural network, φ. Element, φtst ∝ p ?st|xT1 ?, of the matrix, φ, can be used to compute the CTC objective

where

and |S| is the number of output units in the neural network. 1. WFST Representation of the CTC Objective

Consider the task of modeling the word “all” using characters as the tokens. In this case M = 3. For this question, we will model words using both the HMM, and CTC objective functions and examine the different unweighted (i.e., the weights on each arc are 1.0) WFST topologies used for each. Remember that final states are indicated by using a double circle instead of a single circle around the state label. Mark all the input and output labels.

(a) Draw the WFST corresponding using a 1-state HMM for each letter and uniform state transitions. Assume the output labels could be fed into a pronunciation lexicon to recover the word, i.e., a - l - l → all

(b) Draw the WFST corresponding to the CTC topology. Recall that it should be able to accept strings starting or ending with ?.

(c) Enumerate all possible length 5 alignments of the word “all” accepted by the HMM topology where the HMMs for each letter are strung together with null arcs.

(d) Enumerate all possible length 5 alignments of the word “all” accepted by the CTC topology

(e) Draw the HMM trellis, often called a lattice, corresponding to all possible length 5 sequences as a WFST. (It should still look very similar to a normal trellis). Ignore the weights on the arcs. Do not draw any unnecessary (unused) nodes. Please include input and output labels.

(f) Repeat (e) but for the CTC trellis, corresponding to all possible length 5 se- quences as a WFST.

2. WFST Composition

The outputs of a neural network, φ, can themselves be represented as a WFST. Imagine the neural network has 4 outputs. In other words, it outputs vectors φt ∈ R1 x 4. The first output corresponds to the ? symbol, the second symbol corresponds to a, the third symbol to b and the fourth to l.

For this problem consider the matrix of scores, φ, where each column corresponds to a time index, and each row corresponds to one of the network outputs. These are like to observation probabilities in HMMs. We can draw this matrix as a WFST with 6 states corresponding to the length of the neural network output (5+1, where +1 is for a start state). Between each node there are as many arcs as there are output symbols (i.e., rows in the matrix). The input and output labels for this WFST will be the same (i.e., it is a WFSA), and the weights on the arcs correspond to the scores of those symbols at that position in time. Use the following matrix.

(3)

(a) Draw this WFST. Feel free to use some shorthand notation provided it is logical if this process seems tedious. We will call it Φ.

(b) Use the WFST, Φ, from part (a) and compose it, with the CTC WFST, which we will call T , from problem (1.) using the log-semiring, i.e., Φ ? T . Assume unweighted arcs all had a weight of 1. Recall that in the log-semiring, a L b = logea +eb,aNb=a+b,1=0,and0=∞. Showeachstepofthecomposition by denoting states with the pairs of labels on states used in Φ and T, like (Φi,Tj). Feel free to use any computational aid, i.e. a program or calculator, to help. Remember to remove any branches that don’t end in final states. A final state occurs when the both states in the pair of WFSTs are final.

(c) Compare the resulting WFST with the CTC trellis computed in problem (1.f) and comment.

(d) Use the forward algorithm on the result of part (b) to compute the CTC objective for this utterance. Again feel free to use any computational aid.

3. Prove that the posterior distribution modeled by CTC is globally normalized, i.e., the probability of a sequence is computed as the score of the sequence normalized by the sum of scores over all possible sequences and can be expressed as

Ep(y′) e 1 Make sure to state any assumptions used.

4. Using the result of problem (3.), show that maximizing the CTC objective maximizes a lower bound on the mutual information, I (X; Y ), between input and output sequences.

 WX:codehelp

标签:抽取,666,each,possible,length,520,CTC,output,WFST
From: https://www.cnblogs.com/somtimes/p/17348348.html

相关文章

  • 根据题库表文件抽取题目形成试卷表格
    试卷指的是抽取的题目类似试卷,但是不是格式是试卷那种格式。应对考试搞得一个,题库里面的题非常之多,每次都看完不太可能,就想着自动抽取汇总成题目文件。1'''2抽取100道题目:单选30,多选10,填空10,判断改错10,名词解释20,问答203'''45importxlwings6impor......
  • 2023年产品经理需要考的证书——NPDP,含金量高,666
    产品经理国际资格认证NPDP是国际公认的唯一的新产品开发专业认证,集理论、方法与实践为一体的全方位的知识体系,为公司组织层级进行规划、决策、执行提供良好的方法体系支撑。【认证机构】产品开发与管理协会(PDMA)成立于1979年,是全球范围内产品开发与管理专业人士最杰出的倡导者,协助个......
  • FIT5201 Complexity and Model Selection
    Assignment1,FIT5201,S120231ModelComplexityandModelSelectionInthissection,youstudytheeffectofmodelcomplexityonthetrainingandtestingerror.Youalsodemonstrateyourprogrammingskillsbydevelopingaregressionalgorithmandacross......
  • adb命令获取android app FPS 执行命令后只出现一行16666666的解决方案
    一、问题描述使用命令command='adbshelldumpsysSurfaceFlinger--latency{}/{}#0'.format(package_name,activity)获取androidapp的fps数据,执行命令后街股票打印如下:  二、问题分析1、刚开始以为是命令里面的SurfaceView写的有问题,执行命令adbshelldumpsys......
  • dpt-shell 抽取壳实现原理分析(执行逻辑)
    开源项目位置(为大佬开源精神点赞)https://github.com/luoyesiqiu/dpt-shell抽取壳分为两个步骤加壳逻辑:一对apk进行解析,将codeItem抽出到一个文件中,并进行nop填充二对抽取后的apk进行加密三注入壳程序相关文件即配置信息执行逻辑:一壳程序执行二壳解密......
  • dpt-shell 抽取壳实现原理分析(加壳逻辑)
    开源项目位置(为大佬开源精神点赞)https://github.com/luoyesiqiu/dpt-shell抽取壳分为两个步骤加壳逻辑:一对apk进行解析,将codeItem抽出到一个文件中,并进行nop填充二对抽取后的apk进行加密三注入壳程序相关文件即配置信息执行逻辑:一壳程序执行二壳解密......
  • POJ - 3666 Making the Grade(DP)
    题目大意:给你一个数组A,要求将这个数组变成数组B,使得Sum(abs(A[i]-B[i]))达到最小,且B是单调的解题思路:因为答案要求输出单调非递增或者单调非递减的的任意一个,那就只考虑单调非递增吧,因为两个的思路是相同的如果要变化的话,且变化的值要达到最小的话,那么只能变成和前一个相同或者......
  • 基于Label studio实现UIE信息抽取智能标注方案,提升标注效率!
    基于Labelstudio实现UIE信息抽取智能标注方案,提升标注效率!项目链接见文末人工标注的缺点主要有以下几点:产能低:人工标注需要大量的人力物力投入,且标注速度慢,产能低,无法满足大规模标注的需求。受限条件多:人工标注受到人力、物力、时间等条件的限制,无法适应所有的标注场景,尤......
  • idea重构小技巧2,选中变量,抽取为常量
    1.选中写死的内容,抽取为常量2.定义常量名字,默认是INDEX3.常量代码存放地址......
  • 666
    欢迎来到我的友链小屋展示本站所有友情站点,排列不分先后,均匀打乱算法随机渲染的喔! 友链信息博客名称:麋鹿鲁哟博客网址:https://www.cnblogs.com/miluluyo/博客头像:h......