首页 > 编程语言 >python代码实现将PDF文件转为文本及其对应的音频

python代码实现将PDF文件转为文本及其对应的音频

时间:2023-01-30 11:34:00浏览次数:72  
标签:off algorithm python text 音频 learning policy PDF based

代码地址:

​https://github.com/TiffinTech/python-pdf-audo​

 

 

============================================

 

 

import pyttsx3,PyPDF2

#insert name of your pdf
pdfreader = PyPDF2.PdfReader(open('book.pdf', 'rb'))
speaker = pyttsx3.init()

for page_num in range(len(pdfreader.pages)):
text = pdfreader.pages[page_num].extract_text()
clean_text = text.strip().replace('\n', ' ')
print(clean_text)
#name mp3 file whatever you would like
speaker.save_to_file(clean_text, 'story.mp3')
speaker.runAndWait()

speaker.stop()

 

 

python代码实现将PDF文件转为文本及其对应的音频_sed

 

 

 

 

 

首先说下PDF文字提取的功能,大概还是可以凑合的,给出Demo:

python代码实现将PDF文件转为文本及其对应的音频_ci_02

 

 

 

 

 

 

提取的文字为:

python代码实现将PDF文件转为文本及其对应的音频_sed_03

 

 

 

 

Safe and efficient off-policy reinforcement learning R´emi Munos [email protected] Google DeepMindThomas Stepleton [email protected] Google DeepMind Anna Harutyunyan [email protected] Vrije Universiteit BrusselMarc G. Bellemare [email protected] Google DeepMind Abstract In this work, we take a fresh look at some old and new algorithms for off-policy, return-based reinforcement learning. Expressing
these in a common form, we de- rive a novel algorithm, Retrace(λ), with three desired properties: (1) it haslow variance; (2) itsafelyuses samples collected from any behaviour policy, whatever its degree of
“off-policyness”; and (3) it isefficientas it makes the best use of sam- ples collected from near on-policy behaviour policies. We analyze the contractive nature of the related operator under both off-policy
policy evaluation and control settings and derive online sample-based algorithms. We believe this is thefirst return-based off-policy control algorithm converging a.s. toQ∗without the GLIE assumption (Greedy
in the Limit with Infinite Exploration). As a corollary, we prove the convergence of Watkins’ Q(λ), which was an open problem since 1989. We illustrate the benefits of Retrace(λ) on a standard suite of Atari 2600 games. One fundamental trade-off in reinforcement learning lies in the definition of the update target: should one estimate Monte Carlo returns or bootstrap from an existing Q-function? Return-based meth- ods (wherereturnrefers to the sum of discounted rewards� tγtrt) offer some advantages over value bootstrap methods: they are better behaved when combined with function approximation, and quickly propagate the fruits of exploration (Sutton, 1996). On the other hand, value bootstrap meth- ods are more readily applied to off-policy data, a common use case. In this paper we show that learning from returns need not be at cross-purposes with off-policy learning. We start from the recent work of Harutyunyan et al. (2016), who show that naive off-policy policy evaluation, without correcting for the “off-policyness” of a
trajectory, still converges to the desired Qπvalue function provided the behaviorµand targetπpolicies are not too far apart (the maxi- mum allowed distance depends on theλparameter). TheirQπ(λ)algorithm learns from trajectories generated byµsimply by summing discounted off-policy corrected rewards at each time step. Un- fortunately, the assumption thatµandπare close is restrictive, as well as difficult to uphold in the control case, where the target policy is greedy with respect to the current Q-function. In that sense this algorithm is notsafe: it does not handle the case of arbitrary “off-policyness”. Alternatively, the Tree-backup (TB(λ)) algorithm (Precup et al., 2000) tolerates arbitrary tar- get/behavior discrepancies by scaling information (here calledtraces) from future temporal dif- ferences by the product of target policy probabilities. TB(λ) is notefficientin the “near on-policy” case (similarµandπ), though, as traces may be cut prematurely, blocking learning from full returns. In this work, we express several
off-policy, return-based algorithms in a common form. From this we derive an improved algorithm, Retrace(λ), which is bothsafeandefficient, enjoying convergence guarantees for off-policy policy evaluation and – more importantly – for the control setting. 30th Conference on Neural Information Processing Systems (NIPS 2016), Barcelona, Spain.

 

 

 

上面这些这就是文字提取的效果,而对于音频转换这部分就效果实在是糟糕的很,转换的音频是无法贴合原文的,因此这里认为上面代码中这个PDF文字提取功能还是可以勉强用的,为以后项目需要做一定的技术积累,而这个音频转换就无法考虑使用了。

 

 

 

=============================================

 

 

对应的视频:

​https://www.youtube.com/watch?v=LXsdt6RMNfY​

标签:off,algorithm,python,text,音频,learning,policy,PDF,based
From: https://blog.51cto.com/u_15642578/6026157

相关文章

  • python进程绑定CPU的意义
    1.绑定CPU后对计算密集型的任务可能会一定程度上提升运算性能:(小幅度的性能提升,甚至小幅度落后,总之就是差别不大)对比1代码A:importosfrommultiprocessingimportProcessfr......
  • python进程绑定CPU的一些Demo
    从​​中知道如何对python进程设置CPU绑定,本文对此进行一些延伸,给出一些例子:代码1:importosfrommultiprocessingimportProcessimporttimecpu_avia=os.sched_getaffini......
  • python文字转语音
    依赖pyttsx3库pipinstallpyttsx3 yuyin_test1.pyimportpyttsx3engine=pyttsx3.init()engine.say("Fourthlettersoftransitinthespringbreezeasyo......
  • python3-pip
    一、pip介绍Python官网中的安装包中已经自带了pip,在安装时默认选择安装。安装完python后需要手动配置pip的环境变量,cmd命令可以查看pip是否可用:pip或者pip-h二、命令......
  • i.MX8MM开发板音视频开发-音频编码
    我们举个例子,以CD音质来说,量化格式是2字节,采样率是44100,声道数是2,这些信息就描述了CD的音质。对于声音信息,我们还可以用数据比特率来描述音频数据单位时间内的容量......
  • Python Numpy 中的打印设置函数set_printoptions
    一概述np.set_printoptions()用于控制Python中小数的显示精度。二解析np.set_printoptions(precision=None,threshold=None,linewidth=None,suppress=None,......
  • python2--安装es报错:error in elastic-transport setup command: 'install_requires'
    今天使用python2安装es模块时报错:  原因是pip(模块管理工具)版本过低,需先升级pip,再进行安装先替换pip的镜像,默认镜像拉取慢,还可能会失败cd~;mkdir.pip;touch.p......
  • python爬虫(二)- HTML解析之XPath
    HTML解析通过urllib、requests,都可以拿到HTML内容。HTML的内容返回给浏览器,浏览器就会解析它,并对它渲染。HTML超文本表示语言,设计的初衷就是为了超越普通文本,让文本表......
  • Python用KShape对时间序列进行聚类和肘方法确定最优聚类数k可视化|附代码数据
    全文链接:http://tecdat.cn/?p=27078最近我们被客户要求撰写关于KShape的研究报告,包括一些图形和统计输出。时序数据的聚类方法,该算法按照以下流程执行。使用基于互相关......
  • curl_cffi: 支持原生模拟浏览器 TLS/JA3 指纹的 Python 库
    原文首发于我的博客:https://yifei.me/note/2719越来越多的网站开始使用TLS指纹反爬虫,而Python中竟然没有任何方法解决这个问题。前一阵看到由国外大神写了一个curl-......