首页 > 编程语言 >python generator相关

python generator相关

时间:2023-03-28 14:56:13浏览次数:55  
标签:generator python yield file 相关 csv gen row

本文的重点介绍python中的yield用法及这样的表达式:

comp_list = [x * 2 for x in range(10)]   -- List Comprehensions

(x ** 2 for x in range(10))  -- Generator Expressions

摘抄自: https://realpython.com/introduction-to-python-generators/  和 List Comprehensions in Python and Generator Expressions | Django Stars , 给自己看的,所以格式较乱

一、python generator

先看两个例子:

Example 1: Reading Large Files

A common use case of generators is to work with data streams or large files, like CSV files. These text files separate data into columns by using commas. This format is a common way to share data. Now, what if you want to count the number of rows in a CSV file? The code block below shows one way of counting those rows:

csv_gen = csv_reader("some_csv.txt")
row_count = 0

for row in csv_gen:
    row_count += 1

print(f"Row count is {row_count}")

Looking at this example, you might expect csv_gen to be a list. To populate this list, csv_reader() opens a file and loads its contents into csv_gen. Then, the program iterates over the list and increments row_count for each row.

This is a reasonable explanation, but would this design still work if the file is very large? What if the file is larger than the memory you have available? To answer this question, let’s assume that csv_reader() just opens the file and reads it into an array:

def csv_reader(file_name):
    file = open(file_name)
    result = file.read().split("\n")
    return result

This function opens a given file and uses file.read() along with .split() to add each line as a separate element to a list. If you were to use this version of csv_reader() in the row counting code block you saw further up, then you’d get the following output:

>>>
Traceback (most recent call last):
  File "ex1_naive.py", line 22, in <module>
    main()
  File "ex1_naive.py", line 13, in main
    csv_gen = csv_reader("file.txt")
  File "ex1_naive.py", line 6, in csv_reader
    result = file.read().split("\n")
MemoryError

In this case, open() returns a generator object that you can lazily iterate through line by line. However,  file.read().split() loads everything into memory at once, causing the MemoryError.

Before that happens, you’ll probably notice your computer slow to a crawl. You might even need to kill the program with a KeyboardInterrupt. So, how can you handle these huge data files? Take a look at a new definition of csv_reader():

def csv_reader(file_name):
    for row in open(file_name, "r"):
        yield row

In this version, you open the file, iterate through it, and yield a row. This code should produce the following output, with no memory errors:

Row count is 64186394

What’s happening here? Well, you’ve essentially turned csv_reader() into a generator function. This version opens a file, loops through each line, and yields each row, instead of returning it.

You can also define a generator expression (also called a generator comprehension), which has a very similar syntax to list comprehensions. In this way, you can use the generator without calling a function:

csv_gen = (row for row in open(file_name))

This is a more succinct way to create the list csv_gen. You’ll learn more about the Python yield statement soon. For now, just remember this key difference:

  • Using yield will result in a generator object.
  • Using return will result in the first line of the file only.

Example 2: Generating an Infinite Sequence

Let’s switch gears and look at infinite sequence generation. In Python, to get a finite sequence, you call range() and evaluate it in a list context:

>>>
>>> a = range(5)
>>> list(a)
[0, 1, 2, 3, 4]

Generating an infinite sequence, however, will require the use of a generator, since your computer memory is finite:

def infinite_sequence():
    num = 0
    while True:
        yield num
        num += 1

This code block is short and sweet. First, you initialize the variable num and start an infinite loop. Then, you immediately yield num so that you can capture the initial state. This mimics the action of range().

After yield, you increment num by 1. If you try this with a for loop, then you’ll see that it really does seem infinite:

>>>
>>> for i in infinite_sequence():
...     print(i, end=" ")
...
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29
30 31 32 33 34 35 36 37 38 39 40 41 42
[...]
6157818 6157819 6157820 6157821 6157822 6157823 6157824 6157825 6157826 6157827
6157828 6157829 6157830 6157831 6157832 6157833 6157834 6157835 6157836 6157837
6157838 6157839 6157840 6157841 6157842
KeyboardInterrupt
Traceback (most recent call last):
  File "<stdin>", line 2, in <module>

The program will continue to execute until you stop it manually.

Instead of using a for loop, you can also call next() on the generator object directly. This is especially useful for testing a generator in the console:

>>>
>>> gen = infinite_sequence()
>>> next(gen)
0
>>> next(gen)
1
>>> next(gen)
2
>>> next(gen)
3

Here, you have a generator called gen, which you manually iterate over by repeatedly calling next(). This works as a great sanity check to make sure your generators are producing the output you expect.

Understanding Generators

So far, you’ve learned about the two primary ways of creating generators: by using generator functions and generator expressions. You might even have an intuitive understanding of how generators work. Let’s take a moment to make that knowledge a little more explicit.

Generator functions look and act just like regular functions, but with one defining characteristic. Generator functions use the Python yield keyword instead of return. Recall the generator function you wrote earlier:

def infinite_sequence():
    num = 0
    while True:
        yield num
        num += 1

This looks like a typical function definition, except for the Python yield statement and the code that follows it. yield indicates where a value is sent back to the caller, but unlike return, you don’t exit the function afterward.

Instead, the state of the function is remembered. That way, when next() is called on a generator object (either explicitly or implicitly within a for loop), the previously yielded variable num is incremented, and then yielded again. Since generator functions look like other functions and act very similarly to them, you can assume that generator expressions are very similar to other comprehensions available in Python.

 

二、List Comprehensions 

[x * 2 for x in range(10)], 返回的是list

三、 Generator Expressions
>>> gen_exp = (x ** 2 for x in range(10) if x % 2 == 0)
像这样的形式,没有用yield,但是返回的是generator
 

 

标签:generator,python,yield,file,相关,csv,gen,row
From: https://www.cnblogs.com/saaspeter/p/17264678.html

相关文章

  • python笔记3(字典)
    字典1.字典的创建1.{}2.dict()3.通过fromkeys创建值为空的字典2.字典元素的访问通过get()方法以a={"name":"jiachao","age":20,"school":"tongda"}为例如若结......
  • python笔记4(控制语句)
    控制语句一:选择结构A:单分支结构if(条件语句):(缩进)结果缩进:pytharm中默认四个空格=TAB键例:a=input("请输入一个小于10的数:")ifint(a)<10:print(a)运行结果:条......
  • python 视频转代码视频
     #-*-coding:utf-8-*-#coding:utf-8importargparseimportosimportcv2importsubprocessfromcv2importVideoWriter,VideoWriter_fourcc,imread,resizefr......
  • AC自动机相关模板
    P5357#include<bits/stdc++.h>#defineintlonglong#defineN200005usingnamespacestd;intn,cnt[N]={0};strings[N],t;map<string,int>apr;string......
  • Python & Anaconda 基础安装及配置
    原文链接:http://t.csdn.cn/nzH5w说明使用系统:Windows11本文暂时只包含最基础的PyCharm&Anaconda安装及环境配置,供自己记录使用,更为具体的配置方法请参考原文一、......
  • python3绕过360添加用户
    1.环境说明当前具有高权限账户会话,高权限webshell之类的当前环境下有360杀毒软件重点:需要有python3环境,如果没有,通过条件1上传python3环境2.python编写windows-api......
  • sql server相关学习sql语句
    sql脚本---表结构设置点击查看代码ifexists(select*fromsys.objectswherename='Department'andtype='U') droptableDepartmentcreatetableDepartment......
  • #Python 利用python计算百度导航骑行距离(第二篇)批量计算
    https://www.cnblogs.com/simone331/p/17218019.html在上一篇中,我们计算了两点的距离(链接为上篇文章),但是具体业务中,往往会存在一次性计算多组,上百甚至上千的距离。所以......
  • python apscheduler 定时任务的基本使用-8-线程执行器ThreadPoolExecutor
    pythonapscheduler定时任务的基本使用-8-线程执行器ThreadPoolExecutor1、线程执行器ThreadPoolExecutor先说个人总结假设启动线程数为N,任务数为M,misfire_grace_tim......
  • Python高级特性-生成器
    前言生成器相比普通迭代器的实现,不会像普通迭代器生成完整集合返回,而是一边循环一边计算的机制,从而节省大量的空间。普通迭代器deftriangles(line):result=[]......