标签：task img self 线程 time 进程多线程

【一】什么是线程

在操作系统中，每一个进程都有一块内存空间地址，你的程序就泡在这块内存地址上

不可能一个程序只有一个进程在处理所有数据和逻辑

于是就有了线程这个概念：在进程内部开设的处理程序的进程

操作系统 --> 运行一个程序叫进程 ---> 进程里面又开了一个进程 ---> 改名叫线程

进程只是用来将所有资源总结起来的单位，线程才是CPU上的执行单位

多线程就是在进程CPU处理多个任务的逻辑

例子

进程就是你的资源单位就是车间 ---> 存储设备及资源

线程就是你的执行单位就是流水线 --> 负责对数据进行加工和处理

注意

进程和线程都是抽象的概念

【二】进程和线程创建的开销问题

进程的创建开销 > 线程的创建开销

比如一个软件就是一个工厂

工厂内部有很多流水线

流水线工作需要资源(基于电工作)

电源就是一个CPU

一个车间就是一个进程一个车间内部至少会有一个线程(流水线)

创建进程 ---> 相当于你要创建一个车间 ---> 选地，买设备

创建线程 --> 有了车间，只需要增加流水线 --> 减少了开销

【三】进程和线程之间的关系

进程与进程之间的关系是竞争关系

在同一块内存中抢占内存空间 ---> 内存空间越大你的程序跑的可能就越快

线程与线程之间的关系是协作关系

线程是在同一块进程之下的工友

不能自己干自己

百度网盘下载内容的时候 5 个下载明额 10 个

【四】进程和线程的区别

○ 线程共享创建它的进程的地址空间；进程具有自己的地址空间。

● Threads have direct access to the data segment of its process; processes have their own copy of the data segment of the parent process.

○ 线程可以直接访问其进程的数据段；进程具有其父进程数据段的副本。

● Threads can directly communicate with other threads of its process; processes must use interprocess communication to communicate with sibling processes.

○ 线程可以直接与其进程中的其他线程通信；进程必须使用进程间通信与同级进程进行通信。

● New threads are easily created; new processes require duplication of the parent process.

○ 新线程很容易创建；新进程需要复制父进程。

● Threads can exercise considerable control over threads of the same process; processes can only exercise control over child processes.

○ 线程可以对同一进程的线程行使相当大的控制权。进程只能控制子进程。

● Changes to the main thread (cancellation, priority change, etc.) may affect the behavior of the other threads of the process; changes to the parent process does not affect child processes.

○ 对主线程的更改（取消，优先级更改等）可能会影响该进程其他线程的行为；对父进程的更改不会影响子进程。

开设多进程的时候每一个进程之间的数据是相互隔离的

每一个人都有 1 没一人都该 0

对于多线程来说，所有线程共享一个进程中的数据

只有一个 1 0 -1 -2 -3

思考题

（1）案例需求：

开发一款文字处理软件 --- 进程还是线程进程

● 获取用户输入的功能 --- 进程还是线程线程

● 实时展示到屏幕的功能 --- 进程还是线程线程

● 自动保存数据到硬盘的功能 --- 进程还是线程线程

每一个功能应该做成进程还是线程

● 开启一个文字处理软件进程

● 该进程肯定需要办不止一件事情，比如监听键盘输入，处理文字，定时自动将文字保存到硬盘

● 这三个任务操作的都是同一块数据，因而不能用多进程。

● 只能在一个进程里并发地开启三个线程

● 如果是单线程，那就只能是，键盘输入时，不能处理文字和自动保存，自动保存时又不能输入和处理文字。

多进程和多线程只有在遇到阻塞的时候才能体现出速度快，如果没有阻塞，就没有效率

【一】threading模块介绍

from threading import Thread
import time
import random
import os

class MyThread(Thread):
def init(self, name, args, **kwargs):
super().init(args, **kwargs)
self.name = name

def run(self):
    print(f'{self.name} is starting  .... ')
    sleep_time = random.randint(1, 5)
    print(f'{self.name} is start sleep time is {sleep_time}s  .... ')
    time.sleep(sleep_time)
    print(f'{self.name} is end sleep time is {sleep_time}s  .... ')
    print(f'{self.name} is ending  .... ')

计算 1+ 100

直接通过代码运行快直接通过内存

还是通过多进程快/多线程快：打开文件

【1】方式一：通过线程的对象来开启多线程

def work(name):
print(f'{name} is starting .... ')
print(f'thread {name} pid is {os.getpid()} ppid is {os.getppid()} .... ')
sleep_time = random.randint(1, 5)
print(f'{name} is start sleep time is {sleep_time}s .... ')
time.sleep(sleep_time)
print(f'{name} is end sleep time is {sleep_time}s .... ')
print(f'{name} is ending .... ')

def main():
task_list = []
for i in range(5):
task = Thread(
target=work,
args=('thread-{}'.format(i),)
)
task.start()
task_list.append(task)
[task.join() for task in task_list]

def main_one():
task_list = []
for i in range(5):
task = MyThread(name='thread-{}'.format(i))
task.start()
task_list.append(task)
[task.join() for task in task_list]

if name == 'main':
print(f'main process pid is {os.getpid()} ppid is {os.getppid()} .... ')
print(f'main process start .... ')
start_time = time.time()
main()
# main_one()
end_time = time.time()
print(f'main process end .... ')
print(f'总耗时 :>>>> {end_time - start_time}s')

# main process pid is 313892 ppid is 280184 ....

# thread thread-0 pid is 313892 ppid is 280184  ....

线程共享数据
from threading import Thread
from multiprocessing import Process

number = 999

def work(name):
global number
print(f'{name} change before {number}')
number += 1
print(f'{name} change after {number}')

def main_process():
task_list = []
for i in range(5):
task = Process(
target=work,
args=(f'process_{i}',)
)
task.start()
task_list.append(task)
[task.join() for task in task_list]
print(number)

def main_thread():
task_list = []
for i in range(5):
task = Thread(
target=work,
args=(f'thread_{i}',)
)
task.start()
task_list.append(task)
[task.join() for task in task_list]
print(number)

if name == 'main':
# main_process()
main_thread()

# thread_0 change before 999
# thread_0 change after 1000
# thread_1 change before 1000
# thread_1 change after 1001
# thread_2 change before 1001
# thread_2 change after 1002
# thread_3 change before 1002
# thread_3 change after 1003
# thread_4 change before 1003
# thread_4 change after 1004
# 1004

多线程比多进程快
import time

【一】需要两个模块

【1】模仿浏览器对网址发起请求

import requests # pip install requests

【2】解析页面数据的模块

from lxml import etree # pip install lxml

【3】模仿浏览器

from fake_useragent import UserAgent # pip install fake-useragent

from multiprocessing import Process
from threading import Thread

【二】解析网页请求及数据

class SpiderImg(object):
def init(self):
self.base_area = 'https://pic.netbian.com'
self.base_url = 'https://pic.netbian.com/4kdongman/'
self.headers = {
'User-Agent': UserAgent().random
}

def spider_tag_url(self):
    img_data_dict = {}
    response = requests.get(self.base_url, headers=self.headers)
    # response.encoding = 'utf-8'
    response.encoding = 'gbk'
    page_text = response.text
    tree = etree.HTML(page_text)
    li_list = tree.xpath('//*[@id="main"]/div[4]/ul/li')
    for li in li_list:
        # //*[@id="main"]/div[4]/ul/li[1]/a
        # ./a
        detail_href = self.base_area + li.xpath('./a/@href')[0]
        response = requests.get(detail_href, headers=self.headers)
        response.encoding = 'gbk'
        page_text = response.text
        tree = etree.HTML(page_text)
        img_url = self.base_area + tree.xpath('//*[@id="img"]/img/@src')[0]
        # https://pic.netbian.com/uploads/allimg/240521/232729-17163052491e1c.jpg
        img_title = img_url.split('/')[-1]
        # 240521/232729-17163052491e1c.jpg
        img_data_dict[img_title] = img_url
    return img_data_dict

def download_img(self, img_url, img_title):
    # 获取到图片的二进制数据
    response = requests.get(img_url, headers=self.headers)
    img_data = response.content
    with open(f'{img_title}', 'wb') as fp:
        fp.write(img_data)
    print(f'当前下载 {img_title} 成功!')

def main_process(self):
    start_time = time.time()
    img_data_dict = self.spider_tag_url()
    end_time = time.time()
    print(f'抓取所有图片连接数据 {len(img_data_dict)} 总耗时 :>>>> {end_time - start_time}s')
    task_list = []
    for img_title, img_url in img_data_dict.items():
        task = Process(
            target=self.download_img,
            kwargs={'img_url': img_url, 'img_title': img_title}
        )
        task.start()
        task_list.append(task)
    for task in task_list:
        task.join()

def main_thread(self):
    start_time = time.time()
    img_data_dict = self.spider_tag_url()
    end_time = time.time()
    print(f'抓取所有图片连接数据  {len(img_data_dict)}  总耗时 :>>>> {end_time - start_time}s')
    task_list = []
    for img_title, img_url in img_data_dict.items():
        task = Thread(
            target=self.download_img,
            kwargs={'img_url': img_url, 'img_title': img_title}
        )
        task.start()
        task_list.append(task)
    for task in task_list:
        task.join()

if name == 'main':
spider = SpiderImg()
start_time = time.time()
# spider.main_process() # 下载所有图片总耗时 :>>>> 7.990673542022705s
spider.main_thread() # 下载所有图片总耗时 :>>>> 5.58322811126709s
end_time = time.time()
print(f'下载所有图片总耗时 :>>>> {end_time - start_time}s')

标签：task,img,self,线程,time,进程,多线程
From： https://www.cnblogs.com/zenopan101861/p/18206055

多线程