描述
- 遍历cms网站的本地目录,作为参考来扫描远程网站的目录和文件
实现
- 用os.walk实现嵌套目录的遍历
- 用多线程发起requests请求,考虑线程安全,变量应装在队列中而不是列表中
- python命名元组,类似字典的使用,但比字典消耗的内存更小
point = namedtuple('Point', ['x', 'y'])
- 关于函数的参数*args和**kwargs的介绍:
- 一个星号表示以元组形式导入所有参数
- 两个星号表示以字典形式导入所有参数
- 关于python上下文管理器的介绍:
- 用来管理资源和处理异常
- 通过with来使用
- 有两种实现方式:
- 基于类:简单点说,就是在一个类里,实现了
__enter__
和__exit__
的方法,这个类的实例就是一个上下文管理器 - 基于生成器:不用写类,只用写一个带装饰器的函数。在被装饰函数里,必须是一个生成器(带有yield),而yield之前的代码,就相当于
__enter__
里的内容。yield 之后的代码,就相当于__exit__
里的内容,例子:
- 基于类:简单点说,就是在一个类里,实现了
import contextlib
@contextlib.contextmanager
def open_func(file_name):
# __enter__方法
print('open file:', file_name, 'in __enter__')
file_handler = open(file_name, 'r')
try:
yield file_handler
except Exception as exc:
# deal with exception
print('the exception was thrown')
finally:
print('close file:', file_name, 'in __exit__')
file_handler.close()
return
with open_func('/Users/MING/mytest.txt') as file_in:
for line in file_in:
1/0
print(line)
代码
- 先遍历本地目录,保存到web_paths队列中,再开启多个requests线程,从web_paths中每次去一跳路径进行测试,返回200则放入results队列中
import contextlib
import os
import queue
import sys
import threading
import time
import requests
FILTERED = [".jpg", ".gif", ".png", ".css"]
TARGET = "https://localhost/wordpress"
THREAD = 10
results = queue.Queue()
web_paths = queue.Queue()
def gather_paths():
for root, _, files in os.walk('.'):
for fname in files:
if os.path.splitext(fname)[1] in FILTERED:
continue
path = os.path.join(root, fname)
if path.startswith("."):
path = path[1:]
print(path)
web_paths.put(path)
@contextlib.contextmanager
def chdir(path):
this_dir = os.getcwd()
os.chdir(path)
try:
yield
finally:
os.chdir(this_dir)
def test_remote():
while not web_paths.empty():
path = web_paths.get()
url = f'{TARGET}{path}'
time.sleep(2)
r = requests.get(url)
if r.status_code == 200:
results.put(url)
sys.stdout.write('+')
else:
sys.stdout.write('x')
sys.stdout.flush()
def run():
mythreads = list()
for i in range(THREAD):
print(f"Spawning thread {i}")
t = threading.Thread(target=test_remote)
mythreads.append(t)
t.start
for thread in mythreads:
thread.join()
if __name__ == "__main__":
with chdir("E:\\Project\\Python\\blackhat\\ScanWeb\\wordpress"):
gather_paths()
input("Press return to continue.")
run()
with open("results.txt", "w+") as f:
while not results.empty():
f.write(f"{results.get()}\n")
print("done.")
标签:__,web,Python,拓印,file,import,path,os
From: https://www.cnblogs.com/z5onk0/p/17112233.html