1. selenium转beautifulsoup:
pageSource = driver.page_source soup = BeautifulSoup(pageSource,'html.parser') 2. bs4 查找页面内容: resultPages = soup.find(text=re.compile(u'查询失败,请重新查询!$')) print('resultPages: ' + str(resultPages)) if resultPages == '查询失败,请重新查询!': driver.close() 3. bs4查找页面class内容的ul节点中,查找li节点: resultPages = soup.find("ul",class_="pagination").find_all('li') resultNum = len(list(resultPages))-2 pageNum = int(resultPages[resultNum].text) #获取第resultNum 个节点的文本 4. bs4 查找节点的内容:li.find('div',class_='time').text 5. bs4下一个节点: try: xmID = xmSoup.find(text=re.compile(u'采购编号:$')).next_element.text except: xmID = xmSoup.find(text=re.compile(u'采购编号:$')).next_sibling.text
标签:常用,bs4,text,爬虫,节点,查找,方法,find,resultPages From: https://www.cnblogs.com/feifeidxl/p/17337138.html