首页 > 其他分享 >selenium click skip_button("introjs-skipbutton")

selenium click skip_button("introjs-skipbutton")

时间:2023-06-05 16:11:44浏览次数:46  
标签:handle skipbutton chrome skip button self soup options page

class INTERFACING():

    def __init__(self):
        self.driver_initialized = False
        self.driver = ''
        self.MAX_TRIALS = 2
        # self.chrome_version = get_google_chrome_version()

    def make_soup(self):
        return BeautifulSoup(self.driver.page_source, 'lxml')  # etree.HTML()

    def current_url(self):
        return self.driver.current_url

    def get_driver(self):

        # uc.TARGET_VERSION = get_google_chrome_version()
        chrome_options = uc.ChromeOptions()

        # chrome_options.add_argument("--headless")
        chrome_options.add_argument("--window-size=1920.,1080")
        chrome_options.add_argument("--disable-extensions")
        chrome_options.add_argument("--disable-dev-shm-usage")
        chrome_options.add_argument("--disable-popup-blocking")
        chrome_options.add_argument("--profile-directory=Default")
        chrome_options.add_argument("--ignore-certificate-errors")
        chrome_options.add_argument("--disable-plugins-discovery")
        chrome_options.add_argument("--incognito")
        chrome_options.add_argument("--no-first-run")
        chrome_options.add_argument("--no-service-autorun")
        chrome_options.add_argument("--no-default-browser-check")
        chrome_options.add_argument("--password-store=basic")
        chrome_options.add_argument("--no-sandbox")
        chrome_options.add_argument('--disable-application-cache')
        chrome_options.add_argument('--disable-gpu')
        chrome_options.add_argument("--disable-setuid-sandbox")
        chrome_options.add_argument(
            "user-agent=Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.82 Safari/537.36"
        )
        self.driver = uc.Chrome(options=chrome_options, version_main="113")
        # self.browser = uc.Chrome(options=chrome_options, version_main=113)
        time.sleep(10)
        self.driver_initialized = True

    def close_driver(self):
        self.driver.quit()

    def get_selenium_response(self, url):

        # try:
        if not self.driver_initialized:
            self.get_driver()
        else:
            pass
        self.driver.get(url)
        time.sleep(3)
        soup = self.make_soup()
        return soup

    def get_page_source(self):
        return self.driver.page_source

    def clicking(self, xpath):
        elem = self.driver.find_element(By.XPATH, xpath)
        elem.click()
        time.sleep(random.randint(2, 3))

    def entering_values(self, xpath, value):
        elem = self.driver.find_element(By.XPATH, xpath)
        elem.clear()
        elem.send_keys(value)
        time.sleep(random.randint(2, 4))

    def send_keys(self, xpath):
        elem = self.driver.find_element(By.XPATH, xpath).send_keys(Keys.RETURN)

    def going_back(self):
        self.driver.execute_script("window.history.go(-1)")
        time.sleep(1)

    def refresh_page(self):
        self.driver.refresh()

    def close_handle(self):
        self.driver.close()

    def get_current_handle(self):
        return self.driver.current_window_handle

    def get_all_handles(self):
        return self.driver.window_handles

    def swtich_to_window(self, handle):
        self.driver.switch_to.window(handle)

    def switch_handle(self, second_handle=''):

        all_handles = self.get_all_handles()
        for handle in all_handles:
            self.main_page_handle = self.get_current_handle()
            if handle == self.main_page_handle:
                continue

            if second_handle and handle == second_handle:
                continue

            self.swtich_to_window(handle)

            return handle

    def close_handles(self, page_handle, second_handle):

        all_handles = self.get_all_handles()

        for handle in all_handles:
            if handle == page_handle:
                try:
                    self.close_handle()
                except:
                    pass

        self.swtich_to_window(second_handle)

    def skip_button(self, class_item):
        count = 0
        while 1:

            soup = self.make_soup()

            try:
                self.clicking(f'//a[contains(@class,"{class_item}")]')
                break
            except Exception as error:
                print('skip button not yet visible')

            if count > 3:
                try:
                    all_results = soup.find('table', class_='el-table__body').tbody.find_all('tr', class_=re.compile(
                        'el-table__row'))
                    return True
                except:
                    if soup.find('span', class_='el-table__empty-text') is not None:
                        return True
            try:
                self.clicking('//span[text()="Medical Devices"]')
                break
            except:
                pass

            time.sleep(2)

            count += 1

            if count == 20:
                break




    def search_data(self, current_query,page_num):
        if page_num == 1:
            self.entering_values('//*[@id="home"]/main/div[1]/div[7]/div/div[2]/input',current_query)
            self.clicking('//*[@id="home"]/main/div[1]/div[7]/div/div[2]/div/button')
            second_handle = self.switch_handle()
            self.skip_button('introjs-nextbutton')
            self.skip_button("introjs-skipbutton")
            soup = self.make_soup()

            if soup.find('span', class_='el-table__empty-text') is not None:
                pass
            else:
                print('Selecting 20 per page...')
                count = 0
                while True:
                    soup = self.make_soup()
                    try:
                        page_selector = soup.find('input', class_='el-input__inner')
                        if page_selector.attrs.get("placeholder"):
                            break
                    except Exception as error:
                        print('Record not yet loaded: ', count)
                    time.sleep(3)
                    count += 1
                    if not count % 3:
                        print('page refreshed....')
                        self.refresh_page()
                    if count >= 51:
                        break
                self.clicking('//*[@id="home"]/div[3]/div[3]/div/div/span[2]/div/div[1]/input')
                self.clicking(
                    '//ul[@class="el-scrollbar__view el-select-dropdown__list"]//span[text()="20条/页"]')
        if page_num != 1:
            while 1:
                try:
                    self.entering_values('//input[@type="number"]',page_num)
                    break
                except:
                    print('error in entering page num')

                time.sleep(3)
            self.send_keys('//input[@type="number"]')
            time.sleep(3)
        while 1:
            soup = self.make_soup()

            try:
                all_results = soup.find('table', class_='el-table__body').tbody.find_all('tr', class_=re.compile(
                    'el-table__row'))
            except:
                if soup.find('span', class_='el-table__empty-text') is not None:
                    print('No Results...')
                    all_results = []

            total_results = int(soup.find('span', class_='el-pagination__total').text.strip().split()[1])
            ending_page = total_results // 20 + 1

            while 1:

                # sometimes it takes long to load all the records on the page, so here making sure we loaded all 20 records or
                # if not then making usre it's the last page
                soup = self.make_soup()
                all_results = soup.find('table', class_='el-table__body').tbody.find_all('tr', class_=re.compile(
                    'el-table__row'))
                if len(all_results) == 20:
                    break
                if len(all_results) < 20 and ending_page == page_num:
                    break
                print(all_results, " : ", total_results, " : ", ending_page)
                time.sleep(3)

            # each click on the site opens a new window, so here we are switching windows and then closing windows once data read.
            for _result in range(len(all_results)):
                result = all_results[_result].find_all('td')
                if not result:
                    continue

                result_title = result[1].text.strip()

                print(page_num, " : ", ending_page, " : ", _result, " / ", len(all_results), " : ",
                      result_title, " : ", total_results, " : ", ending_page)

            print(f"page_num: {page_num} Done!")
            page_num += 1
            if page_num > ending_page:
                break
            next_button = soup.find('button', class_='btn-next').attrs
            if 'disabled' in next_button:
                break
            self.clicking('//button[@class="btn-next"]')
            time.sleep(3)
        # self.close_handles(second_handle, self.main_page_handle)




if __name__ == '__main__':
    REY_NUM = 5
    next_year = datetime.now().year + 1
    url = r'https://www.nmpa.gov.cn/datasearch/search-result.html'
    # with Display(visible=0, size=(1920, 1080)) as display:
    for _ in range(REY_NUM):
        try:
            handle = INTERFACING()
            soup = handle.get_selenium_response(url)
            handle.skip_button("introjs-skipbutton")

            soup = handle.make_soup()

            if soup.find('div', class_='header-main') is None:
                print("访问失败!")

            main_page_handle = handle.get_current_handle()

            count = 0
            while 1:
                soup = handle.make_soup()

                try:
                    handle.clicking('//span[text()="Medical Devices"]')
                    break
                except Exception as error:
                    print('Medical button not yet visible')

                try:
                    all_results = soup.find('table', class_='el-table__body').tbody.find_all('tr',class_=re.compile('el-table__row'))
                    break
                except:
                    pass

                response = handle.skip_button('introjs-skipbutton')

                if response:
                    break

                time.sleep(1)

                count += 1

                if count >= 5:
                    break
            handle.clicking("//*[@class='pc-max el-row']/div/a[@title='一次性使用医疗器械产品']")
            for device_type in ["械备", "注进", "注准"]:
                for year in range(2020, 2022):
                    current_query = f'{device_type}{year}'
                    handle.search_data(current_query,1)
                    # print(f'{device_type}{_year}')



        except Exception as e:
            print(f'爬取NMPADisposableProductsRequester数据失败: 详情{e}')

        handle.close_driver()
        time.sleep(60)
    else:
        raise Exception(
            f"已经重试{REY_NUM}次, 爬取NMPADisposableProductsRequester数据失败, 详情{e}")

  跳过弹窗  handle.skip_button("introjs-skipbutton")

 之后

 

 

class INTERFACING():
    def __init__(self):        self.driver_initialized = False        self.driver = ''        self.MAX_TRIALS = 2        # self.chrome_version = get_google_chrome_version()
    def make_soup(self):        return BeautifulSoup(self.driver.page_source, 'lxml')  # etree.HTML()
    def current_url(self):        return self.driver.current_url
    def get_driver(self):
        # uc.TARGET_VERSION = get_google_chrome_version()        chrome_options = uc.ChromeOptions()
        # chrome_options.add_argument("--headless")        chrome_options.add_argument("--window-size=1920.,1080")        chrome_options.add_argument("--disable-extensions")        chrome_options.add_argument("--disable-dev-shm-usage")        chrome_options.add_argument("--disable-popup-blocking")        chrome_options.add_argument("--profile-directory=Default")        chrome_options.add_argument("--ignore-certificate-errors")        chrome_options.add_argument("--disable-plugins-discovery")        chrome_options.add_argument("--incognito")        chrome_options.add_argument("--no-first-run")        chrome_options.add_argument("--no-service-autorun")        chrome_options.add_argument("--no-default-browser-check")        chrome_options.add_argument("--password-store=basic")        chrome_options.add_argument("--no-sandbox")        chrome_options.add_argument('--disable-application-cache')        chrome_options.add_argument('--disable-gpu')        chrome_options.add_argument("--disable-setuid-sandbox")        chrome_options.add_argument(            "user-agent=Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.82 Safari/537.36"        )        self.driver = uc.Chrome(options=chrome_options, version_main="113")        # self.browser = uc.Chrome(options=chrome_options, version_main=113)        time.sleep(10)        self.driver_initialized = True
    def close_driver(self):        self.driver.quit()
    def get_selenium_response(self, url):
        # try:        if not self.driver_initialized:            self.get_driver()        else:            pass        self.driver.get(url)        time.sleep(3)        soup = self.make_soup()        return soup
    def get_page_source(self):        return self.driver.page_source
    def clicking(self, xpath):        elem = self.driver.find_element(By.XPATH, xpath)        elem.click()        time.sleep(random.randint(2, 3))
    def entering_values(self, xpath, value):        elem = self.driver.find_element(By.XPATH, xpath)        elem.clear()        elem.send_keys(value)        time.sleep(random.randint(2, 4))
    def send_keys(self, xpath):        elem = self.driver.find_element(By.XPATH, xpath).send_keys(Keys.RETURN)
    def going_back(self):        self.driver.execute_script("window.history.go(-1)")        time.sleep(1)
    def refresh_page(self):        self.driver.refresh()
    def close_handle(self):        self.driver.close()
    def get_current_handle(self):        return self.driver.current_window_handle
    def get_all_handles(self):        return self.driver.window_handles
    def swtich_to_window(self, handle):        self.driver.switch_to.window(handle)
    def switch_handle(self, second_handle=''):
        all_handles = self.get_all_handles()        for handle in all_handles:            self.main_page_handle = self.get_current_handle()            if handle == self.main_page_handle:                continue
            if second_handle and handle == second_handle:                continue
            self.swtich_to_window(handle)
            return handle
    def close_handles(self, page_handle, second_handle):
        all_handles = self.get_all_handles()
        for handle in all_handles:            if handle == page_handle:                try:                    self.close_handle()                except:                    pass
        self.swtich_to_window(second_handle)
    def skip_button(self, class_item):        count = 0        while 1:
            soup = self.make_soup()
            try:                self.clicking(f'//a[contains(@class,"{class_item}")]')                break            except Exception as error:                print('skip button not yet visible')
            if count > 3:                try:                    all_results = soup.find('table', class_='el-table__body').tbody.find_all('tr', class_=re.compile(                        'el-table__row'))                    return True                except:                    if soup.find('span', class_='el-table__empty-text') is not None:                        return True            try:                self.clicking('//span[text()="Medical Devices"]')                break            except:                pass
            time.sleep(2)
            count += 1
            if count == 20:                break



    def search_data(self, current_query,page_num):        if page_num == 1:            self.entering_values('//*[@id="home"]/main/div[1]/div[7]/div/div[2]/input',current_query)            self.clicking('//*[@id="home"]/main/div[1]/div[7]/div/div[2]/div/button')            second_handle = self.switch_handle()            self.skip_button('introjs-nextbutton')            self.skip_button("introjs-skipbutton")            soup = self.make_soup()
            if soup.find('span', class_='el-table__empty-text') is not None:                pass            else:                print('Selecting 20 per page...')                count = 0                while True:                    soup = self.make_soup()                    try:                        page_selector = soup.find('input', class_='el-input__inner')                        if page_selector.attrs.get("placeholder"):                            break                    except Exception as error:                        print('Record not yet loaded: ', count)                    time.sleep(3)                    count += 1                    if not count % 3:                        print('page refreshed....')                        self.refresh_page()                    if count >= 51:                        break                self.clicking('//*[@id="home"]/div[3]/div[3]/div/div/span[2]/div/div[1]/input')                self.clicking(                    '//ul[@class="el-scrollbar__view el-select-dropdown__list"]//span[text()="20条/页"]')        if page_num != 1:            while 1:                try:                    self.entering_values('//input[@type="number"]',page_num)                    break                except:                    print('error in entering page num')
                time.sleep(3)            self.send_keys('//input[@type="number"]')            time.sleep(3)        while 1:            soup = self.make_soup()
            try:                all_results = soup.find('table', class_='el-table__body').tbody.find_all('tr', class_=re.compile(                    'el-table__row'))            except:                if soup.find('span', class_='el-table__empty-text') is not None:                    print('No Results...')                    all_results = []
            total_results = int(soup.find('span', class_='el-pagination__total').text.strip().split()[1])            ending_page = total_results // 20 + 1
            while 1:
                # sometimes it takes long to load all the records on the page, so here making sure we loaded all 20 records or                # if not then making usre it's the last page                soup = self.make_soup()                all_results = soup.find('table', class_='el-table__body').tbody.find_all('tr', class_=re.compile(                    'el-table__row'))                if len(all_results) == 20:                    break                if len(all_results) < 20 and ending_page == page_num:                    break                print(all_results, " : ", total_results, " : ", ending_page)                time.sleep(3)
            # each click on the site opens a new window, so here we are switching windows and then closing windows once data read.            for _result in range(len(all_results)):                result = all_results[_result].find_all('td')                if not result:                    continue
                result_title = result[1].text.strip()
                print(page_num, " : ", ending_page, " : ", _result, " / ", len(all_results), " : ",                      result_title, " : ", total_results, " : ", ending_page)
            print(f"page_num: {page_num} Done!")            page_num += 1            if page_num > ending_page:                break            next_button = soup.find('button', class_='btn-next').attrs            if 'disabled' in next_button:                break            self.clicking('//button[@class="btn-next"]')            time.sleep(3)        # self.close_handles(second_handle, self.main_page_handle)



if __name__ == '__main__':    REY_NUM = 5    next_year = datetime.now().year + 1    url = r'https://www.nmpa.gov.cn/datasearch/search-result.html'    # with Display(visible=0, size=(1920, 1080)) as display:    for _ in range(REY_NUM):        try:            handle = INTERFACING()            soup = handle.get_selenium_response(url)            handle.skip_button("introjs-skipbutton")
            soup = handle.make_soup()
            if soup.find('div', class_='header-main') is None:                print("访问失败!")
            main_page_handle = handle.get_current_handle()
            count = 0            while 1:                soup = handle.make_soup()
                try:                    handle.clicking('//span[text()="Medical Devices"]')                    break                except Exception as error:                    print('Medical button not yet visible')
                try:                    all_results = soup.find('table', class_='el-table__body').tbody.find_all('tr',class_=re.compile('el-table__row'))                    break                except:                    pass
                response = handle.skip_button('introjs-skipbutton')
                if response:                    break
                time.sleep(1)
                count += 1
                if count >= 5:                    break            handle.clicking("//*[@class='pc-max el-row']/div/a[@title='一次性使用医疗器械产品']")            for device_type in ["械备", "注进", "注准"]:                for year in range(2020, 2022):                    current_query = f'{device_type}{year}'                    handle.search_data(current_query,1)                    # print(f'{device_type}{_year}')


        except Exception as e:            print(f'爬取NMPADisposableProductsRequester数据失败: 详情{e}')
        handle.close_driver()        time.sleep(60)    else:        raise Exception(            f"已经重试{REY_NUM}次, 爬取NMPADisposableProductsRequester数据失败, 详情{e}")

标签:handle,skipbutton,chrome,skip,button,self,soup,options,page
From: https://www.cnblogs.com/avivi/p/17458068.html

相关文章

  • 首次进入Mysql修改密码报“The MySQL server is running with the --skip-grant-table
    第一次安装完mysql,修改默认密码的时候,报“TheMySQLserverisrunningwiththe--skip-grant-tablesoptionsoitcannotexecutethisstatement”。先刷新mysql然后再重新修改密码即可。mysql>ALTERUSER'root'@'localhost'IDENTIFIEDBY'123456';ERROR1290(H......
  • IOS学习-UIButton
    常用的属性UIButtonTypeUIButtonTypeCustomUIButtonTypeRounedRect一个圆角矩形样式的按钮UIButtonTypeDetailDisclosure一个详细纰漏按钮UIButtonTypeInfoLight一个信息按钮,有一个浅色背景UIButtonTypeInfoDark一个信息按钮,有一个黑暗的背景UIButtonTypeContactAdd一个联系人......
  • Flutter的RawMaterialButton按钮
    RawMaterialButton介绍简介Flutter的RawMaterialButton是一个具有原始材料样式的可点击的按钮控件。它可以用于创建自定义的按钮和交互元素,具有许多可自定义的属性。自定义更灵活。重要属性以下是RawMaterialButton的一些常用属性:onPressed:必需属性,指定按钮按下时的回调函数......
  • 微信小程序常用的view、text、button、image组件
    【黑马程序员前端微信小程序开发教程,微信小程序从基础到发布全流程_企业级商城实战(含uni-app项目多端部署)】https://www.bilibili.com/video/BV1834y1676P/?p=9&share_source=copy_web&vd_source=03c1dc52eeb3747825ecad0412c18ab1这个系列是用来放代码的,方便view普通视图......
  • MySQL中--skip-password参数作用
     MySQL中--skip-password参数探究 本篇使用客户端:mysql版本:MySQL8认证插件:mysql_native_password对于初始化数据库时,若是使用了--initialize-inscure选项,则对于用户root@localhost会使用空密码。2023-05-26T09:20:21.205673+08:006[Warning][MY-010453][Server]roo......
  • el-button 鼠标移开后不自动失去焦点问题
    在按钮点击后强制按钮失去焦点1.在按钮点击的方法后加上失去焦点的方法<el-button@click="showDetail(scope.row,$event)">详情</el-button>showDetail(rowData,event){if(event.target.nodeName==='SPAN'){event.target.parentNode.blur()......
  • The MySQL server is running with the --skip-grant-tables option so it cannot exe
     TheMySQLserverisrunningwiththe--skip-grant-tablesoptionsoitcannotexecutethisstatement 默认情况下,启动MySQL数据库实例期间,会读取所有的权限表条目到内存中,后续被缓存到内存中的权限条目作为依据即刻对后续的控制访问生效(传送门)。使用"skip-grant-tab......
  • 使用wx.BitmapButton添加一个位图作为按钮的图标
    wx.BitmapButton是wxPython中的一个类,表示一个具有图像的按钮。它继承了wx.Button类,并添加了一个位图作为按钮的图标。wx.BitmapButton的构造函数如下:wx.BitmapButton.__init__(self,parent,id=wx.ID_ANY,bitmap=wx.NullBitmap,pos=wx.DefaultPosition,size=wx.Defa......
  • maven命令,跳过单元测试-maven.test.skip和skipTests的区别
    maven命令,跳过单元测试-maven.test.skip和skipTests的区别-DskipTests,不执行测试用例,但是会编译测试用例类,并且会生成相应的class文件,而且此文件放置在target/test-classes下。-Dmaven.test.skip=true,不执行测试用例,也不会编译测试用例类。一、使用maven.test.skip,不但跳过单元......
  • 【前端异常】html页面中的button按钮会自动提交form表单的问题以及解决方案
    情景描述有时候我们可能需要在表单中放置多个按钮,比如表单页面常见的按钮有创建和取消。点击创建按钮会触发单击响应事件,在单击响应事件中进行提交表单,这没有任何问题。点击取消按钮的时候,触发对应的单击响应事件,这个单击响应事件中主要处理关闭表单页面逻辑,所以会关闭页面,这也正常......