時(shí)間:2023-09-04 22:18:01 | 來(lái)源:網(wǎng)站運(yùn)營(yíng)
時(shí)間:2023-09-04 22:18:01 來(lái)源:網(wǎng)站運(yùn)營(yíng)
大神kennethreitz寫(xiě)出requests-html,號(hào)稱為人設(shè)計(jì)的網(wǎng)頁(yè)解析庫(kù):requests庫(kù)的作者kennethreitz又設(shè)計(jì)出一個(gè)新的庫(kù)requests-html. 目前stars數(shù)高達(dá)9195pip install requests-html
第一頁(yè) https://book.douban.com/tag/小說(shuō)第二頁(yè) https://book.douban.com/tag/小說(shuō)?start=20&type=T第三頁(yè) https://book.douban.com/tag/小說(shuō)?start=40&type=T第四頁(yè) https://book.douban.com/tag/小說(shuō)?start=60&type=T
from bs4 import BeautifulSoupimport requests base = 'https://book.douban.com/tag/小說(shuō)?start={page}&type=Theaders = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.121 Safari/537.36'}for i in range(100): url = base.format(page=i*20) resp = requests.get(url, headers=headers) bsObj = BeautifulSoup(resp.text, 'html.parser')
from requests_html import HTMLSessionsession = HTMLSession()r = session.get('https://book.douban.com/tag/小說(shuō)')for html in r.html: print(html)
<HTML url='https://book.douban.com/tag/%E5%B0%8F%E8%AF%B4'>
from requests_html import HTMLSessionsession = HTMLSession()r = session.get('https://python.org/')r
<Response [200]>
r.text[:50]
'<!doctype html>/n<!--[if lt IE 7]> <html class="n'
r.content[:50]
b'<!doctype html>/n<!--[if lt IE 7]> <html class="n'
r.html
<HTML url='https://www.python.org/'>
#混合著絕對(duì)和相對(duì)網(wǎng)址print(len(r.html.links))list(r.html.links)[:5]
119['/success-stories/category/arts/', 'https://kivy.org/', 'https://www.python.org/psf/codeofconduct/', 'http://www.scipy.org', 'https://docs.python.org/3/license.html']
print(len(r.html.absolute_links))list(r.html.absolute_links)[:5]
119['https://kivy.org/', 'https://www.python.org/psf/codeofconduct/', 'http://www.scipy.org', 'https://jobs.python.org', 'https://docs.python.org/3/license.html']
https://pythonclock.org/
, 我們看到有一個(gè)倒計(jì)時(shí)時(shí)間表。這個(gè)頁(yè)面內(nèi)置了from requests_html import HTMLSessionsession = HTMLSession()r2 = session.get('https://pythonclock.org/')r2.html.search('Python 2.7 will retire in...{}Enable Guido Mode')[0]
'</h1>/n </div>/n <div class="python-27-clock"></div>/n <div class="center">/n <div class="guido-button-block">/n <button class="js-guido-mode guido-button">'
r2.html.render()r2.html.search('Python 2 will retire in only {months} months!')
'</h1>/n </div>/n <div class="python-27-clock is-countdown"><span class="countdown-row countdown-show6"><span class="countdown-section"><span class="countdown-amount">1</span><span class="countdown-period">Year</span></span><span class="countdown-section"><span class="countdown-amount">2</span><span class="countdown-period">Months</span></span><span class="countdown-section"><span class="countdown-amount">28</span><span class="countdown-period">Days</span></span><span class="countdown-section"><span class="countdown-amount">16</span><span class="countdown-period">Hours</span></span><span class="countdown-section"><span class="countdown-amount">52</span><span class="countdown-period">Minutes</span></span><span class="countdown-section"><span class="countdown-amount">46</span><span class="countdown-period">Seconds</span></span></span></div>/n <div class="center">/n <div class="guido-button-block">/n <button class="js-guido-mode guido-button">'
periods = [element.text for element in r.html.find('.countdown-period')]amounts = [element.text for element in r.html.find('.countdown-amount')]countdown_data = dict(zip(periods, amounts))countdown_data
{'Year': '1', 'Months': '2', 'Days': '5', 'Hours': '23', 'Minutes': '34', 'Seconds': '37'}
r.html.find('#about')
[<Element 'li' aria-haspopup='true' class=('tier-1', 'element-1') id='about'>]
about = r.html.find('#about',first=True)about
<Element 'li' aria-haspopup='true' class=('tier-1', 'element-1') id='about'>
r = session.get('https://github.com/')htmlObj = r.htmlhtmlObj.xpath('a',first=True)
<Element 'a' class=('btn', 'ml-2') href='https://help.github.com/articles/supported-browsers'>
關(guān)鍵詞:設(shè)計(jì),號(hào)稱
客戶&案例
營(yíng)銷資訊
關(guān)于我們
客戶&案例
營(yíng)銷資訊
關(guān)于我們
微信公眾號(hào)
版權(quán)所有? 億企邦 1997-2025 保留一切法律許可權(quán)利。