時間:2023-10-04 10:42:01 | 來源:網(wǎng)站運營
時間:2023-10-04 10:42:01 來源:網(wǎng)站運營
數(shù)據(jù)解讀:國內(nèi)最大設計論壇站酷首頁:站酷是國內(nèi)最大的設計設計師交流論壇,涵蓋各類設計以及純藝術。我爬取了站酷首頁1-100頁的數(shù)據(jù)(因為站酷首頁100頁以后網(wǎng)站不保存了拿不到數(shù)據(jù))。import requestsfrom lxml import etreeimport htmlimport reimport timedomain='https://www.zcool.com.cn'baseurl="/?p={}#tab_anchor"patten=re.compile('<.*?>')def getpagenum(a): url=domain+baseurl.format(a) res=requests.get(url) time.sleep(4) return res.textdef gerpage(c): tree = etree.HTML(c) table_row = tree.xpath('//div[@class="card-box"]') boards = [] for row in table_row: board = {} try: board['類別'] = row.xpath('div[@class="card-info"]/p[@class="card-info-type"]')[0].text board['點贊'] = row.xpath('div[2]/p[3]/span[3]')[0].text name2=row.xpath('div[3]/span[1]/a')[0] name2 = etree.tostring(name2).decode('utf-8') name2 = html.unescape(name2) name2 = patten.sub('', name2) name2=name2.strip() board['評論']=row.xpath('div[2]/p[3]/span[2]')[0].text board['作者'] = name2 except Exception as err: #print('error:',err) pass boards.append(board) return boardsdef main(): n=[] for i in range(0,99): c=getpagenum(i) page=gerpage(c) n.append(page) print(n)if __name__ == '__main__': main()
關鍵詞:論壇,設計,數(shù)據(jù)
微信公眾號
版權所有? 億企邦 1997-2025 保留一切法律許可權利。