時(shí)間:2023-06-21 05:00:01 | 來(lái)源:網(wǎng)站運(yùn)營(yíng)
時(shí)間:2023-06-21 05:00:01 來(lái)源:網(wǎng)站運(yùn)營(yíng)
爬蟲(chóng)實(shí)戰(zhàn)—“網(wǎng)絡(luò)信息安全”公司top100:url_begin = "https://www.tianyancha.com/search/p"url_end = "?key=%E7%BD%91%E7%BB%9C%E4%BF%A1%E6%81%AF%E5%AE%89%E5%85%A8"final_result = []for i in range(1, 6): # 分析前5個(gè)頁(yè)面 url = url_begin + str(i) + url_end # 拼接得到頁(yè)面URL
headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.122 Safari/537.36" }
Requests 簡(jiǎn)化了 urllib 的諸多冗雜且無(wú)意義的操作,并提供了更強(qiáng)大的功能。
Requests 是 Python 所有模塊中最受歡迎的一個(gè),全世界最優(yōu)秀的程序員都在使用它!
import requestsres= requests.get(url,headers=headers)print(res.status_code) #判斷是否正確得到頁(yè)面,返回200說(shuō)明成功
BeautifulSoup 模塊,可以化腐朽為神奇,將一個(gè)復(fù)雜的網(wǎng)頁(yè)結(jié)構(gòu)轉(zhuǎn)化為書(shū)籍目錄的形式供你瀏覽。本項(xiàng)目主要對(duì)公司名稱、天眼查評(píng)分、注冊(cè)資本、成立時(shí)間、法人、經(jīng)營(yíng)范圍幾項(xiàng)信息進(jìn)行提取。
from bs4 import BeautifulSoupsoup = BeautifulSoup(res.text,'html.parser')result = [] # 一個(gè)頁(yè)面的信息#公司名、評(píng)分company = []score = []targets= soup.find_all('div',class_="search-item sv-search-company")for each in targets: temp = each.find('a') company.append(temp.text)for each in targets: try: temp = each.find('span',class_ ="score-num") score.append(int(temp.text)) except: score.append("暫無(wú)")#法人boss = []targets = soup.find_all('div', class_="title -wider text-ellipsis")for each in targets: try: boss.append(each.a.text) except: boss.append("未公開(kāi)")#注冊(cè)資本reg_money = []targets = soup.find_all('div',class_="title -narrow text-ellipsis")for each in targets: try: reg_money.append(int(each.span.text.split("萬(wàn)")[0])) except: reg_money.append("未公開(kāi)")#成立日期date = []targets = soup.find_all('div',class_="title text-ellipsis")for each in targets: try: date.append(each.span.text) except: date.append("未公開(kāi)")# 經(jīng)營(yíng)范圍&歷史名稱service = []targets = soup.find_all('div', class_="search-item sv-search-company")for each in targets: try: temp = each.find('div', class_="match row text-ellipsis" ) service.append(temp.span.text) except: service.append("未公開(kāi)")#匯總信息length = len(company)for i in range(0,length): result.append([company[i], score[i], reg_money[i], date[i], boss[i], service[i]])return result
import openpyxlwb = openpyxl.Workbook()wb.guess_type = Truews = wb.activews.append(["公司名稱", "天眼評(píng)分", "注冊(cè)資本(萬(wàn)元)", "成立時(shí)間", "法人", "其他信息"])for item in data: ws.append(item)wb.save("網(wǎng)絡(luò)信息安全前100.xlsx")
關(guān)鍵詞:安全,實(shí)戰(zhàn),網(wǎng)絡(luò),信息,爬蟲(chóng)
客戶&案例
營(yíng)銷資訊
關(guān)于我們
客戶&案例
營(yíng)銷資訊
關(guān)于我們
微信公眾號(hào)
版權(quán)所有? 億企邦 1997-2025 保留一切法律許可權(quán)利。