時(shí)間:2023-06-07 18:54:02 | 來(lái)源:網(wǎng)站運(yùn)營(yíng)
時(shí)間:2023-06-07 18:54:02 來(lái)源:網(wǎng)站運(yùn)營(yíng)
鏈家網(wǎng)全國(guó)省份城市的url地址:import requestsfrom requests.exceptions import RequestExceptionfrom bs4 import BeautifulSoupimport jsondef fetch(url): try: # proxies = {'http': 'http://172.17.0.3:7890'} headers = { 'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64; rv:98.0) Gecko/20100101 Firefox/98.0', } result = requests.get(url, headers=headers) return result.text except RequestException as e: return f"Error {e}!"#解析頁(yè)面,bs4真的慢,好久沒(méi)用了體驗(yàn)哈,parsel超好用,這個(gè)迭代了三次都 O 3了。def pase(result): bs = BeautifulSoup(result, 'lxml') ul = bs.find('ul', attrs={'class': 'city_list_ul'}) li = ul.find_all('div', attrs={'class': 'city_list'}) for i in li: title = i.find('div', attrs={'class': 'city_list_tit c_b'}) table = title.text datas = {table: [], 'url_link': []} city_ul = i.find_all('ul') for j in city_ul: a = j.find_all('a') for a_ in a: datas.get(table).append(a_.text) datas.get('url_link').append(a_.attrs['href']) print(datas) yield datas#將數(shù)據(jù)寫入json格式文件,也可以是其他合適的def back_datas(data): def datas(): for d in data: yield d return datas()def end_save_datas(da): datas = {'result': [data for data in back_datas(da)]} with open('city.json', 'a') as fp: json.dump(datas, fp, indent=4)if __name__ == "__main__": res = fetch("https://www.lianjia.com/city/") data = pase(res) end_save_datas(data)
關(guān)鍵詞:城市,地址,省份
客戶&案例
營(yíng)銷資訊
關(guān)于我們
客戶&案例
營(yíng)銷資訊
關(guān)于我們
微信公眾號(hào)
版權(quán)所有? 億企邦 1997-2025 保留一切法律許可權(quán)利。