時(shí)間:2023-02-12 03:48:01 | 來源:建站知識(shí)
時(shí)間:2023-02-12 03:48:01 來源:建站知識(shí)
site.ip138.com
為例,打開F12
,輸入一個(gè)IP查詢,觀察控制臺(tái)請(qǐng)求,看到下圖中信息jsoup
來解析HTML簡(jiǎn)直完美。jsoup 是一款Java 的HTML解析器,可直接解析某個(gè)URL地址、HTML文本內(nèi)容。它提供了一套非常省力的API,可通過DOM,CSS以及類似于jQuery的操作方法來取出和操作數(shù)據(jù)。
//解析成Document對(duì)象Document document = Jsoup.parse(result);if (document == null) { logger.error("Jsoup parse get document null!");}//根據(jù)ID屬性“l(fā)ist”獲取元素Element對(duì)象(有沒有感覺很像jQuery?)Element listEle = document.getElementById("list");//根據(jù)class屬性和屬性值篩選元素Element集合,并通過eachText()遍歷元素內(nèi)容return listEle.getElementsByAttributeValue("target", "_blank").eachText();
result的內(nèi)容通過HttpClient模擬HTTP請(qǐng)求HttpGet httpGet = new HttpGet(url);httpGet.setHeader("Accept", "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8");httpGet.setHeader("Accept-Encoding", "gzip, deflate");httpGet.setHeader("Accept-Language", "zh-CN,zh;q=0.9");httpGet.setHeader("Cache-Control", "max-age=0");httpGet.setHeader("Connection", "keep-alive");httpGet.setHeader("Cookie", "Hm_lvt_d39191a0b09bb1eb023933edaa468cd5=1553090128; BAIDU_SSP_lcr=https://www.baidu.com/link?url=FS0ccst469D77DpdXpcGyJhf7OSTLTyk6VcMEHxT_9_&wd=&eqid=fa0e26f70002e7dd000000065c924649; pgv_pvi=6200530944; pgv_si=s4712839168; Hm_lpvt_d39191a0b09bb1eb023933edaa468cd5=1553093270");httpGet.setHeader("DNT", "1");httpGet.setHeader("Host", host);httpGet.setHeader("Upgrade-Insecure-Requests", "1");httpGet.setHeader("User-Agent", "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36");String result = HttpUtils.doGet(httpGet);
HTTP請(qǐng)求工具類public class HttpUtils { private static Logger logger = LoggerFactory.getLogger(HttpUtils.class); public static String doGet(HttpGet httpGet) { CloseableHttpClient httpClient = null; try { httpClient = HttpClients.createDefault(); RequestConfig requestConfig = RequestConfig.custom() .setConnectTimeout(5000).setConnectionRequestTimeout(10000) .setSocketTimeout(5000).build(); httpGet.setConfig(requestConfig); HttpResponse httpResponse = httpClient.execute(httpGet); if (httpResponse.getStatusLine().getStatusCode() == 200 || httpResponse.getStatusLine().getStatusCode() == 302) { HttpEntity entity = httpResponse.getEntity(); return EntityUtils.toString(entity, "utf-8"); } else { logger.error("Request StatusCode={}", httpResponse.getStatusLine().getStatusCode()); } } catch (Exception e) { logger.error("Request Exception={}:", e); } finally { if (httpClient != null) { try { httpClient.close(); } catch (IOException e) { logger.error("關(guān)閉httpClient失敗", e); } } } return null; }}
新增Controller@RestControllerpublic class DomainSpiderController { private static Logger logger = LoggerFactory.getLogger(DomainSpiderController.class); @Autowired private DomainSpiderService domainSpiderService; /** * @param ip 119.75.217.109 * @return */ @RequestMapping("/spider/{ip}") @ResponseBody public List<String> domainSpider(@PathVariable("ip") String ip) { long startTime = System.currentTimeMillis(); List<String> domains = domainSpiderService.domainSpiderOfIp138(ip); if(domains == null || domains.size() == 0) { domains = domainSpiderService.domainSpiderOfAizan(ip); } long endTime = System.currentTimeMillis(); logger.info("完成爬蟲任務(wù)總耗時(shí):{}s", (endTime - startTime) / 1000); return domains; }}
啟動(dòng)Spring Boot應(yīng)用,訪問瀏覽器:http://localhost:8080/spider/119.75.217.109 獲得返回結(jié)果如下: dns.aizhan.com
碼云
和Github
上,歡迎下載學(xué)習(xí)關(guān)鍵詞:地址,實(shí)現(xiàn),根據(jù),爬蟲
客戶&案例
營(yíng)銷資訊
關(guān)于我們
客戶&案例
營(yíng)銷資訊
關(guān)于我們
微信公眾號(hào)
版權(quán)所有? 億企邦 1997-2025 保留一切法律許可權(quán)利。