打開成果的微博，小作文一樣的微博看著也太爽了吧。。。

@犬來八荒

來用python分析分析狗哥這幾年微" />

国产成人精品无码青草_亚洲国产美女精品久久久久∴_欧美人与鲁交大毛片免费_国产果冻豆传媒麻婆精东

首頁

解決方案&服務(wù)

客戶&案例

營(yíng)銷資訊

關(guān)于我們

15158846557 或

在線咨詢

所在位置：首頁 > 營(yíng)銷資訊 > 網(wǎng)站運(yùn)營(yíng) > python 爬蟲爬微博分析數(shù)據(jù)

python 爬蟲爬微博分析數(shù)據(jù)

時(shí)間：2023-05-20 10:36:02 | 來源：網(wǎng)站運(yùn)營(yíng)

時(shí)間：2023-05-20 10:36:02 來源：網(wǎng)站運(yùn)營(yíng)

python 爬蟲爬微博分析數(shù)據(jù)：

python 爬蟲爬微博分析數(shù)據(jù)

最近剛看完愛情公寓5，里面的大力也太好看了吧。。。

打開成果的微博，小作文一樣的微博看著也太爽了吧。。。

@犬來八荒

來用python分析分析狗哥這幾年微博的干了些啥。

需要的工具有：

? scrapy + pyecharts + pymysql

這些庫的使用我就不說自己百度學(xué)吧。

第一步：當(dāng)然是進(jìn)入狗哥的微博分析了

這里我推選微博手機(jī)版的網(wǎng)站，因?yàn)槭謾C(jī)版的網(wǎng)站比較簡(jiǎn)單，沒那么花里胡哨提取信息方便一點(diǎn)

狗哥的微博：https://m.weibo.cn/u/1927305954?uid=1927305954&t=0&luicode=10000011&lfid=100103type%3D1%26q%3D%E6%88%90%E6%9E%9C

點(diǎn)擊上述圖片的位置

然后刷新下網(wǎng)站，加載json數(shù)據(jù)

通過分析這個(gè)就是加載微博數(shù)據(jù)的json文件，打開看一下

里的data->cards->mblog,就是存放微博文章的各種信息，比如文章點(diǎn)贊數(shù)，評(píng)論數(shù)等等。

把這個(gè)json文件翻到最下面

看到最下面的是1月29號(hào)發(fā)的微博，也就是說一個(gè)json文件存了3月15號(hào)到1月29號(hào)的微博

那么怎么獲取 1月29號(hào)前面的呢？

這里仔細(xì)分析還是有規(guī)律的

我們?cè)诠犯绲闹黜撋舷蛳路?，翻到最后面?huì)自己滾動(dòng)加載新的json文件

新加載的json文件

打開后把前一個(gè)連接與這個(gè)比較一些

第一個(gè)連接：https://m.weibo.cn/api/container/getIndex?uid=1927305954&t=0&luicode=10000011&lfid=100103type%3D1%26q%3D%E6%88%90%E6%9E%9C&type=uid&value=1927305954&containerid=1076031927305954

第二個(gè)加載的：https://m.weibo.cn/api/container/getIndex?uid=1927305954&t=0&luicode=10000011&lfid=100103type%3D1%26q%3D%E6%88%90%E6%9E%9C&type=uid&value=1927305954&containerid=1076031927305954&since_id=4464357465265600

仔細(xì)發(fā)現(xiàn) 前面都一樣，唯一不同的是后面

第二個(gè)加載的多了一盒since_id

然后我們打開第一和json文件

這里有個(gè)since_id

這時(shí)我們就可以大膽推測(cè)一下了

第一次加載的json文件里面有個(gè) since_id

而這個(gè) since_id 也就是下一個(gè) 要加載的json文件

然后下一個(gè)的 json文件里的since_id 也就是下一個(gè)的下一個(gè)的json文件

………………………………

這樣就可把所有的json文件找出來了

你也可以自己找?guī)讉€(gè)驗(yàn)證一下

有了這些數(shù)據(jù)那就開始爬蟲了

第二部：爬取數(shù)據(jù)

我們可以設(shè)置：start_urls 為第一個(gè)出現(xiàn)的json文件連接

start_urls = "https://m.weibo.cn/api/container/getIndex?uid=1927305954&t=0&luicode=10000011&lfid=100103type%3D1%26q%3D%E6%88%90%E6%9E%9C&type=uid&value=1927305954&containerid=1076031927305954"

? since_id # 下下面的id ? created_at # 創(chuàng)建的日期 ? text # 發(fā)布的內(nèi)容 ? source # 發(fā)布文章的設(shè)備 ? scheme # 原文連接 ? reposts_count # 轉(zhuǎn)發(fā)數(shù)量 ? textLength # 文章字?jǐn)?shù) ? comments_count # 評(píng)論個(gè)數(shù) ? attitudes_count # 點(diǎn)贊個(gè)數(shù)

這些是 json里面的數(shù)據(jù)，可以直接通過字典來獲取

然后我也直接貼代碼了

import jsonimport scrapyfrom weibo.items import WeiboItemfrom bs4 import BeautifulSoupclass weibo_spider(scrapy.Spider):    name = "weibo"    start_urls =["https://m.weibo.cn/api/container/getIndex?uid=1927305954&t=0&luicode=10000011&lfid=100103type%3D1%26q%3D%E6%88%90%E6%9E%9C&type=uid&value=1927305954&containerid=1076031927305954"]    url = "https://m.weibo.cn/api/container/getIndex?uid=1927305954&t=0&luicode=10000011&lfid=100103type%3D1%26q%3D%E6%88%90%E6%9E%9C&type=uid&value=1927305954&containerid=1076031927305954&since_id="    #start_urls = ["https://m.weibo.cn/"]    allowed_domains = ["weibo.com", "weibo.cn"]    since_id = ""       # 下下面的id    created_at = ""     # 創(chuàng)建的日期    text = ""           # 發(fā)布的內(nèi)容    source = ""         # 發(fā)布文章的設(shè)備    scheme = ""         # 原文連接    reposts_count = 0   # 轉(zhuǎn)發(fā)數(shù)量    textLength = 0      # 文章字?jǐn)?shù)    comments_count = 0  # 評(píng)論個(gè)數(shù)    attitudes_count = 0 # 點(diǎn)贊個(gè)數(shù)    def parse(self, response):        text_json = json.loads(response.body_as_unicode())        self.since_id = text_json.get('data').get('cardlistInfo').get('since_id')        cards = text_json.get('data').get('cards')        for it in cards:            it_son = it.get('mblog')            if it_son:                self.created_at = it_son['created_at']                self.text = it_son['text']                self.source = it_son['source']                self.scheme = it['scheme']                self.reposts_count = it_son['reposts_count']                self.comments_count = it_son['comments_count']                self.attitudes_count = it_son['attitudes_count']                soup = BeautifulSoup(str(self.text), "html.parser") # 抓取的數(shù)據(jù)是有html標(biāo)簽 去除一下                self.text = soup.get_text()                if len(self.created_at) < 6 :                    self.created_at = "%s%s"%("2020-", self.created_at) #由于今年的微博沒有年份 所有給數(shù)據(jù)處理一下                self.textLength = len(self.text)                items = WeiboItem(created_at=self.created_at, text=self.text, source=self.source, scheme=self.scheme,                                  reposts_count=self.reposts_count, comments_count=self.comments_count, attitudes_count=self.attitudes_count, textLength=self.textLength) # 將數(shù)據(jù)寫入items 文件中                yield items        if not self.since_id:            return        urls = "%s%s"%(self.url, str(self.since_id)) # 獲取的下一個(gè)json鏈接        yield scrapy.Request(urls, callback=self.parse)

scrapy 的 itmes.py 文件

# -*- coding: utf-8 -*-# Define here the models for your scraped items## See documentation in:# https://docs.scrapy.org/en/latest/topics/items.htmlimport scrapyclass WeiboItem(scrapy.Item):    # define the fields for your item here like:    # name = scrapy.Field()    since_id = scrapy.Field()           # 下下面的id    created_at = scrapy.Field()         # 創(chuàng)建的日期    text = scrapy.Field()               # 發(fā)布的內(nèi)容    source = scrapy.Field()             # 發(fā)布文章的設(shè)備    scheme = scrapy.Field()             # 原文連接    reposts_count = scrapy.Field()      # 轉(zhuǎn)發(fā)數(shù)量    textLength = scrapy.Field()         # 文章字?jǐn)?shù)    comments_count = scrapy.Field()     # 評(píng)論個(gè)數(shù)    attitudes_count = scrapy.Field()    # 點(diǎn)贊個(gè)數(shù)

接下來就是導(dǎo)入數(shù)據(jù)庫了

scrapy 的 pipelines.py

# -*- coding: utf-8 -*-# Define your item pipelines here## Don't forget to add your pipeline to the ITEM_PIPELINES setting# See: https://docs.scrapy.org/en/latest/topics/item-pipeline.htmlimport pymysqlimport jsonclass WeiboPipeline(object):    account = {        'user': 'root',        'password': '*******',        'host': 'localhost',        'database': 'python'    }    def mysqlConnect(self):        connect = pymysql.connect(**self.account)        return connect    def __init__(self):        self.connect = self.mysqlConnect()  # 連接數(shù)據(jù)庫        self. cursor = self.connect.cursor(cursor = pymysql.cursors.DictCursor)        #### 以json寫入        #self.fp = open("xiaofuren.json", 'w', encoding='utf-8')    def insertMsg(self, scheme, text, source, reposts_count, comments_count, attitudes_count, textLength, created_at):        try:            self.cursor.execute(                "INSERT INTO %s VALUES( /'%s/' ,/' %s/' ,/' %s/',/' %d/',/' %d/',/' %d/',/' %d/',/' %s/')" % (                    "weibo", scheme, text, source, reposts_count, comments_count, attitudes_count, textLength, created_at)                )            self.connect.commit()        except Exception as e:            print("insert_sql error: " + e)    def open_spider(self, spider):        print("爬蟲開始了******************")    def process_item(self, item, spider):        self.insertMsg( item['scheme'], item['text'], item['source'], item['reposts_count'], item['comments_count'], item['attitudes_count'], item['textLength'], item['created_at'])        return item        #### 以json寫入        # itme_json = json.dumps(dict(item), ensure_ascii=False)        # self.fp.write(itme_json + '/n')        # return item    def close_spider(self, spider):        print("爬蟲結(jié)束***************")        print("數(shù)據(jù)寫入成功")        self.cursor.close() # since_id = ""       # 下下面的id #    created_at = ""     # 創(chuàng)建的日期 #    text = ""           # 發(fā)布的內(nèi)容 #    source = ""         # 發(fā)布文章的設(shè)備 #    scheme = ""         # 原文連接 #    reposts_count = 0   # 轉(zhuǎn)發(fā)數(shù)量 #    textLength = 0      # 文章字?jǐn)?shù) #    comments_count = 0  # 評(píng)論個(gè)數(shù) #    attitudes_count = 0 # 點(diǎn)贊個(gè)數(shù)

運(yùn)行了快5分鐘吧，比較慢因?yàn)橛袀€(gè) 去除 html標(biāo)簽可能解析的慢

然后看下數(shù)據(jù)庫

總共221條微博，去主頁驗(yàn)證一下

發(fā)現(xiàn)少了20多條，可能有的轉(zhuǎn)發(fā)的沒有爬到，不過驗(yàn)證最后一天是正確的。

有了數(shù)據(jù)就開始分析了

第三步：數(shù)據(jù)分析

我用的pyecharts

這個(gè)可視化庫很厲害，有地圖（雖然沒用上）。

官方文檔：http://gallery.pyecharts.org/#/Line/temperature_change_line_chart

導(dǎo)出數(shù)據(jù)庫的信息

import datetimeimport pymysqlaccount = {    'user' : 'root',    'password' : 'zhaobo123..',    'host' : 'localhost',    'database' : 'python'}def mysqlConnect(account):    connect = pymysql.connect(**account)    return connectdef getMessage(cursor, month, day, year, phone, dianzan, zhuanfa, pinlun, textLength, dates):    sql = 'select * from weibo ORDER BY created_at'    cursor.execute(sql)    row = cursor.fetchall()    Day = {} #建立字典便于統(tǒng)計(jì)每天發(fā)送的微博    Year = {}    Month = {}    for i in range(1, 32):        Day[i] = 0    for i in range(1, 13):        Month[i] = 0    for i in range(2013, 2021):        Year[i] = 0    for it in row:        date = datetime.datetime.strptime(it['created_at'],  " %Y-%m-%d")        Year[date.year] += 1        Day[date.day] += 1        Month[date.month] += 1        phone.append(it['source'])        dianzan.append(it['attitudes_count'])        zhuanfa.append(it['reposts_count'])        pinlun.append(it['comments_count'])        textLength.append(it['textLength'])        dates.append(it['created_at'])    for i in range(1, 32):        day.append(Day[i])    for i in range(1, 13):        month.append(Month[i])    for i in range(2013, 2021):        year.append(Year[i])if __name__ == '__main__':    month = []  # 按照月發(fā)送的微博    year = []   # 按照年發(fā)送的微博    day = []    # 按照日發(fā)送的微博    phone = []  # 手機(jī)的種類    dianzan = [] # 點(diǎn)贊數(shù)    zhuanfa = [] # 轉(zhuǎn)發(fā)數(shù)    pinlun = [] # 評(píng)論數(shù)    textLength = [] #發(fā)送微博長(zhǎng)度    dates = [] # 時(shí)間    connect = mysqlConnect(account)    cursor = connect.cursor(cursor=pymysql.cursors.DictCursor)    getMessage(cursor, month, day, year, phone, dianzan, zhuanfa, pinlun, textLength, dates)

代碼里有注釋我就不解釋了。

然后就是數(shù)據(jù)可視化了

先按照狗哥按天，年，月發(fā)的微博，可視化

#按照日 發(fā)微博的個(gè)數(shù)    xday = []    for i in range(1, 32):        xday.append(i)    bar = (        Bar()            .add_xaxis(xday)            .add_yaxis("每天發(fā)送的微博", day)            .set_global_opts(title_opts=opts.TitleOpts(title="狗哥發(fā)微博統(tǒng)計(jì)"))    )    bar.render(path= 'day.html')    # 按月    xmonth = []    for i in range(1, 13):        xmonth.append(i)    bar = (        Bar()            .add_xaxis(xmonth)            .add_yaxis("每月發(fā)送的微博", month)            .set_global_opts(title_opts=opts.TitleOpts(title="狗哥發(fā)微博統(tǒng)計(jì)"))    )    bar.render(path = 'month.html')    # 按年    xyear = []    for i in range(2013, 2021):        xyear.append(i)    bar = (        Bar()            .add_xaxis(xyear)            .add_yaxis("每年發(fā)送的微博", year)            .set_global_opts(title_opts=opts.TitleOpts(title="狗哥發(fā)微博統(tǒng)計(jì)"))    )    bar.render(path = 'year.html')

天：

這些年每月 28號(hào)發(fā)的最多，應(yīng)該狗哥的小作文式的微博，都喜歡在月尾的時(shí)候發(fā)，來記錄一下這個(gè)月的經(jīng)歷吧。

月：

看這些數(shù)據(jù)，狗哥喜歡在1月發(fā)微博，可能過年的時(shí)候比較閑吧，沒事發(fā)發(fā)微博。

年：

應(yīng)該是2020年最多（畢竟才過了4個(gè)月）剛出道微博宣傳吧。。。。

18年到19年小作文式的微博比較多，剛步入社會(huì)沒事發(fā)發(fā)微博惱騷一下。。。

發(fā)微博的設(shè)備

代碼我就放在后面了。。。

直接上圖吧

蘋果的忠實(shí)粉絲

看看這些年的人氣變化

這些年發(fā)的微博點(diǎn)贊數(shù)

沒啥好分析的狗哥因?yàn)閻矍楣⒒鸬?今年的點(diǎn)贊肯定爆炸式增長(zhǎng)。

但是第一篇有三萬多贊，肯定那些忠實(shí)粉絲看完了所有微博在最后一篇點(diǎn)個(gè)贊。

轉(zhuǎn)發(fā)：

轉(zhuǎn)發(fā)多應(yīng)該是狗哥的小作文式的微博，畢竟還是挺有意思的。

評(píng)論數(shù)

和點(diǎn)贊一樣最后一個(gè)特別多，都是來挖祖墳的

發(fā)布的微博內(nèi)容長(zhǎng)度：

看來狗哥喜歡每隔一段時(shí)間發(fā)布一篇小作文。。。。

ok結(jié)束了。

微博反爬機(jī)制不嚴(yán)獲取微博不用登錄，登錄也不用驗(yàn)證嗎，和本站不一樣，不登錄看不了文章，而且驗(yàn)證碼還特別麻煩。

但是微博爬評(píng)論就要登錄了

下一篇給大家介紹下如何登錄微博爬取微博評(píng)論。

項(xiàng)目源碼：github鏈接：https://github.com/zhaobo0564/project.git

關(guān)鍵詞：分析,數(shù)據(jù),爬蟲

網(wǎng)站
營(yíng)銷
設(shè)計(jì)
運(yùn)營(yíng)
優(yōu)化
效率
專注
電商
方案
推廣

解決方案&服務(wù)

客戶&案例

營(yíng)銷資訊

關(guān)于我們

解決方案&服務(wù)

客戶&案例

營(yíng)銷資訊

關(guān)于我們

微信公眾號(hào)

版權(quán)所有? 億企邦 1997-2025 保留一切法律許可權(quán)利。

為了最佳展示效果，本站不支持IE9及以下版本的瀏覽器，建議您使用谷歌Chrome瀏覽器。點(diǎn)擊下載Chrome瀏覽器

關(guān)閉

国产成人精品无码青草_亚洲国产美女精品久久久久∴_欧美人与鲁交大毛片免费_国产果冻豆传媒麻婆精东

快捷入口

python 爬蟲爬微博分析數(shù)據(jù)

python 爬蟲爬微博分析數(shù)據(jù)

第一步：當(dāng)然是進(jìn)入狗哥的微博分析了

第二部：爬取數(shù)據(jù)

第三步：數(shù)據(jù)分析

先按照狗哥按天，年，月發(fā)的微博，可視化

天：

月：

年：

發(fā)微博的設(shè)備

看看這些年的人氣變化

轉(zhuǎn)發(fā)：

評(píng)論數(shù)

發(fā)布的微博內(nèi)容長(zhǎng)度：

項(xiàng)目源碼：github鏈接：https://github.com/zhaobo0564/project.git

別急，手機(jī)網(wǎng)站制作先注意好這些信息！

從零開始學(xué)習(xí)網(wǎng)站建設(shè)

SEO優(yōu)化對(duì)網(wǎng)站空間有什么要求

公眾號(hào)掃碼登錄最佳實(shí)踐

商城網(wǎng)站建設(shè)是如何報(bào)價(jià)的？角點(diǎn)科技為你揭秘報(bào)價(jià)

網(wǎng)站制作流程費(fèi)用明細(xì)

定西seo優(yōu)化

電商設(shè)計(jì)/運(yùn)營(yíng) | 項(xiàng)目分析法

分享16個(gè)好玩到爆的網(wǎng)站，打開你就會(huì)愛上

網(wǎng)站設(shè)計(jì)規(guī)劃：企業(yè)網(wǎng)站如何建設(shè)？企業(yè)網(wǎng)站如何運(yùn)營(yíng)推廣？

国产成人精品无码青草_亚洲国产美女精品久久久久∴_欧美人与鲁交大毛片免费_国产果冻豆传媒麻婆精东

快捷入口

python 爬蟲 爬微博 分析 數(shù)據(jù)

python 爬蟲 爬微博分析 數(shù)據(jù)

第一步：當(dāng)然是進(jìn)入狗哥的微博分析了

第二部：爬取數(shù)據(jù)

第三步：數(shù)據(jù)分析

先按照狗哥按天， 年 ，月發(fā)的微博，可視化

天：

月：

年：

發(fā)微博的設(shè)備

看看這些年的人氣變化

轉(zhuǎn)發(fā)：

評(píng)論數(shù)

發(fā)布的微博內(nèi)容長(zhǎng)度：

項(xiàng)目源碼：github鏈接：https://github.com/zhaobo0564/project.git

推薦文章

python 爬蟲 爬微博 分析 數(shù)據(jù)

自媒體人必備的熱點(diǎn)輿情分析，3個(gè)網(wǎng)站輕松搞定

新浪微博爬蟲實(shí)現(xiàn)（附核心Python代碼）

21個(gè)Python爬蟲開源項(xiàng)目代碼，包含微信、淘寶、豆瓣、知乎、微博等

爬蟲|爬取微博動(dòng)態(tài)

騰訊回應(yīng)“騰訊云數(shù)據(jù)庫泄露”傳聞；特斯拉將推出13萬元?jiǎng)x車套件；PH

外貿(mào)網(wǎng)站怎么建設(shè)？外貿(mào)英文網(wǎng)站建站方案分析

肇慶seo優(yōu)化網(wǎng)站關(guān)鍵詞優(yōu)化技巧分析

韓國(guó)網(wǎng)站導(dǎo)航相關(guān)數(shù)據(jù)

聲網(wǎng) Agora-楊浦：數(shù)據(jù)產(chǎn)品經(jīng)理/數(shù)據(jù)分析師/數(shù)據(jù)平臺(tái)業(yè)務(wù)平臺(tái)/高級(jí)

別急，手機(jī)網(wǎng)站制作先注意好這些信息！

從零開始學(xué)習(xí)網(wǎng)站建設(shè)

SEO優(yōu)化對(duì)網(wǎng)站空間有什么要求

公眾號(hào)掃碼登錄最佳實(shí)踐

商城網(wǎng)站建設(shè)是如何報(bào)價(jià)的？角點(diǎn)科技為你揭秘報(bào)價(jià)

網(wǎng)站制作流程費(fèi)用明細(xì)

定西seo優(yōu)化

電商設(shè)計(jì)/運(yùn)營(yíng) | 項(xiàng)目分析法

分享16個(gè)好玩到爆的網(wǎng)站，打開你就會(huì)愛上

網(wǎng)站設(shè)計(jì)規(guī)劃：企業(yè)網(wǎng)站如何建設(shè)？企業(yè)網(wǎng)站如何運(yùn)營(yíng)推廣？

python 爬蟲爬微博分析數(shù)據(jù)

python 爬蟲爬微博分析數(shù)據(jù)

先按照狗哥按天，年，月發(fā)的微博，可視化

python 爬蟲爬微博分析數(shù)據(jù)

21個(gè)Python爬蟲開源項(xiàng)目代碼，包含微信、淘寶、豆瓣、知乎、微博等

騰訊回應(yīng)“騰訊云數(shù)據(jù)庫泄露”傳聞；特斯拉將推出13萬元?jiǎng)x車套件；PH

外貿(mào)網(wǎng)站怎么建設(shè)？外貿(mào)英文網(wǎng)站建站方案分析

別急，手機(jī)網(wǎng)站制作先注意好這些信息！

商城網(wǎng)站建設(shè)是如何報(bào)價(jià)的？角點(diǎn)科技為你揭秘報(bào)價(jià)

分享16個(gè)好玩到爆的網(wǎng)站，打開你就會(huì)愛上

網(wǎng)站設(shè)計(jì)規(guī)劃：企業(yè)網(wǎng)站如何建設(shè)？企業(yè)網(wǎng)站如何運(yùn)營(yíng)推廣？