利用 pandas 处理国家统计局数据并展示

pandas 的主要数据结构 Series 对象 一种类似一维数组的对象 由一组数据以及一组与之相关的数据标签(即索引)组成 可以存储任何类型的数据 python 1 2 3 4 0 Python 1 Java 2 C++ 索引 数据 创建 Series 对象 Pandas使用Series()函数来创建Series对象,通过这个对象可以调用相应的方法和属性,从而达到处理数据的目的。 ...

复杂结构数据的获取

PubMed 单篇文献基本信息获取 https://pubmed.ncbi.nlm.nih.gov/33883728/ python 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 import requests from lxml import etree url = "https://pubmed.ncbi.nlm.nih.gov/33883728/" r = requests.get(url).text html = etree.HTML(r) title = html.xpath('//*[@id="full-view-heading"]/h1/text()')[0].strip() print(title) authors = html.xpath('//*[@id="full-view-heading"]/div[2]/div/div/span/a/text()') authors = ','.join(authors) print(authors) pmID = html.xpath('//*[@id="full-view-identifiers"]/li[1]/span/strong/text()')[0] print(pmID) mag = html.xpath('//*[@id="full-view-journal-trigger"]/text()')[0].strip() print(mag) info = html.xpath('//*[@id="full-view-heading"]/div[1]/div[2]/span[2]/text()')[0].split(';') year = info[0][:4] info = info[1] print(info) print(year) abstract = html.xpath('//*[@id="eng-abstract"]/p/text()')[0].strip() print(abstract) try: kw = html.xpath('/html/body/div[5]/main/div[2]/p/text()')[1].strip() print(kw) except: pass PubMed 多篇文献基本信息获取 文章对应链接的获取 在搜索页中,默认为十篇,先爬取一篇文章的链接 ...

页面数据爬取

简易模板 python 1 2 3 4 5 6 7 8 9 10 11 import requests from bs4 import BeautifulSoup meHeader = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64)"} url = "" def getOne(url): r = requests.get(url, headers = myHeader).content.decode('utf-8') soup = BeautifulSoup(r, 'html.parser') t = soup.find_all() 单封家书【译文】内容获取 目标网站:http://ewenyan.com/articles/zgfjs/1.html ...

动态数据爬取

单个城市天气数据爬取 确定目标网页 https://www.weather.com.cn/ 分析网页数据 python 1 2 3 4 5 6 7 8 9 10 11 12 import requests from bs4 import BeautifulSoup myHeader = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64)"} url = "https://www.weather.com.cn/weather1d/101010100.shtml" r = requests.get(url) html = r.content.decode('utf-8') soup = BeautifulSoup(html, "html.parser") print(soup.find('div', class_='tem')) ...

词云绘制

第三方库 wordcloud pip install wordcloud 指定镜像源: pip install -i https://pypi.tuna.tsinghua.edu.cn/simple wordcloud 文档:https://amueller.github.io/word_cloud/index.html wordcloud.WordCloud() 案例 1:”政府工作报告爬取与词云绘制“ python 1 2 3 4 5 6 7 8 9 10 11 12 13 import urllib.request from bs4 import BeautifulSoup from wordcloud import WordCloud url = "https://www.gov.cn/zhuanti/2021lhzfgzbg/index.htm" response = urllib.request.urlopen(url) html = response.read().decode("utf-8") soup = BeautifulSoup(html, "html.parser") content = soup.find("div", class_="zhj-bbqw-cont").text w = WordCloud(font_path="/Fonts/simhei.ttf").generate(content) w.to_file("政府工作报告y1.png") ...

前言-Python数据爬取与可视化

本部分是 MOOC中的《Python数据爬取与可视化》笔记 课程链接:https://www.icourse163.org/course/NHDX-1463126169?tid=1476402447