Pandas DataFrame Cheat Sheet
Pandas 是一个强大的,常用于数据分析的 Python 库。它可以简单快速地从各种数据类型中导入并处理数据。
导入 Pandas
- 此处以 Google Colab 的环境为例,也可以自行搭建 Jupyter Notebook 使用:
import pandas as pd
pandas.__version__
返回结果:
1.1.5
导入数据文件
- 此处以 Excel 文件为例:
import pandas as pd
xl = pd.ExcelFile("/content/TableauSalesData.xlsx") # 导入xlsx表格文件
SalesData = xl.parse("Orders") #导入名为Orders的工作表
print(SalesData.head())
返回结果:
Row ID Order ID Order Date ... Quantity Discount Profit
0 4919 CA-2016-160304 2016-01-02 ... 3 0.0 38.2668
1 4920 CA-2016-160304 2016-01-02 ... 2 0.0 67.2742
2 3074 CA-2016-125206 2016-01-03 ... 2 0.0 28.6150
3 8604 US-2016-116365 2016-01-03 ... 2 0.2 -5.2640
4 8605 US-2016-116365 2016-01-03 ... 3 0.2 -6.2100
[5 rows x 21 columns]
df.head()
会默认返回 dataframe 中前 5 行的数据,但也可以带入参数以预览更多行数
print(SalesData.head(10))
返回结果:
Row ID Order ID Order Date ... Quantity Discount Profit
0 4919 CA-2016-160304 2016-01-02 ... 3 0.0 38.2668
1 4920 CA-2016-160304 2016-01-02 ... 2 0.0 67.2742
2 3074 CA-2016-125206 2016-01-03 ... 2 0.0 28.6150
3 8604 US-2016-116365 2016-01-03 ... 2 0.2 -5.2640
4 8605 US-2016-116365 2016-01-03 ... 3 0.2 -6.2100
5 8606 US-2016-116365 2016-01-03 ... 5 0.2 13.5720
6 9494 CA-2016-105207 2016-01-03 ... 7 0.0 350.4270
7 9495 CA-2016-105207 2016-01-03 ... 2 0.0 5.3460
8 2898 US-2016-164630 2016-01-04 ... 4 0.2 119.9960
9 5868 CA-2016-158211 2016-01-04 ... 1 0.2 0.5840
[10 rows x 21 columns]
列计数
count()
将直接返回计数和种类的结果
JustTables = SalesData.loc[SalesData["Sub-Category"] == "Tables"]
TableProf = JustTables[["Product Name", "Profit"]].groupby(by="Product Name").sum()
print("the total number of table product is")
print(TableProf.count())
NegTables = TableProf[TableProf["Profit"]<0.0]
print("the total number of positive profit table product is")
print(TableProf.count() - NegTables.count())
print(NegTables)
返回结果:
the total number of table product is
Profit 56
dtype: int64
the total number of positive profit table product is
Profit 14
dtype: int64
Profit
Product Name
Anderson Hickey Conga Table Tops & Accessories -17.5145
BPI Conference Tables -795.9725
Balt Solid Wood Rectangular Table -216.2545
Balt Solid Wood Round Tables -1201.0581
Balt Split Level Computer Training Table -357.9750
Barricks 18" x 48" Non-Folding Utility Table wi... -104.8320
Bevis 44 x 96 Conference Tables -164.7200
Bevis Boat-Shaped Conference Table -445.5870
Bevis Oval Conference Table, Walnut -856.0144
Bevis Rectangular Conference Tables -586.8396
Bevis Round Bullnose 29" High Table Top -192.1854
Bevis Round Conference Room Tables and Bases -39.4438
Bevis Round Conference Table Top, X-Base -519.9410
Bevis Traditional Conference Table Top, Plinth ... -456.7032
BoxOffice By Design Rectangular and Half-Moon M... -1148.4375
Bretford CR4500 Series Slim Rectangular Table -532.7613
Bretford Rectangular Conference Table Tops -327.2331
Bretford “Just In Time” Height-Adjustable Multi... -964.1940
Bush Advantage Collection Racetrack Conference ... -1934.3976
Bush Andora Conference Table, Maple/Graphite Gr... -143.6232
Chromcraft 48" x 96" Racetrack Double Pedestal ... -404.0064
Chromcraft Bull-Nose Wood 48" x 96" Rectangular... -611.5878
Chromcraft Bull-Nose Wood Oval Conference Table... -2876.1156
Chromcraft Rectangular Conference Tables -450.2430
Chromcraft Round Conference Tables -189.9761
Hon 2111 Invitation Series Corner Table -401.9904
Hon 2111 Invitation Series Straight Table -115.3074
Hon 30" x 60" Table with Locking Drawer -177.3512
Hon 5100 Series Wood Tables -238.6036
Hon 61000 Series Interactive Training Tables -148.3962
Hon 94000 Series Round Tables -681.2140
Hon Racetrack Conference Tables -598.5228
Hon Rectangular Conference Tables -295.8150
Iceberg OfficeWorks 42" Round Tables -392.5480
KI Adjustable-Height Table -248.4822
KI Conference Tables -479.2164
Laminate Occasional Tables -206.5342
Lesro Round Back Collection Coffee Table, End T... -328.5900
Lesro Sheffield Collection Coffee Table, End Ta... -12.1329
Office Impressions End Table, 20-1/2"H x 24"W x... -66.8272
Riverside Furniture Oval Coffee Table, Oval End... -1147.4000
Riverside Furniture Stanwyck Manor Table Series -415.9325
打印 Dataframe 全部列
print(SalesData.columns)
返回结果:
Index(['Row ID', 'Order ID', 'Order Date', 'Ship Date', 'Ship Mode',
'Customer ID', 'Customer Name', 'Segment', 'Country', 'City', 'State',
'Postal Code', 'Region', 'Product ID', 'Category', 'Sub-Category',
'Product Name', 'Sales', 'Quantity', 'Discount', 'Profit'],
dtype='object')
打印 Dataframe 某一列中全部结果的分类
- 此处以表中
Sub-Category
中的结果为例
SubCats = SalesData["Sub-Category"].unique()
print(SubCats)
返回结果:
['Bookcases' 'Phones' 'Storage' 'Accessories' 'Tables' 'Binders' 'Copiers'
'Art' 'Furnishings' 'Paper' 'Envelopes' 'Chairs' 'Fasteners' 'Appliances'
'Labels' 'Machines' 'Supplies']
筛选 Dataframe 表的列
- 此处以筛选
Region
,Profit
和Sales
三列为目的
SubCatProfits = SalesData[["Region", "Profit", "Sales"]]
print(SubCatProfits)
返回结果:
Region Profit Sales
0 East 38.2668 173.940
1 East 67.2742 231.980
2 West 28.6150 114.460
3 Central -5.2640 30.080
4 Central -6.2100 165.600
... ... ... ...
9989 South 167.8080 364.800
9990 Central -26.7204 152.688
9991 Central 0.5668 3.488
9992 Central -1.3248 5.888
9993 Central 52.3800 116.400
[9994 rows x 3 columns]
合并求和并排序
- 在
SubCatProfits
数据表中,我们筛选出了原数据表中各行数据中的Region
,Profit
和Sales
三组数据 - 对区域行进行合并,并对
Profit
和Sales
数据进行求和便可获得各区域的盈利和销售额数据
Total = SubCatProfits.groupby(by="Region").sum().sort_values(by="Profit")
print(Total)
返回结果:
Profit Sales
Region
Central 39706.3625 501239.8908
South 46749.4303 391721.9050
East 91522.7800 678781.2400
West 108418.4489 725457.8245
对于
sort_values()
这一方法,将按照对应列对数据进行排序。如果对应列为文字格式,将按照首字母顺序进行排序; 如果对应列为数字格式,将按照从小到大的顺序进行排序如果需要反向排序可以增加
ascending=False
参数,即:
### loc 筛选行数据
.loc
将基于标签进行数据筛选,当不存在数据时将返回KeyError
此处将筛选
Sub-Category
中数据值为Tables
的行数据JustTables = SalesData.loc[SalesData["Sub-Category"]=="Tables"] display(JustTables)
返回结果:
- 使用
display()
代替print()
时将以更优化的表格方式显示数据 基于筛选的数据,我们可以进一步合并
Tables
种类中各个不同产品的销售额和利润TableProdProfSales = JustTables[["Sub-Category", "Product Name", "Profit", "Sales"]].groupby(by="Product Name").sum().sort_values("Profit") display(TableProdProfSales)
返回结果:
数据求整
使用
round()
将进行最简单的四舍五入的求整SubCatSalesProf = SalesData[["Segment", "Sub-Category", "Sales", "Profit"]] # 选择Segment, Sub-Category, Sales, Profit四列数据 CorpTableSalesProf = SubCatSalesProf.loc[(SubCatSalesProf["Segment"]=="Corporate")& (SubCatSalesProf["Sub-Category"]=="Tables")] # 筛选Segemnt中数据值为Corporate 且 Sub-Category中数据值为Tables的行数据 total=CorpTableSalesProf.groupby(by=["Segment", "Sub-Category"]).sum().round() # 求和并求整 print(total)
返回结果:
Sales Profit Segment Sub-Category Corporate Tables 70872.0 -4906.0
在
round()
添加整数参数将返回指定小数点后的位数,比如round(3)
total=CorpTableSalesProf.groupby(by=["Segment", "Sub-Category"]).sum().round(3) print(total)
返回结果:
Sales Profit Segment Sub-Category
转载请标注来源