点击关注我的Telegram群组和微信公众号

MENU

Python Pandas DataFrame 速查表 1

2021 年 03 月 14 日 • 阅读: 1492 • 技术,教程

Pandas DataFrame Cheat Sheet

Pandas 是一个强大的,常用于数据分析的 Python 库。它可以简单快速地从各种数据类型中导入并处理数据。

导入 Pandas

  • 此处以 Google Colab 的环境为例,也可以自行搭建 Jupyter Notebook 使用:
import pandas as pd
pandas.__version__

返回结果:

1.1.5

导入数据文件

  • 此处以 Excel 文件为例:
import pandas as pd
xl = pd.ExcelFile("/content/TableauSalesData.xlsx")    # 导入xlsx表格文件
SalesData = xl.parse("Orders")    #导入名为Orders的工作表
print(SalesData.head())

返回结果:

   Row ID        Order ID Order Date  ... Quantity Discount   Profit
0    4919  CA-2016-160304 2016-01-02  ...        3      0.0  38.2668
1    4920  CA-2016-160304 2016-01-02  ...        2      0.0  67.2742
2    3074  CA-2016-125206 2016-01-03  ...        2      0.0  28.6150
3    8604  US-2016-116365 2016-01-03  ...        2      0.2  -5.2640
4    8605  US-2016-116365 2016-01-03  ...        3      0.2  -6.2100

[5 rows x 21 columns]
  • df.head() 会默认返回 dataframe 中前 5 行的数据,但也可以带入参数以预览更多行数
print(SalesData.head(10))

返回结果:

   Row ID        Order ID Order Date  ... Quantity Discount    Profit
0    4919  CA-2016-160304 2016-01-02  ...        3      0.0   38.2668
1    4920  CA-2016-160304 2016-01-02  ...        2      0.0   67.2742
2    3074  CA-2016-125206 2016-01-03  ...        2      0.0   28.6150
3    8604  US-2016-116365 2016-01-03  ...        2      0.2   -5.2640
4    8605  US-2016-116365 2016-01-03  ...        3      0.2   -6.2100
5    8606  US-2016-116365 2016-01-03  ...        5      0.2   13.5720
6    9494  CA-2016-105207 2016-01-03  ...        7      0.0  350.4270
7    9495  CA-2016-105207 2016-01-03  ...        2      0.0    5.3460
8    2898  US-2016-164630 2016-01-04  ...        4      0.2  119.9960
9    5868  CA-2016-158211 2016-01-04  ...        1      0.2    0.5840

[10 rows x 21 columns]

列计数

  • count() 将直接返回计数和种类的结果
JustTables = SalesData.loc[SalesData["Sub-Category"] == "Tables"]
TableProf = JustTables[["Product Name", "Profit"]].groupby(by="Product Name").sum()
print("the total number of table product is")
print(TableProf.count())
NegTables = TableProf[TableProf["Profit"]<0.0]
print("the total number of positive profit table product is")
print(TableProf.count() - NegTables.count())
print(NegTables)

返回结果:

the total number of table product is
Profit    56
dtype: int64
the total number of positive profit table product is
Profit    14
dtype: int64
                                                       Profit
Product Name                                           
Anderson Hickey Conga Table Tops & Accessories       -17.5145
BPI Conference Tables                               -795.9725
Balt Solid Wood Rectangular Table                   -216.2545
Balt Solid Wood Round Tables                       -1201.0581
Balt Split Level Computer Training Table            -357.9750
Barricks 18" x 48" Non-Folding Utility Table wi...  -104.8320
Bevis 44 x 96 Conference Tables                     -164.7200
Bevis Boat-Shaped Conference Table                  -445.5870
Bevis Oval Conference Table, Walnut                 -856.0144
Bevis Rectangular Conference Tables                 -586.8396
Bevis Round Bullnose 29" High Table Top             -192.1854
Bevis Round Conference Room Tables and Bases         -39.4438
Bevis Round Conference Table Top, X-Base            -519.9410
Bevis Traditional Conference Table Top, Plinth ...  -456.7032
BoxOffice By Design Rectangular and Half-Moon M... -1148.4375
Bretford CR4500 Series Slim Rectangular Table       -532.7613
Bretford Rectangular Conference Table Tops          -327.2331
Bretford “Just In Time” Height-Adjustable Multi...  -964.1940
Bush Advantage Collection Racetrack Conference ... -1934.3976
Bush Andora Conference Table, Maple/Graphite Gr...  -143.6232
Chromcraft 48" x 96" Racetrack Double Pedestal ...  -404.0064
Chromcraft Bull-Nose Wood 48" x 96" Rectangular...  -611.5878
Chromcraft Bull-Nose Wood Oval Conference Table... -2876.1156
Chromcraft Rectangular Conference Tables            -450.2430
Chromcraft Round Conference Tables                  -189.9761
Hon 2111 Invitation Series Corner Table             -401.9904
Hon 2111 Invitation Series Straight Table           -115.3074
Hon 30" x 60" Table with Locking Drawer             -177.3512
Hon 5100 Series Wood Tables                         -238.6036
Hon 61000 Series Interactive Training Tables        -148.3962
Hon 94000 Series Round Tables                       -681.2140
Hon Racetrack Conference Tables                     -598.5228
Hon Rectangular Conference Tables                   -295.8150
Iceberg OfficeWorks 42" Round Tables                -392.5480
KI Adjustable-Height Table                          -248.4822
KI Conference Tables                                -479.2164
Laminate Occasional Tables                          -206.5342
Lesro Round Back Collection Coffee Table, End T...  -328.5900
Lesro Sheffield Collection Coffee Table, End Ta...   -12.1329
Office Impressions End Table, 20-1/2"H x 24"W x...   -66.8272
Riverside Furniture Oval Coffee Table, Oval End... -1147.4000
Riverside Furniture Stanwyck Manor Table Series     -415.9325

打印 Dataframe 全部列

print(SalesData.columns)

返回结果:

Index(['Row ID', 'Order ID', 'Order Date', 'Ship Date', 'Ship Mode',
       'Customer ID', 'Customer Name', 'Segment', 'Country', 'City', 'State',
       'Postal Code', 'Region', 'Product ID', 'Category', 'Sub-Category',
       'Product Name', 'Sales', 'Quantity', 'Discount', 'Profit'],
      dtype='object')

打印 Dataframe 某一列中全部结果的分类

  • 此处以表中 Sub-Category 中的结果为例
SubCats = SalesData["Sub-Category"].unique()
print(SubCats)

返回结果:

['Bookcases' 'Phones' 'Storage' 'Accessories' 'Tables' 'Binders' 'Copiers'
 'Art' 'Furnishings' 'Paper' 'Envelopes' 'Chairs' 'Fasteners' 'Appliances'
 'Labels' 'Machines' 'Supplies']

筛选 Dataframe 表的列

  • 此处以筛选 Region ,ProfitSales 三列为目的
SubCatProfits = SalesData[["Region", "Profit", "Sales"]]
print(SubCatProfits)

返回结果:

       Region    Profit    Sales
0        East   38.2668  173.940
1        East   67.2742  231.980
2        West   28.6150  114.460
3     Central   -5.2640   30.080
4     Central   -6.2100  165.600
...       ...       ...      ...
9989    South  167.8080  364.800
9990  Central  -26.7204  152.688
9991  Central    0.5668    3.488
9992  Central   -1.3248    5.888
9993  Central   52.3800  116.400

[9994 rows x 3 columns]

合并求和并排序

  • SubCatProfits 数据表中,我们筛选出了原数据表中各行数据中的 Region ,ProfitSales 三组数据
  • 对区域行进行合并,并对 ProfitSales 数据进行求和便可获得各区域的盈利和销售额数据
Total = SubCatProfits.groupby(by="Region").sum().sort_values(by="Profit")
print(Total)

返回结果:

              Profit        Sales
Region                     
Central   39706.3625  501239.8908
South     46749.4303  391721.9050
East      91522.7800  678781.2400
West     108418.4489  725457.8245
  • 对于 sort_values() 这一方法,将按照对应列对数据进行排序。如果对应列为文字格式,将按照首字母顺序进行排序; 如果对应列为数字格式,将按照从小到大的顺序进行排序

    • 如果需要反向排序可以增加 ascending=False 参数,即:

    
    ### loc 筛选行数据
  • .loc 将基于标签进行数据筛选,当不存在数据时将返回 KeyError
  • 此处将筛选 Sub-Category 中数据值为 Tables 的行数据

    JustTables = SalesData.loc[SalesData["Sub-Category"]=="Tables"]
    display(JustTables)

    返回结果:

    1

  • 使用 display() 代替 print() 时将以更优化的表格方式显示数据
  • 基于筛选的数据,我们可以进一步合并 Tables 种类中各个不同产品的销售额和利润

    TableProdProfSales = JustTables[["Sub-Category", "Product Name", "Profit", "Sales"]].groupby(by="Product Name").sum().sort_values("Profit")
    display(TableProdProfSales)

    返回结果:

    2

    数据求整

  • 使用 round() 将进行最简单的四舍五入的求整

    SubCatSalesProf = SalesData[["Segment", "Sub-Category", "Sales", "Profit"]]    # 选择Segment, Sub-Category, Sales, Profit四列数据
    CorpTableSalesProf = SubCatSalesProf.loc[(SubCatSalesProf["Segment"]=="Corporate")& (SubCatSalesProf["Sub-Category"]=="Tables")]    # 筛选Segemnt中数据值为Corporate 且 Sub-Category中数据值为Tables的行数据
    total=CorpTableSalesProf.groupby(by=["Segment", "Sub-Category"]).sum().round()    # 求和并求整
    print(total)

    返回结果:

                            Sales  Profit
    Segment   Sub-Category           
    Corporate Tables        70872.0 -4906.0
  • round() 添加整数参数将返回指定小数点后的位数,比如 round(3)

    total=CorpTableSalesProf.groupby(by=["Segment", "Sub-Category"]).sum().round(3)
    print(total)

    返回结果:

                              Sales    Profit
    Segment   Sub-Category             
最后编辑于: 2021 年 03 月 30 日
返回文章列表 文章二维码 打赏
本页链接的二维码
打赏二维码