求平均数
mean()
可以用于求平均数
JustTables = SalesData.loc[SalesData["Sub-Category"]=="Tables"]
RegJustTablesDiscount = JustTables[["Region", "Discount"]].groupby(by="Region").mean()
print(RegJustTablesDiscount)
输出结果:
Discount
Region
Central 0.262500
East 0.373750
South 0.222549
West 0.200000
复制DataFrame
copy()
可以完整地复制一个DataFrame到一个新的变量中 (true copy)
JustTablesYear = JustTables.copy()
value_counts() 求重复值
value_counts()
可以筛选出列数据中有哪些不同的值,但和unique()
不同的是,它还可以直接打印出每种值出现的次数
FrequentItems = SalesData["Customer Name"].value_counts()
print(FrequentItems.head(10))
输出结果:
William Brown 37
Paul Prost 34
John Lee 34
Matt Abelman 34
Edward Hooks 34
Seth Vernon 32
Jonathan Doherty 32
Chloris Kastensmidt 32
Arthur Prichep 31
Emily Phan 31
Name: Customer Name, dtype: int64
drop_duplicates() 去重复值
drop_duplicates()
可以一键去除列中的重复值
CustomerOrder = SalesData[["Customer Name", "Order Date"]].sort_values(by="Order Date", ascending = False)
CustomerOrderNoDup = CustomerOrder.drop_duplicates()
dt日期
.dt
可以有效地处理DataFrame中Datetime
类别的数据,可用的接口非常多
JustTablesYear["Year"] = JustTablesYear["Order Date"].dt.year
print(JustTablesYear["Year"])
print("="*20)
Years = JustTablesYear["Year"].unique()
输出结果:
6 2016
37 2016
69 2016
82 2016
113 2016
...
9844 2019
9885 2019
9899 2019
9950 2019
9966 2019
Name: Year, Length: 319, dtype: int64
====================
[2016 2017 2018 2019]
- 在这里,我们在DataFrame中新建了一个列(默认在最后一列),名为
Year
。基于Order Date
列的数据使用.dt.year
来提取了订单的年份并写入了对应的Year
列。uniquie()
打印出了DataFrame中包含了2016~2019年的订单数据 - 至此,我们就可以轻松地统计出各年的销售数据并有机会进行更高级的结果筛选
RegJustTablesYear = JustTablesYear[["Region", "Year", "Discount"]]
for year in Years:
OneYear = RegJustTablesYear.loc[RegJustTablesYear["Year"]==year]
YearlyRegDiscount = OneYear.groupby(by="Region").mean()
print("\n Regional table discounts for year"+ str(year))
print(YearlyRegDiscount)
print("*"*40)
输出结果:
Regional table discounts for year2016
Year Discount
Region
Central 2016 0.205882
East 2016 0.368182
South 2016 0.250000
West 2016 0.240625
****************************************
Regional table discounts for year2017
Year Discount
Region
Central 2017 0.292308
East 2017 0.373913
South 2017 0.270588
West 2017 0.178947
****************************************
Regional table discounts for year2018
Year Discount
Region
Central 2018 0.326667
East 2018 0.380000
South 2018 0.113636
West 2018 0.208000
****************************************
Regional table discounts for year2019
Year Discount
Region
Central 2019 0.207143
East 2019 0.373333
South 2019 0.218750
West 2019 0.166667
****************************************
- 类似地,我们可以用两个
for
循环来合并统计三类数据
Years = JustTablesYear["Year"].unique() # 合并独立不同的年份
Regions = JustTables["Region"].unique() # 合并独立不同的销售地区
SegJustTablesYear = JustTablesYear[["Segment", "Year", "Discount", "Region"]] # 筛选数据表
for year in Years:
# 第一个for循环分别对每年的数据进行统计
OneYear = SegJustTablesYear.loc[SegJustTablesYear["Year"]==year]
NoYear = OneYear[["Segment", "Discount", "Region"]]
for region in Regions:
# 第二个for循环分别对每个销售地区的数据进行统计
OneRegion = NoYear.loc[NoYear["Region"]==region]
YearlySegRegDiscount = OneRegion.groupby(by="Segment").mean() # 对各销售部门的折扣数据合并求平均数
# 打印输出
print("\n"+str(year)+" Table discounts by customer segment in the "+region+" region")
print(YearlySegRegDiscount)
print("*"*50)
输出结果(节选):
Discount
Segment
Consumer 0.32
Corporate 0.06
Home Office 0.00
**************************************************
2016 Table discounts by customer segment in the Central region
Discount
Segment
Consumer 0.32
Corporate 0.06
Home Office 0.00
**************************************************
2016 Table discounts by customer segment in the East region
Discount
Segment
Consumer 0.336364
Corporate 0.400000
Home Office 0.400000
**************************************************
2016 Table discounts by customer segment in the West region
Discount
Segment
Consumer 0.208333
Corporate 0.220000
Home Office 0.380000
**************************************************
2016 Table discounts by customer segment in the South region
Discount
Segment
Consumer 0.183333
Corporate 0.350000
**************************************************
2017 Table discounts by customer segment in the Central region
Discount
Segment
Consumer 0.293333
Corporate 0.320000
Home Office 0.266667
**************************************************
2017 Table discounts by customer segment in the East region
Discount
Segment
Consumer 0.371429
Corporate 0.362500
Home Office 0.387500
**************************************************
2016 Table discounts by customer segment in the East region
Discount
Segment
Consumer 0.336364
Corporate 0.400000
Home Office 0.400000
**************************************************
类似地,
.dt.month
将自动输出Datetime
类数据的月份(int type, 1-12).dt.dayofyear
将输出日期是当年的第几天dt.is_leap_year
将输出日期所属年份是否为闰年 (bool type)- 和Python的
Datetime
库类似,pandas.dt
有一个strftime
可以将str
字串符转化为NumPy数组形式的日期
转载请标注来源