python pandas切片索引（6种pandas方法及示例教你玩转轴向连接）-爱玩科技

威哥 2023-08-01 07:47:44 886

python pandas切片索引（6种pandas方法及示例教你玩转轴向连接）对于pandas对象（如Series和DataFrame），带有标签的轴使你能够进一步推广数组的连接运算。具体点说，开需要考虑一下这些东西：In [15]: np.concatenate([arr arr] axis=1) Out[15]: array([[ 0 1 2 3 0 1 2 3] [ 4 5 6 7 4 5 6 7] [ 8 9 10 11 8 9 10 11]])通过上述结果可以发现，数组按照列向，进行列数据合并。除了Numpy，python中最为丰富且灵活，应用最为便捷且广泛的数据轴向连接函数是pandas 中的concat()函数。丰富的concat()函数参数，让其支持的应用场景足够满足实际需求。我们将会在下文中进行详细介绍。所以简单理解，轴向连接指的是数据按照

python pandas切片索引（6种pandas方法及示例教你玩转轴向连接）(1)

一起学习，一起成长！

什么是轴向连接？

在学习轴向连接数据合并运算前，我们需要先理解，什么是轴向连接？

数据处理与分析，特别是进行数据可视化时，会接触到x轴或y轴这样的概念。比如说，x轴代表的年龄，y轴代表的是收入，通常做的数据分析则是不同年龄的人平均收入情况。从这个例子中可以发现，轴代表的是一个变量或者称为数据表中的一个字段。而本文中的轴向连接中的”轴向“含义类似，则代表是数据方向，即列，还是行。

所以简单理解，轴向连接指的是数据按照列向进行连接，还是按照行向进行连接。

接下来，我们先使用Numpy创建一个数组，来理解一下什么是“轴向连接”：

In [12]: import numpy as np In [13]: arr=np.arange(12).reshape((3 4)) In [14]: arr Out[14]: array([[ 0 1 2 3] [ 4 5 6 7] [ 8 9 10 11]])

然后使用Numpy的concatenation的函数，进行数据连接：

In [15]: np.concatenate([arr arr] axis=1) Out[15]: array([[ 0 1 2 3 0 1 2 3] [ 4 5 6 7 4 5 6 7] [ 8 9 10 11 8 9 10 11]])

通过上述结果可以发现，数组按照列向，进行列数据合并。除了Numpy，python中最为丰富且灵活，应用最为便捷且广泛的数据轴向连接函数是pandas 中的concat()函数。丰富的concat()函数参数，让其支持的应用场景足够满足实际需求。我们将会在下文中进行详细介绍。

使用pandas的concat()函数进行轴向连接

对于pandas对象（如Series和DataFrame），带有标签的轴使你能够进一步推广数组的连接运算。具体点说，开需要考虑一下这些东西：

如果各对象其他轴上的索引不同，那些轴应该是做并集还是交集？
结果对象中的分组需要各不相同吗？
用于连接的轴重要吗？

Pandas的concat函数提供了一种能够解决这些问题的可靠方式。

将值和索引粘合：concat()

具体查看concat函数效果前，我们需要创建3个数据集，具体如下：

In [17]: s1=Series([0 1] index=['a' 'b']) In [18]: s2=Series([2 3 4] index=['c' 'd' 'e']) In [19]: s3=Series([5 6] index=['f' 'g'])

对这些对象调用concat函数将值和索引粘合在一起：

In [8]: pd.concat([s1 s2 s3]) Out[8]: a 0 b 1 c 2 d 3 e 4 f 5 g 6 dtype: int64

默认情况下，concat是在axis=0上工作的，最终产生一个新的Series。如果传入axis=1，则结果就会变成一个DataFrame（axis=1是列）。

In [9]: pd.concat([s1 s2 s3] axis=1) Out[9]: 0 1 2 a 0.0 NaN NaN b 1.0 NaN NaN c NaN 2.0 NaN d NaN 3.0 NaN e NaN 4.0 NaN f NaN NaN 5.0 g NaN NaN 6.0

这种情况下，另外一条轴上没有重叠，从索引的有序并集（外连接）上就可以看出来。传入join=’inner’即可得到它们的交集。

轴向交集：join=’inner’

In [10]: s4=pd.concat([s1*5 s3]) In [11]: s4 Out[11]: a 0 b 5 f 5 g 6 dtype: int64 In [12]: pd.concat([s1 s4] axis=1) Out[12]: 0 1 a 0.0 0 b 1.0 5 f NaN 5 g NaN 6 In [13]: pd.concat([s1 s4] axis=1 join='inner') Out[13]: 0 1 a 0 0 b 1 5join_axes：指定在其他轴上使用的索引

可以通过join_axes指定要在其他轴上使用的索引：

In [14]: pd.concat([s1 s4] axis=1 join_axes=[['a' 'c' 'b' 'e']]) Out[14]: 0 1 a 0.0 0.0 c NaN NaN b 1.0 5.0 e NaN NaN

不过有个问题，参与连接的片段在结果中区分不开。假设想要在连接轴上创建一个层次化索引。使用keys参数即可达到这个目的：

In [15]: result=pd.concat([s1 s2 s3] keys=['one' 'two' 'three']) In [16]: result Out[16]: one a 0 b 1 two c 2 d 3 e 4 three f 5 g 6 dtype: int64轴变换：unstack()

In [17]: result.unstack() Out[17]: a b c d e f g one 0.0 1.0 NaN NaN NaN NaN NaN two NaN NaN 2.0 3.0 4.0 NaN NaN three NaN NaN NaN NaN NaN 5.0 6.0

如果沿着axis=1对Series进行合并，则keys就会成为DataFrame的列头：

In [18]: pd.concat([s1 s2 s3] axis=1 keys=['one' 'two' 'three']) Out[18]: one two three a 0.0 NaN NaN b 1.0 NaN NaN c NaN 2.0 NaN d NaN 3.0 NaN e NaN 4.0 NaN f NaN NaN 5.0 g NaN NaN 6.0同样的逻辑对DataFrame对象也是一样：

同样的逻辑对DataFrame对象也是一样：

In [19]: df1=DataFrame(np.arange(6).reshape(3 2) index=['a' 'b' 'c'] columns=['one' 'two']) In [20]: df2=DataFrame(5 np.arange(4).reshape(2 2) index=['a' 'c'] columns=['three' 'four']) In [21]: df1 Out[21]: one two a 0 1 b 2 3 c 4 5 In [22]: df2 Out[22]: three four a 5 6 c 7 8 In [23]: pd.concat([df1 df2] axis=1 keys=['level1' 'level2']) Out[23]: level1 level2 one two three four a 0 1 5.0 6.0 b 2 3 NaN NaN c 4 5 7.0 8.0

如果传入的不是列表而是一个字典，则字典的键就会被当作keys选项的值：

In [24]: pd.concat({'level1':df1 'level2':df2}) Out[24]: four one three two level1 a NaN 0.0 NaN 1.0 b NaN 2.0 NaN 3.0 c NaN 4.0 NaN 5.0 level2 a 6.0 NaN 5.0 NaN c 8.0 NaN 7.0 NaN In [25]: pd.concat({'level1':df1 'level2':df2} axis=1) Out[25]: level1 level2 one two three four a 0 1 5.0 6.0 b 2 3 NaN NaN c 4 5 7.0 8.0names参数

此外还有两个用于管理层次化索引创建的参数：

In [26]: pd.concat([df1 df2] axis=1 keys=['level1' 'level2'] names=['upper' 'lower']) Out[26]: upper level1 level2 lower one two three four a 0 1 5.0 6.0 b 2 3 NaN NaN c 4 5 7.0 8.0DataFrame行索引：ignore_index=True

最后一个需要考虑的问题是，跟当钱分析工作无关的DataFrame行索引：

In [27]: df1=DataFrame(np.random.randn(3 4) columns=['a' 'b' 'c' 'd']) In [28]: df2=DataFrame(np.random.randn(2 3) columns=['b' 'd' 'a']) In [29]: df1 Out[29]: a b c d 0 0.032443 1.113210 0.502779 -1.227075 1 -0.613984 -0.204040 -0.630603 0.341598 2 0.746166 1.518603 0.533425 0.320373 In [30]: df2 Out[30]: b d a 0 -0.350900 0.851649 0.959348 1 -0.071497 1.916604 -0.993156

在这种情况下，传入ignore_index=True即可：

In [31]: pd.concat([df1 df2] ignore_index=True) Out[31]: a b c d 0 0.032443 1.113210 0.502779 -1.227075 1 -0.613984 -0.204040 -0.630603 0.341598 2 0.746166 1.518603 0.533425 0.320373 3 0.959348 -0.350900 NaN 0.851649 4 -0.993156 -0.071497 NaN 1.916604

总体来说，6种轴向连接的pandas方法几乎覆盖到了数据合并场景。

大家可以依据上述内容进行实践，如遇问题，欢迎留言！喜欢的别忘记加个关注！感谢支持！

网站首页

返回栏目

python pandas切片索引（6种pandas方法及示例教你玩转轴向连接）

猜您喜欢：

相关文章