pythonnumpy多维数组的理解（用Python做科学计算工具篇）-爱玩科技

威哥 2023-05-13 07:57:06 357

pythonnumpy多维数组的理解（用Python做科学计算工具篇）>>> a = np.array([1 2 3 4]) >>> b = np.array([4 2 2 4]) >>> c = np.array([1 2 3 4]) >>> np.array_equal(a b) False >>> np.array_equal(a c) True 逻辑运算：>>> a = np.array([1 2 3 4]) >>> b = np.array([4 2 2 4]) >>> a == b array([False True False True]) >>> a > b array([False False True False]) 数组

元素操作
Basic reductions
Broadcasting
阵列形状操作
排序数据
总结

1.2.1 元素操作基本操作

使用标量：

>>> a = np.array([1 2 3 4]) >>> a 1 array([2 3 4 5]) >>> 2**a array([ 2 4 8 16])

所有算术都按元素进行操作：

>>> b = np.ones(4) 1 >>> a - b array([-1. 0. 1. 2.]) >>> a * b array([2. 4. 6. 8.]) >>> j = np.arange(5) >>> 2**(j 1) - j array([ 2 3 6 13 28])

这些操作当然比在纯 python 中执行要快得多：

>>> a = np.arange(10000) >>> %timeit a 1 10000 loops best of 3: 24.3 us per loop >>> l = range(10000) >>> %timeit [i 1 for i in l] 1000 loops best of 3: 861 us per loop

数组乘法不是矩阵乘法：

>>> c = np.ones((3 3)) >>> c * c # 不是矩阵乘法! array([[1. 1. 1.] [1. 1. 1.] [1. 1. 1.]])

矩阵乘法：

>>> c.dot(c) array([[3. 3. 3.] [3. 3. 3.] [3. 3. 3.]])其他操作

比较：

>>> a = np.array([1 2 3 4]) >>> b = np.array([4 2 2 4]) >>> a == b array([False True False True]) >>> a > b array([False False True False])

数组比较：

>>> a = np.array([1 2 3 4]) >>> b = np.array([4 2 2 4]) >>> c = np.array([1 2 3 4]) >>> np.array_equal(a b) False >>> np.array_equal(a c) True

逻辑运算：

>>>

>>> a = np.array([1 1 0 0] dtype=bool) >>> b = np.array([1 0 1 0] dtype=bool) >>> np.logical_or(a b) array([ True True True False]) >>> np.logical_and(a b) array([ True False False False])

超越函数：

>>>

>>> a = np.arange(5) >>> np.sin(a) array([ 0. 0.84147098 0.90929743 0.14112001 -0.7568025 ]) >>> np.log(a) array([ -inf 0. 0.69314718 1.09861229 1.38629436]) >>> np.exp(a) array([ 1. 2.71828183 7.3890561 20.08553692 54.59815003])

形状不匹配

>>>

>>> a = np.arange(5) >>> a np.array([1 2]) ValueError: operands could not be broadcast together with shapes (5 ) (2 )

换位：

>>>

>>> a = np.triu(np.ones((3 3)) 1) # help(np.triu) >>> a array([[0. 1. 1.] [0. 0. 1.] [0. 0. 0.]]) # triu(m k)，保留上三角，左下部分调为0. k=0 表示主对角线。 >>> a.T # 转置 array([[0. 0. 0.] [1. 0. 0.] [1. 1. 0.]])

np.reshape

>>>

>>> a = np.arange(9).reshape(3 3) >>> a.T[0 2] = 999 >>> a.T array([[ 0 3 999] [ 1 4 7] [ 2 5 8]]) >>> a array([[ 0 1 2] [ 3 4 5] [999 7 8]])

线性代数

该子模块NumPy.linalg实现了基本的线性代数，例如求解线性系统、奇异值分解等。但是，它并不一定总是高效，因此我们一般使用scipy.linalg

1.2.2 Basic reductions计算总和

>>>

>>> x = np.array([1 2 3 4]) >>> np.sum(x) 10 >>> x.sum() 10

pythonnumpy多维数组的理解（用Python做科学计算工具篇）(1)

按行和按列求和：

>>>

>>> x = np.array([[1 1] [2 2]]) >>> x array([[1 1] [2 2]]) >>> x.sum(axis=0) # columns (first dimension) array([3 3]) >>> x[: 0].sum() x[: 1].sum() (3 3) >>> x.sum(axis=1) # rows (second dimension) array([2 4]) >>> x[0 :].sum() x[1 :].sum() (2 4)

更高维度：

>>>

x = np.random.rand(2 2 2) x Out[14]: array([[[0.31174025 0.11658995] [0.27243086 0.87529974]] [[0.7719098 0.30237664] [0.45840615 0.05789042]]]) x.sum(axis=2) Out[15]: array([[0.4283302 1.14773061] [1.07428645 0.51629657]]) x.sum(axis=2)[0 1] Out[16]: 1.147730606111291 x[0 1 :].sum() Out[17]: 1.147730606111291其它

极值：

>>>

x = np.arange(1 17).reshape(4 4) x Out[23]: array([[ 1 2 3 4] [ 5 6 7 8] [ 9 10 11 12] [13 14 15 16]]) x.min() Out[24]: 1 x.max() Out[25]: 16 x.argmin() # index of minimum Out[26]: 0 x.argmax() # index of maximum Out[27]: 15

逻辑运算：

>>>

>>> np.all([True True False]) False >>> np.any([True True False]) True

可用于数组比较：

>>>

>>> a = np.zeros((100 100)) >>> np.any(a != 0) False >>> np.all(a == a) True >>> a = np.array([1 2 3 2]) >>> b = np.array([2 2 3 2]) >>> c = np.array([6 4 4 5]) >>> ((a <= b) & (b <= c)).all() True

统计数据：

>>>

>>> x = np.array([1 2 3 1]) >>> y = np.array([[1 2 3] [5 6 1]]) >>> x.mean() 1.75 >>> np.median(x) 1.5 >>> np.median(y axis=-1) # last axis array([2. 5.]) >>> x.std() # full population standard dev. 0.82915619758884995

工作示例：使用随机游走算法

pythonnumpy多维数组的理解（用Python做科学计算工具篇）(2)

让我们考虑一个简单的一维随机游走过程：在每个时间步，步行者以相等的概率向右或向左跳跃。

我们有兴趣在 t 左跳或右跳后找到与随机步行者原点的距离？我们将模拟许多“步行者”来找到这个定律，我们将使用数组计算技巧来做到这一点：我们将创建一个二维数组，

pythonnumpy多维数组的理解（用Python做科学计算工具篇）(3)

>>>

>>> n_stories = 1000 # number of walkers >>> t_max = 200 # 时间

我们随机选择步行的所有步骤 1 或 -1：

>>>

>>> t = np.arange(t_max) >>> steps = 2 * np.random.randint(0 1 1 (n_stories t_max)) - 1 # 1 because the high value is exclusive >>> np.unique(steps) # 验证，所有步为1或-1 array([-1 1])

我们通过对时间的步求和来得到距离：

>>>

>>> positions = np.cumsum(steps axis=1) # axis = 1: dimension of time >>> sq_distance = positions**2

我们得到故事轴的平均值：

>>>

>>> mean_sq_distance = np.mean(sq_distance axis=0)

结果：

pythonnumpy多维数组的理解（用Python做科学计算工具篇）(4)

我们得到的结论：RMS 距离随着时间的平方根而增长！

1.2.3 Broadcasting

numpy数组的基本操作（加法等）是元素级的
这适用于相同大小的数组。尽管如此，也可以对不同大小的数组进行操作，如果NumPy可以转换这些数组，以便它们大小相同：这种转换称为Broadcasting。

下图给出了一个Broadcasting的例子：

pythonnumpy多维数组的理解（用Python做科学计算工具篇）(5)

让我们验证一下：

>>>

>>> a = np.tile(np.arange(0 40 10) (3 1)).T >>> a array([[ 0 0 0] [10 10 10] [20 20 20] [30 30 30]]) >>> b = np.array([0 1 2]) >>> a b array([[ 0 1 2] [10 11 12] [20 21 22] [30 31 32]])

我们已经在不知不觉中使用了Broadcasting！：

>>>

>>> a = np.ones((4 5)) >>> a[0] = 2 # 我们将维数为0的数组赋值给维数为1的数组 >>> a array([[2. 2. 2. 2. 2.] [1. 1. 1. 1. 1.] [1. 1. 1. 1. 1.] [1. 1. 1. 1. 1.]])

一个有用的技巧：

>>>

>>> a = np.arange(0 40 10) >>> a.shape (4 ) >>> a = a[: np.newaxis] # adds a new axis -> 2D array >>> a.shape (4 1) >>> a array([[ 0] [10] [20] [30]]) >>> a b array([[ 0 1 2] [10 11 12] [20 21 22] [30 31 32]])

Broadcasting 看起来有点神奇，但当我们要解决输出数据是比输入数据多维的数组的问题时，使用它实际上是很自然的。

许多基于网格或基于网络的问题也可以使用Broadcasting。例如，如果我们想计算 5x5 网格上点到原点的距离，我们可以这样做

>>>

>>> x y = np.arange(5) np.arange(5)[: np.newaxis] >>> distance = np.sqrt(x ** 2 y ** 2) >>> distance array([[0. 1. 2. 3. 4. ] [1. 1.41421356 2.23606798 3.16227766 4.12310563] [2. 2.23606798 2.82842712 3.60555128 4.47213595] [3. 3.16227766 3.60555128 4.24264069 5. ] [4. 4.12310563 4.47213595 5. 5.65685425]])

或颜色：

>>>

>>> plt.pcolor(distance) >>> plt.colorbar()

pythonnumpy多维数组的理解（用Python做科学计算工具篇）(6)

备注：该numpy.ogrid()函数允许直接创建上向量 x 和 y，具有两个“重要维度”：

>>>

>>> x y = np.ogrid[0:5 0:5] >>> x y (array([[0] [1] [2] [3] [4]]) array([[0 1 2 3 4]])) >>> x.shape y.shape ((5 1) (1 5)) >>> distance = np.sqrt(x ** 2 y ** 2)

因此一旦我们需要在网格上处理计算，np.ogrid，用起来非常方便。另一方面，np.mgrid直接为我们不能（或不想）从 Broadcasting 中受益的情况提供索引的矩阵：

>>>

>>> x y = np.mgrid[0:4 0:4] >>> x array([[0 0 0 0] [1 1 1 1] [2 2 2 2] [3 3 3 3]]) >>> y array([[0 1 2 3] [0 1 2 3] [0 1 2 3] [0 1 2 3]])1.2.4 阵列形状操作展平

>>>

>>> a = np.array([[1 2 3] [4 5 6]]) >>> a.ravel() array([1 2 3 4 5 6]) >>> a.T array([[1 4] [2 5] [3 6]]) >>> a.T.ravel() array([1 4 2 5 3 6])

更高的维度：最后的维度是“第一”。

重塑

展平的逆操作：

>>>

>>> a.shape (2 3) >>> b = a.ravel() >>> b = b.reshape((2 3)) >>> b array([[1 2 3] [4 5 6]])

或者，

>>>

>>> a.reshape((2 -1)) # unspecified (-1) value is inferred array([[1 2 3] [4 5 6]])

>>> b[0 0] = 99 >>> a array([[99 2 3] [ 4 5 6]])

当心：reshape 也可能返回一个copy！：

>>>

>>> a = np.zeros((3 2)) >>> b = a.T.reshape(3*2) >>> b[0] = 9 >>> a array([[0. 0.] [0. 0.] [0. 0.]])

要理解这一点，需要了解有关 numpy 数组的内存布局的更多信息。

添加维度

使用np.newaxis对象，允许我们向数组添加轴：

>>>

>>> z = np.array([1 2 3]) >>> z array([1 2 3]) >>> z[: np.newaxis] array([[1] [2] [3]]) >>> z[np.newaxis :] array([[1 2 3]])维度洗牌

>>>

>>> a = np.arange(4*3*2).reshape(4 3 2) >>> a.shape (4 3 2) >>> a[0 2 1] 5 >>> b = a.transpose(1 2 0) >>> b.shape (3 2 4) >>> b[2 1 0] 5

还创建了一个view：

>>>

>>> b[2 1 0] = -1 >>> a[0 2 1] -1调整大小

可以通过以下方式更改数组的大小ndarray.resize：

>>>

>>> a = np.arange(4) >>> a.resize((8 )) >>> a array([0 1 2 3 0 0 0 0])

但是，不得在其它地方提及它：

>>>

>>> b = a >>> a.resize((4 )) ValueError: cannot resize an array that has been referenced or is referencing another array in this way. Use the resize function1.2.5。排序数据

沿轴排序：

>>>

>>> a = np.array([[4 3 5] [1 2 1]]) >>> b = np.sort(a axis=1) >>> b array([[3 4 5] [1 1 2]])

分别对每一行进行排序！

就地排序：

>>>

>>> a.sort(axis=1) >>> a array([[3 4 5] [1 1 2]])

用花哨的索引排序：

>>>

>>> a = np.array([4 3 1 2]) >>> j = np.argsort(a) >>> j array([2 3 1 0]) >>> a[j] array([1 2 3 4])

寻找最小值和最大值：

>>>