Chapter 8 Numpy
8.1 Numpy dimension, size and shape
dimension: 以array的單一軸線(axis)來看的向量維度。
axis=0: row方向,即以row vector角度。
axis=1: column方向,即以column vector角度。
shape: 不同軸線的維度表示。
- (axis=0的維度, axis=1的維度)
範例
\[ \begin{eqnarray*} \mathbf{X}_{2\times 3}=\left[ \begin{array}{ccc} -1 & 1 & 5 \\ 0 & 7 & -4 \\ \end{array} \right] \end{eqnarray*} \]
以row vector來看每個向量來自3維(row dimension=3)空間。
以column vector來看每個向量來自2維(column dimension=2)空間。
shape=(3,2)
計算有幾個軸線axis
8.1.1 vectors to matrix
8.2 Array and flat array
\[ \begin{eqnarray*} \mathbf{x1}=\left[ \begin{array}{c} 1 \\ 2 \\ 3 \\ \end{array} \right]_{3\times 1}\in\mathbb{R}^3,~~ \mathbf{x2}=\left[ \begin{array}{ccc} 1 & 2 & 3 \\ \end{array} \right]_{1\times 3}\in\mathbb{R}^3 \end{eqnarray*} \]
flat array。
Flat array uses less memory than vector, which gives it a fast performance.
Naturally there is a way to convert a vector into a flat array via .flatten()
method.
8.3 Elementwise Operations
- add: \(+\)
- subtract: \(-\)
- multiplication: \(\otimes\)
- division: \(\oslash\)
Mathematically: Only when two matrices are conformable. Python: when two arrays are compatible.
\[ \mathbf{a}=\begin{bmatrix} 1.00 & 2.00 & 3.00\\ 1.00 & 2.00 & 3.00 \end{bmatrix}, \mathbf{b}=\begin{bmatrix} -1.00 & 2.00 & -2.00\\ 1.00 & 2.00 & 7.00 \end{bmatrix} \]
\[ \begin{eqnarray} \mathbf{a}+\mathbf{b} &=& \begin{bmatrix} 0.00 & 4.00 & 1.00\\ 2.00 & 4.00 & 10.00 \end{bmatrix}\\ \mathbf{a}-\mathbf{b} &=& \begin{bmatrix} 2.00 & 0.00 & 5.00\\ 0.00 & 0.00 & -4.00 \end{bmatrix}\\ \mathbf{a}\otimes \mathbf{b} &=& \begin{bmatrix} -1.00 & 4.00 & -6.00\\ 1.00 & 4.00 & 21.00 \end{bmatrix}\\ \mathbf{a}\oslash \mathbf{b} &=& \begin{bmatrix} -1.00 & 1.00 & -1.50\\ 1.00 & 1.00 & 0.43 \end{bmatrix} \end{eqnarray} \]
8.4 Broadcasting
當矩陣運算所需的conformability條件不滿足時,Python更動dimensions形成conformability的原則。
Python先定義比conformability條件寬的:dimension compatibility.
Pythond dimension compatibility definition:
Two dimensions are compatible when
C1. they are equal, or
C2. one of them is 1.
dimension compatibility
basic example
For matrix \(\mathbf{a}\) and \(\mathbf{b}\), each one of them has two dimensions, as \((d1,d2)\).
For d1 dimension: they are equal. (C1 compatible)
For d2 dimension: they are equal. (C1 compatible)
inconformable example
New say we define \(\mathbf{a}\) as
\[ \mathbf{a}=\begin{bmatrix} 1.00 & 2.00 & 3.00 \end{bmatrix} \] and \(\mathbf{b}\) the same as before: \[ \mathbf{b}=\begin{bmatrix} -1.00 & 2.00 & -2.00\\ 1.00 & 2.00 & 7.00 \end{bmatrix} \]
a = np.array([1.0, 2.0, 3.0])
a.shape=(1,3)
b = np.array(
[[-1, 2.0, -2.0],
[1.0,2.0,7.0]]
) # (2,3)
\(\mathbf{a}\) and \(\mathbf{b}\) are inconformable, but Python compatible:
For d1 dimension: a is 1 and b is 2; one of them is 1. (C2 compatible)
For d2 dimension: they are equal. (C1 compatible)
8.4.0.1 broadcasting C2 compatible dimension
將dimension=1的矩陣之沿著該dimension方向,複製到與另一矩陣相同dimension。
a_broadcast=np.array(
[
[1.0, 2.0, 3.0],
[1.0, 2.0, 3.0]
]
)
a_broadcast.shape=(2,3)
print(a_broadcast)
若透過broadcasting可以conformable,則應使用inconformable矩陣來減少記憶體使用,加快運算速度——不需要先形成conformable矩陣,只需有Python compatible。
flat array在進行elementwise運算會被當做row vector. 理解x0+x1及x0+x2的成因。
8.5 Vectorized function
8.5.1 Vectorized function
One of the features that NumPy provides is a class vectorize to convert an ordinary Python function which accepts scalars and returns scalars into a “vectorized-function” with the same broadcasting rules as other NumPy functions (i.e. the Universal functions, or ufuncs).
排除某些arg不vectorized
8.5.2 Universal function (ufunc)
對array input進行elementwise運算的函數,當dimension不足時,會進行broadcasting補齊
ufunc函數:https://docs.scipy.org/doc/numpy/reference/ufuncs.html#available-ufuncs
8.6 Operations
8.6.1 Inner product: @
8.6.2 transpose
Only vector or matrix can be transposed, but not a flat array.
8.6.4 zero/one array
np.zeros()
/np.ones()
produces a flat array. To make it a vector or a matrix, you need to change its shape via:
8.6.5 Identity matrix
8.7 Missing/Invalid Value
Masked arrays are arrays that may have missing or invalid entries.
We wish to mark the fourth entry as invalid. The easiest is to create a masked array:
We can now compute the mean of the dataset, without taking the invalid data into account:
8.8 Random Numbers
- 使用numpy.random模組
8.8.1 Two routines
電腦如何產生符合某個分配的random number sample?假設要產生10個標準常態分配random numbers:
產生10個32或64位元長度的隨機值,此過程稱為random bits generation。
不同分配會有一個隨機位元值對應到該分配隨機變數值的對照表,使用對應的表把位元值轉位所要分配之random numbers。
Numpy稱能產生1的物件為BitGenerators,能進行2的隨機變數值對應轉換的物件為Generators。
在過去Numpy的BitGenerator沒得選只有一種,所以只需要有Generator讓使用者設定使用即可,現在稱過去方式產生的random number 為legacy RandomState random numbers,以便和現在(numpy 1.17+)缺點較少的Generator random numbers有所區分。
8.8.1.1 RandomState random number
8.8.1.2 Generator random number
For numpy 1.17 or above only
使用說明:https://docs.scipy.org/doc/numpy/reference/random/generator.html
8.8.2 Seed
用來確保此隨機樣本別人執行時可以複製出來。
在求隨機函數期望值極值時,我們用隨機函數樣本值平均來逼近,此時會固定隨機樣本以免每次求FOC, SOC時其值會不斷變動。
RandomState
Random Generator
假設隨機樣本\((y,x)\)值間有如下的關係:
\[y=0.1x+0.33\epsilon,\mbox{ where }\ x\mbox{ and } \epsilon\sim\ N(0,1)\]
以上述的關係隨機抽出100個\((x,\epsilon)\)隨機值形成一個sample size=100的\((y,x)\) sample。(即使用上述式子為樣本的data generating process)
假設一錢幣投擲出現正面機率為80%,反面為20%。請隨機產生投擲100次的觀測樣本。
以下面的Data Generating Process (DGP)產生100個隨機樣本: \[\bf Y_i=X_i\beta+\epsilon_i,\] 其中 \[ \bf Y_i=\left[\begin{array}{c} \bf y_{1i}\\ \bf y_{2i}\\ \end{array}\right],\ X_i=\left[\begin{array}{c} \bf x_{1i}\\ \bf x_{2i}\\ \end{array}\right] ,\ \bf \beta=\left[\begin{array}{c} 1 & 0.2\\ 0 & 1\\ \end{array}\right] \] 且 \[\bf x_i\sim N(0,I),\ \epsilon_i\sim N(0,\Sigma),\\ \bf x_i \perp \epsilon_i\\ \Sigma=\left[\begin{array}{cc} 1 & 0.2\\ 0.2 & 0.9\\ \end{array}\right]\]
8.9 Efficient Linear Algebra
Built on numpy, scipy offers more efficient linear algebra than numpy. scipy.linalg contains all the functions in numpy.linalg. plus some other more advanced ones not contained in numpy.linalg.