Chapter 8 Numpy

import numpy as np

8.1 Numpy dimension, size and shape

dimension: 以array的單一軸線(axis)來看的向量維度。
- axis=0: row方向，即以row vector角度。
- axis=1: column方向，即以column vector角度。
shape: 不同軸線的維度表示。
- (axis=0的維度, axis=1的維度)

範例

$\begin{eqnarray*} \mathbf{X}_{2\times 3}=\left[ \begin{array}{ccc} -1 & 1 & 5 \\ 0 & 7 & -4 \\ \end{array} \right] \end{eqnarray*}$

以row vector來看每個向量來自3維(row dimension=3)空間。
以column vector來看每個向量來自2維(column dimension=2)空間。
shape=(3,2)

X_23=np.array([
  [-1,1,5], 
  [0,7,4 ]
])       

X_23.shape

計算有幾個軸線axis

X_23.ndim

8.1.1 vectors to matrix

x0=np.array([[2,3,5,7,9,11]])

x0
x0.shape
x0.shape=(2,3) # method直接改x0
x0

x1=x0.reshape((3,2)) # method不會改x0
x0
x1

8.2 Array and flat array

$\begin{eqnarray*} \mathbf{x1}=\left[ \begin{array}{c} 1 \\ 2 \\ 3 \\ \end{array} \right]_{3\times 1}\in\mathbb{R}^3,~~ \mathbf{x2}=\left[ \begin{array}{ccc} 1 & 2 & 3 \\ \end{array} \right]_{1\times 3}\in\mathbb{R}^3 \end{eqnarray*}$

x1=np.array([[ 1., 2., 3.]])
x2=np.array([[1.],[2.],[3.]])

print(x1)
x1.shape 
print(x2)
x2.shape

flat array。

x0=np.array([ 1., 0., 0.]) # 少了最外層[.]
print(x0)
x0.shape
x0.ndim

Flat array uses less memory than vector, which gives it a fast performance.

Naturally there is a way to convert a vector into a flat array via .flatten() method.

x1_flat=x1.flatten()
x1_flat
print(x1_flat)
x1_flat.shape

8.3 Elementwise Operations

add: $+$
subtract: $-$
multiplication: $\otimes$
division: $\oslash$

Mathematically: Only when two matrices are conformable. Python: when two arrays are compatible.

a = np.array(
  [[1.0, 2.0, 3.0],
  [1.0, 2.0, 3.0]]
  )  

b = np.array(
  [[-1, 2.0, -2.0],
  [1.0,2.0,7.0]]
  )

$\mathbf{a}=\begin{bmatrix} 1.00 & 2.00 & 3.00\\ 1.00 & 2.00 & 3.00 \end{bmatrix}, \mathbf{b}=\begin{bmatrix} -1.00 & 2.00 & -2.00\\ 1.00 & 2.00 & 7.00 \end{bmatrix}$

a+b
a-b
a*b # * 是elementwise product，不是矩陣相乘
a/b

$\begin{eqnarray} \mathbf{a}+\mathbf{b} &=& \begin{bmatrix} 0.00 & 4.00 & 1.00\\ 2.00 & 4.00 & 10.00 \end{bmatrix}\\ \mathbf{a}-\mathbf{b} &=& \begin{bmatrix} 2.00 & 0.00 & 5.00\\ 0.00 & 0.00 & -4.00 \end{bmatrix}\\ \mathbf{a}\otimes \mathbf{b} &=& \begin{bmatrix} -1.00 & 4.00 & -6.00\\ 1.00 & 4.00 & 21.00 \end{bmatrix}\\ \mathbf{a}\oslash \mathbf{b} &=& \begin{bmatrix} -1.00 & 1.00 & -1.50\\ 1.00 & 1.00 & 0.43 \end{bmatrix} \end{eqnarray}$

8.4 Broadcasting

當矩陣運算所需的conformability條件不滿足時，Python更動dimensions形成conformability的原則。

Python先定義比conformability條件寬的：dimension compatibility.

Pythond dimension compatibility definition:

Two dimensions are compatible when

C1. they are equal, or

C2. one of them is 1.

dimension compatibility

basic example

a = np.array(
  [[1.0, 2.0, 3.0],
  [1.0, 2.0, 3.0]]
  )  

b = np.array(
  [[-1, 2.0, -2.0],
  [1.0,2.0,7.0]]
  )

For matrix $\mathbf{a}$ and $\mathbf{b}$ , each one of them has two dimensions, as $(d1,d2)$ .

For d1 dimension: they are equal. (C1 compatible)
For d2 dimension: they are equal. (C1 compatible)

inconformable example

New say we define $\mathbf{a}$ as

$\mathbf{a}=\begin{bmatrix} 1.00 & 2.00 & 3.00 \end{bmatrix}$ and $\mathbf{b}$ the same as before: $\mathbf{b}=\begin{bmatrix} -1.00 & 2.00 & -2.00\\ 1.00 & 2.00 & 7.00 \end{bmatrix}$

a = np.array([1.0, 2.0, 3.0]) 
a.shape=(1,3)

b = np.array(
  [[-1, 2.0, -2.0],
  [1.0,2.0,7.0]]
  ) # (2,3)

$\mathbf{a}$ and $\mathbf{b}$ are inconformable, but Python compatible:

For d1 dimension: a is 1 and b is 2; one of them is 1. (C2 compatible)
For d2 dimension: they are equal. (C1 compatible)

8.4.0.1 broadcasting C2 compatible dimension

將dimension=1的矩陣之沿著該dimension方向，複製到與另一矩陣相同dimension。

a_broadcast=np.array(
  [
  [1.0, 2.0, 3.0],
  [1.0, 2.0, 3.0]
  ]
  )

a_broadcast.shape=(2,3)

print(a_broadcast)

a+b == a_broadcast+b
a-b == a_broadcast-b
a*b == a_broadcast*b
a/b == a_broadcast/b

若透過broadcasting可以conformable，則應使用inconformable矩陣來減少記憶體使用，加快運算速度——不需要先形成conformable矩陣，只需有Python compatible。

flat array在進行elementwise運算會被當做row vector. 理解x0+x1及x0+x2的成因。

x0=np.array([ 1., 0., 0.])
x1=np.array([[ 1., 2., 3.]])
x2=x1.reshape((3,1))
x0
x1
x2
x0+x1
x0+x2

8.5 Vectorized function

https://docs.scipy.org/doc/numpy/reference/generated/numpy.vectorize.html

8.5.1 Vectorized function

One of the features that NumPy provides is a class vectorize to convert an ordinary Python function which accepts scalars and returns scalars into a “vectorized-function” with the same broadcasting rules as other NumPy functions (i.e. the Universal functions, or ufuncs).

def addsubtract(a,b):
   if a > b:
       return a - b
   else:
       return a + b

addsubtract([0,3,6,9],[1,3,5,7])

vec_addsubtract = np.vectorize(addsubtract)

vec_addsubtract([0,3,6,9],[1,3,5,7])

排除某些arg不vectorized

import math
def a_plus_b(a,b):
    return math.exp(a)*math.exp(b)

vec_a_plus_vec_b=np.vectorize(a_plus_b)
vec_a_plus_b=np.vectorize (a_plus_b, excluded=['b'])

(a,b)=np.random.randint(10,size=(2,5))
vec_a_plus_vec_b(a,b=b)
vec_a_plus_b(a,b=b[0])
vec_a_plus_b(a,b=b) # Error

8.5.2 Universal function (ufunc)

對array input進行elementwise運算的函數，當dimension不足時，會進行broadcasting補齊
ufunc函數：https://docs.scipy.org/doc/numpy/reference/ufuncs.html#available-ufuncs

8.6 Operations

8.6.1 Inner product: `@`

a = np.array([1.0, 2.0, 3.0])
b = np.array([2.0, 2.0, 2.0])
a
b

a@b

A=np.array(
  [
    [2, 4],
    [1, 3]
  ]
)

B=np.array(
  [
    [-1, 2],
    [-3, 4]
  ]
)

A*B # 不是矩陣相乘AB

A@B # 才是矩陣相乘

8.6.2 transpose

x1
x1.T

Only vector or matrix can be transposed, but not a flat array.

x0=np.array([2,3,5])
print(x0)
print(x0.T) # no difference from x0

8.6.3 inverse & determinant

Require sub-package linalg of numpy

from numpy.linalg import inv, det

8.6.4 zero/one array

bold_0 = np.zeros(10)
print(bold_0)
bold_0.shape

bold_1 = np.ones(10)
print(bold_1)
bold_1.shape

np.zeros()/np.ones() produces a flat array. To make it a vector or a matrix, you need to change its shape via:

bold_0.shape=(10,1) # 10 by 1 vector
print(bold_0)
bold_0.shape

bold_0.shape=(2,5) # 2 by 5 matrix
print(bold_0)
bold_0.shape

8.6.5 Identity matrix

unit_coord = np.identity(5)
print(unit_coord)

8.7 Missing/Invalid Value

Masked arrays are arrays that may have missing or invalid entries.

import numpy as np
import numpy.ma as ma
x = np.array([1, 2, 3, -1, 5])

We wish to mark the fourth entry as invalid. The easiest is to create a masked array:

mx = ma.masked_array(x, mask=[0, 0, 0, 1, 0])

We can now compute the mean of the dataset, without taking the invalid data into account:

mx.mean()

8.8 Random Numbers

使用numpy.random模組

8.8.1 Two routines

電腦如何產生符合某個分配的random number sample？假設要產生10個標準常態分配random numbers:

產生10個32或64位元長度的隨機值，此過程稱為random bits generation。
不同分配會有一個隨機位元值對應到該分配隨機變數值的對照表，使用對應的表把位元值轉位所要分配之random numbers。

Numpy稱能產生1的物件為BitGenerators，能進行2的隨機變數值對應轉換的物件為Generators。

在過去Numpy的BitGenerator沒得選只有一種，所以只需要有Generator讓使用者設定使用即可，現在稱過去方式產生的random number 為legacy RandomState random numbers，以便和現在（numpy 1.17+）缺點較少的Generator random numbers有所區分。

8.8.1.1 RandomState random number

使用說明：https://docs.scipy.org/doc/numpy/reference/random/legacy.html#numpy.random.mtrand.RandomState

import numpy.random as rs_rv

rs_rv.standard_normal(size=10) # RandomState所產生的10個standard normal random number

8.8.1.2 Generator random number

For numpy 1.17 or above only
使用說明：https://docs.scipy.org/doc/numpy/reference/random/generator.html

from numpy.random import default_rng
g_rv=default_rng() #  g_rv為Generator object
g_rv.standard_normal(size=10) # Generator所產生的standard normal random number

8.8.2 Seed

用來確保此隨機樣本別人執行時可以複製出來。
在求隨機函數期望值極值時，我們用隨機函數樣本值平均來逼近，此時會固定隨機樣本以免每次求FOC, SOC時其值會不斷變動。

RandomState

rs_rv.seed(2019)
rs_rv.standard_normal(size=10)

Random Generator

g_rv=default_rng(2019) # 需要在建立Generator obj時就設好
g_rv.standard_normal(size=10)

假設隨機樣本 $(y,x)$ 值間有如下的關係：

$y=0.1x+0.33\epsilon,\mbox{ where }\ x\mbox{ and } \epsilon\sim\ N(0,1)$

以上述的關係隨機抽出100個 $(x,\epsilon)$ 隨機值形成一個sample size=100的 $(y,x)$ sample。（即使用上述式子為樣本的data generating process）

假設一錢幣投擲出現正面機率為80%，反面為20%。請隨機產生投擲100次的觀測樣本。

以下面的Data Generating Process (DGP)產生100個隨機樣本： $\bf Y_i=X_i\beta+\epsilon_i,$ 其中 $\bf Y_i=\left[\begin{array}{c} \bf y_{1i}\\ \bf y_{2i}\\ \end{array}\right],\ X_i=\left[\begin{array}{c} \bf x_{1i}\\ \bf x_{2i}\\ \end{array}\right] ,\ \bf \beta=\left[\begin{array}{c} 1 & 0.2\\ 0 & 1\\ \end{array}\right]$ 且 $\bf x_i\sim N(0,I),\ \epsilon_i\sim N(0,\Sigma),\\ \bf x_i \perp \epsilon_i\\ \Sigma=\left[\begin{array}{cc} 1 & 0.2\\ 0.2 & 0.9\\ \end{array}\right]$

8.9 Efficient Linear Algebra

Built on numpy, scipy offers more efficient linear algebra than numpy. scipy.linalg contains all the functions in numpy.linalg. plus some other more advanced ones not contained in numpy.linalg.

from scipy.linalg import inv, solve, det, eig