Edit ## Linear Regression

Linear Regression = 线性回归。

Single Feature Hypothesis: Cost Function: ### Cost Function & Gradient Descent

Multiple Feature Hypothesis: Multiple Feature Cost Function:  • j := 0…m
• - Learning rate.

### How to choose learning rate - ?

• If is too small: slow convergence
• If is too large: J( ) may not decrease on every iteration; may not converge.

To choose , try:
…, 0.001, 0.003, 0.01, 0.03, 0.1, 0.3, 1, …

### Feature Scaling

We can speed up gradient descent by having each of our input values in roughly the same range. This is because θ will descend quickly on small ranges and slowly on large ranges, and so will oscillate inefficiently down to the optimum when the variables are very uneven. • is the average of all the values for feature (i)
• is the range of values (max - min)
• or is the standard deviation.

* standard deviation (标准差)= ### Normal Equation正规解 • 注1：training set的数量要大于feature数量，否则 会不可逆，导致没有解
• 注2：正规解不需要feature scaling

## Logistic Regression

Hypothesis: Sigmoid Function / Logistic Function  The following image shows us what the sigmoid function looks like: y=1的条件下，x, 的取值概率。  ### Cost Function       Vectorized implementation:    ## Overfitting过拟合

### Regularized Linear Regression The λ, or lambda, is the regularization parameter. It determines how much the costs of our theta parameters are inflated.

### Regularized Logistic Regression The second sum, means to explicitly exclude the bias term !!注意：这里的regularized项，不包含 ，如果在Matlab/Octave中就是  WeChat Pay Alipay
Welcome to my other publishing channels