Friday, 22 November 2013

machine learning--04 : Normal equation

  Besides gradient descent, we could use normal equation to find out the optimal hypothesis function too.

   As equation 1 show, normal equation is far more easier to implement with compare to gradient descent.If the matrix is singular, we could either decrease the number of features, or using SVD to find an approximation of the inverse matrix.

  The pros and cons of gradient descent vs normal equation.

Gradient Descent

  • Need to choose alpha
  • Needs many iterations
  • works well even when n(number of features) is large

Normal Equation

  • No need to choose alpha
  • Don't need to iterate
  • Need to compute inverse matrix
  • slow if n(number of features) is very large

  The price of computing the inverse matrix is almost same as O(n^3), this kind of complexity is unacceptable when the number of n is big(10000 or more, depends on your machine).