## Learning From Data – A Short Course: Exercise 7.10

Page 20 How many weight parameters are there in a neural network with architecture speciﬁed by , a vector giving the number of nodes in each layer?     Evaluate your formula for a 2 hidden layer network with 10 hidden nodes in each hidden layer.

## Learning From Data – A Short Course: Exercise 7.9

Page 18 What can go wrong if you just initialize all the weights to exactly zero?   For , if becomes zero then becomes zero. For , ( or ), if becomes zero then becomes zero. The gradient will then becomes zero so the algorithm will stop immediately and then blindly return as the final […]

## [Notes] Learning From Data – A Short Course: e-Chapter 7

Page 18: Need an explanation for . Page 20: : is number of node in the first layer, is number of node in the input layer. Here is my guess: Each input node must connect to at least one node in the first layer (that is ). So the first input node can choose one in […]

## Learning From Data – A Short Course: Exercise 7.7

Page 11 For the sigmoidal perceptron, , let the in-sample error be . Show that: If , what happens to the gradient; how this is related to why it is hard to optimize the perceptron.                                 We observe that: That […]

## Learning From Data – A Short Course: Exercise 7.2

Page 3 (a) The Boolean and of two inputs can be extended to more than two inputs: if any one of the inputs is ; if all the inputs equal . Give graph representations of and .         (c) Give the graph representation of .

## [Notes] Reflection through hyperplane

(from Introduction to Linear Algebra [4th Edition] by Gilbert Strang, section 4.4, page 231) I don’t know the official reflection definition in mathematics so it would be a lame to prove that is indeed a reflection matrix, but I will note several stuffs that I have observed. Also special thanks to anyone gave me a […]

## Proof that if a non-zero vector is orthogonal to a subspace then it is not in the subspace

We consider a non-zero vector and a matrix with set of column vectors is a basis , is orthogonal to . Now we prove that column vectors of are linearly independent. If column vectors of  are linearly independent then is invertible (which means ). We have: . It’s easy to see that  ( is invertible because has linearly […]

## Proof that N(CD) = N(D) if C is invertible

So the statement follows.

## Proof that elimination does not change the row space

To be clear, in this proof I concern the row space of the matrix . The matrix represents the elimination process. We have:     is invertible (because it is the product of invertible elementary row operation matrices), so is . Hence we have:     That means column vectors of are linear combinations of column […]

## Proof that dim(A) + dim(B) = dim(A and B) + dim(A + B)

Here and are two subspaces. For the simplicity, in the following proof, I will use notation as a matrix / block matrix containing basis of subspace , so will . The subspace contains all the vectors that satisfy the condition: There exists a vector  and vector  such that , which also means: . Every determines a […]