## Backpropagation in Convolutional (Neural) Network

Neural networks and deep learning, Chapter 6: Backpropagation in a convolutional network The core equations of backpropagation in a network with fully-connected layers are (BP1)-(BP4) (link). Suppose we have a network containing a convolutional layer, a max-pooling layer, and a fully-connected output layer, as in the network discussed above. How are the equations of backpropagation […]

## Proof for Softmax Regression gradient

The notation of this proof comes from the article Softmax Regression by UFLDL Tutorial.     We have:       When :                                 When :                             Now we look carefully at the […]

## Learning From Data – A Short Course: Exercise 7.19

Page 41 Previously, for our digit problem, we used symmetry and intensity. How do these features relate to deep networks? Do we still need them? Symmetry and intensity features can be the input features in the input layer or some outputs of some hidden layer. We may intentionally inject them into the network or they may […]

If we apply logistic regression in binary classification problem then we will have a parameter called threshold (its usual value is ). In skewed classes problem, rare class is considered as positive class (Andrew Ng). When the threshold increases: Intuition: If you are more picky for positive class then when you label a data point […]

## Learning From Data – A Short Course: Exercise 7.18

Since the input is an image it is convenient to represent it as a mtrix of its pixels which are black () or white (). The basic shape of identifies a set of these pixels which are black. (a) Show that feature  can be computed by the neural network node     Set if the […]

## Learning From Data – A Short Course: Exercise 3.9

Page 97 Consider pointwise error measures , , and , where the signal . (b) Show that , and hence that the classification error is upper bounded by the squared error. If then and . If  then , if then and , if then and , in general for whichever the case may be we have: . […]

## Proof that non-singular matrix A and singular matrix B add to non-singular matrix C

We have:     A note here is that where is not equivalent to . Consider the two respective column vectors and , for the sake of simplicity and are now scalars, we have but . Because is non-singular and suppose are by matrices, so in first columns are pivots and last columns are free. 10/24/2016: […]

## Learning From Data – A Short Course: Exercise 7.13

Page 27   Suppose you run gradient descent for 1000 iterations. You have 500 examples in , and you use 450 for and 50 for . You output the weight from iteration 50, with and . (a) Is  an unbiased estimate of ? No. (b) Use the Hoeffding bound to get a bound for using […]

## Learning From Data – A Short Course: Exercise 7.11

Page 24 For weight elimination, show that . I have this differential formula: . Now I consider the following derivative:                 So we have:     Argue that weight elimination shrinks small weights faster than large ones. There are many ways to do this. One of the fastest […]