# Sticky Note of Questions

KNOWLEDGE REPRESENTATION
http://groups.csail.mit.edu/medg/ftp/psz/k-rep.html
MACHINE LEARNING

1. Có vấn đề nào có thể giải quyết bằng Supervised Learning mà không thể giải quyết bằng Unsupervised Learning không? ANSWER.
2. In Bias-Variance Decomposition, what if the hypothesis set is more complex than the hypothesis and the average final hypothesis is far from target function than ?
3. It looks like this result suggests that bad hypothesis can lead to large generalization bound.
4. How does regularized logistic regression regularize perceptron model?
5. Andrew Ng says we should apply error analysis on cross validation error (however, I believe that he means non-cross validation error), but his method make me suspect that would lead to seriously tainted validation set. In case of real cross validation error, for my current knowledge it’s hard to tell but my guess is the training/validation set in such case will also get tainted seriously. ANSWER.
6. What is a feature map?
• Michael Nielsen in Neural networks and deep learning says feature map, kernel and filter, despite sometimes being used in slightly different ways, are the same:
• For this reason, we sometimes call the map from the input layer to the hidden layer a feature map. We call the weights defining the feature map the shared weights. And we call the bias defining the feature map in this way the shared bias. The shared weights and bias are often said to define a kernel or filter. In the literature, people sometimes use these terms in slightly different ways, and for that reason I’m not going to be more precise; rather, in a moment, we’ll look at some concrete examples.

• The 20 images correspond to 20 different feature maps (or filters, or kernels). Each map is represented as a 5×5 block image, corresponding to the 5×5 weights in the local receptive field.

• Ian Goodfellow, Yoshua Bengio and Aaron Courville in Deep Learning says:
• In convolutional network terminology, the ﬁrst argument (in this example, the function x) to the convolution is often referred to as the inputand the second argument (in this example, the function w) as the kernel. The output is sometimes referred to as the feature map.