物体检测

终于有点时间，可以写继续写notes，这样可以让整个学习过程的印象更加深入。

本节课为CNN的第三周，总的来说就是讲述如何通过pre-train的物体模型，识别整张照片上的物体。和以往前两周的课程一样，围绕着若干篇论文算法展开 Detecting Algorithm

Object Localization
Landmark Detection, which describe less in the class about how to detect the interal features of an object by key landmarks
Object Detection, talking about how to detect an object with bounding box
Sliding Window

这里主要重点回顾YOLO（you only look once）。这是本课重点阐述的内容，video有两个，作业也是直接就是讲述YOLO，顺带一点其他算法。

起点，将原图划分成19x19的区块（方便简化计算）
Input image (608, 608, 3)
The input image goes through a CNN, resulting in a (19,19,5,85) dimensional output.
After flattening the last two dimensions, the output is a volume of shape (19, 19, 425):
- Each cell in a 19x19 grid over the input image gives 425 numbers.
- 425 = 5 x 85 because each cell contains predictions for 5 boxes, corresponding to 5 anchor boxes, as seen in lecture.
- 85 = 5 + 80 where 5 is because (pc,bx,by,bh,bw)(pc,bx,by,bh,bw) has 5 numbers, and and 80 is the number of classes we’d like to detect
You then select only few boxes based on:
- Score-thresholding: throw away boxes that have detected a class with a score less than the threshold
- Non-max suppression: Compute the Intersection over Union and avoid selecting overlapping boxes
This gives you YOLO’s final output.

What you should remember:

YOLO is a state-of-the-art object detection model that is fast and accurate
It runs an input image through a CNN which outputs a 19x19x5x85 dimensional volume.
The encoding can be seen as a grid where each of the 19x19 cells contains information about 5 boxes.
You filter through all the boxes using non-max suppression. Specifically:
- Score thresholding on the probability of detecting a class to keep only accurate (high probability) boxes
- Intersection over Union (IoU) thresholding to eliminate overlapping boxes
Because training a YOLO model from randomly initialized weights is non-trivial and requires a large dataset as well as lot of computation, we used previously trained model parameters in this exercise. If you wish, you can also try fine-tuning the YOLO model with your own dataset, though this would be a fairly non-trivial exercise.

Aug 12 2018

Technology

对于tensorflow基本用法的一些记录

最近已经学到了机器学习的第四课CNN的部分。这个部分里面还是用到了一些Tensorflow的基本内容。这里把一些简单的方法做个总结，以做备忘，也许之后用得上。

1 2	# Use tf.image.non_max_suppression() to get the list of indices corresponding to boxes you keep nms_indices = tf.image.non_max_suppression(boxes, scores, max_boxes, iou_threshold)

对于TF的Run始终觉得需要系统的理解一下

# Run the session with the correct tensors and choose the correct placeholders in the feed_dict.
    # You'll need to use feed_dict={yolo_model.input: ... , K.learning_phase(): 0})
    out_scores, out_boxes, out_classes = sess.run([scores, boxes, classes],
                                                  feed_dict={yolo_model.input: image_data, K.learning_phase(): 0})

这里除了TensorFlow还得多提一个Keras，一个构建在TF上面更加丰富函数的第三方包

1 2	keras.backend.argmax(x, axis=-1) keras.backend.max(x, axis=None, keepdims=False)

np也有一些特殊的方法，比较不常见和不容易理解，下面np.eye就是把一个Y变成C个为一组的one-hot

1
2
3

def convert_to_one_hot(Y, C):
    Y = np.eye(C)[Y.reshape(-1)]
    return Y