物体检测

终于有点时间,可以写继续写notes,这样可以让整个学习过程的印象更加深入。

本节课为CNN的第三周,总的来说就是讲述如何通过pre-train的物体模型,识别整张照片上的物体。和以往前两周的课程一样,围绕着若干篇论文算法展开 Detecting Algorithm

  • Object Localization
  • Landmark Detection, which describe less in the class about how to detect the interal features of an object by key landmarks
  • Object Detection, talking about how to detect an object with bounding box
  • Sliding Window

这里主要重点回顾YOLO(you only look once)。这是本课重点阐述的内容,video有两个,作业也是直接就是讲述YOLO,顺带一点其他算法。

  • 起点,将原图划分成19x19的区块 (方便简化计算)
  • Input image (608, 608, 3)
  • The input image goes through a CNN, resulting in a (19,19,5,85) dimensional output.
  • After flattening the last two dimensions, the output is a volume of shape (19, 19, 425):
    • Each cell in a 19x19 grid over the input image gives 425 numbers.
    • 425 = 5 x 85 because each cell contains predictions for 5 boxes, corresponding to 5 anchor boxes, as seen in lecture.
    • 85 = 5 + 80 where 5 is because (pc,bx,by,bh,bw)(pc,bx,by,bh,bw) has 5 numbers, and and 80 is the number of classes we’d like to detect
  • You then select only few boxes based on:
    • Score-thresholding: throw away boxes that have detected a class with a score less than the threshold
    • Non-max suppression: Compute the Intersection over Union and avoid selecting overlapping boxes
  • This gives you YOLO’s final output.

What you should remember:

  • YOLO is a state-of-the-art object detection model that is fast and accurate
  • It runs an input image through a CNN which outputs a 19x19x5x85 dimensional volume.
  • The encoding can be seen as a grid where each of the 19x19 cells contains information about 5 boxes.
  • You filter through all the boxes using non-max suppression. Specifically:
    • Score thresholding on the probability of detecting a class to keep only accurate (high probability) boxes
    • Intersection over Union (IoU) thresholding to eliminate overlapping boxes
  • Because training a YOLO model from randomly initialized weights is non-trivial and requires a large dataset as well as lot of computation, we used previously trained model parameters in this exercise. If you wish, you can also try fine-tuning the YOLO model with your own dataset, though this would be a fairly non-trivial exercise.

对于tensorflow基本用法的一些记录

最近已经学到了机器学习的第四课CNN的部分。这个部分里面还是用到了一些Tensorflow的基本内容。这里把一些简单的方法做个总结,以做备忘,也许之后用得上。

1
2
# Use tf.image.non_max_suppression() to get the list of indices corresponding to boxes you keep
nms_indices = tf.image.non_max_suppression(boxes, scores, max_boxes, iou_threshold)

对于TF的Run始终觉得需要系统的理解一下

1
2
3
4
# Run the session with the correct tensors and choose the correct placeholders in the feed_dict.
# You'll need to use feed_dict={yolo_model.input: ... , K.learning_phase(): 0})
out_scores, out_boxes, out_classes = sess.run([scores, boxes, classes],
feed_dict={yolo_model.input: image_data, K.learning_phase(): 0})

这里除了TensorFlow还得多提一个Keras,一个构建在TF上面更加丰富函数的第三方包

1
2
keras.backend.argmax(x, axis=-1)
keras.backend.max(x, axis=None, keepdims=False)

np也有一些特殊的方法,比较不常见和不容易理解,下面np.eye就是把一个Y变成C个为一组的one-hot

1
2
3
def convert_to_one_hot(Y, C):
Y = np.eye(C)[Y.reshape(-1)]
return Y