Abstract:The combination of deep learning and computer vision has brought a new detection mode in the field of object detection. Through the analysis of deep learning-based object detection network, the object detection network framework can be modularized and divided into three parts: feature extraction network, multi-scale fusion network and prediction network. This paper analyzes and summarizes each module from the modularized perspective of detection network, and gives suggestions on how to build a suitable model framework according to actual demand, which provides a reference for the research of target detection method based on deep learning.