State of Object Detection Models 2021 and more! | Newsletter by Victor Dibia - Issue #1
This edition is a roundup on the state of object detection models and self supervised multimodal models!
State of Deep Learning for Object Detection - You Should Consider CenterNets!
I spent some time this month reviewing the state of practical object detection deep learning models. I have found the Tensorflow Object Detection API to be a great starting point and was happy to see implementations for CenterNets included as part of the Tensorflow 2.0 model zoo.
Some highlights are summarized below.
CenterNets and Yolov5 appear to have very good accuracy/latency tradeoffs! In theory, these models should work well for low latency applications! CenterNets can be fast and accurate because they propose an "anchor-free" approach to predicting bounding boxes (more below).
MobileNet SSDv2 used to be the state of the art in terms speed. CenterNets (keypoint version) represents a 3.15x increase in speed, and 2.06x increase in performance (MAP) compared to MobileNet.
EfficientNet based Models (EfficientDet) provide the best overall performance (MAP of 51.2 for EfficientDet D6).
Read the full blog post.
Recent Breakthroughs in AI (Karpathy, Johnson et al, Feb 2021)
This post is a summary of my notes from the Feb 11, 2021 discussion on Clubhouse titled Recent Breakthroughs in AI. The talk was moderated by Russell Kaplan (Scale AI) and the panel included Richard Socher (You, Salesforce Research), Justin Johnson (University of Michigan, Facebook), Andrej Karpathy (Tesla). The discussion mostly looked at the novelty of transformer based multimodal models such as CLIP1 and DALL·E2 which have both shown interesting results.
Below are some of the ideas I found really interesting.
Data is king! Getting better data might be the single biggest bang for buck in terms of performance improvement.
Data curation toolkits, MLOps (and companies in this space) and will be increasingly important.
Transformers are unifying the deep learning problem/solution space i.e transformer-based model architectures can be effectively applied to multiple domains e.g. image, text, speech.
Models that can be parralelized and optimized for today's hardware will have more impact.
New Research Frontiers? Models that learn continuously; Models capable of logic/reasoning; New objective functions and application areas; New approaches to data labeling; Model-first approach to benchmark design; New approaches to creating massive datasets (e.g. via simulations)
Read the full blog post!