In this paper, the authors propose a two stage deep neural network model for real time segmentation of subjects from background. Their approach achieves 60FPS on HD images (1920*1080) and 30FPS on 4K images (3840×2160) .. measured on a GPU.
- Proposed model handles hair and subject boundary details much better than current approaches (think how Zoom might crop out hair portions or fail when your hand is close to your face or some other occlusion etc)
- They improve speed/latency state of the art for processing large images. Previous approaches that attempt fine grained segmentation achieve 8 fps on 512*512 images (pretty much unusable). Their approach achieves 60FPS on HD images (1920*1080) and 30FPS on 4K images (3840×2160)
- They achieve these speed gains by using a two stage network. First network downsamples the image and outputs matte predictions + error prediction map at a low resolution. The second network (a refinement network) uses the low resolution result and original image to generate high-resolution output (fine grained detail) for only select regions of the image.
- They compare their approach with several existing approaches and create a zoom plugin that pipes model output to zoom.
- They provide sample code to reproduce their results and experiments via notebooks.
- System requires specifying background image to work well. This is not a huge issue but introduces an additional step (selecting background image) that might interfere with usability.
- The results (30FPS on HD images and 60FPS on HD images) are run on a GPU - Nvidia RTX 2080 TI GPU. This suggests it might still be unusable on CPUs (the majority of user environments)
[1 ] Lin, S., Ryabtsev, A., Sengupta, S., Curless, B., Seitz, S., & Kemelmacher-Shlizerman, I. (2020). Real-Time High-Resolution Background Matting. arXiv preprint arXiv:2012.07810. CVPR 2021.