About me

My name is Ivan Bogun. I am 4nd year PhD student working towards my degree in Computer Science department at FIT under supervision of Dr. Eraldo Ribeiro. My research interests include application of sparsity seeking convex optimization problems to computer vision and machine learning. Recently, I've been interested in model-free tracking. My free time I like to spend practicing guitar, playing video games or taking courses on Coursera.


  • 09/07/2015 The tracker I submitted for OpenCV Challenge 2015 took second place ( Results ). Trackers I submitted are called experimental_0 and experimental_1.
  • 08/28/2015 Spent summer at Google Brain in Mountain View working with Anelia Angelova on the problem of object recognition from small videos using motion ( Project webpage)
  • 03/04/2015 Presented a talk titled "From attention to object proposals" as part of depth exam ( survey paper; presentation)
  • 01/28/2015 Unfortunately my CVPR paper submission was rejected due to insignificance of results. Have to do a better job next time!
  • 11/15/2014 Presented some of my research on visual tracking during FIT CS seminar ( presentation)


Object-aware tracking

Building discriminative appearance model of the object in model-free tracking is a hard problem due to limited information about the object, abrupt motion and deformation. Here, we propose to use objectness prior to help the tracker choose bounding box which is more likely to contain an object. The objectness prior is based on straddling measure defined on the superpixels and the edge density. Based on the Structured tracker with the Robust Kalman filter we show that the prior substantially improves success metric (overlap over union), compared to the tracker without it, on the Wu 2013 dataset, and that the resulting tracker performs comparably to the state-of-the-art.

Improved Structured Tracking-by-Detection Using Robust Kalman Filter

In this paper we extend a tracker, based on Structured SVM, known as Struck to output bounding boxes on multiple scales. Furthermore, to decrease the possibility of false negative detection and in order to make the tracker resilient towards short time occlusions we spatially smooth results of the tracking with Robust Kalman filter. A special strategy is developed for the tracker update designed to decrease overfitting and to allow for the tracker to reacquire lost track. We thoroughly evaluate the method and perform sensitivity analysis on different benchmarks with different evaluation protocols. Our results show that our method establishes new state-of-the-art on both datasets

Object Recognition from Short Videos for Robotic Perception

Deep neural networks have become the primary learning technique for object recognition. Videos, unlike still images, are temporally coherent which makes the application of deep networks non-trivial. Here, we investigate how motion can aid object recognition in short videos. Our approach is based on Long Short-Term Memory (LSTM) deep networks. Unlike previous applications of LSTMs, we implement each gate as a convolution. We show that convolutional-based LSTM models are capable of learning motion dependencies and are able to improve the recognition accuracy when more frames in a sequence are available. We evaluate our approach on the Washington RGBD Object dataset and on the Washington RGBD Scenes dataset. Our approach outperforms deep nets applied to still images and sets a new state-of-the-art in this domain.

Movie Twitter Network

We build a network of words based on the “Captain Phillips” movie tweets. A separate network was built for each week tweets were downloaded ( in total 4 weeks). We show that all resulting networks are scale-free. We apply sentiment analysis to determine movie ratings. Sentiment analyses is extended to network communities. Finally, we show how community sentiments can be used to track their “positiveness”.

Interaction recognition using sparse portraits

We study the problem of human-object interaction recognition. Using trajectories as the main source of information we propose new methods for extracting and processing them. Via Robust Principal Component Analysis we are able to extract moving/non-moving parts of the video and use it for trajectory extraction. The novelty is the extraction of trajectories without using detectors or trackers. We show how the object related to the interaction can be localized and learn it’s appearance probabilistic model. Finally, classification is performed using one- vs-all SVM. We conclude the paper with experimental results on the Gupta data set.

Recognizing Human-Object Interactions Using Sparse Subspace Clustering

In this paper we investigate the problem of human-object interaction recognition. Using trajectories information alone we propose an unsupervised framework for interaction clustering. Our method is based on the algorithm [1], first developed for motion segmentation. We show that each interaction can be seen as a trajectory laying on low- dimensional space and that subspace clustering is able to recover them. Experimental results, performed on the Gupta dataset [2], show that our approach is comparable to the state of the art [3].