Christmas Colloquium on Computer Vision 2015

Skoltech, Monday 28 December 2015, 14:00 — 18:00

Six researchers will present their recent works from ECCV’14, CVPR’15, ICCV’15, ICML’15, ICLR’16(submitted) of this year. The language of the talk will be chosen by the speakers.

Center for Data-Intensive Science and Engineering (CDISE) of Skoltech invites you to participate (pre-registration needed)!
For the pre-registration, please send email with the subject ‘CCCV registration’ and your name to  (CC )

How to get to Skolkovo
Once arriving at the Skolkovo Parking, you need to take the internal shuttle to Technopark. Technopark is a complex of four buildings, Skoltech is in the dark blue one.
“Emergency” phone: +7 916 391 47 73 (Victor Lempitsky)The colloquium will be in room 407 (4th floor).


14:00 — 14:30
Asya Pentina, IST Austria, Vienna
Title: Curriculum learning of multiple tasks
Abstract: Sharing information between multiple tasks enables algorithms to achieve good generalization performance even from small amounts of training data. However, in a realistic scenario of multi-task learning not all tasks are equally related to each other, hence it could be advantageous to transfer information only between the most related tasks.
In this work we propose an approach that processes multiple tasks in a sequence with sharing between subsequent tasks instead of solving all tasks jointly. Subsequently, we address the question of curriculum learning of tasks, i.e. finding the best order of tasks to be learned. Our approach is based on a generalisation bound criterion for choosing the task order that optimises the average expected classification performance over all tasks.Our experimental results show that learning multiple related tasks sequentially can be more effective than learning them jointly, the order in which tasks are being solved affects the overall performance, and that our model is able to automatically discover a favourable order of tasks.

14:30 — 15:00
Sergey Zagoruyko, École de Ponts ParisTech (talk based on the work at Facebook AI Research)
Title: FAIRCNN MSCOCO object detection/segmentation challenge submission.
Abstract: Our approach is built on DeepMask proposals fed into the Fast R-CNN pipeline. The DeepMask proposals have been substantially improved to encourage proposal diversity and mask quality. We also augmented our CNN classifier with a novel foveal structure, skip-connections, an improved cost function that encourages better localization, and a few additional modifications. Finally we utilize ensembling (model and inference) to further improve performance.

15:00 — 15:15

15:15 — 15:45
Anton Osokin, INRIA/École Normale Supérieure, Paris
Title: Context-aware CNNs for person head detection
Abstract: Person detection is a key problem for many computer vision tasks. While face detection has reached maturity, detecting people under a full variation of camera view-points, human poses, lighting conditions and occlusions is still a difficult challenge. In this work we focus on detecting human heads in natural scenes. Starting from the recent local R-CNN object detector, we extend it with two types of contextual cues. First, we leverage person-scene relations and propose a Global CNN model trained to predict positions and scales of heads directly from the full image. Second, we explicitly model pairwise relations among objects and train a Pairwise CNN model using a structured-output surrogate loss. The Local, Global and Pairwise models are combined into a joint CNN framework. To train and test our full model, we introduce a large dataset composed of 369,846 human heads annotated in 224,740 movie frames. We evaluate our method and demonstrate improvements of person head detection against several recent baselines in three datasets. We also show improvements of the detection speed provided by our model.

15:45 — 16:15
Danila Potapov, INRIA-LEAR, Grenoble
Title: Category-specific video summarization
Abstract: In large video collections with clusters of typical categories, such as “birthday party” or “flash-mob”, category-specific video summarization can produce higher quality video summaries than unsupervised approaches that are blind to the video category.
Given a video from a known category, our approach first efficiently performs a temporal segmentation into semantically-consistent segments, delimited not only by shot boundaries but also general change points. Then, equipped with an SVM classifier, our approach assigns importance scores to each segment. The resulting video assembles the sequence of segments with the highest scores. The obtained video summary is therefore both short and highly informative. Experimental results on videos from the multimedia event detection (MED) dataset of TRECVID’11 show that our approach produces video summaries with higher relevance than the state of the art.

16:15 — 16:30

16:30 — 17:00
Michael Figurnov, Skoltech, Moscow
Title: PerforatedCNNs: Acceleration through Elimination of Redundant Convolutions
Abstract: We propose a novel approach to reduce the computational cost of evaluation of convolutional neural networks, a factor that has hindered their deployment in low-power devices such as mobile phones. Inspired by the loop perforation technique from source code optimization, we speed up the bottleneck convolutional layers by skipping their evaluation in some of the spatial positions. We propose and analyze several strategies of choosing these positions. Our method allows to reduce the evaluation time of modern convolutional neural networks by 50% with a small decrease in accuracy. More details can be found in our ICLR 2016 submission:

17:00 — 17:30
Yaroslav Ganin, Skoltech, Moscow
Title: Unsupervised Domain Adaptation by Backpropagation
Abstract: Top-performing deep architectures are trained on massive amounts of labeled data. In the absence of labeled data for a certain task, domain adaptation often provides an attractive option given that labeled data of similar nature but from a different domain (e.g. synthetic images) are available. Here, we propose a new approach to domain adaptation in deep architectures that can be trained on large amount of labeled data from the source domain and large amount of unlabeled data from the target domain (no labeled target-domain data is necessary). As the training progresses, the approach promotes the emergence of “deep” features that are (i) discriminative for the main learning task on the source domain and (ii) invariant with respect to the shift between the domains. We show that this adaptation behaviour can be achieved in almost any feed-forward model by augmenting it with few standard layers and a simple new gradient reversal layer. The resulting augmented architecture can be trained using standard backpropagation. Overall, the approach can be implemented with little effort using any of the deep-learning packages. The method performs very well in a series of image classification experiments, achieving adaptation effect in the presence of big domain shifts and outperforming previous state-of-the-art on Office datasets. We also validate the approach for descriptor learning task in the context of person re-identification application.