Our lab has a research seminar. Please visit us according to the schedule


Sparsification in NAS. Research seminar, December 7

Differentiable neural search is performed by learning a weighted sum of different functions within one layer among all the layers. Then one final architecture is selected with the highest weight on each layer. Speaker Egor Shvetsov explained that this issue arises due to the fact that functions within a sum operation or within the layer may co-adapt, therefore evaluation of discretized architecture is not correct. Discretization aware architecture search may improve this situation. Check out our YouTube channel to watch the seminar.
Why is attention not so interpretable? Research seminar, October 1

Attention-based methods have played an important role in model interpretations, where the calculated attention weights are expected to highlight the critical parts of inputs (e.g., keywords in sentences). However, recent research points out that attention-as-importance interpretations often do not work as well as we expect. For example, learned attention weights sometimes highlight less meaningful tokens like “[SEP]”, “,”, and “.”, and are frequently uncorrelated with other feature importance indicators like gradient-based measures. Finally, a debate on the effectiveness of attention-based interpretations has been raised. In our talk, we are going to reveal that one root cause of this phenomenon can be ascribed to the combinatorial shortcuts, which stand for that the models may not only obtain information from the highlighted parts, but also from the attention weights themselves, as a result, the attention weights are no longer pure importance indicators. The combinatorial shortcuts will be analyzed theoretically by designing one intuitive experiment to demonstrate their existence, and propose two methods to mitigate this issue.
Diffusion Probabilistic Models. Research seminar, July 16

In this seminar Alexander Kolesov looks at specifics of generative models based on the estimation of data distribution gradients. The speaker also explains Denoising Score Matching along with Annealed Langevin Dynamics.
Research Seminar Loss Surfaces of Deep Neural Networks. Part 1, July 9

In this talk, Alexey Zaytsev considers methods for examination of loss surfaces of deep neural networks. We start with basic tools and then proceed with more advanced ones. We also pay attention to possible applications of obtained results in deep learning.
Neural ODE. Seminar 4, July 8

In this talk Alexey Okunev provided an overview of recent results in Neural ODE world connected to particular Neural Network models: GANs and time series prediction.
Skoltech students defended their theses, June 24

We are happy to announce that our Bachelor and Masters students have successfully defended their theses. The defense procedures were arranged by Skoltech and the Institute for Information Transmission Problems of the Russian Academy of Sciences. Some graduates are determined to pursue their career in Sberbank, whereas others have already been selected to do PhD at Skoltech. We wish all of them the best of luck in all their future endeavors!
Neural ODE. Seminar 3, June 24

In the first part, we analyze the article “Training Generative Adversarial Networks by Solving Ordinary Differential Equations”. Its main thesis is as follows: the instability of GAN training is caused by sampling, GAN training in continuous time (solving the ODE for the dependence of the GAN parameters on time) is stable. In the second part, we focus on the time series modeling, where data flows at irregular intervals. We will discuss models such as ODE-RNN, Latent ODE, Neural CDE.
Neural ODE. Seminar 2, June 9

We start with a discussion of articles from NIPS and ICML 2020 about Neural ODE modifications. Then, we discuss Neural ODE applications: we’ll talk about continuous normalizing flows, how they generalize to manifolds, and finally we talk about, how the continuous normalizing flows are used to generate a mesh.
Neural ODE. Seminar 1, June 2nd

In the first talk, we introduce a Neural ODE and its applications, as well as a simple modification named Augmented Neural ODE. Then we anaylsis the recent articles from NIPS and ICML 2020 about how to correctly teach Neural ODE. We take a look at various regularizations, why the standard backpropagation algorithm for Neural ODE can work poorly due to the accumulation of numerical errors, and how you can deal with it.
Code Repair via Deep Learning, April 1st

Deep Learning (DL) techniques for Natural Language Processing have been evolving remarkably fast. Recently, the DL advances in language modeling, machine translation and paragraph understanding are so prominent that the potential of DL in Software Engineering cannot be overlooked, especially in the field of program learning. During this seminar we looked at the basic problems in code analysis. Speaker Alexander Lukashevich touched on the existing code analysers and pointed to some common concerns about the work with the code. He placed an emphasis on the issues that users might encounter while doing code analysis. In addition, Alexander reviewed selected articles and projects on the topic.
Transformers for Long Sequences, March 3rd

On March 3rd the laboratory hosted a seminar on transformers for long sequences. Transformer model architectures attract immense interest lately due to their effectiveness across various domains like language, computer vision, and reinforcement learning. The central idea of a transformer is a self-attention mechanism. It has quadratic complexity in the length of a sequence, so the approach scales poorly. In the first part of the talk our speakers touched upon a broad range of faster versions of Transformers that claim to make the attention mechanism work for long sequences. The focus of the second part of this seminar was on experimental comparison of the algorithms: different papers use different datasets and obtain different results in the end.
Matern Gaussian Processes on Riemannian Manifoldss, January 22nd

On January 22nd the laboratory held a seminar on the use of Gaussian processes for modeling functions on manifolds. Many machine learning problems require modeling functions defined on manifolds. For instance, in robotics the state of a joint of a robotic arm can be parameterized by a torus. The question arises about the use of Gaussian processes for modeling functions on manifolds, which is reduced to constructing covariance functions (kernels) on manifolds. The “naive” approach of replacing the Euclidean metric with a geodesic distance turns out to be untenable. However, the Matern kernels, which are most often used in practice, are associated with a certain stochastic differential equation, which allows them to be naturally generalized to the case of compact Riemannian manifolds. In the talk our experts discussed a way to represent these kernels in a tractable manner, which is based on the spectral theory of the Laplace-Beltrami operator.
Deep Learning Models with Monotonicity Constraints, October 30
In this seminar we looked at the knowledge discovery in databases as an essential step to identify valid, novel, and usage patterns for decision making. There are many real-world scenarios, such as bankruptcy prediction, option pricing or medical diagnosis, where the classification models to be learned need to fulfill restrictions of monotonicity (i.e. the target class label should not decrease when input attributes values increase). For instance, it is reasonable to assume that a higher debt ratio of a company should never result in a lower level of bankruptcy risk. Consequently, there is a growing interest in the data mining research community concerning monotonic predictive models.
Transformer Hawkes Process Models, September 17
During this session we discussed that capturing the occurrence dynamics is crucial to predicting which type of events will happen next and when. These data often exhibit complicated short-term and long-term temporal dependencies. A common method to do this is Hawkes processes. To enhance their capacity, recurrent neural networks (RNNs) have been incorporated due to RNNs’ successes in processing sequential data such as languages. Recent evidence suggests self-attention is more competent than RNNs in dealing with languages. Similarly, most of the existing recurrent neural network based point process models fail to capture such dependencies, and yield unreliable prediction performance.
Generative Prior Selection for Continual and Transfer Deep Learning, August 24
In this seminar our experts placed an emphasis on specific disadvantages inherent in deep learning methods that are affected by fact that deep neural networks have a large number of configurable parameters. They require large high-quality labeled datasets to avoid overfitting typical for small datasets. At the same time, training on incrementally arriving data leads to forgetting the previously learned distribution. Thus, deep learning models usually fail on small datasets or incrementally updating distribution over samples — so called Continual Learning framework. Bayesian approach can be used to resolve these challenges. In particular, training on small datasets can benefit from a good initialization provided by accurately chosen implicit prior over the parameters of a Bayesian neural network. At the same time, given a pre-trained prior, one can update the prior with the new data arriving in Continual Learning paradigm.

Sberbank Open Doors Day in Skoltech, 14th November

On 14th of November we conducted Sberbank Open Doors Day at Skoltech: presentation of thesis topics for master students provided by Sberbank. About 15 students who haven’t decided on their advisor yet participated in this event. From Sberbank side Risk department and Robotics lab talked about their problems. From Skoltech side Maxim Panov and Alexey Zaytsev presented their view on how Data Science in Sberbank works.
tim_7645-copy tim_7841