Novel Optimization Methods for Computer Vision and Shape Analysis

Within the Carl von Linde Senior Fellowship, we were fortunate to attract two outstanding reaserchers as temporary professors to TU Munich. These were Prof. Jörg Stückler (now Max Planck Institute for Intelligent Systems, Tübingen) and Prof. Florian Bernard (now University of Bonn). Rather than giving a complete account of respective research activities, in the following we will highlight two publications in the area of computer vision and shape analysis performed in collaboration with Florian Bernard.

Focus Group Computer Vision & Machine Learning

Prof. Daniel Cremers (TUM), Alumnus Carl von Linde Senior Fellow

In [1], we developed a method for joint deep multi-graph matching and 3D geometry learning: Graph matching aims to establish correspondences between vertices of graphs so that both the node and edge attributes agree. Various learning-based methods were recently proposed for finding correspondences between image key points based on deep graph matching formulations. While these approaches mainly focus on learning node and edge attributes, they completely ignore the 3D geometry of the underlying 3D objects depicted in the 2D images. We fill this gap by proposing a trainable framework that takes advantage of graph neural networks for learning a deformable 3D geometry model from inhomogeneous image collections, i.e., a set of images that depict different instances of objects from the same category. Experimentally, we demonstrate that our method outperforms recent learning-based approaches for graph matching considering both accuracy and cycle-consistency error, while we also obtain the underlying 3D geometry of the objects depicted in the 2D images.

Figure 1

In [2], we developed a unified framework for implicit Sinkhorn differentiation: The Sinkhorn algorithm is a classical iterative algorithm for minimizing regularized optimal transport problems. And optimal transport (also known as earth-movers distance) is a formalism for computing correspondence, a central component in many computer vision and shape analysis works. As a result, the Sinkhorn operator has recently experienced a surge of popularity in computer vision and related fields. One major reason is its ease of integration into deep learning frameworks. To allow for an efficient training of respective neural networks, we propose an algorithm that obtains analytical gradients of a Sinkhorn layer via implicit differentiation. In comparison to prior work, our framework is based on the most general formulation of the Sinkhorn operator. It allows for any type of loss function, while both the target capacities and cost matrices are differentiated jointly.

Figure 2

We further construct error bounds of the resulting algorithm for approximate inputs. Finally, we demonstrate that for a number of applications, simply replacing automatic differentiation with our algorithm directly improves the stability and accuracy of the obtained gradients. Moreover, we show that it is computationally more efficient, particularly when resources like GPU memory are scarce.

[1]
Ye, Z., Yenamandra, T., Bernard, F. & Cremers. D. Joint Deep Multi-Graph Matching and 3D Geometry Learning from Inhomogeneous 2D Image Collections. AAAI (2022).

[2]
Eisenberger, M., Toker, A., Leal-Taixé, L., Bernard, F. & Cremers, D. A Unified Framework for Implicit Sinkhorn Differentiation. IEEE International Conference on Computer Vision and Pattern Recognition (CVPR) (2022).

Novel Optimization Methods for Computer­ Vision and Shape ­Analysis

Focus Group Computer Vision & Machine Learning

Figure 1

Figure 2

Novel Optimization Methods for Computer Vision and Shape Analysis