Efficient repair and privacy in distributed storage systems

The focus of our group is on coding theory and its application in different settings.

Focus Group Coding for Communications and Data Storage (COD)

Prof. Antonia Wachter-Zeh (TUM), Alumna Rudolf Mößbauer Tenure Track Professor | Prof. Camilla Hollanti (Aalto University), Alumna Hans Fischer Fellow | Dr. Ragnar Freij-Hollanti, Dr. Nikita Polianskii, Dr. Sven Puchinger, (TUM), Postdoctoral Researchers | Haider Al Kim, Lukas Holzbaur, Hendongliang Liu, (TUM), Doctoral Candidates | Host: Coding for Communications and Data Storage, TUM

In distributed storage, algebraic codes are used to protect the system against data loss in the event of a server failure. While this significantly reduces the storage overhead compared to the trivial solution of replicating files, one downside is that the process of recovering from a server failure must include a large number of surviving servers. To remedy this problem, tremendous attention has been directed toward the study of codes with locality, which make it possible to compensate for the likely event of a single or very few server failures while only contacting a small number of surviving nodes. The second downside to employing algebraic codes is the large amount of traffic between the servers required for the recovery of a failed server. To this end, researchers have proposed solutions that reduce this traffic. In [1] we are looking to get the best of both worlds and propose a combination of the two approaches. Here, any single server failure can be recovered from a small set of surviving nodes, while only transmitting the minimal amount of data possible.

Another major concern in today's connected world is the privacy of the users. However, it is not only the data of a given user that is subject to privacy concerns. Consider a distributed storage system offering a potentially large set of users access to a number of files such as movies, stock prices, or medical data. Unless it is an open public service, any user must be identifiable by the database. Consequently, if no additional precautions are taken, the database operator has full knowledge about the files requested by any given user. The problem of protecting the identity of the files requested by a user is referred to as private information retrieval (PIR). In [2] we introduce a new scheme based on cryptographic principles with the goal of providing privacy against an attacker (curious database) with access to all servers in the database. Another model that has received considerable attention in literature is a setting where the attacker only has access to a subset of servers in the database, which is not known to the user. Under this assumption it is possible to guarantee perfect privacy, and in [3] we investigate the best achievable rate, i.e., the minimal size of the total download required to privately obtain a file of a given size. Specifically, we consider the setting where the database employs an algebraic code (for protection against data loss) and the user is able to request specific functions of the shares of the data stored on each server. We also investigate PIR robust against adversaries, where some of the servers do not correctly serve the request but actively attempt to prevent the user from obtaining the correct file. Furthermore, we consider the case of symmetric privacy, where the database is interested in allowing the user to obtain the requested file but no information about any other file. This model is of practical interest in applications where, for example, the user is required to pay for each requested file.

In a classical setting, providing a user with privacy means that all potentially sensitive data or requests need to be hidden, as it is not possible to determine whether a curious database has read the information. However, in a quantum setting the rules change. A quantum system, e.g., a quantum bit (qubit), is a superposition of states that can be measured to obtain the contained information. This measurement consumes the qubit; in other words, a qubit can be measured exactly once. In a model considering attackers that are honest (i.e., need to conform to the protocol) but curious (i.e., interested in the private data or request), this can be exploited, as it is possible to guarantee that the information in a quantum system can only be obtained by destroying it.

In [4] we consider the problem of quantum PIR, in a setting similar to that of [3], but where servers respond with quantum systems. This allows for significantly increasing the communication rate and, in some cases, it is possible to enable private retrieval of a file with no communication overhead.

Figures 1 and 2

Figure 1, Illustration of a distributed storage system with locality.

Figure 2, Illustration of PIR from multiple servers.

[1]
L. Holzbaur, S. Puchinger, E. Yaakobi and A. Wachter-Zeh, “Partial MDS codes with local regeneration”, in 2020 IEEE International Symposium on Information Theory (ISIT), 2020, pp. 628-633.

[2]
L. Holzbaur, R. Freij-Hollanti and C. Hollanti, “On the Capacity of Private Information Retrieval from Coded, Colluding, and Adversarial Servers”, 2019 IEEE Information Theory Workshop (ITW), 2019.

[3]
L. Holzbaur, C. Hollanti and A. Wachter-Zeh, “Computational Code-Based Single-Server Private Information Retrieval”, in 2020 IEEE International Symposium on Information Theory (ISIT), 2020, pp. 1065-1070.

[4]
M. Allaix, L. Holzbaur, T. Pllaha and C. Hollanti, “Quantum Private Information Retrieval from Coded and Colluding Servers”, IEEE Journal on Selected Areas in Information Theory, vol. 1, no. 2, pp. 599-610, 2020.