Recognizing numbers (without seeing what they are)
Usually, when one thinks of encryption, they think of data being locked within a safe. You can transport this safe around but can only access it if you have the key. Information is secure this way. During my second summer during my graduate degree, I was introduced to Homomorphic Encryption – A new encryption scheme that not only kept information secure but allowed mathematical operations on data encrypted using HE. Information can not only be kept secure but transformed as well. This opens so many exciting avenues for the world of secure computing. If one wants to process sensitive data, a general concept would be to create a sandboxed environment with some network security. One could even set up an entirely offline environment. Regardless, we have seen frequent data leaks and thefts.
A solution(?)
Secure computing on HE data is inherently secure as you are encrypting the data – The data remains secure because the computations are performed without decryption. Here’s brief explanation on how it works – Imagine a 2D cartesian plane with its usual basis vectors (1, 1) and a grid drawn. Shearing this plane will change its landscape, but still keep the gridlines parallel to each other because the transformation applied (shearing) is linear. While these linear transformations can change the vector basis and the plane strucutre, the distance metric between every point remains constant. Multiplying or adding to a value on the plane by whole numbers will still be a point on the respective plane – Just scaled or translated by the specified amount. By knowing the vector bases of the plane, we can get the corresponding value of a point in the space with the standard basis. Homomorphic encryption uses the (skewed) vector bases as its secret key. Without knowing the vector bases, one would have to examine a large search space to identify the correct plane. This forms the base for HE, which is a lattice-based cryptography scheme. I have simplified most of the working to explain the concept of HE. There are a lot more details that go into Homomorphic encryption (Addition of noise, multiplicative depth). You could read Craig Gentry’s dissertation (Chapter 6) for more information on how lattice-based cryptography works.
There’s no such thing as free lunch
There are, however, a few restrictions to look out for. Initial HE schemes supported only additions and multiplications on integers. Only recently has new research supported processing floating-point numbers. This also allows performing approximated division, by multiplication.
I’ve been working with convolutional Neural networks (CNNs) for a while now. Most operations in typical CNNs are some combination of addition and multiplication, emphasis on most. Could it be possible to process encrypted images through a neural network? Let’s break down the common operations in a CNN to find out. For the sake of simplicity, let’s work only on inferring encrypted data i.e. we train on the plaintext. That way we do not have to worry about the backprop training math. • Convolution – Only additions and multiplications. Element-wise product of the filter and tensor and then adding them. • Pooling layers – Comparison, addition, and division. It depends if we use average or max-pooling. It would be easier to implement average pooling, given that we can perform approximated division. A recent approach encodes the denominator (the window size of the pooling operation) within the previous layer. • Fully connected layer – Just as convolution, contains only additions and multiplications. • Activation layer – Includes exponentiations, multiplications, divisions, and additions.
We can see that most of the layers can be computed on HE data and some have some HE-friendly alternative – except for the activation layer. Activation layers like ReLU perform a comparison, which is not possible to compute on encrypted values. Sigmoid, tanh, and Swish make use of large exponents which will cross the multiplicative limits imposed by the scheme. This layer plays an important role as it enables the model to learn complex non-linear patterns in the data and is extremely crucial!
One step closer to reality
There is a solution, however. As a suitable replacement, we can use a non-linearity that is HE-friendly i.e. composed only of additions and multiplications. Polynomials are such non-linearities. They have fixed exponentiation which is repeated multiplication and additions. Researchers from Microsoft used this idea and were able to successfully train a convolutional neural network using x^2 as their activation function. You can read more about their research here.
Edit 08/2020: My master’s thesis focused on searching for activation functions suitable for CNNS. You can read more about it here