A beginner’s guide to Bayesian CNN

Convolutional neural networks(CNN) are the best way to deal with computer vision problems like image classification, object localization and detection, and image segmentation. The main reason behind this capability is CNN can easily deal with a set of non-linear data points. Most of the time we find these datasets small in amount for training CNN and standard CNN requires large size data to overcome the problem of overfitting. Bayesian CNN is a variant of CNN that can reduce the chances of overfitting while training on small-size data. In this article, we are going to discuss Bayesian CNN. The major points to be discussed in the article are listed below.

What are Bayesian neural networks?
Problem with CNN
What is Bayesian CNN?
The architecture of Bayesian CNN
How Does Bayesian CNN Work?
Applications of Bayesian CNN

Let’s first understand how Bayesian is used in a neural network.

What are Bayesian neural networks?

We can think of the Bayesian neural network as an extension of a standard network with posterior inference so that the network can deal with overfitting. Talking about the standard networks they are bound to perform a given task on data without having any prior knowledge about the task. To do so, the network finds optimal points estimation for the weights in every node. Applying the Bayesian approach means using statistical methods to cover a probability distribution that is attached to the network with parameters of the network such as weights and biases.

Talking about the standard networks we get different values from it for the same random variable. Applying Bayesian on the network makes the historical data represent the prior knowledge of the overall behaviour with the statistical properties of each variable that can also vary with time. We can assume that any random variable with a normal distribution and whenever the standard network works on X gives different results. The results from the network depend on the probability distribution of the X. we can get a similar result by deducing the nature and shape of the parameters of the neural network. The motive behind applying Bayesian on neural networks are as follows:

Standard networks have the problem of overfitting with small datasets.
Bayesian can be applied to any network.
Applying bayesian makes the network capable of giving better results with a huge number of tasks.
Bayesian helps in predicting or estimating uncertainty in prediction.

In the above, we have discussed a motive that is telling us that applying bayesian reduces the problem of overfitting. While talking about the convolutional neural network, they are famous because of dealing with the image data, and to make them work properly we are required to provide high dimensional data for training but in such situations where the dimensions of the data are low, we may find applying Bayesian on CNN useful. Let’s discuss the problem with CNN before moving on to Bayesian CNN.

Problem with CNN

Convolutional neural networks are one of the most important variants of Deep neural networks that are being used mainly in dealing with image data. These data types can be considered as a set of non-linear data points that require a huge amount of modelling and are available as very small in amount. Training CNN requires a huge amount of data to reduce the chances of overfitting.

So in general we can say that training CNN on small data leads to overfitting. Although these models are capable of getting trained on small training data but not capable of predicting accurately. In the above, we have discussed that we can reduce the chances of overfitting by applying Bayesian statistics to the network. Similarly bayesian can also be applied with the CNN. applying bayesian on CNN helps us in approximating the uncertainty and also regularizes the predictions.

What is Bayesian CNN?

In the above, we have seen that applying bayesian on neural networks is a method of controlling overfitting. We can also apply bayesian on CNN to reduce the overfitting and we can call CNN with applied Bayesian as a BayesianCNN. One way to do so can be by modelling the distribution over the kernel of CNN, here we are required to infer the model posterior. In a variety of cases, we can find the approximation of the model posterior with the variational inference, whereas in some cases we find the modelling of the posterior using the variational distributions like gaussian distribution. The major objective of applying Bayesian on the CNN is to fit the parameter of distribution closer to the true posterior. This objective can be fully filled by minimizing the divergence from the posterior.

We can form a Bayesian CNN by approximating the true posterior probability distribution with the variational probability distribution that can compose properties of distributions such as gaussian distribution. We call the final distribution a variational posterior probability distribution that expresses an uncertainty estimation of models parameters. Some of the studies of Bayesian CNN have shown in their results that these Bayesian CNNs are helpful in predicting more richly from cheap model averaging.

We can use the Bayesian CNN for tasks like Image Super-Resolution and Generative Adversarial Networks. In this section, we have a look at how we can model a Bayesian CNN. Let’s just look at the works where we find the implementations of Bayesian CNN.

The architecture of Bayesian CNN

This section will let us know about the basic architecture of Bayesian CNN, let’s say in the architecture we are required to add variational inference with CNN. so the architecture can have the following three main component components:

(Each filter weight has distribution in Bayesian CNN)

Layers: In the layer section of the network we can use a module wrapper and flatten the layer, linear layers, and convolutional layer.
Bayesian models: that should contain some of the standard bayesian models like BBBLeNet, BBBAlexNet, BBB3Conv3FC, etc.
Convolutional model: this section of the architecture can hold CNN like LeNet and AlexNet.

How Does Bayesian CNN Work?

Using the Bayesian CNN we mainly focus on estimating the uncertainty that can be of two types one that measures for variation of data and the second that measures for the model. So for estimating the uncertainty we can put the last layer in the above-given architecture and this layer should propose the estimator for predicting uncertainty using the below mathematical condition:

Image source

The first term in the equation is for measures for variation of data and the second measures variation of the models. The above-given condition computes the variability of the predictive probability. This condition also involves extra sampling steps for providing results while reducing the number of weights which also helps in the deduction of overfitting. After calculating these two terms we can also convert them in probability to output the results.

Applications of Bayesian CNN

Some of the notable applications of bayesian convolutional networks are as follows:

A Comprehensive Guide to Bayesian Convolutional Neural Network with Variational Inference: In this work, the researchers have presented a variant of CNN. They have called the process of learning the posterior distribution using the variational inference method Bayes by Backprop. Here we can find that we can use any variational distribution to approximate the intractable posterior of the model.
BAYESIAN CONVOLUTIONAL NEURAL NETWORKS WITH BERNOULLI APPROXIMATE VARIATIONAL INFERENCE: This work is a representation of applying probability distribution on the kernel of CNN and approximating the intractable posterior of the model using Bernoulli’s distribution.
Auto-Encoding Variational Bayes: This work is a representation of stochastic variational inference and algorithms that can scale to large datasets and can be applied to intractable problems. This work can give us an idea of how we can apply Bayesian statistics to different scenarios.
What Uncertainties Do We Need in Bayesian Deep Learning for Computer Vision?: This work tells about the types of uncertainty that can be modelled in computer vision, how we can apply Bayesian deep learning tools to the computer vision models, and the benefits of modeling uncertainty.

Final words

In this article, we have discussed the Bayesian CNN and Bayesian neural networks. With CNN we find the problem of overfitting majorly and applying Bayesian statistics to them can make them more capable and accurate while reducing the chances of overfitting. Along with this, we have seen examples of some works that can help to make our projects on Bayesian CNN.

A beginner’s guide to Bayesian CNN

Table of contents