A hands-on guide to Neural Neighbor Style Transfer

Neural Neighbor Style Transfer (NNST) is the upgraded version of the original Neural Style Transfer (NST). Neural Style Transfer is a type of algorithm that stylizes the digital image or video by adopting the visual style from another image. It is a deep learning-based algorithm that creates digital artwork from photographs, for example stylizing the user given images using famous paintings. Neural Neighbor Style Transfer is the latest model that is used for creating styled images, it can be implemented by the PyTorch deep learning framework. In this article, we are going to understand the Neural Neighbor Style Transfer, from its introduction to its working. Below are the major points that we are going to discuss in this post.

Introduction to Neural Neighbor Style Transfer
Variants of Neural Neighbor Style Transfer
Overview of NNST working
Evaluation of the model
Implementing of NNST

Let’s first understand the Introduction to Neural Neighbor Style Transfer.

Introduction to Neural Neighbor Style Transfer (NNST)

In 2015 Leon A. Gatys proposed the original Neural Style Transfer, using input photos, we can create artworks with similar content and distinct visual characteristics, of course, it uses the neural networks for creating artwork because neural network methods have the advantage of not requiring any additional information to perform stylization, users only have to specify the input image or content image and the style image (any artistic painting) in order to make a stylized image.

This model works on two main ideas. The first one demonstrates how to make use of the features extracted by convolutional neural networks which are pre trained for image classification tasks in order to measure the high-level compatibility between two images. The second idea is using gradient descent to optimize the output pixels directly to minimize both “style loss” (Statistical match based on VGG artwork features) and “content loss” (Reducing deviations from the features of the content image) simultaneously.

Generally, it is impossible to match the statistics of the style features and keep the content features as it is. To tackle this drawback, Neural Neighbor Style Transfer takes the idea of guided patch-based synthesis (it is an algorithm used for finding similarities between small patches of images) to explicitly match content and style features. But the model does not work directly on the image patches; instead, it uses features that are extracted from the pre-trained VGG network.

Neural Neighbor Style Transfer is more generalized and efficient than the Neural style transfer.

The NNST approach is to replace the neural features extracted from the content image with the style image and in the end, synthesize the final output.

Variants of Neural Neighbor Style Transfer

There are two variants of the NNST model, the first one is NNST-D and the second one is NNST-Opt. The NNST-D decodes the stylized output directly from the rearranged style features using a Convolutional Neural Network; it can perform better and faster than the previous methods, this model takes only a few seconds to stylize a 512 X 512-pixel output. NNST-Opt is the optimization-based type that offers higher quality but at a slower speed, it takes 30 seconds for 512 X 512.

Overview of NNST working

The major steps are given in the figure below.

Image source

This is a simplified version of the original NNST; several details are removed for better understanding.

In the first step, we are extracting the features of the content image and style image. Then zero-center the content image feature and style image feature. Zero-center means data should be processed in such a way that the mean (average) is equal to zero, it is a common image preprocessing that helps the neural network to perform better. After this replace the content image features with the nearest style feature. In the fourth block both NNST-D and NNST-Opt are implemented, we will discuss both one by one.

Neural Neighbor Style Transfer Decoder (NNST-D): the decoder ‘G’ takes the input, and then this input image is inserted into 4 independent branches, Each branch produces one level of a 4-level laplacian pyramid parameterizing the output image. (A Laplacian pyramid is a linear invertible representation of an image, by breaking the image into different isotropic spatial frequency bands, the Laplacian pyramid provides another level of analysis).

Each branch have the same architecture but with different parameters, it consists of five 3*3 Conv layers with leaky ReLU activation function, but the last layer has a linear activation function. All hidden states have 256 channels. In order to trade off-channel depth for resolution, transposed convolutions are applied to the first two branches.

No changes are made to the third branch. In the fourth branch, the output is bilinearly downsampled by a factor of two. The final output image is generated by combining the four branch outputs. Each branch output is treated as a level in a Laplacian pyramid. In training, the model uses MS-COCO data for content images and Wikiart used for style images.

Neural Neighbor Style Transfer Optimization (NNST-Opt): NNST-Opt is the same as NNST-D but an optimized version. When performing style transfers using G is fast, there are many times where optimizing the output image directly leads to sharper and more artifact-free results. So it minimizes the equation, Using Adam, 200 updates of x (output image) were minimized. x (output image) parametrize as an 8-level laplacian pyramid to allow large regions to change color quickly within 200 updates.

The last part is optional which stylizes content color using moment matching.

Evaluation of the model

Using a dataset of 30 high-resolution photographs from Flickr, we benchmark the performance of our method and previous work. A collection of ten ink drawings and ten watercolor paintings are collected and ten impressionist works of art created between 1800-1900 come from the open-source collection of the Rijksmuseum. These images were added with the ten pencil drawings taken from the dataset used in Im2Pencil. Finally, it becomes the dataset of a total of 40 hi-res style images.

After performing both variants on the same content/style 1200 possible combinations. We found that despite the good results of the encoder variant they are not as good as the optimization-based model. Hence shortening that gap will be crucial for developing a high-quality, practical stylization algorithm.

Implementing of NNST

We can implement the NNST with the help of the PyTorch deep learning framework. First clone the git repository, you will get all the folders and the main styleTransfer.py python file.

!git clone https://github.com/nkolkin13/NeuralNeighborStyleTransfer.git

Then run this command for style transfer.

python styleTransfer.py –content_path PATH_TO_CONTENT_IMAGE –style_path PATH_TO_STYLE_IMAGE –output_path PATH_TO_OUTPUT

PATH_TO_CONTENT_IMAGE: path to content image

PATH_TO_STYLE_IMAGE: path to style image

PATH_TO_OUTPUT: path where you want to save the output styled image.

Also, this instruction will download a VGG pre-trained model of almost 500MB. The model took 190 sec to complete the job in the Colab. After this, we can get the output image on a given path. On the left is the content image and in the middle is the style image and the right-side image is the output of the neural style transform. The model did a pretty good job.

Choosing another style image for neural style transfer.

We can observe that the model is not preserving the content image details. So for controlling that there is an alpha value by which you can preserve the content detail. Alpha value ranges between 0 to 1, for maximum content preservation use alpha == 0. The default alpha value is 0.75.

So I changed the alpha value to 0.5 and ran the model again.

python styleTransfer.py –content_path PATH_TO_CONTENT_IMAGE –style_path PATH_TO_STYLE_IMAGE –output_path PATH_TO_OUTPUT —alpha 0.5

See the results of how it manages to preserve the content details this time.

With an alpha value of 0.8, also did a pretty good job.

You can play with the Alpha value and choose different style images for different results.

Final words

In this article, with the help of the original Neural Style Transfer, we understand what Neural Neighbor Style Transfer is, we also get an overview of its working and explain its variants. In the end, we implemented the NNST-Opt to create style images, we tried different style images and alpha values for the style image.

A hands-on guide to Neural Neighbor Style Transfer

Table of contents

Introduction to Neural Neighbor Style Transfer (NNST)

Final words

Reference

A hands-on guide to Neural Neighbor Style Transfer

Table of contents

Introduction to Neural Neighbor Style Transfer (NNST)

Final words

Reference

Related Posts

Is It Time for the Vibe Researcher to Rise and Shine?

Is It Time for the Vibe Researcher to Rise and Shine?

What Enterprises Get Wrong About Data Complexity and How to Fix It