Hands-On Guide to Torch-Points3D: A Modular Deep Learning Framework for 3D Data

There has been a surge of advancements in automated analysis of 3D data caused by affordable LiDAR sensors, more efficient photogrammetry algorithms, and new neural network architectures. So much that the number of papers related to 3D data being presented at vision conferences is now on par with images, although this rapid methodological development is beneficial to the young field of deep learning for 3D, with its fast pace come several shortcomings:

Adding new datasets, tasks, or neural architectures to existing approaches is a complicated endeavour, sometimes equivalent to reimplementing from scratch.
Handling large 3D datasets requires a significant time investment and is prone to many implementation pitfalls.
There is no standard approach for inference schemes and performance metrics, which makes assessing and reproducing new algorithms’ intrinsic performance difficult.

Torch-Points3D aims to solve these issues. It is an open-source framework designed to facilitate deep neural networks on point cloud-based computer vision. It provides an intuitive interface with most open-access 3D datasets, implementations of many state-of-the-art networks, data augmentation schemes, and validated performance metrics.

Torch-Points3D has a modular design and its components are highly customizable, they can be plugged into one another using a unified system of configuration files. It aims to make it easy to standardize experiments to ensure reproducibility and to help evaluate the performances of different approaches fairly. As the developers put it, “the purpose of our framework is to become for 3D point clouds what torchvision or PyTorch-geometric have become for images and graphs respectively“. The framework is built upon Pytorch Geometric and Facebook Hydra. Like PyTorch, Torch-Points3D uses the background processes to help increase the data processing speed. It off-loads the radius search and subsampling operations to background processes operating on CPUs.

Training speed of the KPconv model, in thousands
of points processed per second (kpts/s)

Functionalities/operations supported by Torch-Points3D

You can check out all supported tasks and algorithms here.

Supported datasets

Torch-Points3D supports multiple 3D datasets with the data download, pre-processing, as well as automatic result submission.

You can find a comprehensive list of all supported datasets here.

Installation and Requirements

Requirements

CUDA 10 or higher (if you want GPU version)
Python 3.7 or higher + headers (python-dev)
PyTorch 1.7 or higher
A Sparse convolution backend (optional) like torchsparse

Run the following code before installing Torch-Points3D to ensure that you don’t run into a CUDA version mismatch error.

 import torch
 def format_pytorch_version(version):
   return version.split('+')[0]
 TORCH_version = torch.__version__
 TORCH = format_pytorch_version(TORCH_version)
 def format_cuda_version(version):
   return 'cu' + version.replace('.', '')
 CUDA_version = torch.version.cuda
 CUDA = format_cuda_version(CUDA_version)
 !pip install torch-scatter    -f https://pytorch-geometric.com/whl/torch-${TORCH}+${CUDA}.html
 !pip install torch-sparse      -f https://pytorch-geometric.com/whl/torch-${TORCH}+${CUDA}.html
 !pip install torch-cluster     -f https://pytorch-geometric.com/whl/torch-${TORCH}+${CUDA}.html
 !pip install torch-spline-conv -f https://pytorch-geometric.com/whl/torch-${TORCH}+${CUDA}.html
 !pip install torch-geometric

Install Torch-Points3D from PyPI

!pip install torch-points3d

For instructions on how to install using other methods see this.

Install PyVista for visualizing point clouds

!pip install pyvista

Creating a KP Conv Segmentation model with Torch-Points3D

Import necessary libraries

 import os
 #omegaconf config is used for dealing with config files.
 from omegaconf import OmegaConf
 import pyvista as pv
 import torch
 import numpy as np

We are going to use the Torch-Points3D version of ShapeNet. Create the config file for the dataset and download it using the torch_points3d.datasets.segmentation.ShapeNet class.

 CATEGORY = "All" 
 USE_NORMALS = True


 shapenet_yaml = """
 class: shapenet.ShapeNetDataset
 task: segmentation
 dataroot: %s
 normal: %r                                    # Use normal vectors as features
 first_subsampling: 0.02                       # Grid size of the input data
 pre_transforms:                               # Offline transforms, done only once
     - transform: NormalizeScale           
     - transform: GridSampling3D
       params:
         size: ${first_subsampling}
 train_transforms:                             # Data augmentation pipeline
     - transform: RandomNoise
       params:
         sigma: 0.01
         clip: 0.05
     - transform: RandomScaleAnisotropic
       params:
         scales: [0.9,1.1]
     - transform: AddOnes
     - transform: AddFeatsByKeys
       params:
         list_add_to_x: [True]
         feat_names: ["ones"]
         delete_feats: [True]
 test_transforms:
     - transform: AddOnes
     - transform: AddFeatsByKeys
       params:
         list_add_to_x: [True]
         feat_names: ["ones"]
         delete_feats: [True]
 """ % (os.path.join(DIR,"data"), USE_NORMALS) 
 params = OmegaConf.create(shapenet_yaml)
 if CATEGORY != "All":
     params.category = CATEGORY


 from torch_points3d.datasets.segmentation import ShapeNetDataset
 dataset = ShapeNetDataset(params)

Torch-Points3D version of Shapenet dataset

Visualize some random point clouds from the dataset using pyvista.

 objectid_1 = 9 
 objectid_2 = 82 
 objectid_3 = 95 
 samples = [objectid_1,objectid_2,objectid_3]
 p = pv.Plotter(notebook=True,shape=(1, len(samples)),window_size=[1024,412])
 for i in range(len(samples)):
     p.subplot(0, i)
     sample = dataset.train_dataset[samples[i]]
     point_cloud = pv.PolyData(sample.pos.numpy())
     point_cloud['y'] = sample.y.numpy()
     p.add_points(point_cloud,  show_scalar_bar=False, point_size=3)
     p.camera_position = [-1,5, -10]
 p.show()

Create a multi-headed segmentation module to use with the KP Convolution network.

 from torch_points3d.core.common_modules import MLP, UnaryConv
 class MultiHeadClassifier(torch.nn.Module):
     """ Allows segregated segmentation in case the category of an object is known. 
     This is the case in ShapeNet for example.
     Parameters
     ----------
     in_features -
         size of the input channel
     cat_to_seg
         category to segment maps for example:
         {
             'Airplane': [0,1,2],
             'Table': [3,4]
         }
     """
     def __init__(self, in_features, cat_to_seg, dropout_proba=0.5, bn_momentum=0.1):
         super().__init__()
         self._cat_to_seg = {}
         self._num_categories = len(cat_to_seg)
         self._max_seg_count = 0
         self._max_seg = 0
         self._shifts = torch.zeros((self._num_categories,), dtype=torch.long)
         for i, seg in enumerate(cat_to_seg.values()):
             self._max_seg_count = max(self._max_seg_count, len(seg))
             self._max_seg = max(self._max_seg, max(seg))
             self._shifts[i] = min(seg)
             self._cat_to_seg[i] = seg
         self.channel_rasing = MLP(
             [in_features, self._num_categories * in_features], bn_momentum=bn_momentum, bias=False
         )
         if dropout_proba:
             self.channel_rasing.add_module("Dropout", torch.nn.Dropout(p=dropout_proba))
         self.classifier = UnaryConv((self._num_categories, in_features, self._max_seg_count))
         self._bias = torch.nn.Parameter(torch.zeros(self._max_seg_count,))
     def forward(self, features, category_labels, **kwargs):
         assert features.dim() == 2
         self._shifts = self._shifts.to(features.device)
         in_dim = features.shape[-1]
         features = self.channel_rasing(features)
         features = features.reshape((-1, self._num_categories, in_dim))
         features = features.transpose(0, 1)  # [num_categories, num_points, in_dim]
         features = self.classifier(features) + self._bias  # [num_categories, num_points, max_seg]
         ind = category_labels.unsqueeze(-1).repeat(1, 1, features.shape[-1]).long()
         logits = features.gather(0, ind).squeeze(0)
         softmax = torch.nn.functional.log_softmax(logits, dim=-1)
         output = torch.zeros(logits.shape[0], self._max_seg + 1).to(features.device)
         cats_in_batch = torch.unique(category_labels)
         for cat in cats_in_batch:
             cat_mask = category_labels == cat
             seg_indices = self._cat_to_seg[cat.item()]
             probs = softmax[cat_mask, : len(seg_indices)]
             output[cat_mask, seg_indices[0] : seg_indices[-1] + 1] = probs
         return output

Create a KPConv backbone model using the KPCONV method, you learn more about available models here.

 from torch_points3d.applications.kpconv import KPConv
 class PartSegKPConv(torch.nn.Module):
     def __init__(self, cat_to_seg):
         super().__init__()
         self.unet = KPConv(
             architecture="unet", 
             input_nc=USE_NORMALS * 3, 
             num_layers=4, 
             in_grid_size=0.02
             )
         self.classifier = MultiHeadClassifier(self.unet.output_nc, cat_to_seg)
     @property
     def conv_type(self):
         """ This is needed by the dataset to infer which batch collate should be used"""
         return self.unet.conv_type
     def get_batch(self):
         return self.batch
     def get_output(self):
         """ This is needed by the tracker to get access to the ouputs of the network"""
         return self.output
     def get_labels(self):
         """ Needed by the tracker in order to access ground truth labels"""
         return self.labels
     def get_current_losses(self):
         """ Entry point for the tracker to grab the loss """
         return {"loss_seg": float(self.loss_seg)}
     def forward(self, data):
         self.labels = data.y
         self.batch = data.batch
         # Forward through unet and classifier
         data_features = self.unet(data)
         self.output = self.classifier(data_features.x, data.category)
          # Set loss for the backward pass
         self.loss_seg = torch.nn.functional.nll_loss(self.output, self.labels)
         return self.output
     def get_spatial_ops(self):
         return self.unet.get_spatial_ops()
     def backward(self):
          self.loss_seg.backward() 


 model = PartSegKPConv(dataset.class_to_segments)

Create the data loaders and toggle the CPU operation precompute by setting the precompute_multi_scale parameter to True

 NUM_WORKERS = 4
 BATCH_SIZE = 16

 dataset.create_dataloaders(
     model,
     batch_size=BATCH_SIZE, 
     num_workers=NUM_WORKERS, 
     shuffle=True, 
     precompute_multi_scale=True 
     )

 sample = next(iter(dataset.train_dataloader))
 sample.keys

The sample contains the pre-computed spatial information in the multiscale (encoder side) and upsample (decoder) attributes.

sample.multiscale contains 10 different versions of the input batch, each one of these versions contains the location of the points in pos as well as the neighbors of these points in the previous point cloud.

Let’s take a look at the points coming out of each downsampling layer.

 
sample_in_batch = 0 
 ms_data = sample.multiscale 
 num_downsize = int(len(ms_data) / 2)
 p = pv.Plotter(notebook=True,shape=(1, num_downsize),window_size=[1024,256])

 for i in range(0,num_downsize):
     p.subplot(0, i)
     pos = ms_data[2*i].pos[ms_data[2*i].batch == sample_in_batch].numpy()
     point_cloud = pv.PolyData(pos)
     point_cloud['y'] = pos[:,1]
     p.add_points(point_cloud,  show_scalar_bar=False, point_size=3)
     p.add_text("Layer {}".format(i+1),font_size=10)
     p.camera_position = [-1,5, -10]
 p.show()

Train the model

 from tqdm.auto import tqdm
 import time
 class Trainer:
     def __init__(self,model, dataset, num_epoch = 50, device=torch.device('cuda')):
         self.num_epoch = num_epoch
         self._model = model
         self._dataset=dataset
         self.device = device
     def fit(self):
         self.optimizer = torch.optim.Adam(self._model.parameters(), lr=0.001)
         self.tracker = self._dataset.get_tracker(False, True)
         for i in range(self.num_epoch):
             print("=========== EPOCH %i ===========" % i)
             time.sleep(0.5)
             self.train_epoch()
             self.tracker.publish(i)
             self.test_epoch()
             self.tracker.publish(i)

     def train_epoch(self):
         self._model.to(self.device)
         self._model.train()
         self.tracker.reset("train")
         train_loader = self._dataset.train_dataloader
         iter_data_time = time.time()
         with tqdm(train_loader) as tq_train_loader:
             for i, data in enumerate(tq_train_loader):
                 t_data = time.time() - iter_data_time
                 iter_start_time = time.time()
                 self.optimizer.zero_grad()
                 data.to(self.device)
                 self._model.forward(data)
                 self._model.backward()
                 self.optimizer.step()
                 if i % 10 == 0:
                     self.tracker.track(self._model)
                 tq_train_loader.set_postfix(
                     **self.tracker.get_metrics(),
                     data_loading=float(t_data),
                     iteration=float(time.time() - iter_start_time),
                 )
                 iter_data_time = time.time()

     def test_epoch(self):
         self._model.to(self.device)
         self._model.eval()
         self.tracker.reset("test")
         test_loader = self._dataset.test_dataloaders[0]
         iter_data_time = time.time()
         with tqdm(test_loader) as tq_test_loader:
             for i, data in enumerate(tq_test_loader):
                 t_data = time.time() - iter_data_time
                 iter_start_time = time.time()
                 data.to(self.device)
                 self._model.forward(data)           
                 self.tracker.track(self._model)
                 tq_test_loader.set_postfix(
                     **self.tracker.get_metrics(),
                     data_loading=float(t_data),
                     iteration=float(time.time() - iter_start_time),
                 )
                 iter_data_time = time.time()

 trainer = Trainer(model, dataset)
 trainer.fit()

Last Epoch (Endnote)

In this article, we discussed Torch-Points3D, a flexible and powerful framework that aims to make deep learning on 3D data both more accessible and reproducible. It’s built on Pytorch Geometric and Facebook Hydra. It has a modular design to facilitate easy experimentation and comes with many datasets and models built-in. As per the paper, the developers are currently working on a high-level API for pre-trained, self-supervised, self-trained, and unsupervised deep learning approaches operating on 3D point clouds.

For the official code, documentation, papers, and tutorials, see:

The post Hands-On Guide to Torch-Points3D: A Modular Deep Learning Framework for 3D Data appeared first on Analytics India Magazine.