Hands-On Guide to Torch-Points3D: A Modular Deep Learning Framework for 3D Data
There has been a surge of advancements in automated analysis of 3D data caused by affordable LiDAR sensors, more efficient photogrammetry algorithms, and new neural network architectures. So much that the number of papers related to 3D data being presented at vision conferences is now on par with images, although this rapid methodological development is beneficial to the young field of deep learning for 3D, with its fast pace come several shortcomings:
- Adding new datasets, tasks, or neural architectures to existing approaches is a complicated endeavour, sometimes equivalent to reimplementing from scratch.
- Handling large 3D datasets requires a significant time investment and is prone to many implementation pitfalls.
- There is no standard approach for inference schemes and performance metrics, which makes assessing and reproducing new algorithms’ intrinsic performance difficult.
Torch-Points3D aims to solve these issues. It is an open-source framework designed to facilitate deep neural networks on point cloud-based computer vision. It provides an intuitive interface with most open-access 3D datasets, implementations of many state-of-the-art networks, data augmentation schemes, and validated performance metrics.
Torch-Points3D has a modular design and its components are highly customizable, they can be plugged into one another using a unified system of configuration files. It aims to make it easy to standardize experiments to ensure reproducibility and to help evaluate the performances of different approaches fairly. As the developers put it, “the purpose of our framework is to become for 3D point clouds what torchvision or PyTorch-geometric have become for images and graphs respectively“. The framework is built upon Pytorch Geometric and Facebook Hydra. Like PyTorch, Torch-Points3D uses the background processes to help increase the data processing speed. It off-loads the radius search and subsampling operations to background processes operating on CPUs.
of points processed per second (kpts/s)
Functionalities/operations supported by Torch-Points3D
You can check out all supported tasks and algorithms here.
Supported datasets
Torch-Points3D supports multiple 3D datasets with the data download, pre-processing, as well as automatic result submission.
You can find a comprehensive list of all supported datasets here.
Installation and Requirements
Requirements
- CUDA 10 or higher (if you want GPU version)
- Python 3.7 or higher + headers (python-dev)
- PyTorch 1.7 or higher
- A Sparse convolution backend (optional) like torchsparse
Run the following code before installing Torch-Points3D to ensure that you don’t run into a CUDA version mismatch error.
import torch
def format_pytorch_version(version):
return version.split('+')[0]
TORCH_version = torch.__version__
TORCH = format_pytorch_version(TORCH_version)
def format_cuda_version(version):
return 'cu' + version.replace('.', '')
CUDA_version = torch.version.cuda
CUDA = format_cuda_version(CUDA_version)
!pip install torch-scatter -f https://pytorch-geometric.com/whl/torch-${TORCH}+${CUDA}.html
!pip install torch-sparse -f https://pytorch-geometric.com/whl/torch-${TORCH}+${CUDA}.html
!pip install torch-cluster -f https://pytorch-geometric.com/whl/torch-${TORCH}+${CUDA}.html
!pip install torch-spline-conv -f https://pytorch-geometric.com/whl/torch-${TORCH}+${CUDA}.html
!pip install torch-geometric
Install Torch-Points3D from PyPI
!pip install torch-points3d
For instructions on how to install using other methods see this.
Install PyVista for visualizing point clouds
!pip install pyvista
Creating a KP Conv Segmentation model with Torch-Points3D
- Import necessary libraries
import os #omegaconf config is used for dealing with config files. from omegaconf import OmegaConf import pyvista as pv import torch import numpy as np
- We are going to use the Torch-Points3D version of ShapeNet. Create the config file for the dataset and download it using the torch_points3d.datasets.segmentation.ShapeNet class.
CATEGORY = "All"
USE_NORMALS = True
shapenet_yaml = """
class: shapenet.ShapeNetDataset
task: segmentation
dataroot: %s
normal: %r # Use normal vectors as features
first_subsampling: 0.02 # Grid size of the input data
pre_transforms: # Offline transforms, done only once
- transform: NormalizeScale
- transform: GridSampling3D
params:
size: ${first_subsampling}
train_transforms: # Data augmentation pipeline
- transform: RandomNoise
params:
sigma: 0.01
clip: 0.05
- transform: RandomScaleAnisotropic
params:
scales: [0.9,1.1]
- transform: AddOnes
- transform: AddFeatsByKeys
params:
list_add_to_x: [True]
feat_names: ["ones"]
delete_feats: [True]
test_transforms:
- transform: AddOnes
- transform: AddFeatsByKeys
params:
list_add_to_x: [True]
feat_names: ["ones"]
delete_feats: [True]
""" % (os.path.join(DIR,"data"), USE_NORMALS)
params = OmegaConf.create(shapenet_yaml)
if CATEGORY != "All":
params.category = CATEGORY
from torch_points3d.datasets.segmentation import ShapeNetDataset
dataset = ShapeNetDataset(params)
- Visualize some random point clouds from the dataset using pyvista.
objectid_1 = 9 objectid_2 = 82 objectid_3 = 95 samples = [objectid_1,objectid_2,objectid_3] p = pv.Plotter(notebook=True,shape=(1, len(samples)),window_size=[1024,412]) for i in range(len(samples)): p.subplot(0, i) sample = dataset.train_dataset[samples[i]] point_cloud = pv.PolyData(sample.pos.numpy()) point_cloud['y'] = sample.y.numpy() p.add_points(point_cloud, show_scalar_bar=False, point_size=3) p.camera_position = [-1,5, -10] p.show()
- Create a multi-headed segmentation module to use with the KP Convolution network.
from torch_points3d.core.common_modules import MLP, UnaryConv
class MultiHeadClassifier(torch.nn.Module):
""" Allows segregated segmentation in case the category of an object is known.
This is the case in ShapeNet for example.
Parameters
----------
in_features -
size of the input channel
cat_to_seg
category to segment maps for example:
{
'Airplane': [0,1,2],
'Table': [3,4]
}
"""
def __init__(self, in_features, cat_to_seg, dropout_proba=0.5, bn_momentum=0.1):
super().__init__()
self._cat_to_seg = {}
self._num_categories = len(cat_to_seg)
self._max_seg_count = 0
self._max_seg = 0
self._shifts = torch.zeros((self._num_categories,), dtype=torch.long)
for i, seg in enumerate(cat_to_seg.values()):
self._max_seg_count = max(self._max_seg_count, len(seg))
self._max_seg = max(self._max_seg, max(seg))
self._shifts[i] = min(seg)
self._cat_to_seg[i] = seg
self.channel_rasing = MLP(
[in_features, self._num_categories * in_features], bn_momentum=bn_momentum, bias=False
)
if dropout_proba:
self.channel_rasing.add_module("Dropout", torch.nn.Dropout(p=dropout_proba))
self.classifier = UnaryConv((self._num_categories, in_features, self._max_seg_count))
self._bias = torch.nn.Parameter(torch.zeros(self._max_seg_count,))
def forward(self, features, category_labels, **kwargs):
assert features.dim() == 2
self._shifts = self._shifts.to(features.device)
in_dim = features.shape[-1]
features = self.channel_rasing(features)
features = features.reshape((-1, self._num_categories, in_dim))
features = features.transpose(0, 1) # [num_categories, num_points, in_dim]
features = self.classifier(features) + self._bias # [num_categories, num_points, max_seg]
ind = category_labels.unsqueeze(-1).repeat(1, 1, features.shape[-1]).long()
logits = features.gather(0, ind).squeeze(0)
softmax = torch.nn.functional.log_softmax(logits, dim=-1)
output = torch.zeros(logits.shape[0], self._max_seg + 1).to(features.device)
cats_in_batch = torch.unique(category_labels)
for cat in cats_in_batch:
cat_mask = category_labels == cat
seg_indices = self._cat_to_seg[cat.item()]
probs = softmax[cat_mask, : len(seg_indices)]
output[cat_mask, seg_indices[0] : seg_indices[-1] + 1] = probs
return output
Create a KPConv backbone model using the KPCONV method, you learn more about available models here.
from torch_points3d.applications.kpconv import KPConv
class PartSegKPConv(torch.nn.Module):
def __init__(self, cat_to_seg):
super().__init__()
self.unet = KPConv(
architecture="unet",
input_nc=USE_NORMALS * 3,
num_layers=4,
in_grid_size=0.02
)
self.classifier = MultiHeadClassifier(self.unet.output_nc, cat_to_seg)
@property
def conv_type(self):
""" This is needed by the dataset to infer which batch collate should be used"""
return self.unet.conv_type
def get_batch(self):
return self.batch
def get_output(self):
""" This is needed by the tracker to get access to the ouputs of the network"""
return self.output
def get_labels(self):
""" Needed by the tracker in order to access ground truth labels"""
return self.labels
def get_current_losses(self):
""" Entry point for the tracker to grab the loss """
return {"loss_seg": float(self.loss_seg)}
def forward(self, data):
self.labels = data.y
self.batch = data.batch
# Forward through unet and classifier
data_features = self.unet(data)
self.output = self.classifier(data_features.x, data.category)
# Set loss for the backward pass
self.loss_seg = torch.nn.functional.nll_loss(self.output, self.labels)
return self.output
def get_spatial_ops(self):
return self.unet.get_spatial_ops()
def backward(self):
self.loss_seg.backward()
model = PartSegKPConv(dataset.class_to_segments)
- Create the data loaders and toggle the CPU operation precompute by setting the
precompute_multi_scale parametertoTrue
NUM_WORKERS = 4 BATCH_SIZE = 16 dataset.create_dataloaders( model, batch_size=BATCH_SIZE, num_workers=NUM_WORKERS, shuffle=True, precompute_multi_scale=True ) sample = next(iter(dataset.train_dataloader)) sample.keys
- The
samplecontains the pre-computed spatial information in the multiscale (encoder side) andupsample(decoder) attributes.
sample.multiscale contains 10 different versions of the input batch, each one of these versions contains the location of the points in pos as well as the neighbors of these points in the previous point cloud.
Let’s take a look at the points coming out of each downsampling layer.
sample_in_batch = 0
ms_data = sample.multiscale
num_downsize = int(len(ms_data) / 2)
p = pv.Plotter(notebook=True,shape=(1, num_downsize),window_size=[1024,256])
for i in range(0,num_downsize):
p.subplot(0, i)
pos = ms_data[2*i].pos[ms_data[2*i].batch == sample_in_batch].numpy()
point_cloud = pv.PolyData(pos)
point_cloud['y'] = pos[:,1]
p.add_points(point_cloud, show_scalar_bar=False, point_size=3)
p.add_text("Layer {}".format(i+1),font_size=10)
p.camera_position = [-1,5, -10]
p.show()
- Train the model
from tqdm.auto import tqdm
import time
class Trainer:
def __init__(self,model, dataset, num_epoch = 50, device=torch.device('cuda')):
self.num_epoch = num_epoch
self._model = model
self._dataset=dataset
self.device = device
def fit(self):
self.optimizer = torch.optim.Adam(self._model.parameters(), lr=0.001)
self.tracker = self._dataset.get_tracker(False, True)
for i in range(self.num_epoch):
print("=========== EPOCH %i ===========" % i)
time.sleep(0.5)
self.train_epoch()
self.tracker.publish(i)
self.test_epoch()
self.tracker.publish(i)
def train_epoch(self):
self._model.to(self.device)
self._model.train()
self.tracker.reset("train")
train_loader = self._dataset.train_dataloader
iter_data_time = time.time()
with tqdm(train_loader) as tq_train_loader:
for i, data in enumerate(tq_train_loader):
t_data = time.time() - iter_data_time
iter_start_time = time.time()
self.optimizer.zero_grad()
data.to(self.device)
self._model.forward(data)
self._model.backward()
self.optimizer.step()
if i % 10 == 0:
self.tracker.track(self._model)
tq_train_loader.set_postfix(
**self.tracker.get_metrics(),
data_loading=float(t_data),
iteration=float(time.time() - iter_start_time),
)
iter_data_time = time.time()
def test_epoch(self):
self._model.to(self.device)
self._model.eval()
self.tracker.reset("test")
test_loader = self._dataset.test_dataloaders[0]
iter_data_time = time.time()
with tqdm(test_loader) as tq_test_loader:
for i, data in enumerate(tq_test_loader):
t_data = time.time() - iter_data_time
iter_start_time = time.time()
data.to(self.device)
self._model.forward(data)
self.tracker.track(self._model)
tq_test_loader.set_postfix(
**self.tracker.get_metrics(),
data_loading=float(t_data),
iteration=float(time.time() - iter_start_time),
)
iter_data_time = time.time()
trainer = Trainer(model, dataset)
trainer.fit()
Last Epoch (Endnote)
In this article, we discussed Torch-Points3D, a flexible and powerful framework that aims to make deep learning on 3D data both more accessible and reproducible. It’s built on Pytorch Geometric and Facebook Hydra. It has a modular design to facilitate easy experimentation and comes with many datasets and models built-in. As per the paper, the developers are currently working on a high-level API for pre-trained, self-supervised, self-trained, and unsupervised deep learning approaches operating on 3D point clouds.
For the official code, documentation, papers, and tutorials, see:
The post Hands-On Guide to Torch-Points3D: A Modular Deep Learning Framework for 3D Data appeared first on Analytics India Magazine.




