Google, in collaboration with OpenMined, has released Pipeline DP, an open-source tool to build and use differentially private data pipelines. The tool was developed as part of OpenMined’s mission to make privacy preserving machine learning (PPML) accessible.

PipelineDP is a framework for applying differential privacy to large datasets using batch processing systems like Apache Spark and Apache Beam. Developers can apply differential privacy to datasets using PipelineDP API.

The API encapsulates the complexities of differential privacy such as protection of outliers and long tail categories, generation of safe noise, and privacy budget accounting. It also supports many standard computations such as count, sum, average and is easily extensible to support other aggregation types.

Differential privacy can provide useful insights and services from the data without revealing any personal information about individuals.

PipelineDP was built on the foundation of PyDP. PyDP provides many important functions for applying differential privacy via a relatively low-level Python API. While PyDP offers great flexibility, additional expertise and configurations such as accounting for the privacy budget, calculating the sensitivity of various functions, and implementing correct aggregations. PipelineDP helps developers to avoid the complexities involved with PyDP.
In 2019, Google launched an open-source differential privacy library that works with C++, Go and Java languages.