As the scale of data continues to expand, the need for efficient data condensation techniques has become increasingly important. Data condensation involves synthesizing a smaller dataset that retains the essential information from the original dataset, thus reducing storage and computational costs without sacrificing model performance. However, privacy concerns have also emerged as a significant challenge in data condensation. While several approaches have been proposed to preserve privacy during data condensation, privacy protection still needs improvement.
Current privacy-preserving dataset condensation methods often add constant noise to gradients using fixed privacy parameters. This approach can introduce excessive noise, reducing model accuracy, especially in colored datasets with small clipping norms.
Existing techniques lack dynamic parameter strategies that adaptively adjust noise levels based on gradient clipping and sensitivity measures. There’s also a need for more research on how different hyperparameters affect utility and visual quality.
In this context, a new paper was recently published in Neurocomputing journal to address these limitations by proposing Dyn-PSG (Dynamic Differential Privacy-based Dataset Condensation), a novel approach that utilizes dynamic gradient clipping thresholds and sensitivity measures to minimize noise while ensuring differential privacy guarantees. The proposed method aims to improve accuracy compared to existing approaches while adhering to the same privacy budget and applying specified clipping thresholds.
Concretely, instead of using a fixed clipping norm, Dyn-PSG gradually decreases the clipping threshold with training rounds, reducing the noise added in later stages of training. Additionally, it adapts sensitivity measures based on the maximum 𝑙2 norm observed in per-example gradients, ensuring that excessive noise is not injected when necessary. By injecting noise based on the maximum gradient size after clipping, Dyn-PSG introduces minimal increments of noise, mitigating accuracy loss and parameter instability caused by excessive noise injection. This dynamic parameter-based approach improves utility and visual quality compared to existing methods while adhering to strict privacy guarantees.
The steps involved in Dyn-PSG are as follows:
1. Dynamic Clipping Threshold: Instead of using a fixed clipping norm, Dyn-PSG dynamically adjusts the clipping threshold during training. This means that in later stages of training, smaller clipping thresholds are used, resulting in less aggressive gradient clipping and reduced noise added to gradients.
2. Dynamic Sensitivity: To further mitigate noise impact, Dyn-PSG adapts sensitivity measures based on the maximum 𝑙2 norm observed in per-example gradients from each batch. This ensures that excessive noise is not injected into gradients when unnecessary.
3. Noise Injection: Dyn-PSG injects noise into gradients based on the maximum gradient size after clipping instead of arbitrary noise addition. Accuracy loss and parameter instability resulting from excessive noise injection are mitigated by only introducing minimal increments of noise.
To evaluate the proposed method, the research team conducted extensive experiments using several benchmark datasets, including MNIST, FashionMNIST, SVHN, and CIFAR10, which cover a range of image classification tasks with varying complexity and resolution.
The experiments utilized multiple model architectures, with a ConvNet comprising three blocks as the default. Each block includes a Convolutional layer with 128 filters, followed by Instance Normalization, ReLU activation, and Average Pooling, with a fully connected (FC) layer as the final output. The evaluation focused on accuracy metrics and the visual quality of the synthesized datasets across different architectures. The results showed that Dyn-PSG outperformed existing approaches in accuracy while maintaining privacy guarantees.
Overall, these comprehensive evaluations demonstrated that Dyn-PSG is an effective method for data condensation with dynamic differential privacy considerations.
To conclude, Dyn-PSG offers a dynamic solution for privacy-preserving dataset condensation by reducing noise during training while maintaining strict privacy guarantees. Adaptively adjusting gradient clipping thresholds and sensitivity measures achieves better accuracy than existing methods. Experiments across multiple datasets and architectures demonstrate that Dyn-PSG effectively balances data utility and privacy, making it a superior approach for efficient data condensation.
Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. If you like our work, you will love our newsletter..
Don’t Forget to join our 50k+ ML SubReddit
Mahmoud is a PhD researcher in machine learning. He also holds a
bachelor’s degree in physical science and a master’s degree in
telecommunications and networking systems. His current areas of
research concern computer vision, stock market prediction and deep
learning. He produced several scientific articles about person re-
identification and the study of the robustness and stability of deep
networks.