Machine Learning Model Generation With Copula-Based Synthetic Dataset for Local Differentially Private Numerical Data
Document Type
Article
Publication Title
IEEE Access
Abstract
With the development of IoT technology, personal data are being collected in many places. These data can be used to create new services, but consideration must be given to the individual's privacy. We can safely collect personal data while adding noise by applying differential privacy. However, because such data are very noisy, the accuracy of machine learning trained by the data greatly decreased. In this study, our objective is to build a highly accurate machine learning model using these data. We focus on the decision tree machine learning algorithm, and, instead of applying it as is, we use a preprocessing technique wherein pseudodata are generated using a copula while removing the effect of noise added by differential privacy. In detail, the proposed novel protocol consists of three steps: generating a covariance matrix from the differentially private numerical data, generating a discrete cumulative distribution function from differentially private numerical data, and generating copula-based numerical samples. Simulation results using synthetic and real datasets verify the utility of the proposed method not only for the decision tree algorithm but also for other machine learning algorithms such as deep neural networks. This method will help create machine learning models, such as recommendation systems, using differential privacy data.
First Page
101656
Last Page
101671
DOI
10.1109/ACCESS.2022.3208715
Publication Date
1-1-2022
Recommended Citation
Sei, Yuichi; Andrew Onesimu, J.; and Ohsuga, Akihiko, "Machine Learning Model Generation With Copula-Based Synthetic Dataset for Local Differentially Private Numerical Data" (2022). Open Access archive. 4829.
https://impressions.manipal.edu/open-access-archive/4829