Cristin-resultat-ID: 1709826
Sist endret: 10. februar 2020, 14:04
NVI-rapporteringsår: 2019
Resultat
Vitenskapelig artikkel
2019

Information preserving regression-based tools for statistical disclosure control

Bidragsytere:
  • Øyvind Langsrud

Tidsskrift

Statistics and computing
ISSN 0960-3174
e-ISSN 1573-1375
NVI-nivå 2

Om resultatet

Vitenskapelig artikkel
Publiseringsår: 2019
Publisert online: 2019
Volum: 29
Hefte: 5
Sider: 965 - 976
Open Access

Importkilder

Scopus-ID: 2-s2.0-85059508705

Beskrivelse Beskrivelse

Tittel

Information preserving regression-based tools for statistical disclosure control

Sammendrag

This paper presents a unified framework for regression-based statistical disclosure control for microdata. A basic method, known as information preserving statistical obfuscation (IPSO), produces synthetic data that preserve variances, covariances and fitted values. The data are then generated conditionally according to the multivariate normal distribution. Generalizations of the IPSO method are described in the literature, and these methods aim to generate data more similar to the original data. This paper describes these methods in a concise and interpretable way, which is close to efficient implementation. Decomposing the residual data into orthogonal scores and corresponding loadings is an essential part of the framework. Both QR decomposition (Gram–Schmidt orthogonalization) and singular value decomposition (principal components) may be used. Within this framework, new and generalized methods are presented. In particular, a method is described by means of which the correlations to the original principal component scores can be controlled exactly. It is shown that a suggested method of random orthogonal matrix masking can be implemented without generating an orthogonal matrix. Generalized methodology for hierarchical categories is presented within the context of microaggregation. Some information can then be preserved at the lowest level and more information at higher levels. The presented methodology is also applicable to tabular data. One possibility is to replace the content of primary and secondary suppressed cells with generated values. It is proposed replacing suppressed cell frequencies with decimal numbers, and it is argued that this can be a useful method.

Bidragsytere

Aktiv cristin-person

Øyvind Langsrud

  • Tilknyttet:
    Forfatter
    ved Avdeling for metodeutvikling og datainnsamling ved Statistisk sentralbyrå
1 - 1 av 1