Bottom line: The BfDI assesses the direct training of AI models with real tax data as data-protection-critical, since memorization of citizen data represents a known risk.
Financial authorities are planning to train AI models with real tax datasets. The Federal Data Protection Officer (BfDI) advises caution, as trained models could memorize personally identifiable data.
Financial authorities are testing the deployment of AI systems to increase the efficiency of their processes. In doing so, real tax datasets are being used as training data to adapt the models to the specific requirements of tax audits and administration.
The Federal Data Protection Officer sees a significant data-protection risk in this approach. A central problem: Large language models and neural networks can memorize personally identifiable data from the training dataset during training – i.e., store it internally – and reconstruct or disclose it in their outputs later. This affects sensitive information such as income data, asset information, and other tax data from millions of citizens.
For data controllers, this creates a confrontation between optimization objectives and data-protection requirements: The legality of AI use must be ensured from the design stage (Privacy by Design), not examined retroactively. Training with real, non-anonymized data conflicts with data minimization and the risk that the legal basis (such as purpose limitation under the GDPR) may not remain satisfied.
Instead, financial authorities would need to resort to synthetic data, anonymization techniques, or heavily aggregated datasets – even if this could impair model accuracy. This is a frequent dilemma in public AI use: higher data-protection standards and technical security often come at the cost of performance.
Source: www.golem.de · Published 12 June 2026
Lumi AI News — AI-assisted curation according to Article 50 EU AI Act. Paraphrasing and classification through Lumi News Pipeline v1.6.5.