Federated Learning

Federated Learning (FL) is a decentralized approach to machine learning where models are trained across multiple devices or institutions without sharing raw data. This chapter explores FL architectures, security challenges, privacy-preserving techniques, and real-world applications.

What is Federated Learning?

Federated Learning allows multiple participants (e.g., mobile devices, hospitals, banks) to collaboratively train a model without exposing individual data. Instead of sending raw data to a central server, only model updates (gradients or parameters) are shared.

Key Benefits: - 🔐 Privacy-Preserving: Raw data remains on local devices. - Efficiency: Enables distributed training without centralizing data storage. - 🌎 Scalability: Works across diverse environments (e.g., healthcare, finance, edge computing).

Types of Federated Learning

Federated Learning is categorized into three main architectures based on data distribution.

1. Horizontal Federated Learning (HFL)
  • Data has the same features across multiple entities.

  • Example: Banks in different countries sharing transaction data (same attributes but different customers).

  • Challenge: Requires strict privacy protection due to user-sensitive data.

2. Vertical Federated Learning (VFL)
  • Entities share different features for the same users.

  • Example: A bank and an e-commerce company both have information on the same customers but with different attributes (banking history vs. purchase behavior).

  • Challenge: Requires secure feature alignment and encryption techniques.

3. Hybrid Federated Learning
  • Combination of HFL and VFL for multi-party collaboration.

  • Example: A global insurance company and local healthcare providers collaborating on risk assessment.

  • Challenge: Requires adaptive privacy strategies and secure computation.

Security & Privacy Challenges

Although FL protects raw data, model updates can still leak sensitive information. Key threats include:

🚨 1. Model Inversion Attacks
  • An adversary reconstructs original training data from shared gradients.

  • Solution: Homomorphic encryption or differential privacy noise injection.

🚨 2. Membership Inference Attacks
  • Attackers determine if a particular record was part of the training set.

  • Solution: PATE (Private Aggregation of Teacher Ensembles).

🚨 3. Gradient Leakage
  • Gradients can reveal sensitive data patterns.

  • Solution: Secure Multi-Party Computation (SMPC) or DP-SGD (Differentially Private Stochastic Gradient Descent).

🚨 4. Free-Rider Attacks
  • Malicious participants benefit from the model without contributing useful data.

  • Solution: Client authentication and aggregation integrity checks.

Privacy-Preserving Techniques for FL

To protect data while ensuring accurate model training, FL relies on three primary privacy-enhancing methods.

1. Differential Privacy (DP)
  • Adds mathematical noise to model updates to limit individual record exposure.

  • Example: DP-SGD adds noise to gradients before aggregation.

2. Homomorphic Encryption (HE)
  • Allows computations on encrypted data without decryption.

  • Example: Model updates are encrypted before transmission, ensuring that even the central server cannot view them.

3. Secure Multi-Party Computation (SMPC)
  • Splits model updates into multiple secret shares, ensuring no single entity has access to complete data.

  • Example: Privacy-Preserving Federated Learning (PPFL) in medical research.

Real-World Applications of Federated Learning

Federated Learning is transforming various industries by enabling AI training on sensitive datasets.

🏦 1. Financial Services
  • Fraud detection models trained across banks without exposing customer transactions.

  • VFL + Homomorphic Encryption for secure cross-bank collaborations.

🏥 2. Healthcare & Biomedical Research
  • FL enables medical AI models trained across hospitals while maintaining patient privacy.

  • Example: Tumor detection AI trained on MRI scans from multiple hospitals.

📱 3. Edge Computing & IoT
  • Enables on-device AI training for mobile devices, reducing cloud dependence.

  • Example: Federated Learning on smartphones for speech recognition (Google Keyboard, Apple Siri).

🚗 4. Autonomous Vehicles
  • Self-driving cars share insights from driving conditions without sending raw sensor data.

  • HFL + SMPC ensures privacy while improving AI accuracy.

Challenges & Future Directions

Despite its advantages, FL still faces technical and operational challenges.

1. Communication Overhead
  • Training across multiple devices leads to network bottlenecks.

  • Solution: Compression techniques (e.g., sparsification, quantization).

2. Model Poisoning Attacks
  • Malicious clients inject backdoors into the model.

  • Solution: Byzantine-robust aggregation (e.g., Krum, Trimmed Mean).

3. Lack of Standardization
  • Different implementations lack interoperability.

  • Solution: Unified FL frameworks like Flower, FedML, and TensorFlow Federated (TFF).

4. Scalability & Client Heterogeneity
  • Devices have varying computational power.

  • Solution: Adaptive learning rates and client selection.

Next Steps

📖 For privacy-preserving methods, see Confidential Computing 📊 For differential privacy details, see Differential Privacy

For secure AI architectures, see Risk Simulation