Development build for TheAxeC/fl-kit@ddaec3d (branch: dev-0.1)
Skip to content Skip to footer

Fl use cases: federated analytics life cycle template

Introduction

Briefly describe the project in plain language.

  • Problem and domain:
  • Why federated analytics suits this problem:
  • Who will use the results:

Scope and objectives

Define the project boundaries and goals.

  • Primary objective:
  • Population and setting (describe subjects, patients, or data sources):
  • Outputs to produce:
    • Predictive model, or
    • Descriptive / inferential statistics, or
    • Privacy preserving dashboards or studies
  • Out of scope:

Assumptions and constraints

Make assumptions explicit to avoid hidden risks.

  • Key assumptions:
  • Constraints (technical, organizational, regulatory):
  • How assumptions will be validated or monitored:

Governance

Policies and roles guiding responsible data and model use. (Note: governance covers oversight and policy; technical controls belong in the Privacy, security, and risk section.)

  • Stakeholders and roles:
  • Approvals and oversight:
  • Legal basis, consent, and agreements:
  • Ethics status (institutional approval, N/A for public datasets, or secondary analysis notes):
  • Access and sharing policy for data, models, or results:
  • Publication and dissemination rules:

Data landscape

Describe the data available and site level differences.

  • Federation mode (simulated, live, hybrid):
    • If simulated: describe how clients are defined (e.g., temporal cross-validation, geographic partitions, synthetic splits)
    • If live: describe institutional boundaries and participation
  • Clients and data sources:
  • Inclusion and exclusion rules:
  • Feature families and link to data dictionary:
  • Label or outcome definitions (if applicable):
  • Dataset size, class balance, and known biases:
  • Data quality issues:

Standards and harmonization

List conventions that ensure semantic alignment.

  • Vocabularies, ontologies, or coding systems:
  • Unit conventions and mapping rules:
  • Versioning and updates:

Infrastructure

How the federation is run and secured.

  • Federation topology and orchestration:
  • Frameworks and libraries:
  • Client participation policy:
  • Compute, storage, and networking:
  • Monitoring and failure recovery:
  • Simulation or live clients (if simulated, describe how this approximates real-world federation):
  • Security baseline for transport and authentication:

Wrangling

How data are prepared locally.

  • Preprocessing steps and provenance:
  • Train, validation, and test splits (if modeling):
  • Normalization strategy and source of stats:
  • Missing data handling:
  • Class imbalance handling (if modeling):
  • Validation checks and data QA:

Computation plan

Describe methods to be run.

  • If predictive modeling
    • Baselines and algorithms (include centralized baselines if available):
    • Personalization or adaptation strategy:
    • Model architectures:
    • Training schedule and early stopping:
    • Hyperparameters and search plan:
  • If analytics without modeling
    • Statistical methods and estimators:
    • Aggregations and query design:
    • Hypothesis tests and assumptions:
    • Privacy budgets (if using differential privacy):
  • Random seeds and reproducibility notes:

Evaluation and success criteria

How results will be judged.

  • If modeling
    • Primary and secondary metrics:
    • Client side evaluation plan:
    • Aggregation across clients:
    • Calibration and threshold selection:
    • Runtime and cost reporting:
    • Statistical tests and uncertainty:
    • Comparison to centralized baseline (if available, report performance difference and privacy trade-offs):
  • If analytics without modeling
    • Estimator accuracy and precision:
    • Coverage or confidence intervals:
    • Agreement with a centralized reference (if feasible):
    • Sensitivity analyses for assumptions:
    • Robustness checks across clients:
    • Runtime and cost reporting:
  • Fairness and subgroup checks (demographic, per-site, per-outcome, or other relevant subgroups):

Privacy, security, and risk

Technical and procedural safeguards. (Note: this section describes how governance policies are implemented.)

  • Threat model (distinguish simulation vs deployment if applicable):
  • Controls in use:
    • Secure aggregation
    • Encryption in transit and at rest
    • Differential privacy or k-anonymity if applicable
    • Access logging and audit trail
  • Privacy budget accounting for repeated queries:
  • Incident response and contacts:
  • Simulation-specific notes (if applicable, describe how simulation differs from deployment privacy risks):

Reproducibility and sharing

Make it possible for others to rerun or extend the work.

  • Code repository and commit tag:
  • Environment capture and seeds:
  • Artifacts to release (configs, metrics, models if allowed):
  • Artifact registry or index for traceability:
  • Data availability (public dataset with URL, restricted access with application process, synthetic samples):
  • Known limitations and caveats:

Operationalization and maintenance

Plan for use beyond the study.

  • Deployment target and owner (or “intended use case” for proof-of-concept work without concrete deployment):
  • If modeling
    • Monitoring for drift and performance:
    • Update and retraining policy:
  • If analytics without modeling
    • Schedule for recurring queries or dashboards:
    • Change control for query definitions:
  • Site playbooks and operator training:
  • Sunset or rollback plan:

Technology readiness level (TRL)

Describe maturity and supporting evidence.

  • Claimed TRL:
  • Evidence and references:
  • Gaps to reach the next TRL:
  • Target deployment setting:

Wrap up

Summarize the key outcomes and next steps.

  • Key learning:
  • Decisions made and why:
  • Next step to raise TRL:

Appendix: Changes from Template v1.1

Added fields

  1. Governance section: “Ethics status” field to handle public datasets, secondary analysis, and N/A cases
  2. Data landscape section: “Federation mode” field (simulated, live, hybrid) with guidance on documenting simulation approaches
  3. Infrastructure section: Explicit prompt to describe how simulation approximates real-world federation
  4. Computation plan section: Explicit request for centralized baselines in modeling subsection
  5. Evaluation section: “Comparison to centralized baseline” field to quantify privacy-preserving trade-offs
  6. Privacy section: “Simulation-specific notes” to distinguish current vs. future deployment threat models
  7. Reproducibility section: “Data availability” field with three options (public, restricted, synthetic)
  8. Operationalization section: Allow “intended use case” for proof-of-concept work without concrete deployment owner

Clarified language

  1. Scope section: Changed “Population and setting” description to explicitly include non-clinical subjects (e.g., “describe subjects, patients, or data sources”)
  2. Evaluation section: Clarified “Fairness and subgroup checks” to include non-demographic subgroups (per-site, per-outcome, per-finger, etc.)
  3. Throughout: Added simulation considerations where relevant to support early-stage work

Design principles preserved from v1.1

  • Prose-first approach (minimize bullet points in completed examples)
  • Separation of governance (policy) from privacy/security (technical controls)
  • Explicit TRL assessment with gap analysis
  • Reproducibility focus with artifact tracking

Appendix: Potential Future Sections for Template v3.0

The following sections were identified as potentially valuable but require more community feedback before inclusion:

  1. Algorithm Development & Validation: For projects introducing novel federated methods, document mathematical formulation, algorithmic innovations, and validation separate from application performance (distinct from “Computation plan” which focuses on using existing methods)

  2. Communication & Bandwidth Analysis: Quantify bytes transmitted per round, total bandwidth requirements, compression strategies, network latency tolerance, and communication efficiency techniques (gradient sparsification, quantization)

  3. Heterogeneity Analysis: Systematically document data heterogeneity (distribution differences), system heterogeneity (compute/memory/network variations), and statistical heterogeneity (non-IID effects on convergence) in a dedicated section rather than scattered across Infrastructure and Evaluation

  4. Client Selection & Sampling Strategy: Document selection criteria (random, stratified, active learning), minimum participation requirements per round, dropout tolerance policies, and strategies for handling partial client participation

  5. Interpretability & Explainability: For clinical or high-stakes applications, document model interpretability for domain experts, feature importance analysis, failure mode characterization, and explanation generation strategies

  6. Cost-Benefit Analysis: Quantify resource costs (compute, storage, personnel), opportunity costs of federated constraints, and benefit quantification to justify federated approach versus centralized alternatives (include acceptable performance trade-offs)

More information