Skip to content

Configuration reference

An experiment is a single test_conf.yaml. Any top-level FL knob may be a scalar or a list — a list marks it for fuzzing. The runs: block lists the frameworks to execute (this is what enables cross-framework differential testing).

Full example

name: my_eval                    # report name
device: cpu                      # cpu | mps | cuda
seed: 786
deterministic: true

# ---- data ----
dataset: [mnist, cifar10]        # list => fuzzed
data_distribution: [iid, dirichlet]   # iid | dirichlet | pathological
dirichlet_alpha: 0.5             # lower = more non-IID (only for dirichlet)
classes_per_partition: 2         # only for pathological
dataset_partitions: 100          # how finely to shard before taking num_clients shards

# ---- model ----
model_name: LeNet                # LeNet | ConvNet | MLP

# ---- FL parameters ----
num_clients: 10
num_rounds: 10
client_epochs: 1
client_lr: 0.01
client_batch_size: 32
server_batch_size: 256
max_test_data_size: 2048
optimizer: SGD                   # SGD | Adam
loss_fn: CrossEntropyLoss

# ---- plugins ----
attacks:
  - {name: backdoor, params: {target_label: 0, infection_rate: 0.3}, target_clients: [0, 1]}
defenses:
  - {name: median}
metrics: [accuracy, loss, per_client]

# ---- which frameworks to run ----
runs:
  - {framework: reference, name: reference}
  - {framework: flwr,      name: flower}
  - {framework: nvflare,   name: nvflare}

# ---- what to assert ----
testing:
  differential: {enabled: true, mode: cross_framework, metric: accuracy, tolerance: 0.05}
  metamorphic:
    - {relation: clients_scale, values: [10, 20], metric: accuracy, tolerance: 0.05}

Knob reference

Reproducibility / hardware

Key Default Notes
seed 786 seeds python/numpy/torch
device cpu cpu is deterministic; mps/cuda for speed
deterministic true load cached identical initial weights for all clients/frameworks
total_cpus / total_gpus 4 / 0 Flower/Ray resource pool

Data

Key Default Notes
dataset mnist see Datasets
data_distribution iid iid, dirichlet (label skew), pathological (N classes/client)
dirichlet_alpha 0.5 only used when distribution is dirichlet; lower ⇒ more heterogeneous
classes_per_partition 2 only used when distribution is pathological
dataset_partitions 100 the dataset is split into this many shards; the first num_clients are used. Keep it fixed while sweeping num_clients so per-client data size is comparable.
max_test_data_size 2048 size of the central test subset (keeps eval fast)

Model

Key Default Options
model_name LeNet LeNet (32×32 conv), ConvNet (smooth activations, for DLG), MLP (fast)

FL parameters

Key Default Notes
num_clients 10 participants per round (all participate; fraction_fit=1)
num_rounds 10 global aggregation rounds
client_epochs 1 local epochs per round
client_lr 0.01 local learning rate
client_batch_size 32 local batch size
optimizer SGD SGD, Adam

Plugins

  • attacks: / defenses: — lists of {name, params, target_clients?}. See Attacks and Defenses for each plugin's parameters. target_clients (attacks only) restricts the attack to those client ids; omit for all.
  • metrics: — list of metric-listener names. accuracy and loss are always produced; add per_client for personalized evaluation. See Metrics.

runs:

A list of {framework, name?, ...overrides}. One entry per framework you want to execute. A run entry may also override any top-level knob and may carry its own attacks/defenses/ metrics. Frameworks: reference, flwr/flower, nvflare/flare.

testing:

Where parameters come from in code

Defaults and validation live in fltest/core/config.py (TestConfig and RunSpec). The fuzzable knob list is FUZZABLE_KNOBS in the same file.