Concepts & internals¶
The five ideas that make FLTest work.
1. RunSpec — one flat, resolved run¶
A RunSpec (fltest/core/config.py) is everything one simulation needs: framework, seed,
device, dataset + distribution, model, FL params (clients/rounds/lr/epochs/batch), the
attacks/defenses/metrics to compose, and cache paths. The fuzzer produces a list of these
from your TestConfig. Adapters consume only RunSpec — never the raw YAML.
2. The FrameworkAdapter seam¶
Every FL framework implements one method (fltest/frameworks/base.py):
class FrameworkAdapter:
def run_simulation(self, spec: RunSpec, data: dict, hook_runner: HookRunner) -> RunResult: ...
The core never imports a framework directly; get_adapter(name) returns the registered
adapter. This is the "run_simulation() boundary": adding a backend is a subclass +
@register_framework("name"). See Architecture for the fidelity
table (which hooks each backend supports).
3. The hook lifecycle + HookContext¶
A single mutable HookContext (fltest/core/hook_context.py) flows through every phase:
before_simulation → on_data_distribute →
[ before_round →
{ before_client_train → (train) → after_client_train }* →
before_aggregate → on_aggregate → after_aggregate → after_round ]* →
after_simulation
Relevant HookContext fields by phase:
| Field | Set during | Meaning |
|---|---|---|
cfg, framework, round, client_id |
all | run identity |
dist_dict |
data phase | cid → client loader (attacks may repartition) |
client_data |
before_client_train |
this client's loader (attacks swap it) |
global_state |
rounds | current global params (list of ndarrays) |
client_update |
after_client_train |
this client's update (attacks/defenses mutate) |
updates_and_weights |
before_aggregate |
[(update, n), …] (robust agg replaces this) |
model, test_data |
after_round |
live model + central test loader |
metrics, history |
all | ctx.record(**kv) writes here |
4. Hook plugins (HookPlugin)¶
Attacks, defenses, and metric listeners all subclass HookPlugin
(fltest/core/plugin.py). A plugin declares the hooks it uses in HOOKS and implements
the matching methods; attach() registers exactly those onto a HookRunner.
class MyAttack(ThreatModelBaseClass):
HOOKS = ("before_client_train",)
def before_client_train(self, ctx): ...
Because every backend emits the same hooks with the same context, a plugin written once runs across all of them, and multiple plugins compose on one run. See Port your attacks & defenses.
5. One canonical parameter representation¶
Model parameters and updates are always an ordered list of numpy arrays
(state_dict_to_ndarrays / load_ndarrays_into / aggregate_ndarrays in
fltest/data/utils.py). Attacks and defenses operate on this list, so they are identical
across frameworks regardless of each framework's native format. The reference backend and
Flower both aggregate with the same FLTest weighted-mean, which is what makes their results
match within tolerance.
RunResult¶
@dataclass
class RunResult:
framework: str
status: str # "success" | "failed"
final: dict # last-round metrics, e.g. {"accuracy":.., "loss":..}
history: dict[int, dict] # round → metrics
per_client: dict # optional
extras: dict # e.g. DLG reconstruction details
duration_seconds: float
final is what differential and metamorphic tests read.