Pitfall checker¶

The pitfall checker inspects a config (before or without running it) and flags the common FL-evaluation mistakes from the project, then recommends concrete counter-experiments. Code: fltest/pitfalls/.

fltest pitfalls <config.yaml>

Detectors¶

Id	Pitfall	Triggers when…	Severity
`P1_threat_models`	Inadequate threat models	no attacks, or only naive ones (`gaussian`/`label_flip`/`sign_flip`)	high / medium
`P2_dataset`	Dataset sensitivities	MNIST-only, or only class-balanced datasets	medium / low
`P3_iid_only`	IID-only evaluation	every `data_distribution` is `iid`	high
`P3_no_personalized`	No personalized metric	`per_client` not in `metrics`	medium
`P4_misconfig_dp`	Misconfigured DP	`gradient_noise` with `sigma=0` (no privacy) or very large	high / low
`P5_subtle_leakage`	Subtle privacy leakage	no privacy attack (`dlg`) included	medium
`P6_user_expertise`	Mismatched defense	only perturbation defenses against non-naive attacks	low

Recommendations¶

For each finding the recommender emits a copy-pasteable YAML fragment, ordered by severity. Example output:

[HIGH  ] IID-only data distribution  (P3_iid_only)
    Only IID is evaluated; ~50% of works do this even though IID is easiest to defend.
    → Sweep data_distribution over ['iid','dirichlet','pathological'].

# IID-only data distribution (high)
data_distribution: [iid, dirichlet, pathological]

Merge the fragment and the pitfall clears on the next check (e.g. the list form satisfies the fuzzer and exercises non-IID).

Programmatic use¶

from fltest.core.config import load_config
from fltest.pitfalls import check_config, recommend

cfg = load_config("my_conf.yaml")
findings = check_config(cfg)
for f in findings:
    print(f.severity, f.pitfall, f.message)
for r in recommend(findings):
    print(r["title"], r["counter_experiment"])

Add or tune detectors in fltest/pitfalls/checker.py and their counter-experiments in fltest/pitfalls/recommender.py.