Pitfall checker¶
The pitfall checker inspects a config (before or without running it) and flags the common
FL-evaluation mistakes from the project, then recommends concrete counter-experiments.
Code: fltest/pitfalls/.
fltest pitfalls <config.yaml>
Detectors¶
| Id | Pitfall | Triggers when… | Severity |
|---|---|---|---|
P1_threat_models |
Inadequate threat models | no attacks, or only naive ones (gaussian/label_flip/sign_flip) |
high / medium |
P2_dataset |
Dataset sensitivities | MNIST-only, or only class-balanced datasets | medium / low |
P3_iid_only |
IID-only evaluation | every data_distribution is iid |
high |
P3_no_personalized |
No personalized metric | per_client not in metrics |
medium |
P4_misconfig_dp |
Misconfigured DP | gradient_noise with sigma=0 (no privacy) or very large |
high / low |
P5_subtle_leakage |
Subtle privacy leakage | no privacy attack (dlg) included |
medium |
P6_user_expertise |
Mismatched defense | only perturbation defenses against non-naive attacks | low |
Recommendations¶
For each finding the recommender emits a copy-pasteable YAML fragment, ordered by severity. Example output:
[HIGH ] IID-only data distribution (P3_iid_only)
Only IID is evaluated; ~50% of works do this even though IID is easiest to defend.
→ Sweep data_distribution over ['iid','dirichlet','pathological'].
# IID-only data distribution (high)
data_distribution: [iid, dirichlet, pathological]
Merge the fragment and the pitfall clears on the next check (e.g. the list form satisfies the fuzzer and exercises non-IID).
Programmatic use¶
from fltest.core.config import load_config
from fltest.pitfalls import check_config, recommend
cfg = load_config("my_conf.yaml")
findings = check_config(cfg)
for f in findings:
print(f.severity, f.pitfall, f.message)
for r in recommend(findings):
print(r["title"], r["counter_experiment"])
Add or tune detectors in fltest/pitfalls/checker.py and their counter-experiments in
fltest/pitfalls/recommender.py.