Demajh logo Demajh, Inc.

Tuning without Peeking: Provable Privacy and Generalization Bounds for LLM Post‑Training — what it means for business leaders

BBoxER retrofits large language models with a privacy‑preserving, black‑box evolutionary layer that boosts accuracy while guaranteeing formal bounds on generalization, differential privacy, and robustness to poisoning — letting enterprises deploy LLMs in sensitive domains with confidence.

1. What the method is

BBoxER is a comparison‑based evolutionary algorithm that fine‑tunes a trained LLM without gradients. The sequence of queries and weight updates forms an implicit compression trace, acting as an information bottleneck that can be formally analysed.

2. Why the method was developed

Standard gradient fine‑tuning exposes models to data leakage, overfitting, and backdoor risks. BBoxER was created to keep optimisation black‑box and comparison‑only, enabling provable generalization and (ε,δ) = (0,0) differential‑privacy guarantees without the accuracy loss of noisy DP‑SGD.

3. Who should care

4. How the method works

BBoxER selects a small set of adapter parameters (e.g. rank‑1 output weights), then iteratively (i) samples candidate tweaks, (ii) scores them on a held‑out prompt set, and (iii) keeps only the best candidate — recording just its index, not raw gradients. The resulting comparison trace compresses training information, which the authors exploit to derive finite‑sample PAC‑Bayes and differential‑privacy bounds.

5. How it was evaluated

The team retrofitted Llama‑3‑8B and Qwen‑2‑3B models on GSM8K, Geometry3K, MATH and other reasoning suites. Budgets were capped at ≤500 model evaluations — two orders of magnitude below typical gradient fine‑tuning.

6. How it performed

Despite the low‑budget regime, BBoxER delivered +3‑6 pp accuracy gains over base checkpoints and matched gradient‑tuned baselines on GSM8K, while certifying zero‑leakage privacy and immunity to a 5 % poisoning attack. (Source: arXiv 2507.01752, 2025)

← Back to dossier index