Truthfulness guarantee

01 · No fabrication

The Penned workflow refuses to add anything you didn't write yourself. Skills you don't list, employers you never had, dates that don't line up — all rejected at the verification stage.

The verification pass runs after every artefact generation. Any fabricated claim sends the run back to the previous state for a regeneration; two consecutive failures abort the run and refund any payment.

02 · Two-strike failure

The orchestrator gives the model exactly two attempts at a truthful artefact. If the second attempt also fails verification, the run is marked failed and — if you paid — the payment is automatically refunded.

For the resume rewrite specifically, two-strike failure degrades softly rather than aborting the run. When both rewrite attempts fail validation (the cycle-31 + cycle-47c graceful fallback path), Penned ships the “How Penned read your resume” parser-view alongside coverage advice instead of failing the whole run. You see the truthful summary of what we extracted from your source resume plus an explanation of which JD requirements are not yet evidenced — surfaced on the application detail page (open any run from your dashboard). The truthfulness contract is preserved: nothing ungrounded is shown.

03 · Truth in analytics

We count tokens; we don't store prompts. The orchestrator's usage_events stream records cost + duration + the model used — never the resume bytes that fed the request.

br-042workflow_pii_redacted_in_usage_events

usage_events rows written during an application run MUST NOT contain any substring drawn from the underlying resume or JD files. The orchestrator records {tokens_used, cost_cents, model_used, stage} only — never prompt or completion bytes, never quoted resume lines, never the JD's company/title free-text. Lifts the immutable rule pii_resume_content_never_in_analytics from narrative to testable at the orchestrator's persistence boundary. Hypothesis property-test threshold (ch-009 tier 2, d-003 follow- up): the PII substring match is enforced at min_substring_length = 8 characters. Shorter substrings (<= 7 chars) are deliberately ignored because random 4-6 char strings from unicode text Hypothesis generates will frequently appear by chance in model_used names like "claude-3-5-sonnet" or stage strings like "requirements_extraction" without constituting a real PII leak. The 8-char threshold keeps the property test tight enough to catch any real quoted line while avoiding pathological shrinker noise.

br-056waitlist_no_pii_in_analytics

The waitlist signup email MUST NOT appear in any usage_events row, Sentry breadcrumb, or third-party analytics tool. The waitlist_signup_created usage_event carries metadata = {ref_code, position, source} only — no email, no IP. Sentry's PII redactor allowlist extends to strip 'email' / 'waitlist_email' field values from event payloads originating from any module that handles the waitlist email — and, defensively, from any other code path. The redaction is applied GLOBALLY (not module-scoped) to err on the side of more PII safety; this means legitimate non-PII uses of an `email` field name (e.g. an admin tool surfacing a bounce diagnostic for an internal address) MUST use a different key such as `recipient_email_domain` or rename to `email_address` so the redactor does not strip the value. The same rule is enforced at the Sentry boundary via a `before_send` hook that walks `event.extra`, `event.contexts`, and `event.request` and applies `redact_pii` to any string-valued field whose key matches REDACT_FIELDS, so direct `sentry_sdk.capture_exception` paths cannot bypass the structlog redactor. Inngest event payloads (which DO carry the email, since Inngest is first-party infrastructure not analytics) are excluded from this rule but logged with the email redacted to Inngest's dashboard. Carry-forward of the immutable rule pii_resume_content_never_in_analytics — extends the principle from resume content to waitlist email.

04 · Reporting a slip

If you spot a fabricated claim, send the application id + the disputed sentence to support@penned. A verified report grants a free re-run within seven days.

05 · How the validators stack

Truthfulness is not a single check; it is a stack of named, spec-pinned business rules that compose to cover every artefact the workflow emits. The six rules below are the load-bearing ones. Each is enforced in code at a named state boundary; drift between this page and the spec fails a unit test.

br-097resume_yaml_load_bearing_tokens_supported_by_source_span

Cycle-32 ch-046: for every ``add`` and ``modify`` Operation emitted by state 6, the load-bearing tokens extracted from ``op["after"]`` MUST be ⊆ the union of tokens extractable from the SOURCE SPANS of every YAML path in ``op["from"]``. Replaces cycle-31's ``op_load_bearing_tokens_in_source`` (br-069 over op.after against the cited bullet text) with a stricter, structural form. The cited path's ``source_span`` (start_line, end_line into the original resume text) is the load-bearing anchor — tokens appearing in the proposed text MUST appear in the source-span body. Numbers and dates may fall back to the full raw resume text (cycle-16 hotfix carryover) so a year mentioned only in a header line still validates. The pre-condition guard is ``yaml_leaf_source_span_resolves`` (also added by cycle-32): every TextLeaf / Bullet's ``source_span`` MUST be a valid line range in the raw resume text AND raw_text[span.start_line] (lower-cased) MUST contain the leaf value (lower-cased). Hard rejection on miss with ``RESUME_YAML_SCHEMA_VIOLATION``. A defense-in-depth helper, ``yaml_load_bearing_tokens_subset_of_value``, verifies every Bullet's load_bearing_tokens are a subset of tokens extractable from the bullet's own value — catches LLM enrichment hallucinations where a token is invented that doesn't appear in the bullet text. First failing op drives ``failure_reason`` (``RESUME_YAML_OP_TOKEN_UNSUPPORTED``); cycle-30 semantic-retry framework gets one retry per ``RESUME_REWRITE_MAX_SEMANTIC_RETRIES`` budget unit. PII posture: validator failure messages reference YAML PATHS + token CLASSES only, never the resume bullet text or the proposed op.after content. Telemetry-safe (PRD §9 carryover from cycle-31).

br-099cover_letter_paragraph_grounded

Cycle-35 ch-049: every paragraph emitted by state 7 (cover_letter_draft) MUST carry a ``grounded_in`` field whose value is a list of YAML path strings (possibly empty). When non-empty, every entry MUST resolve to an existing node in the application's pre-extracted ResumeYAML. The paragraph's claims (employer / role / quantified achievement / named project / etc.) MUST be derivable from the cited bullets / skills / roles — paragraphs that can't be grounded in real anchors are rejected. Empty ``grounded_in: []`` IS valid (signals "no resume claim made; pure JD/role context paragraph"). Soft-close paragraphs ("excited to discuss further") have no resume claim to cite; forcing a citation there causes fabrication. Differs from br-098 (interview_prep) only on the empty-list policy: STARs MUST cite at least one path; cover letter paragraphs MAY have an empty grounded_in list. Closes the last fabrication-risk gap in the LLM pipeline. Cycle-31 + cycle-32 closed it for state 6 (resume_rewrite); cycle-34 extended to state 9 (interview_prep); cycle-35 closes state 7 (cover_letter_draft) using the SAME path resolver (``api/src/workflows/llm/resume_yaml_schema.py::resolve_path``) and the SAME path-shape contract as br-096 / br-098. Validator surface: ``cover_letter_paragraph_grounded`` — for each paragraph, every non-empty entry in ``grounded_in`` MUST resolve. Append-marker paths (``[+]``) are NOT valid here — paragraphs cite EXISTING anchors, never write sites. Malformed paths reject with the same reason. Path syntax (Q-A locked at cycle-32 plan time, reused unchanged at cycle-34 + cycle-35): bracketed list indices ``experience[0].bullets[2]`` / ``skills[3]`` / ``education[1].institution``. State 8 (cover_letter_critique) PRESERVES grounded_in across the critique transformation but does NOT re-validate against ResumeYAML — state 7 is the structural enforcement boundary (PRD §6 Q4). br-049's substring guard at state 8 still fires on paragraph TEXT independent of grounded_in. First failing paragraph + path drives ``failure_reason`` (``COVER_LETTER_DRAFT_PARAGRAPH_UNGROUNDED``); cycle-30 / cycle-32 / cycle-33 / cycle-34 semantic-retry framework gets up to ``COVER_LETTER_DRAFT_MAX_SEMANTIC_RETRIES`` (default 2) retries before failing the state. Truthfulness invariant takes precedence over completion (consistent with cycle-30's stance for state 6 and cycle-34's stance for state 9). PII posture: validator failure messages reference PARAGRAPH INDEX + YAML PATH only, never paragraph text content. Path strings come from the LLM, not from the user's resume. Telemetry-safe (carries forward the cycle-31 / cycle-32 / cycle-34 PII contract).

br-105verdict_synthesis_deterministic_band

The state-4 verdict_synthesis MUST compute the verdict via the pure function compute_verdict(fit_score, critical_count, confidence). The LLM call MAY return reasoning text only; the verdict, fit_score, and confidence persisted on the applications row are sourced from the precomputed inputs and MUST NOT be overridden by the LLM JSON. Discrepancy between the rule's emitted verdict and any decision the LLM's reasoning seems to argue for MUST NOT influence the persisted verdict — confidence-band rules are code, not LLM judgement. Calibration band (lifted verbatim from the cycle-37 prompt table; tightened with a strict transition at critical_count >= 2 in the 60-100 fit band): fit_score < 40 → DONT_APPLY (any conf) fit_score 40-59: critical_count == 0 AND conf in {high, medium} → PROCEED else → DONT_APPLY fit_score 60-100: conf == high: critical_count >= 2 → DONT_APPLY else → PROCEED conf == medium: critical_count >= 2 → DONT_APPLY else → PROCEED conf == low: critical_count >= 1 → DONT_APPLY else → PROCEED The CONFIDENCE RUBRIC is similarly lifted to a pure function compute_confidence() reading upstream signals (state-2 STRONG/PARTIAL/GAP/EXCEEDS counts, state-3 critical/moderate/ minor counts). The LLM no longer emits confidence either. Rationale: the 2026-05-07 quality survey of 17 prod apps showed 5/16 PROCEED apps (31%) violated the system's own stated calibration band — same archetype every time: fit ∈ [68,74] + critical=2 + confidence=medium → band says DONT_APPLY → LLM emitted PROCEED. Receipts: 388166e3, 2b983e3d, f5c03caf, d2d75f55, 13267f78. App 2b983e3d's reasoning literally said "Calibration band 60-100 with 1-2 critical challenges supports PROCEED at medium confidence" — the band actually transitions hard at critical_count >= 2; the LLM smoothed the discontinuity. The verdict_band_miss_total{direction=PROCEED|DONT_APPLY} counter is emitted from verdict_synthesis.run() whenever the precomputed verdict diverges from any verdict the LLM's reasoning text seems to argue for (best-effort detection; regression signal, not hard assertion). cycle-85 wires alerting; cycle-81 just emits. Counter value MUST be 0 by construction after this rule lands. Truthfulness: this is an invariant TIGHTENING. Pre-cycle-81 the system was emitting verdicts inconsistent with its own stated rules. After cycle-81 the rules and the emitted verdict are guaranteed identical.

br-108outreach_template_truthfulness_validator

Cycle-86 ch-086: every body string in ``final_package.linkedin_outreach[*].body`` MUST pass two validators before persistence. Failure replaces the offending phrase with a generic non-claim phrase. TRUTHFULNESS TIGHTENING. Adds a validator gate on a previously- ungated rendering surface. Surfaced by post-cycle-85 field testing on prod app ``0bd68b7f-…`` (ServiceNow AI Foundry, 2026- 05-09): the hiring_manager outreach template emitted "I bring 3+ years of hands-on technical consulting and software development experience" for a candidate whose resume shows 20+ years. The LLM mirrored the JD's "3-5 years" requirement and produced a numerically wrong claim. Validator 1 — token-level grounded check (mirrors br-099): For each token in ``body`` that is NOT in the connective stoplist (br-109), the token MUST appear in either ``flatten(resume_yaml)`` OR ``state-1.requirements_text`` OR ``state-1.job_meta.{title,company}``. Failures trigger Remediation A (single-shot LLM rewrite with the failing phrase replaced by a grounded paraphrase). Validator 2 — numerical-tenure check (NEW): For every match of the regex ``\b(\d+)\+?\s+(?:years?|yrs?) \b`` in ``body``: - ``actual_tenure_years`` MUST be derivable from ``resume_yaml.experience[*].dates`` (sum distinct year ranges, dedup by company; fallback to None if dates not parseable). - REJECT if claimed N satisfies ``N < 0.5 * actual`` OR ``N > 1.10 * actual``. - ACCEPT if ``actual_tenure_years`` is None (cannot verify; do not invent a constraint). - ACCEPT if no numerical tenure claim in body (most common case). Failures trigger Remediation B (regex-replace the matched ``\b\d+\+?\s+years?\b`` phrase with the static safe phrase "with relevant hands-on experience"; preserve surrounding sentence). Remediation order: Validator 1 first; if it triggers a single- shot LLM rewrite, Validator 2 runs on the rewritten body. Validator 2 NEVER calls the LLM (deterministic regex replacement) — this guarantees no cascade of LLM remediations and bounds latency to one extra LLM call worst case. Telemetry: ``final_package.outreach_validation`` structured log emits ``{event, application_id, token_validator_fired, numerical_validator_fired, replacements_count, tenure_years_max_actual}``. Cycle-87 ch-087 renamed from ``claimed_tenure_years_max`` per cycle-86 audit f-002 — the field logs the candidate's ACTUAL tenure (max across roles), not a CLAIMED one. PII-safe per br-103: structural counts + numeric claims only; NO body content, NO resume content.

br-053proceed_completed_has_full_output_package

A PROCEED application that reaches status='completed' MUST have exactly 7 application_outputs rows, one per kind: ats_scan, cover_letter, final_package, improvement_plan, interview_prep, resume_changes, tailored_resume. Missing any one means state 12 did not run to completion for that artifact; the orchestrator MUST NOT mark the application completed until all 7 are persisted. Enforced by: (a) DB trigger on applications.status transition (migration 0027 widens the cycle-31 0021 trigger from 6 → 7 expected kinds), (b) Python guard in the orchestrator's terminal-transition step (assert_seven_outputs_persisted). DONT_APPLY apps are exempt — br-045 already requires zero outputs there. History: cycle-9 ch-009 introduced the original 6-kind set (with tailored_resume). Cycle-31 ch-045 (migrations 0020+0021) replaced tailored_resume with resume_changes (the cited diff). Cycle-81 ch-081 (migration 0027) re-introduces tailored_resume as the rendered DOCX, keeping resume_changes as the diff narrative — both are part of a complete ready-to-submit package.

06 · What we tightened

Truthfulness is monotonic at Penned: rules can be added or tightened, but never relaxed. Cycles 81 (the deterministic verdict band) and 86 (the outreach truthfulness validator) each tightened the truthfulness contract; no cycle has ever amended or relaxed it. As of cert v1.0.54: 2 tightenings, 0 amendments cumulative across the cycle 81-89 epic.