mirror of
https://github.com/speed47/spectre-meltdown-checker.git
synced 2026-04-22 16:43:20 +02:00
reconsider prior backlog each run + recognize CVEs from context
This commit is contained in:
@@ -28,10 +28,7 @@ subsystems.
|
|||||||
{
|
{
|
||||||
"scan_date": "2026-04-18T14:24:43+00:00",
|
"scan_date": "2026-04-18T14:24:43+00:00",
|
||||||
"window_cutoff": "2026-04-17T13:24:43+00:00",
|
"window_cutoff": "2026-04-17T13:24:43+00:00",
|
||||||
"per_source": {
|
"per_source": { "phoronix": {"status": 200, "new": 2, "total_in_feed": 75} },
|
||||||
"phoronix": {"status": 200, "new": 2, "total_in_feed": 75},
|
|
||||||
"oss-sec": {"status": 304, "new": 0}
|
|
||||||
},
|
|
||||||
"items": [
|
"items": [
|
||||||
{
|
{
|
||||||
"source": "phoronix",
|
"source": "phoronix",
|
||||||
@@ -44,13 +41,27 @@ subsystems.
|
|||||||
"vendor_ids": [],
|
"vendor_ids": [],
|
||||||
"snippet": "first 400 chars of description, tags stripped"
|
"snippet": "first 400 chars of description, tags stripped"
|
||||||
}
|
}
|
||||||
|
],
|
||||||
|
"reconsider": [
|
||||||
|
{
|
||||||
|
"canonical_id": "INTEL-SA-00145",
|
||||||
|
"current_bucket": "toimplement",
|
||||||
|
"title": "Lazy FP State Restore",
|
||||||
|
"sources": ["intel-psirt"],
|
||||||
|
"urls": ["https://www.intel.com/.../intel-sa-00145.html"],
|
||||||
|
"extracted_cves": [],
|
||||||
|
"first_seen": "2026-04-19T09:41:44+00:00"
|
||||||
|
}
|
||||||
]
|
]
|
||||||
}
|
}
|
||||||
```
|
```
|
||||||
|
|
||||||
`items` is already: (a) within the time window, (b) not known to prior
|
- `items` are fresh observations from today's fetch: already inside the
|
||||||
state under any of its alt-IDs. If `items` is empty, your only job is to
|
time window and not yet present in state under any alt-ID.
|
||||||
write the three stub output files with `(no new items in this window)`.
|
- `reconsider` holds existing `toimplement`/`tocheck` entries from state,
|
||||||
|
submitted for re-review each run (see the "Reconsideration" section
|
||||||
|
below). On days where both arrays are empty, write stub output files
|
||||||
|
with `(no new items in this window)`.
|
||||||
|
|
||||||
- `./checker/` is a checkout of the **`test`** branch of this repo (the
|
- `./checker/` is a checkout of the **`test`** branch of this repo (the
|
||||||
development branch where coded-but-unreleased CVE checks live). This is
|
development branch where coded-but-unreleased CVE checks live). This is
|
||||||
@@ -82,6 +93,30 @@ in `tocheck`.
|
|||||||
follow-ups per run total**. Do not use it for items you already plan to file
|
follow-ups per run total**. Do not use it for items you already plan to file
|
||||||
as `unrelated` or `toimplement`.
|
as `unrelated` or `toimplement`.
|
||||||
|
|
||||||
|
## Reconsideration rules (for `reconsider` entries)
|
||||||
|
|
||||||
|
Each `reconsider` entry is an item *already* in state under `current_bucket`
|
||||||
|
= `toimplement` or `tocheck`, from a prior run. Re-examine it against the
|
||||||
|
**current** `./checker/` tree and current knowledge. You may:
|
||||||
|
|
||||||
|
- **Demote** `toimplement` → `tocheck` or `unrelated` if the checker now
|
||||||
|
covers the CVE/codename (grep confirms), or if reinterpreting the
|
||||||
|
advisory shows it's out of scope.
|
||||||
|
- **Demote** `tocheck` → `unrelated` if new context settles the ambiguity
|
||||||
|
as out-of-scope.
|
||||||
|
- **Promote** `tocheck` → `toimplement` if you now have firm evidence it's
|
||||||
|
a real, in-scope, not-yet-covered CVE.
|
||||||
|
- **Leave it unchanged** (same bucket) — emit a record anyway; it's cheap
|
||||||
|
and documents that the reconsideration happened today.
|
||||||
|
- **Reassign the canonical ID** — if a CVE has since been assigned to a
|
||||||
|
vendor advisory (e.g., an INTEL-SA that previously had no CVE), put the
|
||||||
|
CVE in `extracted_cves` and use it as the new `canonical_id`. The merge
|
||||||
|
step will rekey the record under the CVE and keep the old ID as an alias.
|
||||||
|
|
||||||
|
For every reconsider record you emit, set `"reconsider": true` in its
|
||||||
|
classification entry — this tells the merge step to **overwrite** the
|
||||||
|
stored bucket (including demotions), not just promote.
|
||||||
|
|
||||||
## Outputs
|
## Outputs
|
||||||
|
|
||||||
Compute `TODAY` = the `YYYY-MM-DD` prefix of `scan_date`. Write three files at
|
Compute `TODAY` = the `YYYY-MM-DD` prefix of `scan_date`. Write three files at
|
||||||
@@ -91,6 +126,11 @@ the repo root, overwriting if present:
|
|||||||
- `watch_${TODAY}_tocheck.md`
|
- `watch_${TODAY}_tocheck.md`
|
||||||
- `watch_${TODAY}_unrelated.md`
|
- `watch_${TODAY}_unrelated.md`
|
||||||
|
|
||||||
|
These delta files cover the **`items`** array only — they answer "what
|
||||||
|
did today's fetch surface". Reconsider decisions update state (and surface
|
||||||
|
in the `current_*.md` snapshots the merge step rewrites); don't duplicate
|
||||||
|
them here.
|
||||||
|
|
||||||
Each file uses level-2 headers per source short-name, then one bullet per
|
Each file uses level-2 headers per source short-name, then one bullet per
|
||||||
item: the stable ID, the permalink, and 1–2 sentences of context.
|
item: the stable ID, the permalink, and 1–2 sentences of context.
|
||||||
|
|
||||||
@@ -112,6 +152,9 @@ otherwise empty):
|
|||||||
- per-source counts (from per_source): ...
|
- per-source counts (from per_source): ...
|
||||||
- fetch failures (status != 200/304): ...
|
- fetch failures (status != 200/304): ...
|
||||||
- total classified this run: toimplement=<n>, tocheck=<n>, unrelated=<n>
|
- total classified this run: toimplement=<n>, tocheck=<n>, unrelated=<n>
|
||||||
|
- reconsidered: <n> entries re-reviewed; <list any bucket transitions, e.g.
|
||||||
|
"CVE-2018-3665: toimplement -> tocheck (now covered at src/vulns/...)">,
|
||||||
|
or "no transitions" if every reconsider kept its existing bucket.
|
||||||
```
|
```
|
||||||
|
|
||||||
## `classifications.json` — required side-channel for the merge step
|
## `classifications.json` — required side-channel for the merge step
|
||||||
@@ -134,14 +177,27 @@ record per item in `new_items.json.items`:
|
|||||||
|
|
||||||
Rules:
|
Rules:
|
||||||
|
|
||||||
- One record per input item. Same `stable_id` as in `new_items.json`.
|
- One record per input item (`items` + `reconsider`). For items, use the
|
||||||
|
same `stable_id` as in `new_items.json`. For reconsider entries, use the
|
||||||
|
entry's `canonical_id` from state as the record's `stable_id`.
|
||||||
- `canonical_id`: prefer the first `extracted_cves` entry if any; otherwise
|
- `canonical_id`: prefer the first `extracted_cves` entry if any; otherwise
|
||||||
the item's `stable_id`. **Use the same `canonical_id` for multiple items
|
the item's `stable_id`. **Use the same `canonical_id` for multiple items
|
||||||
that are really the same CVE from different sources** — the merge step
|
that are really the same CVE from different sources** — the merge step
|
||||||
will collapse them into one entry and add alias rows automatically.
|
will collapse them into one entry and add alias rows automatically.
|
||||||
|
- **Populate `extracted_cves` / `canonical_id` from context when the feed
|
||||||
|
didn't.** If the title, body, or a well-known transient-execution codename
|
||||||
|
mapping lets you identify a CVE the feed didn't emit (e.g., "Lazy FP
|
||||||
|
State Restore" → `CVE-2018-3665`, "LazyFP" → same, "FP-DSS" → whatever
|
||||||
|
CVE AMD/Intel assigned), put the CVE in `extracted_cves` and use it as
|
||||||
|
`canonical_id`. This prevents Intel's CVE-less listing entries from
|
||||||
|
creating orphan `INTEL-SA-NNNNN` records in the backlog.
|
||||||
- `sources` / `urls`: arrays; default to the item's own single source and
|
- `sources` / `urls`: arrays; default to the item's own single source and
|
||||||
permalink if you didn't enrich further.
|
permalink if you didn't enrich further.
|
||||||
- If `new_items.json.items` is empty, write `[]`.
|
- **`reconsider: true`** — set on every record that corresponds to an
|
||||||
|
input from the `reconsider` array. The merge step uses this flag to
|
||||||
|
overwrite the stored bucket instead of merging by "strongest wins" —
|
||||||
|
this is what enables demotions.
|
||||||
|
- If both `items` and `reconsider` are empty, write `[]`.
|
||||||
|
|
||||||
## Guardrails
|
## Guardrails
|
||||||
|
|
||||||
|
|||||||
@@ -362,6 +362,46 @@ def _resolve_window_hours() -> float:
|
|||||||
return float(DEFAULT_WINDOW_HOURS)
|
return float(DEFAULT_WINDOW_HOURS)
|
||||||
|
|
||||||
|
|
||||||
|
def backlog_to_reconsider(data: dict[str, Any]) -> list[dict[str, Any]]:
|
||||||
|
"""Walk state.seen and emit toimplement/tocheck entries for re-review.
|
||||||
|
|
||||||
|
Each entry carries enough context that Claude can re-grep ./checker/
|
||||||
|
and decide whether the prior classification still holds. Items in
|
||||||
|
`unrelated` are skipped — those are settled.
|
||||||
|
|
||||||
|
A CVE alias pointing at this canonical is included in `extracted_cves`
|
||||||
|
so Claude sees every known CVE for the item without having to consult
|
||||||
|
the full alias map.
|
||||||
|
"""
|
||||||
|
seen = data.get("seen", {})
|
||||||
|
aliases = data.get("aliases", {})
|
||||||
|
# Reverse-index aliases: canonical -> [alt, ...]
|
||||||
|
by_canonical: dict[str, list[str]] = {}
|
||||||
|
for alt, canon in aliases.items():
|
||||||
|
by_canonical.setdefault(canon, []).append(alt)
|
||||||
|
|
||||||
|
out: list[dict[str, Any]] = []
|
||||||
|
for canonical, rec in seen.items():
|
||||||
|
if rec.get("bucket") not in ("toimplement", "tocheck"):
|
||||||
|
continue
|
||||||
|
cves: list[str] = []
|
||||||
|
if canonical.startswith("CVE-"):
|
||||||
|
cves.append(canonical)
|
||||||
|
for alt in by_canonical.get(canonical, []):
|
||||||
|
if alt.startswith("CVE-") and alt not in cves:
|
||||||
|
cves.append(alt)
|
||||||
|
out.append({
|
||||||
|
"canonical_id": canonical,
|
||||||
|
"current_bucket": rec.get("bucket"),
|
||||||
|
"title": rec.get("title") or "",
|
||||||
|
"sources": list(rec.get("sources") or []),
|
||||||
|
"urls": list(rec.get("urls") or []),
|
||||||
|
"extracted_cves": cves,
|
||||||
|
"first_seen": rec.get("first_seen"),
|
||||||
|
})
|
||||||
|
return out
|
||||||
|
|
||||||
|
|
||||||
def candidate_ids(item: dict[str, Any]) -> list[str]:
|
def candidate_ids(item: dict[str, Any]) -> list[str]:
|
||||||
"""All identifiers under which this item might already be known."""
|
"""All identifiers under which this item might already be known."""
|
||||||
seen: set[str] = set()
|
seen: set[str] = set()
|
||||||
@@ -451,19 +491,25 @@ def main() -> int:
|
|||||||
# Persist updated HTTP cache metadata regardless of whether Claude runs.
|
# Persist updated HTTP cache metadata regardless of whether Claude runs.
|
||||||
state.save(data)
|
state.save(data)
|
||||||
|
|
||||||
|
reconsider = backlog_to_reconsider(data)
|
||||||
|
|
||||||
out = {
|
out = {
|
||||||
"scan_date": scan_date_iso,
|
"scan_date": scan_date_iso,
|
||||||
"window_cutoff": cutoff.isoformat(),
|
"window_cutoff": cutoff.isoformat(),
|
||||||
"per_source": per_source,
|
"per_source": per_source,
|
||||||
"items": all_new,
|
"items": all_new,
|
||||||
|
"reconsider": reconsider,
|
||||||
}
|
}
|
||||||
args.output.write_text(json.dumps(out, indent=2, sort_keys=True) + "\n")
|
args.output.write_text(json.dumps(out, indent=2, sort_keys=True) + "\n")
|
||||||
|
|
||||||
# GitHub Actions step outputs
|
# GitHub Actions step outputs. Downstream `if:` conditions gate the
|
||||||
|
# classify step on `new_count || reconsider_count`; both must be 0
|
||||||
|
# for Claude to be skipped.
|
||||||
gh_out = os.environ.get("GITHUB_OUTPUT")
|
gh_out = os.environ.get("GITHUB_OUTPUT")
|
||||||
if gh_out:
|
if gh_out:
|
||||||
with open(gh_out, "a") as f:
|
with open(gh_out, "a") as f:
|
||||||
f.write(f"new_count={len(all_new)}\n")
|
f.write(f"new_count={len(all_new)}\n")
|
||||||
|
f.write(f"reconsider_count={len(reconsider)}\n")
|
||||||
failures = [
|
failures = [
|
||||||
s for s, v in per_source.items()
|
s for s, v in per_source.items()
|
||||||
if not (isinstance(v["status"], int) and v["status"] in (200, 304))
|
if not (isinstance(v["status"], int) and v["status"] in (200, 304))
|
||||||
@@ -474,6 +520,7 @@ def main() -> int:
|
|||||||
print(f"Window: {window_hours:g} h")
|
print(f"Window: {window_hours:g} h")
|
||||||
print(f"Cutoff: {cutoff.isoformat()}")
|
print(f"Cutoff: {cutoff.isoformat()}")
|
||||||
print(f"New items: {len(all_new)}")
|
print(f"New items: {len(all_new)}")
|
||||||
|
print(f"Reconsider: {len(reconsider)} existing toimplement/tocheck entries")
|
||||||
for s, v in per_source.items():
|
for s, v in per_source.items():
|
||||||
print(f" {s:14s} status={str(v['status']):>16} new={v['new']}")
|
print(f" {s:14s} status={str(v['status']):>16} new={v['new']}")
|
||||||
|
|
||||||
|
|||||||
@@ -14,11 +14,22 @@ Each classification record has shape:
|
|||||||
"bucket": "toimplement|tocheck|unrelated",
|
"bucket": "toimplement|tocheck|unrelated",
|
||||||
"extracted_cves": ["...", ...], # optional
|
"extracted_cves": ["...", ...], # optional
|
||||||
"sources": ["...", ...], # optional
|
"sources": ["...", ...], # optional
|
||||||
"urls": ["...", ...] # optional
|
"urls": ["...", ...], # optional
|
||||||
|
"reconsider": true # optional; set by Claude for reconsidered
|
||||||
|
# backlog entries — merge overwrites
|
||||||
|
# the stored bucket (incl. demotions)
|
||||||
|
# instead of promoting
|
||||||
}
|
}
|
||||||
|
|
||||||
Behavior:
|
Behavior:
|
||||||
- Upsert seen[canonical_id], union sources/urls, promote bucket strength.
|
- For records WITHOUT `reconsider: true` (fresh items):
|
||||||
|
upsert seen[canonical_id], union sources/urls, promote bucket strength.
|
||||||
|
- For records WITH `reconsider: true` (previously-classified entries):
|
||||||
|
overwrite the stored bucket unconditionally (permits demotions), union
|
||||||
|
sources/urls. If Claude's canonical_id differs from the stable_id (the
|
||||||
|
previous canonical), rekey the seen entry under the new ID and leave
|
||||||
|
the old as an alias — used when a CVE has since been assigned to what
|
||||||
|
was previously a bare vendor-ID entry.
|
||||||
- For every alt_id in (stable_id, vendor_ids, extracted_cves) that differs
|
- For every alt_id in (stable_id, vendor_ids, extracted_cves) that differs
|
||||||
from canonical_id, set aliases[alt_id] = canonical_id.
|
from canonical_id, set aliases[alt_id] = canonical_id.
|
||||||
- Update last_run to SCAN_DATE.
|
- Update last_run to SCAN_DATE.
|
||||||
@@ -92,13 +103,24 @@ def merge(
|
|||||||
scan_date: str,
|
scan_date: str,
|
||||||
) -> None:
|
) -> None:
|
||||||
for rec in classifications:
|
for rec in classifications:
|
||||||
stable_id = rec.get("stable_id")
|
if not rec.get("stable_id"):
|
||||||
if not stable_id:
|
|
||||||
continue
|
continue
|
||||||
|
if rec.get("reconsider"):
|
||||||
|
_apply_reconsider(data, rec, scan_date)
|
||||||
|
else:
|
||||||
|
_apply_new_item(data, rec, new_items_by_stable_id, scan_date)
|
||||||
|
|
||||||
|
|
||||||
|
def _apply_new_item(
|
||||||
|
data: dict[str, Any],
|
||||||
|
rec: dict[str, Any],
|
||||||
|
new_items_by_stable_id: dict[str, dict[str, Any]],
|
||||||
|
scan_date: str,
|
||||||
|
) -> None:
|
||||||
|
stable_id = rec["stable_id"]
|
||||||
meta = new_items_by_stable_id.get(stable_id, {})
|
meta = new_items_by_stable_id.get(stable_id, {})
|
||||||
canonical = _canonical(rec, meta)
|
canonical = _canonical(rec, meta)
|
||||||
bucket = rec.get("bucket", "unrelated")
|
bucket = rec.get("bucket", "unrelated")
|
||||||
|
|
||||||
title = (meta.get("title") or "").strip()
|
title = (meta.get("title") or "").strip()
|
||||||
|
|
||||||
existing = data["seen"].get(canonical)
|
existing = data["seen"].get(canonical)
|
||||||
@@ -120,12 +142,80 @@ def merge(
|
|||||||
existing["sources"] = _unique(list(existing.get("sources") or []) + list(rec.get("sources") or []) + ([meta.get("source")] if meta.get("source") else []))
|
existing["sources"] = _unique(list(existing.get("sources") or []) + list(rec.get("sources") or []) + ([meta.get("source")] if meta.get("source") else []))
|
||||||
existing["urls"] = _unique(list(existing.get("urls") or []) + list(rec.get("urls") or []) + ([meta.get("permalink")] if meta.get("permalink") else []))
|
existing["urls"] = _unique(list(existing.get("urls") or []) + list(rec.get("urls") or []) + ([meta.get("permalink")] if meta.get("permalink") else []))
|
||||||
|
|
||||||
# Aliases: every alt id that is not the canonical key points at it.
|
|
||||||
for alt in _alt_ids(rec, meta):
|
for alt in _alt_ids(rec, meta):
|
||||||
if alt != canonical:
|
if alt != canonical:
|
||||||
data["aliases"][alt] = canonical
|
data["aliases"][alt] = canonical
|
||||||
|
|
||||||
|
|
||||||
|
def _apply_reconsider(
|
||||||
|
data: dict[str, Any],
|
||||||
|
rec: dict[str, Any],
|
||||||
|
scan_date: str,
|
||||||
|
) -> None:
|
||||||
|
"""Re-review of a previously-classified entry. The record's stable_id
|
||||||
|
is the entry's current canonical key in state; `canonical_id` may name
|
||||||
|
a new key (e.g. a freshly-assigned CVE) — in which case we rekey."""
|
||||||
|
old_key = rec["stable_id"]
|
||||||
|
new_canonical = _canonical(rec, None)
|
||||||
|
bucket = rec.get("bucket", "unrelated")
|
||||||
|
|
||||||
|
# Resolve the current record — may need to follow an alias if the
|
||||||
|
# backlog snapshot the classifier reviewed is slightly out of sync.
|
||||||
|
current_key = old_key if old_key in data["seen"] else data["aliases"].get(old_key)
|
||||||
|
if not current_key or current_key not in data["seen"]:
|
||||||
|
print(f"warning: reconsider record for {old_key!r} points at no "
|
||||||
|
f"state entry; skipping.", file=sys.stderr)
|
||||||
|
return
|
||||||
|
|
||||||
|
existing = data["seen"][current_key]
|
||||||
|
|
||||||
|
# Overwrite bucket unconditionally (allows demotions) and stamp the
|
||||||
|
# reconsideration date so we can later throttle if this grows.
|
||||||
|
existing["bucket"] = bucket
|
||||||
|
existing["seen_at"] = scan_date
|
||||||
|
existing["reconsidered_at"] = scan_date
|
||||||
|
|
||||||
|
# Union any fresh sources/urls the classifier surfaced.
|
||||||
|
if rec.get("sources"):
|
||||||
|
existing["sources"] = _unique(list(existing.get("sources") or []) + list(rec["sources"]))
|
||||||
|
if rec.get("urls"):
|
||||||
|
existing["urls"] = _unique(list(existing.get("urls") or []) + list(rec["urls"]))
|
||||||
|
|
||||||
|
# Alias every alt ID the classifier provided to the current key
|
||||||
|
# (before a possible rekey below redirects them).
|
||||||
|
for alt in _alt_ids(rec, None):
|
||||||
|
if alt != current_key:
|
||||||
|
data["aliases"][alt] = current_key
|
||||||
|
|
||||||
|
# Rekey if Claude newly identified a canonical ID (e.g., a CVE for a
|
||||||
|
# vendor-ID entry). If the destination already exists, merge; else
|
||||||
|
# move. In both cases, retarget all aliases and leave the old key
|
||||||
|
# itself as an alias.
|
||||||
|
if new_canonical and new_canonical != current_key:
|
||||||
|
if new_canonical in data["seen"]:
|
||||||
|
dest = data["seen"][new_canonical]
|
||||||
|
dest["bucket"] = state.promote_bucket(dest.get("bucket", "unrelated"), existing.get("bucket", "unrelated"))
|
||||||
|
dest["sources"] = _unique(list(dest.get("sources") or []) + list(existing.get("sources") or []))
|
||||||
|
dest["urls"] = _unique(list(dest.get("urls") or []) + list(existing.get("urls") or []))
|
||||||
|
if not dest.get("title") and existing.get("title"):
|
||||||
|
dest["title"] = existing["title"]
|
||||||
|
dest["seen_at"] = scan_date
|
||||||
|
dest["reconsidered_at"] = scan_date
|
||||||
|
dest.setdefault("first_seen", existing.get("first_seen") or scan_date)
|
||||||
|
del data["seen"][current_key]
|
||||||
|
else:
|
||||||
|
data["seen"][new_canonical] = existing
|
||||||
|
del data["seen"][current_key]
|
||||||
|
|
||||||
|
for alias_key, target in list(data["aliases"].items()):
|
||||||
|
if target == current_key:
|
||||||
|
data["aliases"][alias_key] = new_canonical
|
||||||
|
data["aliases"][current_key] = new_canonical
|
||||||
|
# Clean up any self-aliases the retarget may have produced.
|
||||||
|
for k in [k for k, v in data["aliases"].items() if k == v]:
|
||||||
|
del data["aliases"][k]
|
||||||
|
|
||||||
|
|
||||||
def ensure_stub_reports(scan_date: str) -> None:
|
def ensure_stub_reports(scan_date: str) -> None:
|
||||||
"""If the Claude step was skipped, write empty stub watch_*.md files so the
|
"""If the Claude step was skipped, write empty stub watch_*.md files so the
|
||||||
report artifact is consistent across runs."""
|
report artifact is consistent across runs."""
|
||||||
|
|||||||
Reference in New Issue
Block a user