mirror of
https://github.com/speed47/spectre-meltdown-checker.git
synced 2026-04-02 13:17:07 +02:00
678 lines
38 KiB
Markdown
678 lines
38 KiB
Markdown
# Project Overview
|
||
|
||
spectre-meltdown-checker is a single self-contained shell script (`spectre-meltdown-checker.sh`) that detects system vulnerability to several transient execution CPU CVEs (Spectre, Meltdown, and related). It supports Linux and BSD (FreeBSD, NetBSD, DragonFlyBSD) on x86, amd64, ARM, and ARM64.
|
||
|
||
The script must stay POSIX-compatible, and not use features only available in specific shells such as `bash` or `zsh`. The `local` keyword is accepted however.
|
||
|
||
## Project Mission
|
||
|
||
This tool exists to give system administrators simple, actionable answers to two questions:
|
||
|
||
1. **Am I vulnerable?**
|
||
2. **What do I have to do to mitigate these vulnerabilities on my system?**
|
||
|
||
The script does not run exploits and cannot guarantee security. It reports whether a system is **affected**, **vulnerable**, or **mitigated** against known transient execution vulnerabilities, and provides detailed insight into the prerequisites for full mitigation (microcode, kernel, hypervisor, etc.).
|
||
|
||
### Why this tool still matters
|
||
|
||
Even though the Linux `sysfs` hierarchy (`/sys/devices/system/cpu/vulnerabilities/`) now reports mitigation status for most vulnerabilities, this script provides value beyond what `sysfs` offers:
|
||
|
||
- **Independent of kernel knowledge**: A given kernel only understands vulnerabilities known at compile time. This script's detection logic is maintained independently, so it can identify gaps a kernel doesn't yet know about.
|
||
- **Detailed prerequisite breakdown**: Mitigating a vulnerability can involve multiple layers (microcode, host kernel, hypervisor, guest kernel, software). The script shows exactly which pieces are in place and which are missing.
|
||
- **Offline kernel analysis**: The script can inspect a kernel image before it is booted (`--kernel`, `--config`, `--map`), verifying it carries the expected mitigations.
|
||
- **Backport-aware**: It detects actual capabilities rather than checking version strings, so it works correctly with vendor kernels that silently backport or forward-port patches.
|
||
- **Covers gaps in sysfs**: Some vulnerabilities (e.g. Zenbleed) are not reported through `sysfs` at all.
|
||
|
||
### Terminology
|
||
|
||
These terms have precise meanings throughout the codebase and output:
|
||
|
||
- **Affected**: The CPU hardware, as shipped from the factory, is known to be concerned by a vulnerability. Says nothing about whether the vulnerability is currently exploitable.
|
||
- **Vulnerable**: The system uses an affected CPU *and* has no (or insufficient) mitigations in place, meaning the vulnerability can be exploited.
|
||
- **Mitigated**: A previously vulnerable system has all required layers updated so the vulnerability cannot be exploited.
|
||
|
||
## Branch Model
|
||
|
||
The project uses 4 branches organized in two pipelines (production and dev/test). Developers work on the source branches; CI builds the monolithic script and pushes it to the corresponding output branch.
|
||
|
||
| Branch | Contents | Pushed by |
|
||
|--------|----------|-----------|
|
||
| **`test`** | Dev/test source (split files + Makefile) | Developers |
|
||
| **`test-build`** | Monolithic test script (built artifact) | CI from `test` |
|
||
| **`source`** | Production source (split files + Makefile) | Developers |
|
||
| **`source-build`** | Monolithic test script (built artifact) | CI from `source` |
|
||
| **`master`** | Monolithic production script (built artifact) | PR by developers from `source-build` |
|
||
|
||
- **`source`** and **`test`** contain the split source files and the Makefile. These are the branches developers commit to.
|
||
- **`master`**, **`source-build`** and **`test-build`** contain only the monolithic `spectre-meltdown-checker.sh` built by CI. Nobody commits to these directly.
|
||
- **`master`** is the preexisting production branch that users pull from. It cannot be renamed.
|
||
- **`test-build`** is a testing branch that users can pull from to test pre-release versions.
|
||
- **`source-build`** is a preprod branch to prepare the artifact before merging to **`master`**.
|
||
|
||
Typical workflow:
|
||
1. Feature/fix branches are created from `test` and merged back into `test`.
|
||
2. CI builds the script and pushes it to `test-build` for testing.
|
||
3. When ready for release, `test` is merged into `source`.
|
||
4. CI builds the script and pushes it to `source-build` for production.
|
||
5. Developer creates a PR from `source-build` to `master`.
|
||
|
||
## Versioning
|
||
|
||
The project follows semantic versioning in the format `X.Y.Z`:
|
||
|
||
- **X** = the current year, in `YY` format.
|
||
- **Y** = the number of CVEs supported by the script, which corresponds to the number of files under `src/vulns/`.
|
||
- **Z** = `MMDDVAL`, where `MMDD` is the UTC build date and `VAL` is a 3-digit value (000–999) that increases monotonically throughout the day, computed as `seconds_since_midnight_UTC * 1000 / 86400`.
|
||
|
||
The version is patched automatically by `build.sh` into the `VERSION=` variable of the assembled script. The source file (`src/libs/001_core_header.sh`) carries a placeholder value that is overwritten at build time.
|
||
|
||
## Linting and Testing
|
||
|
||
```bash
|
||
# Assemble the final script
|
||
make build
|
||
|
||
# Lint the generated script
|
||
make fmt-check shellcheck
|
||
|
||
# Run the script (requires root for full results)
|
||
sudo ./spectre-meltdown-checker.sh
|
||
|
||
# Run specific tests that we might have just added (variant name)
|
||
sudo ./spectre-meltdown-checker.sh --variant l1tf --variant taa
|
||
|
||
# Run specific tests that we might have just added (CVE name)
|
||
sudo ./spectre-meltdown-checker.sh --cve CVE-2018-3640 --cve CVE-2022-40982
|
||
|
||
# Batch JSON mode (CI validates exactly 19 CVEs in output)
|
||
sudo ./spectre-meltdown-checker.sh --batch json | jq '.[] | .CVE' | wc -l # must be 19
|
||
|
||
# Update microcode firmware database
|
||
sudo ./spectre-meltdown-checker.sh --update-fwdb
|
||
|
||
# Docker
|
||
docker-compose build && docker-compose run --rm spectre-meltdown-checker
|
||
```
|
||
|
||
There is no separate test suite. CI (`.github/workflows/check.yml`) runs shellcheck, tab-indentation checks, a live execution test validating 19 CVEs, Docker builds, and a firmware DB update test that checks for temp file leaks.
|
||
|
||
## Architecture
|
||
|
||
The entire tool is a single bash script with no external script dependencies. Key structural sections:
|
||
|
||
- **Output/logging functions** (~line 253): `pr_warn`, `pr_info`, `pr_verbose`, `pr_debug`, `explain`, `pstatus`, `pvulnstatus` - verbosity-aware output with color support
|
||
- **CPU detection** (~line 2171): `parse_cpu_details`, `is_intel`/`is_amd`/`is_hygon`, `read_cpuid`, `read_msr`, `is_cpu_smt_enabled` - hardware identification via CPUID/MSR registers
|
||
- **Microcode database** (embedded): Intel/AMD microcode version lookup via `read_mcedb`/`read_inteldb`; updated automatically via `.github/workflows/autoupdate.yml`
|
||
- **Kernel analysis** (~line 1568): `extract_kernel`, `try_decompress` - extracts and inspects kernel images (handles gzip, bzip2, xz, lz4, zstd compression)
|
||
- **Vulnerability checks**: 19 `check_CVE_<year>_<number>()` functions, each with `_linux()` and `_bsd()` variants. Uses whitelist logic (assumes affected unless proven otherwise)
|
||
- **Main flow** (~line 6668): Parse options → detect CPU → loop through requested CVEs → output results (text/json/nrpe/prometheus) → cleanup
|
||
|
||
## Key Design Principles
|
||
|
||
These rules are non-negotiable and govern how every part of the script is written:
|
||
|
||
### 1. Production-safe
|
||
|
||
It must always be okay to run this script in a production environment.
|
||
|
||
- **1a. Non-destructive**: Never modify the system. If the script loads a kernel module it needs (e.g. `cpuid`, `msr`), it must unload it on exit.
|
||
- **1b. Report only**: Never attempt to "fix" or "mitigate" any vulnerability, or modify any configuration. The script reports status and leaves all decisions to the sysadmin.
|
||
- **1c. No exploit execution**: Never run any kind of exploit or proof-of-concept. This would violate rule 1a, could cause unpredictable system behavior, and may produce wrong conclusions (especially for Spectre-class PoCs that require very specific build options and prerequisites).
|
||
|
||
### 2. Never hardcode kernel versions
|
||
|
||
Never look at the kernel version string to determine whether it supports a mitigation. This would defeat the script's purpose: it must detect mitigations in unknown, vendor-patched, or backported kernels. Similarly, do not blindly trust what `sysfs` reports when it is possible to verify directly.
|
||
|
||
### 3. Never hardcode microcode versions
|
||
|
||
Never look at the microcode version to determine whether it has the proper mitigation mechanisms. Instead, probe for the mechanisms themselves (CPUID bits, MSR values), as the kernel would.
|
||
|
||
### 4. Assume affected unless proven otherwise (whitelist approach)
|
||
|
||
When a CPU is not explicitly known to be unaffected by a vulnerability, assume that it is affected. This conservative default has been the right call since the early Spectre/Meltdown days and remains sound.
|
||
|
||
### 5. Offline mode
|
||
|
||
The script can analyze a non-running kernel via `--kernel`, `--config`, `--map` flags, allowing verification before deployment.
|
||
|
||
## CVE Inclusion Criteria
|
||
|
||
A vulnerability should be supported by this tool when mitigating it requires **kernel modifications**, **microcode modifications**, or **both**.
|
||
|
||
A vulnerability is **out of scope** when:
|
||
|
||
- Mitigation is handled entirely by a driver or userspace software update (e.g. CVE-2019-14615, which requires an Intel driver update).
|
||
- The vulnerability is a regression from a bad backport and cannot be detected without hardcoding kernel versions (violates rule 2).
|
||
- The vendor has determined it is not a new attack and issued no kernel or microcode changes, leaving nothing for the script to check.
|
||
- The industry has collectively decided not to address the vulnerability (no mitigations exist), leaving nothing to verify.
|
||
|
||
When evaluating whether to add a new CVE, check the [information-tagged issues](https://github.com/speed47/spectre-meltdown-checker/issues?q=is%3Aissue+label%3Ainformation) for prior discussion and precedent.
|
||
|
||
## POSIX Compliance
|
||
|
||
The script must run on both Linux and BSD systems (FreeBSD, NetBSD, DragonFlyBSD). This means all external tool invocations must use only POSIX-specified options. Many tools have GNU extensions that are not available on BSD, or BSD extensions that are not available on GNU/Linux. When in doubt, test on both.
|
||
|
||
Common traps to avoid:
|
||
|
||
| Tool | Non-portable usage | Portable alternative |
|
||
|------|--------------------|----------------------|
|
||
| `sed` | `-r` (GNU extended regex flag) | `-E` (accepted by both GNU and BSD) |
|
||
| `grep` | `-P` (Perl regex, GNU only) | Use `awk` or rework the pattern |
|
||
| `sort` | `-V` (version sort, GNU only) | Extract numeric fields and compare with `awk` or shell arithmetic |
|
||
| `cut` | `-w` (whitespace delimiter, BSD only) | `awk '{print $N}'` |
|
||
| `stat` | `-c %Y` (GNU format) | Try GNU first, fall back to BSD: `stat -c %Y ... 2>/dev/null \|\| stat -f %m ...` |
|
||
| `date` | `-d @timestamp` (GNU only) | Try GNU first, fall back to BSD: `date -d @ts ... 2>/dev/null \|\| date -r ts ...` |
|
||
| `xargs` | `-r` (no-op if empty, GNU only) | Guard with a prior `[ -n "..." ]` check, or accept the harmless empty invocation |
|
||
| `readlink` | `-f` (canonicalize, GNU only) | Use only in Linux-specific code paths, or reimplement with `cd`/`pwd` |
|
||
| `dd` | `iflag=`, `oflag=` (GNU only) | Use only in Linux-specific code paths (e.g. `/dev/cpu/*/msr`) |
|
||
|
||
When a tool genuinely has no portable equivalent, restrict the non-portable call to a platform-specific code path (i.e. inside a BSD-only or Linux-only branch) and document why.
|
||
|
||
## Return Codes
|
||
|
||
0 = not vulnerable, 2 = vulnerable, 3 = unknown, 255 = error
|
||
|
||
## Variable naming conventions
|
||
|
||
This script uses the following naming rules for variables:
|
||
|
||
`UPPER_SNAKE_CASE` : Constants and enums (e.g. READ_MSR_RET_OK, EAX), declared with `readonly` on the assignment line (e.g. `readonly FOO="bar"`).
|
||
When they're used as values affected to "Out-parameters" of a function, they should follow the `<FUNC>_RET_*` pattern.
|
||
Such variables should be declared right above the definition of the function they're dedicated to.
|
||
Other general constants go at the top of the file, below the `VERSION` affectation.
|
||
`opt_*` : Command-line options set during argument parsing (e.g. opt_verbose, opt_batch).
|
||
`cpu_*` : CPU identification/state filled by parse_cpu_details() (e.g. cpu_family, cpu_model).
|
||
`cap_*` : CPU capability flags read from hardware/firmware (e.g. cap_verw_clear, cap_rdcl_no).
|
||
All `cap_*` variables are set in `check_cpu()`. They come in two flavors:
|
||
- **Immunity bits** (`cap_*_no`): The CPU vendor declares this hardware is not affected by a vulnerability.
|
||
The `_no` suffix mirrors the vendor's own bit naming (e.g. RDCL_NO, GDS_NO, TSA_SQ_NO).
|
||
These are consumed in `is_cpu_affected()` to mark a CPU as immune.
|
||
- **Mitigation bits** (all other `cap_*`): Microcode or hardware provides a mechanism to work around
|
||
a vulnerability the CPU *does* have (e.g. cap_verw_clear, cap_ibrs, cap_ssbd).
|
||
These are consumed in `check_CVE_*_linux()` functions to assess mitigation status.
|
||
`affected_*` : Per-CVE vulnerability status from is_cpu_affected() (e.g. affected_l1tf).
|
||
`ret_<func>_*` : "Out-parameters" set by a function for its caller (e.g. ret_read_cpuid_value, ret_read_msr_msg).
|
||
The <func> matches the function name so ownership is obvious, these variables can't be written
|
||
to by any other function than <func>, nor by toplevel.
|
||
`g_*` : Other global (i.e. non-`local`) variables that don't match cases previously described.
|
||
`<name>` : Scratch/temporary variables inside functions (e.g. core, msg, col).
|
||
These must be declared as `local`. These must not match any naming pattern above.
|
||
Any variable that is only used in the scope of a given function falls in this category.
|
||
|
||
Additionally, all vars must start with a [a-z] character, never by an underscore.
|
||
|
||
## Function naming conventions
|
||
|
||
Functions follow two naming tiers:
|
||
|
||
`public_function` : Top-level functions called directly from the main flow or from other public functions.
|
||
Examples: `parse_cpu_details`, `read_cpuid`, `check_CVE_2017_5754`.
|
||
|
||
`_private_function` : Utility/helper functions that exist solely to factorize code shared by other functions.
|
||
These must never be called directly from the top-level main flow.
|
||
Examples: `_echo`, `_emit_json`, `_cve_registry_field`.
|
||
|
||
## How to Implement a New CVE Check
|
||
|
||
Adding a new CVE follows a fixed pattern. Every check uses the same three-function structure and the same decision algorithm. This section walks through both.
|
||
|
||
### Prerequisites
|
||
|
||
Before writing code, verify the CVE meets the inclusion criteria (see "CVE Inclusion Criteria" above). The vulnerability must require kernel and/or microcode changes to mitigate.
|
||
|
||
### Step 1: Create the Vulnerability File
|
||
|
||
Create `src/vulns/CVE-YYYY-NNNNN.sh`. The file must contain exactly three functions:
|
||
|
||
```sh
|
||
# vim: set ts=4 sw=4 sts=4 et:
|
||
####################
|
||
# SHORT_NAME section
|
||
|
||
# CVE-YYYY-NNNNN SHORT_NAME (one-line description) - entry point
|
||
check_CVE_YYYY_NNNNN() {
|
||
check_cve 'CVE-YYYY-NNNNN'
|
||
}
|
||
|
||
# CVE-YYYY-NNNNN SHORT_NAME (one-line description) - Linux mitigation check
|
||
check_CVE_YYYY_NNNNN_linux() {
|
||
# ... (see Step 3)
|
||
}
|
||
|
||
# CVE-YYYY-NNNNN SHORT_NAME (one-line description) - BSD mitigation check
|
||
check_CVE_YYYY_NNNNN_bsd() {
|
||
if ! is_cpu_affected "$cve"; then
|
||
pvulnstatus "$cve" OK "your CPU vendor reported your CPU model as not affected"
|
||
else
|
||
pvulnstatus "$cve" UNK "your CPU is affected, but mitigation detection has not yet been implemented for BSD in this script"
|
||
fi
|
||
}
|
||
```
|
||
|
||
The entry point calls `check_cve`, which prints the CVE header and dispatches to `_linux()` or `_bsd()` based on `$g_os`. If BSD mitigations are not yet understood, use the stub above - it correctly reports UNK rather than a false OK.
|
||
|
||
### Step 2: Register the CVE in the CPU Affection Logic
|
||
|
||
In `src/libs/200_cpu_affected.sh`, add an `affected_yourname` variable and populate it inside `is_cpu_affected()`. The variable follows the whitelist principle: **assume affected (`1`) unless you can prove the CPU is immune (`0`)**. Two kinds of evidence can prove immunity:
|
||
|
||
- **Static identifiers**: CPU vendor, family, model, stepping - these identify the hardware design.
|
||
- **Hardware immunity `cap_*` bits**: CPUID or MSR bits that the CPU vendor defines to explicitly declare "this hardware is not affected" (e.g. `cap_rdcl_no` for Meltdown, `cap_ssb_no` for Variant 4, `cap_gds_no` for Downfall, `cap_tsa_sq_no`/`cap_tsa_l1_no` for TSA). These are read in `check_cpu()` and stored as `cap_*` globals.
|
||
|
||
Never use microcode version strings.
|
||
|
||
**Important**: Do not confuse hardware immunity bits with *mitigation* capability bits. A hardware immunity bit (e.g. `GDS_NO`, `TSA_SQ_NO`) declares that the CPU design is architecturally free of the vulnerability - it belongs here in `is_cpu_affected()`. A mitigation capability bit (e.g. `VERW_CLEAR`, `MD_CLEAR`) indicates that updated microcode provides a mechanism to work around a vulnerability the CPU *does* have - it belongs in the `check_CVE_YYYY_NNNNN_linux()` function (Phase 2), where it is used to determine whether mitigations are in place.
|
||
|
||
### Step 3: Implement the Linux Check
|
||
|
||
The `_linux()` function follows a standard algorithm with four phases:
|
||
|
||
**Phase 1 - Initialize and check sysfs:**
|
||
|
||
```sh
|
||
check_CVE_YYYY_NNNNN_linux() {
|
||
local status sys_interface_available msg
|
||
status=UNK
|
||
sys_interface_available=0
|
||
msg=''
|
||
if sys_interface_check "$VULN_SYSFS_BASE/vuln_name"; then
|
||
sys_interface_available=1
|
||
status=$ret_sys_interface_check_status
|
||
fi
|
||
```
|
||
|
||
`sys_interface_check` reads `/sys/devices/system/cpu/vulnerabilities/<name>` and parses the kernel's own assessment into `ret_sys_interface_check_status` (OK/VULN/UNK) and `ret_sys_interface_check_fullmsg`. If the sysfs file doesn't exist (older kernel, or the CVE predates kernel awareness), it returns false and `sys_interface_available` stays 0.
|
||
|
||
**Phase 2 - Custom detection (kernel + runtime):**
|
||
|
||
Guarded by `if [ "$opt_sysfs_only" != 1 ]; then` so users who trust sysfs can skip it.
|
||
|
||
This is where the real detection lives. Check for mitigations at each layer:
|
||
|
||
- **Kernel support**: Determine whether the kernel carries the mitigation code. Three sources of evidence are available, and any one of them is sufficient:
|
||
|
||
- **Kernel image** (`$g_kernel`): Search for strings or symbols that prove the mitigation code is compiled in.
|
||
```sh
|
||
if grep -q 'mitigation_string' "$g_kernel"; then
|
||
kernel_mitigated="found mitigation evidence in kernel image"
|
||
fi
|
||
```
|
||
Guard with `if [ -n "$g_kernel_err" ]; then` first - the kernel image may be unavailable.
|
||
|
||
- **Kernel config** (`$g_kernel_config`): Look for the `CONFIG_*` option that enables the mitigation.
|
||
```sh
|
||
if [ -n "$g_kernel_config" ] && grep -q '^CONFIG_MITIGATION_NAME=y' "$g_kernel_config"; then
|
||
kernel_mitigated="found mitigation config option enabled"
|
||
fi
|
||
```
|
||
|
||
- **System.map** (`$g_kernel_map`): Look for function names directly linked to the mitigation.
|
||
```sh
|
||
if [ -n "$g_kernel_map" ] && grep -q 'mitigation_function_name' "$g_kernel_map"; then
|
||
kernel_mitigated="found mitigation function in System.map"
|
||
fi
|
||
```
|
||
|
||
Each source may independently be unavailable (offline mode without the file, or stripped kernel), so check all that are present. A match in any one confirms kernel support.
|
||
|
||
- **Runtime state** (live mode only): Read MSRs, check cpuinfo flags, parse dmesg, inspect debugfs.
|
||
```sh
|
||
if [ "$opt_live" = 1 ]; then
|
||
read_msr 0xADDRESS
|
||
ret=$?
|
||
if [ "$ret" = "$READ_MSR_RET_OK" ]; then
|
||
# check specific bits in ret_read_msr_value_lo / ret_read_msr_value_hi
|
||
fi
|
||
else
|
||
pstatus blue N/A "not testable in offline mode"
|
||
fi
|
||
```
|
||
|
||
- **Microcode capabilities**: Check CPUID bits or MSR flags that indicate the CPU firmware supports the mitigation. Never compare microcode version numbers directly.
|
||
|
||
Close the `opt_sysfs_only` block with the forced-sysfs fallback:
|
||
```sh
|
||
elif [ "$sys_interface_available" = 0 ]; then
|
||
msg="/sys vulnerability interface use forced, but it's not available!"
|
||
status=UNK
|
||
fi
|
||
```
|
||
|
||
**Phase 3 - CPU affection gate:**
|
||
|
||
```sh
|
||
if ! is_cpu_affected "$cve"; then
|
||
pvulnstatus "$cve" OK "your CPU vendor reported your CPU model as not affected"
|
||
```
|
||
|
||
If the CPU is not affected, nothing else matters - report OK and return. This overrides any sysfs or custom detection result.
|
||
|
||
**Phase 4 - Final status determination:**
|
||
|
||
For affected CPUs, combine the evidence from Phase 2 into a final verdict. The dispatch
|
||
works through `msg`: if Phase 1 (sysfs) or a sysfs override set `msg` to non-empty, use
|
||
it directly; otherwise run own logic or fall back to the raw sysfs result.
|
||
|
||
```sh
|
||
elif [ -z "$msg" ]; then
|
||
# msg is empty: sysfs either wasn't available, or gave a standard
|
||
# response that wasn't overridden. Use our own logic when we have it.
|
||
if [ "$opt_sysfs_only" != 1 ]; then
|
||
# --- own logic using Phase 2 variables ---
|
||
if [ "$microcode_ok" = 1 ] && [ -n "$kernel_mitigated" ]; then
|
||
pvulnstatus "$cve" OK "Both kernel and microcode mitigate the vulnerability"
|
||
else
|
||
pvulnstatus "$cve" VULN "Neither kernel nor microcode mitigate the vulnerability"
|
||
explain "Remediation advice here..."
|
||
fi
|
||
else
|
||
# --sysfs-only: Phase 2 variables are unset, fall back to the
|
||
# raw sysfs result (status + fullmsg were set in Phase 1).
|
||
pvulnstatus "$cve" "$status" "$ret_sys_interface_check_fullmsg"
|
||
fi
|
||
else
|
||
# msg was explicitly set - either by the "sysfs not available" elif
|
||
# above, or by a sysfs override in Phase 1. Use it as-is.
|
||
pvulnstatus "$cve" "$status" "$msg"
|
||
fi
|
||
}
|
||
```
|
||
|
||
The `opt_sysfs_only` guard inside the `[ -z "$msg" ]` branch is **critical**: without it,
|
||
`--sysfs-only` mode would fall into own-logic with all Phase 2 variables unset, producing
|
||
wrong results. The `else` at line `pvulnstatus "$cve" "$status" "$ret_sys_interface_check_fullmsg"`
|
||
is safe because it is only reachable when sysfs was available (if it wasn't, the "sysfs not
|
||
available" `elif` at the end of Phase 2 would have set `msg`, sending us to the other branch).
|
||
|
||
The exact combination logic depends on the CVE. Some require **both** microcode and kernel fixes (report VULN if either is missing). Others are mitigated by **either** layer alone (report OK if one is present). Some also require SMT to be disabled - check with `is_cpu_smt_enabled()`.
|
||
|
||
**Sysfs overrides:** When the kernel's sysfs reporting is known to be incorrect for certain
|
||
messages (e.g. old kernels misclassifying a partial mitigation as fully mitigated), add an
|
||
override in Phase 1 after `sys_interface_check` returns. The override sets both `status` and
|
||
`msg`, which routes Phase 4 to the `else` branch - bypassing own logic entirely. This is
|
||
correct because the override and own logic will always agree on the verdict. Example:
|
||
|
||
```sh
|
||
if sys_interface_check "$VULN_SYSFS_BASE/vuln_name"; then
|
||
sys_interface_available=1
|
||
status=$ret_sys_interface_check_status
|
||
# Override: old kernels (before <commit>) incorrectly reported this as mitigated
|
||
if echo "$ret_sys_interface_check_fullmsg" | grep -qi 'Mitigation:.*partial mitigation.*missing piece'; then
|
||
status=VULN
|
||
msg="Vulnerable: partial mitigation, missing piece (your kernel incorrectly reports this as mitigated, it was fixed in more recent kernels)"
|
||
fi
|
||
fi
|
||
```
|
||
|
||
When adding a sysfs override, also add an `explain` call in the `else` branch of Phase 4
|
||
(where `msg` is non-empty) to tell the user why the kernel says "Mitigated" while the script
|
||
reports vulnerable. Additionally, in Phase 2, add a kernel-image grep to inform the user
|
||
whether their kernel has the corrected reporting (the post-fix kernel will contain the new
|
||
vulnerability string in its image).
|
||
|
||
**Sysfs message inventory:** Before writing Phase 1 (and any sysfs overrides), audit **every
|
||
version** of the sysfs message that the kernel has ever produced for this vulnerability. The
|
||
script may run on any kernel - from early release candidates that first introduced the sysfs
|
||
file, through every stable release, up to the latest mainline. The inventory must catalogue
|
||
every string variant, including:
|
||
|
||
- Messages that only existed briefly between two commits in the same release cycle.
|
||
- Format changes (e.g. field reordering, renamed labels).
|
||
- New states added in later kernels (e.g. new flush modes, new mitigation strategies).
|
||
- Reporting corrections where a later kernel changed its assessment of what counts as
|
||
mitigated (e.g. a message that said `"Mitigation: ..."` in kernel A is reclassified as
|
||
`"Vulnerable: ..."` in kernel B under the same conditions).
|
||
|
||
Document all discovered variants as comments in the CVE file, grouped by the kernel commit
|
||
that introduced or changed them, so future readers can understand the evolution at a glance.
|
||
See `src/vulns/CVE-2018-3646.sh` (Phase 1 comment block) for a reference example.
|
||
|
||
This inventory matters because later kernels may have a different - and more accurate - view
|
||
of what is vulnerable versus mitigated for a given vulnerability, as understanding progresses
|
||
over time. The script must be able to reach the same conclusions as the most recent kernel,
|
||
even when running under an old kernel that misreports a vulnerability as mitigated. This is
|
||
exactly what sysfs overrides (described above) are for: when the inventory reveals that an
|
||
old kernel's message is now known to be wrong, add an override in Phase 1 to correct the
|
||
status, and use the Phase 2 kernel-image grep to tell the user whether their kernel has the
|
||
corrected reporting.
|
||
|
||
**How to build the inventory - git blame walkback method:**
|
||
|
||
The goal is to find every commit that changed the sysfs output strings for a given
|
||
vulnerability. The method uses `git blame` iteratively, walking backwards through history
|
||
until the vulnerability's sysfs reporting no longer exists.
|
||
|
||
1. **Locate the output function.** Most vulnerability sysfs files are generated from
|
||
`arch/x86/kernel/cpu/bugs.c`. Find the `*_show_state()` function for the vulnerability
|
||
(e.g. `l1tf_show_state()`, `mds_show_state()`) and the corresponding `case X86_BUG_*`
|
||
in `cpu_show_common()`. Both paths can produce messages: the show_state function handles
|
||
the mitigated cases, while `cpu_show_common()` handles `"Not affected"` (common to all
|
||
bugs) and `"Vulnerable"` (fallthrough). Some vulnerabilities also use string arrays
|
||
(e.g. `l1tf_vmx_states[]`, `spectre_v1_strings[]`) - include those in the audit.
|
||
|
||
2. **Blame the current code.** Run `git blame` on the relevant line range:
|
||
|
||
```
|
||
git blame -L<start>,<end> arch/x86/kernel/cpu/bugs.c
|
||
```
|
||
|
||
For each line that contributes to the sysfs output (format strings, string arrays, enum
|
||
lookups, conditional branches that select different messages), note the commit hash.
|
||
|
||
3. **Walk back one commit at a time.** For each commit found in step 2, check the state of
|
||
the file **before** that commit to see what changed:
|
||
|
||
```
|
||
git show <commit>^:arch/x86/kernel/cpu/bugs.c | grep -n -A10 '<function_name>'
|
||
```
|
||
|
||
Compare the output strings, format patterns, and conditional logic with the version after
|
||
the commit. Record any differences: added/removed/renamed states, reordered fields,
|
||
changed conditions.
|
||
|
||
4. **Repeat until the vulnerability disappears.** Take the oldest commit found and check the
|
||
parent. Eventually you reach a version where the `case X86_BUG_*` for this vulnerability
|
||
does not exist - that is the boundary.
|
||
|
||
5. **Watch for non-obvious string changes.** Some commits change the output without touching
|
||
the format strings themselves:
|
||
- **Condition changes**: A commit may change *when* a branch is taken (e.g. switching from
|
||
`cpu_smt_control == CPU_SMT_ENABLED` to `sched_smt_active()`), which changes which
|
||
message appears for the same hardware state, even though the strings are identical.
|
||
- **Enum additions**: A new entry in a string array (e.g. adding `"flush not necessary"` to
|
||
`l1tf_vmx_states[]`) adds a new possible message without changing the format string.
|
||
- **Early returns**: Adding or removing an early-return path changes which messages are
|
||
reachable (e.g. returning `L1TF_DEFAULT_MSG` for `FLUSH_AUTO` before reaching the VMX
|
||
format string).
|
||
- **Mechanical changes**: `sprintf` → `sysfs_emit`, `const` qualifications, whitespace
|
||
reformats - these do not change strings and can be noted briefly or omitted.
|
||
|
||
6. **Cross-check with `git log`.** After the blame walkback, run a targeted `git log` to
|
||
confirm no commits were missed:
|
||
|
||
```
|
||
git log --all --oneline -- arch/x86/kernel/cpu/bugs.c | xargs -I{} \
|
||
sh -c 'git show {} -- arch/x86/kernel/cpu/bugs.c | grep -q "<vuln_name>" && echo {}'
|
||
```
|
||
|
||
Any commit that touches lines mentioning the vulnerability name should already be in
|
||
your inventory. If one is missing, inspect it.
|
||
|
||
7. **Audit the stable tree.** After completing the mainline inventory, repeat the process on
|
||
the linux-stable repository (`~/linux-stable`). Stable/LTS branches can carry backports
|
||
that differ from mainline in subtle ways:
|
||
|
||
- **Partial backports**: A stable branch may backport the mitigation but not the VMX
|
||
reporting, producing a simpler set of messages than mainline (e.g. 4.4.y has l1tf's
|
||
`"PTE Inversion"` but no VMX flush state reporting at all).
|
||
- **Stable-only commits**: Maintainers sometimes make stable-specific changes that never
|
||
existed in mainline (e.g. renaming a string to match upstream without backporting the
|
||
full commit that originally renamed it).
|
||
- **Backport batching**: Multiple mainline commits may land in the same stable release,
|
||
meaning intermediate formats (that existed briefly between mainline commits) may never
|
||
have shipped in any stable release. Note this when it happens - it narrows the set of
|
||
messages that real-world kernels can produce, but the script should still handle the
|
||
intermediate formats since someone could be running a mainline rc kernel.
|
||
- **Missing backports**: Some stable branches reach EOL before a fix is backported (e.g.
|
||
the `sched_smt_active()` change was not backported to 4.17.y or 4.18.y). This doesn't
|
||
change the strings but can change which message appears for the same hardware state.
|
||
|
||
Check each LTS/stable branch that was active when the vulnerability's sysfs support was
|
||
introduced. A quick way to identify relevant branches:
|
||
|
||
```
|
||
cd ~/linux-stable
|
||
for branch in $(git branch -r | grep 'linux-'); do
|
||
count=$(git show "$branch:arch/x86/kernel/cpu/bugs.c" 2>/dev/null | grep -c '<vuln_name>')
|
||
[ "$count" -gt 0 ] && echo "$branch: $count matches"
|
||
done
|
||
```
|
||
|
||
Then for each branch with matches, show the output function and compare it with mainline.
|
||
Document stable-specific differences in a separate `--- stable backports ---` section of
|
||
the inventory comment.
|
||
|
||
**Comment format in CVE files:**
|
||
|
||
The inventory comment goes in Phase 1, right after `sys_interface_check` returns successfully.
|
||
Group entries chronologically by commit, newest last. For each commit, show the hash, the
|
||
kernel version it appeared in, and the exact message(s) it introduced or changed. Use `+` to
|
||
indicate incremental additions to an enum or format. Example:
|
||
|
||
```sh
|
||
# Complete sysfs message inventory for <vuln>, traced via git blame:
|
||
#
|
||
# all versions:
|
||
# "Not affected" (cpu_show_common, <commit>)
|
||
# "Vulnerable" (cpu_show_common fallthrough, <commit>)
|
||
#
|
||
# <commit> (<version>, <what changed>):
|
||
# "Mitigation: <original message>"
|
||
# <commit> (<version>, <what changed>):
|
||
# "Mitigation: <new message format>"
|
||
# <field>: value1 | value2 | value3
|
||
# <commit> (<version>, <what changed>):
|
||
# <field>: + value4
|
||
#
|
||
# all messages start with either "Not affected", "Mitigation", or "Vulnerable"
|
||
```
|
||
|
||
The final line (`all messages start with ...`) is a summary that helps verify the grep
|
||
patterns used to derive `status` from the message are complete.
|
||
|
||
### Cross-Cutting Features
|
||
|
||
Several command-line options affect the logic inside `_linux()` checks. New CVE implementations must account for them where relevant.
|
||
|
||
#### `--explain` (`opt_explain`)
|
||
|
||
When the user passes `--explain`, the `explain()` function prints actionable "How to fix" remediation advice. Call `explain` whenever reporting a VULN status, so the user knows what concrete steps to take:
|
||
|
||
```sh
|
||
pvulnstatus "$cve" VULN "Neither kernel nor microcode mitigate the vulnerability"
|
||
explain "Update your kernel to a version that includes the mitigation, and update your CPU microcode. If you are using a distro, make sure you are up to date."
|
||
```
|
||
|
||
The text should be specific: mention kernel parameters to set (`nosmt`), sysctl knobs to toggle, or which component needs updating. If SMT must be disabled, say so explicitly. Multiple `explain` calls can be made for different failure paths, each tailored to the specific gap found. `explain` is a no-op when `--explain` was not passed, so it is always safe to call.
|
||
|
||
#### `--paranoid` (`opt_paranoid`)
|
||
|
||
Paranoid mode raises the bar for what counts as "mitigated". In normal mode, conditional mitigations or partial defenses may be accepted as sufficient. In paranoid mode, only the **maximum security configuration** qualifies as OK.
|
||
|
||
The most common effect is requiring SMT (Hyper-Threading) to be disabled. For example, MDS and TAA mitigations are considered incomplete in paranoid mode if SMT is still enabled, because a sibling thread could still exploit the vulnerability:
|
||
|
||
```sh
|
||
if [ "$opt_paranoid" != 1 ] || [ "$kernel_smt_allowed" = 0 ]; then
|
||
pvulnstatus "$cve" OK "Microcode and kernel mitigate the vulnerability"
|
||
else
|
||
pvulnstatus "$cve" VULN "Mitigation is active but SMT must be disabled for full protection"
|
||
fi
|
||
```
|
||
|
||
Other paranoid-mode effects include requiring unconditional (rather than conditional) L1D flushing, or requiring TSX to be fully disabled. When implementing a new CVE, consider whether there is a stricter configuration that paranoid mode should enforce and add the appropriate `opt_paranoid` branches.
|
||
|
||
#### `--vmm` (`opt_vmm`)
|
||
|
||
The `--vmm` option tells the script whether the system is a hypervisor host running untrusted virtual machines. It accepts three values: `auto` (default, auto-detect by looking for `qemu`/`kvm`/`xen` processes), `yes` (force hypervisor mode), or `no` (force non-hypervisor mode). The result is stored in `g_has_vmm` by the `check_has_vmm()` function.
|
||
|
||
Some vulnerabilities (e.g. L1TF/CVE-2018-3646, ITLBMH/CVE-2018-12207) only matter - or require additional mitigations - when the host is running a hypervisor with untrusted guests. If `g_has_vmm` is 0, the system can be reported as not vulnerable to these VMM-specific aspects:
|
||
|
||
```sh
|
||
if [ "$g_has_vmm" = 0 ]; then
|
||
pvulnstatus "$cve" OK "this system is not running a hypervisor"
|
||
else
|
||
# check hypervisor-specific mitigations (L1D flushing, EPT, etc.)
|
||
fi
|
||
```
|
||
|
||
CVEs that need VMM context should call `check_has_vmm` early in their `_linux()` function. Note the interaction with paranoid mode: when `--paranoid` is active and `--vmm` was not explicitly set, the script assumes a hypervisor is present (`g_has_vmm=2`), erring on the side of caution.
|
||
|
||
### Step 4: Wire Up and Test
|
||
|
||
1. **Add the CVE name mapping** in the `cve2name()` function so the header prints a human-readable name.
|
||
2. **Build** the monolithic script with `make`.
|
||
3. **Test live**: Run the built script and confirm your CVE appears in the output and reports a sensible status.
|
||
4. **Test batch JSON**: Run with `--batch json` and verify the CVE count incremented by one (currently 19 → 20).
|
||
5. **Test offline**: Run with `--kernel`/`--config`/`--map` pointing to a kernel image and verify the offline code path reports correctly.
|
||
6. **Lint**: Run `shellcheck` on the monolithic script and fix any warnings.
|
||
7. **Update `dist/README.md`**: Add details about the new CVE check (name, description, what it detects) so that the user-facing documentation stays in sync with the implementation.
|
||
|
||
### Key Rules to Remember
|
||
|
||
- **Never hardcode kernel or microcode versions** - detect capabilities directly (design principles 2 and 3).
|
||
- **Assume affected by default** - only mark a CPU as unaffected when there is positive evidence (design principle 4).
|
||
- **Always handle both live and offline modes** - use `$opt_live` to branch, and print `N/A "not testable in offline mode"` for runtime-only checks when offline.
|
||
- **Use `explain()`** when reporting VULN to give actionable remediation advice (see "Cross-Cutting Features" above).
|
||
- **Handle `--paranoid` and `--vmm`** when the CVE has stricter mitigation tiers or VMM-specific aspects (see "Cross-Cutting Features" above).
|
||
- **All indentation must use tabs** (CI enforces this).
|
||
- **Stay POSIX-compatible** - no bashisms, no GNU-only flags in portable code paths.
|
||
|
||
## Function documentation headers
|
||
|
||
Every function must have a documentation header immediately above its definition. The format is:
|
||
|
||
```sh
|
||
# <short description of what the function does>
|
||
# Sets: <comma-separated list of global variables written by this function>
|
||
# Returns: <return value constants or description>
|
||
<function_name>()
|
||
{
|
||
```
|
||
|
||
**Header lines** (all optional except the description):
|
||
|
||
| Line | When to include | Example |
|
||
|--------------|-----------------|---------|
|
||
| Description | Always | `# Read CPUID register value across one or all cores` |
|
||
| `# Args:` | When the function takes positional parameters | `# Args: $1=msr_address $2=cpu_index(optional, default 0)` |
|
||
| `# Sets:` | When the function writes any `ret_*` or other global variable | `# Sets: ret_read_cpuid_value, ret_read_cpuid_msg` |
|
||
| `# Returns:` | When the function uses explicit return codes (constants) | `# Returns: READ_CPUID_RET_OK \| READ_CPUID_RET_ERR \| READ_CPUID_RET_KO` |
|
||
| `# Callers:` | **Required** for `_private` (underscore-prefixed) functions | `# Callers: pvulnstatus, pstatus` |
|
||
|
||
**Rules:**
|
||
|
||
- The `# Sets:` line is critical - it makes global side effects explicit so any reviewer can immediately see what a function mutates.
|
||
- The `# Callers:` line is required for all `_`-prefixed functions. It documents which functions depend on this helper, making it safe to refactor.
|
||
- Keep descriptions to one line when possible. If more context is needed, add continuation comment lines before the structured lines.
|
||
- Parameter documentation uses `$1=name` format. Append `(optional, default X)` for optional parameters.
|
||
|
||
**Full example:**
|
||
|
||
```sh
|
||
# Read a single MSR register on one CPU core
|
||
# Args: $1=msr_address $2=cpu_index(optional, default 0)
|
||
# Sets: ret_read_msr_value, ret_read_msr_msg
|
||
# Returns: READ_MSR_RET_OK | READ_MSR_RET_ERR | READ_MSR_RET_KO
|
||
read_msr()
|
||
{
|
||
```
|
||
|
||
**Private function example:**
|
||
|
||
```sh
|
||
# Emit a single CVE result as a JSON object to the batch output buffer
|
||
# Args: $1=cve_id $2=status $3=message
|
||
# Callers: _record_result
|
||
_emit_json()
|
||
{
|
||
```
|