Guide
@nkwib/pr-engine is the deterministic engine for PR Compass. It turns a list of git commits into per-file risk metrics. No network, no LLM, no database. Same input always produces the same output, bit-for-bit.
This package backs the OSS release of the PR Compass hosted product. The hosted product is the closed-source layer above; this engine is what computes "which file looks risky" given the repository's commit history.
ESM-only. Requires Node 20+.
Install
npm install @nkwib/pr-engine
# or pnpm / yarn / bun Public API is frozen at 1.0.0 per VERSIONING.md. Until then, breaking changes only happen between minor versions and are listed in CHANGELOG.md.
Quick start
import {
mineCommits,
computeChurn,
computeCochange,
computeHotspots,
computeRisk
} from '@nkwib/pr-engine';
// 1. You produce CommitRecord[] yourself (e.g. via git log).
const commits = await yourGitDriver.listCommits();
// 2. Mine: classify each commit as bug-fix or not.
const mined = mineCommits({ commits });
// 3. Per-file metrics.
const churn = computeChurn({ mined });
const cochange = computeCochange({ mined });
const hotspots = computeHotspots({ mined });
// 4. Combine into a risk report.
const risk = computeRisk({ mined, hotspots, churn, cochange });
// 5. risk.byFile[path] is your per-file FileRiskReport.
// Every numeric value is grounded by real commit SHAs or is null.
console.log(JSON.stringify(risk, null, 2)); Each compute* is pure: no I/O, deterministic, idempotent. You can re-run any step in isolation, snapshot-test it, or skip it (e.g. compute churn without ever calling computeCochange).
Producing CommitRecord[]
The engine consumes CommitRecord[]. To produce that from a git directory, run two git log passes and feed their outputs to the bundled parsers:
import { execFile } from 'node:child_process';
import { promisify } from 'node:util';
import {
parseCommitMetadata,
parseCommitFiles,
type CommitRecord
} from '@nkwib/pr-engine';
const run = promisify(execFile);
async function listCommits(repoDir: string): Promise<CommitRecord[]> {
const META = '%x1e%H%x1f%P%x1f%aN%x1f%aI%x1f%B';
const { stdout: metaOut } = await run('git', ['log', `--format=${META}`], {
cwd: repoDir
});
const { stdout: fileOut } = await run(
'git',
['log', '--name-only', '--format=\x1eCOMMIT %H'],
{ cwd: repoDir }
);
const meta = parseCommitMetadata(metaOut);
const files = parseCommitFiles(fileOut);
return meta.map((m) => ({ ...m, filesTouched: files.get(m.sha) ?? [] }));
} The @nkwib/pr-analyze package ships a LocalAdapter that does exactly this. Use it directly if you don't want to write subprocess code.
One-shot — analyze(ctx)
If you already have an AnalyzeContext (commits + diff + optional PR metadata), the high-level entry point runs the whole pipeline in one call:
import { analyze, ANALYSIS_SCHEMA_VERSION, type AnalysisOutput } from '@nkwib/pr-engine';
const output: AnalysisOutput = analyze(ctx);
// output.version === ANALYSIS_SCHEMA_VERSION
// output.mining — bug-fix vs total commit stats
// output.hotspots — Bayesian-smoothed bug-fix density per file
// output.churn — per-file commit count, bug-fix count, defect density, first/last touched
// output.cochange — file × file co-modification graph (Jaccard + counts)
// output.risk — per-file combined risk score with `groundedIn` SHA pointers analyze() is pure: no I/O, no Date.now(), no Math.random(). The adapter that produced ctx is responsible for any side effects.
Non-goals
This package is the engine. It deliberately does not:
- Read or write the filesystem.
- Spawn subprocesses (no
git, nochild_process). - Talk to GitHub or any other network service.
- Call an LLM, post comments, or suggest fixes.
- Write to a database.
Those concerns belong to adapters (LocalAdapter, GitHubAdapter) and gates (the closed-source PR Compass hosted product). The engine itself is the data-only core that those layers wrap.
ESLint rules at the workspace root, the package-invariants.test.ts per package, and code review enforce these constraints. See /decisions for the full invariant list.
Where next
- API reference — every public function, every type.
- Pipeline — what each engine computes, in order, with the maths.
- Benchmarks — the headline ~7.5 ms / 10 k commits number plus the per-engine breakdown.
- Invariants — the seven rules that keep the engine deterministic and grounded.