Guide

@nkwib/pr-engine is the deterministic engine for PR Compass. It turns a list of git commits into per-file risk metrics. No network, no LLM, no database. Same input always produces the same output, bit-for-bit.

This package backs the OSS release of the PR Compass hosted product. The hosted product is the closed-source layer above; this engine is what computes "which file looks risky" given the repository's commit history.

ESM-only. Requires Node 20+.

Install

npm install @nkwib/pr-engine
# or pnpm / yarn / bun

Public API is frozen at 1.0.0 per VERSIONING.md. Until then, breaking changes only happen between minor versions and are listed in CHANGELOG.md.

Quick start

import {
  mineCommits,
  computeChurn,
  computeCochange,
  computeHotspots,
  computeRisk
} from '@nkwib/pr-engine';

// 1. You produce CommitRecord[] yourself (e.g. via git log).
const commits = await yourGitDriver.listCommits();

// 2. Mine: classify each commit as bug-fix or not.
const mined = mineCommits({ commits });

// 3. Per-file metrics.
const churn = computeChurn({ mined });
const cochange = computeCochange({ mined });
const hotspots = computeHotspots({ mined });

// 4. Combine into a risk report.
const risk = computeRisk({ mined, hotspots, churn, cochange });

// 5. risk.byFile[path] is your per-file FileRiskReport.
//    Every numeric value is grounded by real commit SHAs or is null.
console.log(JSON.stringify(risk, null, 2));

Each compute* is pure: no I/O, deterministic, idempotent. You can re-run any step in isolation, snapshot-test it, or skip it (e.g. compute churn without ever calling computeCochange).

Producing `CommitRecord[]`

The engine consumes CommitRecord[]. To produce that from a git directory, run two git log passes and feed their outputs to the bundled parsers:

import { execFile } from 'node:child_process';
import { promisify } from 'node:util';
import {
  parseCommitMetadata,
  parseCommitFiles,
  type CommitRecord
} from '@nkwib/pr-engine';

const run = promisify(execFile);

async function listCommits(repoDir: string): Promise<CommitRecord[]> {
  const META = '%x1e%H%x1f%P%x1f%aN%x1f%aI%x1f%B';
  const { stdout: metaOut } = await run('git', ['log', `--format=${META}`], {
    cwd: repoDir
  });
  const { stdout: fileOut } = await run(
    'git',
    ['log', '--name-only', '--format=\x1eCOMMIT %H'],
    { cwd: repoDir }
  );
  const meta = parseCommitMetadata(metaOut);
  const files = parseCommitFiles(fileOut);
  return meta.map((m) => ({ ...m, filesTouched: files.get(m.sha) ?? [] }));
}

The @nkwib/pr-analyze package ships a LocalAdapter that does exactly this. Use it directly if you don't want to write subprocess code.

One-shot — `analyze(ctx)`

If you already have an AnalyzeContext (commits + diff + optional PR metadata), the high-level entry point runs the whole pipeline in one call:

import { analyze, ANALYSIS_SCHEMA_VERSION, type AnalysisOutput } from '@nkwib/pr-engine';

const output: AnalysisOutput = analyze(ctx);
// output.version === ANALYSIS_SCHEMA_VERSION

// output.mining   — bug-fix vs total commit stats
// output.hotspots — Bayesian-smoothed bug-fix density per file
// output.churn    — per-file commit count, bug-fix count, defect density, first/last touched
// output.cochange — file × file co-modification graph (Jaccard + counts)
// output.risk     — per-file combined risk score with `groundedIn` SHA pointers

analyze() is pure: no I/O, no Date.now(), no Math.random(). The adapter that produced ctx is responsible for any side effects.

Non-goals

This package is the engine. It deliberately does not:

Read or write the filesystem.
Spawn subprocesses (no git, no child_process).
Talk to GitHub or any other network service.
Call an LLM, post comments, or suggest fixes.
Write to a database.

Those concerns belong to adapters (LocalAdapter, GitHubAdapter) and gates (the closed-source PR Compass hosted product). The engine itself is the data-only core that those layers wrap.

ESLint rules at the workspace root, the package-invariants.test.ts per package, and code review enforce these constraints. See /decisions for the full invariant list.

Where next

API reference — every public function, every type.
Pipeline — what each engine computes, in order, with the maths.
Benchmarks — the headline ~7.5 ms / 10 k commits number plus the per-engine breakdown.
Invariants — the seven rules that keep the engine deterministic and grounded.