Proof-carrying deduplication · built for the AI era
It's just a little
duplicate code.
Until it's a full rewrite.
You'll fix it later. You won't. It compounds — silent, harmless-looking — until the refactor you keep dodging is a ground-up rewrite: a senior engineer, a lost year, a bill nobody budgeted for. And AI writes it faster than any team can catch.
dupelint is the only engine that proves duplication away instead of guessing — and the first to discover Type-5: a hidden class of duplication that nothing else even looks for.
read-only · never executes your code · operated by ROI PIPE LLC
the actual output — not a mockup
Proven on CPython
Over-detect. Then prove down to certainty.
One pass across the CPython standard library and tooling. The net flags everything that could be a clone; the sound sieve discards everything it cannot prove. Nothing reaches a verdict without an unconditional proof.
What it solves
Speed up, slim down, stop the rewrite.
Duplicate code is the one debt that compounds on its own — and AI now writes it faster than any team can catch. Remove it, with proof, and the payoff lands in five places at once.
Reuse what exists. AI stops re-deriving the primitives it already built — less to write, review, and test.
One source of truth — not the same fix in eight places, and the one you'd have missed.
Don't regenerate what exists or reload bloat into context. The savings compound on every run.
AI has no memory of what it wrote. dupelint is the one it lacks: ask before it writes, sweep after it merges.
Zero false positives, by proof — consolidate with confidence, on a branch behind your tests.
The clone taxonomy
Five types, ordered by how hard they are to prove.
The standard taxonomy stops at four. The fifth — subsumption — is our own discovery.
Identical modulo whitespace and comments.
Discovered by J. Johnson · 1993
Identical up to consistent renaming of identifiers.
Discovered by B. Baker · 1993–95
Statements inserted or removed; the shared run still proven equivalent.
Discovered by Baxter et al. · 1998
Same behavior, different code. No text overlap required.
Discovered by Komondoor & Horwitz · 2001
A fifth clone type beyond the standard four — one our engine catches and nothing else even looks for. The certainty is yours; the method is ours.
Discovered by Ronald C. Smart · 2026-06-03
How it decides
Math first. The model only speaks where math can't.
Deterministic by design
The verdict step is compute, not a model. The same input gives the same answer, every time. A sound verdict is reserved for an unconditional proof — so it carries no false positives.
Two bases, never blurred
sound — an unconditional proof, safe to act on: apply on a branch, behind your tests. bounded — proven only within a stated domain; flagged for you to verify. The two are never folded together.
Read-only & secret-safe
It inspects code statically — never runs it, never edits your files. Files that look like secrets are rejected on ingest. The output is data: a verdict and a suggested rewrite. Applying it is your call.
The answer, not the recipe
A response says these are equivalent or inner ⊑ outer. It never carries how that was decided. You get the verdict; the derivation stays ours.
Safe by design
It reads. You decide. Your tests confirm.
dupelint never touches your code — it returns a verdict and a suggested rewrite. Scanning an entire codebase changes nothing. Applying is yours, done the safe way.
Never executes or edits your files. A scan inspects statically and changes nothing — zero risk.
The output is data — a verdict and a suggested rewrite. You apply it, on your terms.
Branch → apply → run your tests → review → merge. Never straight to production.
Anything the proof can't close is flagged for review — never marked sound, never guessed.
Full terms of use → dupelint.com/terms
Privacy
We keep no memory of your code.
Your code is read to compute the verdict, then it's gone. Nothing is written to disk, nothing is kept — here's the whole lifecycle of a scan.
Encrypted over TLS on every request — your code never travels in the clear.
Analyzed in RAM for the moment of the scan — read only to compute the verdict, never written to disk.
When the scan finishes, it's gone — no persistent copy of your code, anywhere.
Processed and stored in EU data centers — never in US data centers. Under GDPR.
Coverage
Python is live — proven, not promised.
Python ships with all five clone types, sound, with zero false positives. Not a demo — see it run across the entire CPython standard library, the real output.
See the CPython scan ↗Roadmap — a language ships only when all five types are sound for it. No half-shipped languages.
For coding agents
Built for the agent loop.
Generation is fast and forgetful — it reinvents what it already wrote. dupelint is the primitive that catches it, with an answer solid enough to act on — on a branch, behind your tests.
Query before it writes
A worker checks whether the function already exists before emitting it. Prevent the duplicate at the source.
Supervisor sweep
A coordinator scans merged output in parallel — catching cross-worker duplication no single worker can see.
Post-task refactor
When the task closes, batch-scan and consolidate. A typical repo clears in under a minute.
Stop shipping duplicate code.
Point dupelint at your codebase — or wire it into your agents. The verdict is sound, the proof is ours, and your code stays slim.
Built on Python
dupelint runs on CPython — and we proved it on CPython's own source. As a proud PSF Supporting Member, we back the Python Software Foundation — the nonprofit that stewards the language everything here is built on.