dupelint Notify me

Proof-carrying deduplication · built for the AI era

It's just a little
duplicate code.
Until it's a full rewrite.

You'll fix it later. You won't. It compounds — silent, harmless-looking — until the refactor you keep dodging is a ground-up rewrite: a senior engineer, a lost year, a bill nobody budgeted for. And AI writes it faster than any team can catch.

dupelint is the only engine that proves duplication away instead of guessing — and the first to discover Type-5: a hidden class of duplication that nothing else even looks for.

read-only · never executes your code · operated by ROI PIPE LLC

result.json · CPython 3.12.3
{
"types": {
"1_exact": 28,
"2_renamed": 30,
"3_gapped": 17,
"4_semantic": 0,
"5_subsumption": 4,
"total": 79
},
"duplicates": [
{ "action": "remove",
"remove": "_is_leap (Lib/_pydatetime.py:49)",
"use": "isleap (Lib/calendar.py:141)" },
{ "action": "remove",
"remove": "seekable (Lib/_pyio.py:1020)",
"use": "readable (Lib/_pyio.py:1010)" },
{ "action": "reuse", Type-5
"inside": "iconcat (Lib/operator.py:349)",
"reuse": "iadd (Lib/operator.py:339)" }
… +76 more
]
}
SOUND 0 false positives · by construction

the actual output — not a mockup

Proven on CPython

Over-detect. Then prove down to certainty.

One pass across the CPython standard library and tooling. The net flags everything that could be a clone; the sound sieve discards everything it cannot prove. Nothing reaches a verdict without an unconditional proof.

candidates flagged
2,346,742
raw suspects across 5 clone types
sound duplicates
79
proven, with a ready consolidation
false-positive verdicts
0
by construction — not by tuning
63,186 functions 1,996 files 8 skipped (unparseable) ~14.6 min single full-spectrum pass

What it solves

Speed up, slim down, stop the rewrite.

Duplicate code is the one debt that compounds on its own — and AI now writes it faster than any team can catch. Remove it, with proof, and the payoff lands in five places at once.

Speed
Time to market

Reuse what exists. AI stops re-deriving the primitives it already built — less to write, review, and test.

Maintainability
Change once

One source of truth — not the same fix in eight places, and the one you'd have missed.

Cost
Fewer tokens

Don't regenerate what exists or reload bloat into context. The savings compound on every run.

AI
No blind spots

AI has no memory of what it wrote. dupelint is the one it lacks: ask before it writes, sweep after it merges.

Trust
Safe to act on

Zero false positives, by proof — consolidate with confidence, on a branch behind your tests.

The clone taxonomy

Five types, ordered by how hard they are to prove.

The standard taxonomy stops at four. The fifth — subsumption — is our own discovery.

01
Exact

Identical modulo whitespace and comments.

Discovered by J. Johnson · 1993

02
Renamed

Identical up to consistent renaming of identifiers.

Discovered by B. Baker · 1993–95

03
Gapped

Statements inserted or removed; the shared run still proven equivalent.

Discovered by Baxter et al. · 1998

04
Semantic

Same behavior, different code. No text overlap required.

Discovered by Komondoor & Horwitz · 2001

05
Subsumption OURS

A fifth clone type beyond the standard four — one our engine catches and nothing else even looks for. The certainty is yours; the method is ours.

Discovered by Ronald C. Smart · 2026-06-03

How it decides

Math first. The model only speaks where math can't.

Deterministic by design

The verdict step is compute, not a model. The same input gives the same answer, every time. A sound verdict is reserved for an unconditional proof — so it carries no false positives.

Two bases, never blurred

sound — an unconditional proof, safe to act on: apply on a branch, behind your tests. bounded — proven only within a stated domain; flagged for you to verify. The two are never folded together.

Read-only & secret-safe

It inspects code statically — never runs it, never edits your files. Files that look like secrets are rejected on ingest. The output is data: a verdict and a suggested rewrite. Applying it is your call.

The answer, not the recipe

A response says these are equivalent or inner  outer. It never carries how that was decided. You get the verdict; the derivation stays ours.

Safe by design

It reads. You decide. Your tests confirm.

dupelint never touches your code — it returns a verdict and a suggested rewrite. Scanning an entire codebase changes nothing. Applying is yours, done the safe way.

Read-only

Never executes or edits your files. A scan inspects statically and changes nothing — zero risk.

Decides, never does

The output is data — a verdict and a suggested rewrite. You apply it, on your terms.

Apply behind tests

Branch → apply → run your tests → review → merge. Never straight to production.

Unprovable → candidate

Anything the proof can't close is flagged for review — never marked sound, never guessed.

Full terms of use → dupelint.com/terms

Privacy

We keep no memory of your code.

Your code is read to compute the verdict, then it's gone. Nothing is written to disk, nothing is kept — here's the whole lifecycle of a scan.

In transit

Encrypted over TLS on every request — your code never travels in the clear.

In memory

Analyzed in RAM for the moment of the scan — read only to compute the verdict, never written to disk.

Dumped after

When the scan finishes, it's gone — no persistent copy of your code, anywhere.

In the EU

Processed and stored in EU data centers — never in US data centers. Under GDPR.

Coverage

Python is live — proven, not promised.

Python ships with all five clone types, sound, with zero false positives. Not a demo — see it run across the entire CPython standard library, the real output.

See the CPython scan ↗

Roadmap — a language ships only when all five types are sound for it. No half-shipped languages.

01 Python
LIVE
02 TypeScript / JavaScript
in development
03 Go
next
04 Rust
planned
05 Java
planned
06 C#
planned
07 Kotlin
planned
08 Swift
planned
09 Dart
planned
10 PHP
planned
11 Lua
planned
12 Shell
planned
13 C
planned
14 C++
planned
HTML · CSS · XML · YAML · Markdown
planned
+ more languages — on request

For coding agents

Built for the agent loop.

Generation is fast and forgetful — it reinvents what it already wrote. dupelint is the primitive that catches it, with an answer solid enough to act on — on a branch, behind your tests.

before

Query before it writes

A worker checks whether the function already exists before emitting it. Prevent the duplicate at the source.

during

Supervisor sweep

A coordinator scans merged output in parallel — catching cross-worker duplication no single worker can see.

after

Post-task refactor

When the task closes, batch-scan and consolidate. A typical repo clears in under a minute.

Stop shipping duplicate code.

Point dupelint at your codebase — or wire it into your agents. The verdict is sound, the proof is ours, and your code stays slim.

PSF Supporting Member

Built on Python

dupelint runs on CPython — and we proved it on CPython's own source. As a proud PSF Supporting Member, we back the Python Software Foundation — the nonprofit that stewards the language everything here is built on.