Proof-carrying deduplication · built for the AI era

It's just a little
duplicate code.
Until it's a full rewrite.

You'll fix it later. You won't. It compounds. Silent, harmless-looking. Until the refactor you keep dodging is a ground-up rewrite: a senior engineer, a lost year, a bill nobody budgeted for. And AI writes it faster than any team can catch.

dupelint is the only engine that proves duplication away instead of guessing. And the first to discover Type-5: a hidden class of duplication that nothing else even looks for.

Notify me when API keys are available See it on real code ↗

read-only · never modifies your files · operated by ROI PIPE LLC

result.json · CPython 3.12.3

{

"types": {

"1_exact": 28,

"2_renamed": 30,

"3_gapped": 17,

"4_semantic": 0,

"5_subsumption": 4,

"total": 79

"duplicates": [

{ "action": "remove",

"remove": "_is_leap (Lib/_pydatetime.py:49)",

"use": "isleap (Lib/calendar.py:141)" },

{ "action": "remove",

"remove": "seekable (Lib/_pyio.py:1020)",

"use": "readable (Lib/_pyio.py:1010)" },

{ "action": "reuse", ⊑ Type-5

"inside": "iconcat (Lib/operator.py:349)",

"reuse": "iadd (Lib/operator.py:339)" }

… +76 more

]

}

SOUND 0 false positives · by construction

the actual output, not a mockup

Proven on CPython

Over-detect. Then prove down to certainty.

One pass across the CPython standard library and tooling. The net flags everything that could be a clone; the sound sieve discards everything it cannot prove. Nothing reaches a verdict without an unconditional proof.

candidates flagged

2,346,742

raw suspects across 5 clone types

sound duplicates

proven, with a ready consolidation

false-positive verdicts

by construction, not by tuning

63,186 functions 1,996 files 8 skipped (unparseable) ~14.6 min single full-spectrum pass

See the full CPython results ↗

What it solves

Speed up, slim down, stop the rewrite.

Duplicate code is the one debt that compounds on its own. And AI now writes it faster than any team can catch. Remove it, with proof, and the payoff lands in five places at once.

Speed

Time to market

Reuse what exists. AI stops re-deriving the primitives it already built. Less to write, review, and test.

Maintainability

Change once

One source of truth. Not the same fix in eight places, and the one you'd have missed.

Cost

Fewer tokens

Don't regenerate what exists or reload bloat into context. The savings compound on every run.

No blind spots

AI has no memory of what it wrote. dupelint is the one it lacks: ask before it writes, sweep after it merges.

Trust

Safe to act on

Zero false positives, by proof. Consolidate with confidence, on a branch behind your tests.

The clone taxonomy

Five types, ordered by how hard they are to prove.

The standard taxonomy stops at four. The fifth, subsumption, is our own discovery.

Exact

Identical modulo whitespace and comments.

Discovered by J. Johnson · 1993

Renamed

Identical up to consistent renaming of identifiers.

Discovered by B. Baker · 1993-95

Gapped

Statements inserted or removed; the shared run still proven equivalent.

Discovered by Baxter et al. · 1998

Semantic

Same behavior, different code. No text overlap required.

Discovered by Komondoor & Horwitz · 2001

⊑

Subsumption OURS

A fifth clone type beyond the standard four. One our engine catches and nothing else even looks for. The certainty is yours; the method is ours.

Discovered by Ronald C. Smart · 2026-06-03

How it decides

Math first. The model only speaks where math can't.

Deterministic by design

The verdict step is compute, not a model. The same input gives the same answer, every time. A sound verdict is reserved for an unconditional proof, so it carries no false positives.

Two bases, never blurred

sound. An unconditional proof, safe to act on: apply on a branch, behind your tests. bounded. Proven only within a stated domain; flagged for you to verify. The two are never folded together.

Read-only & secret-safe

It reads code as structure and never modifies your files. Verdicts are proven statically, not by running your code. The only execution is a sandboxed check of pure-arithmetic functions, with no file, network, or import access, used only to rule out false matches. Secret-looking files are rejected on ingest, and the output is data: a verdict and a suggested rewrite you choose to apply.

The answer, not the recipe

A response says these are equivalent or inner ⊑ outer. It never carries how that was decided. You get the verdict; the derivation stays ours.

Safe by design

It reads. You decide. Your tests confirm.

dupelint never modifies your code. It returns a verdict and a suggested rewrite. Scanning an entire codebase changes nothing. Applying is yours, done the safe way.

Read-only

Never modifies your files. A scan reads your code and changes nothing on your side; verdicts come from static analysis, not from running it.

Decides, never does

The output is data: a verdict and a suggested rewrite. You apply it, on your terms.

Apply behind tests

Branch → apply → run your tests → review → merge. Never straight to production.

Unprovable → candidate

Anything the proof can't close is flagged for review. Never marked sound, never guessed.

Full terms of use → dupelint.com/terms

Privacy

We never store your code.

Your code is read in memory to compute the verdict, then dropped. It is never written to disk, and no copy is kept. Here's the full lifecycle of a scan, limits included.

In transit

Encrypted over TLS on every request. Your code never travels in the clear.

In memory

Held in RAM only, for the moment of the scan, read to compute the verdict. Never written to disk, and swap is off so it cannot leak there.

Dropped after

When the scan finishes, your code is released from memory. No stored copy, on disk or off.

In the EU

Processed in EU data centers, under GDPR. Never in US data centers.

Full transparency: we store nothing and never write your code to disk. We also won't overclaim. RAM holds the bytes only until they are overwritten, and we don't own the physical host, so we don't call it "unrecoverable." Hardware-encrypted memory (confidential computing) is on our roadmap to seal the in-scan window itself.

Coverage

Python is live. Proven, not promised.

Python ships with all five clone types, sound, with zero false positives. Not a demo — see it run across the entire CPython standard library, the real output.

See the CPython scan ↗

Roadmap — a language ships only when all five types are sound for it. No half-shipped languages.

01 Python

LIVE

02 TypeScript / JavaScript

in development

03 Go

04 Rust

planned

05 Java

planned

06 C#

planned

07 Kotlin

planned

08 Swift

planned

09 Dart

planned

10 PHP

planned

11 Lua

planned

12 Shell

planned

13 C

planned

14 C++

planned

· HTML · CSS · XML · YAML · Markdown

planned

+ more languages — on request

For coding agents

Built for the agent loop.

Generation is fast and forgetful. It reinvents what it already wrote. dupelint is the primitive that catches it, with an answer solid enough to act on. On a branch, behind your tests.

before

Query before it writes

A worker checks whether the function already exists before emitting it. Prevent the duplicate at the source.

during

Supervisor sweep

A coordinator scans merged output in parallel, catching cross-worker duplication no single worker can see.

after

Post-task refactor

When the task closes, batch-scan and consolidate. A typical repo clears in under a minute.

⊑

Stop shipping duplicate code.

Point dupelint at your codebase, or wire it into your agents. The verdict is sound, the proof is ours, and your code stays slim.

Notify me when API keys are available Read the API ↗

Built on Python

dupelint runs on CPython, and we proved it on CPython's own source. As a proud PSF Supporting Member, we back the Python Software Foundation, the nonprofit that stewards the language everything here is built on.

It's just a littleduplicate code.Until it's a full rewrite.

Over-detect. Then prove down to certainty.

Speed up, slim down, stop the rewrite.

Five types, ordered by how hard they are to prove.

Math first. The model only speaks where math can't.

Deterministic by design

Two bases, never blurred

Read-only & secret-safe

The answer, not the recipe

It reads. You decide. Your tests confirm.

We never store your code.

Python is live. Proven, not promised.

Built for the agent loop.

Query before it writes

Supervisor sweep

Post-task refactor

Stop shipping duplicate code.

It's just a little
duplicate code.
Until it's a full rewrite.