<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
  <title>Deslop</title>
  <subtitle>The live LSP + MCP duplicate-code server for AI coding agents. Deslop streams real-time clone signals to Claude Code, Cursor, Copilot, Continue, and Codex as code is written — find-similar prevents the copy-paste before it lands. Install via the VS Code VSIX (bundles LSP, MCP server, and CLI); JetBrains plugin in active development.</subtitle>
  <link href="https://deslop.live/feed.xml" rel="self"/>
  <link href="https://deslop.live/"/>
  <updated>2026-06-03T12:33:53Z</updated>
  <id>https://deslop.live/</id>
  <author><name>Christian Findlay</name></author>
  <entry>
    <title>Regex on source code is illegal</title>
    <link href="https://deslop.live/blog/tree-sitter-over-regex/"/>
    <id>https://deslop.live/blog/tree-sitter-over-regex/</id>
    <updated>2026-04-10T00:00:00Z</updated>
    <content type="html">https://deslop.live/%3Cp%3EMost%20clone%20detectors%20you%20have%20used%20%E2%80%94%20CPD,%20Simian,%20jscpd%20%E2%80%94%20are%20fundamentally%20line-matchers.%20They%20take%20your%20source,%20tokenize%20or%20hash%20it%20by%20line,%20and%20find%20runs%20of%20matching%20lines.%20That%20approach%20has%20two%20features:%20it%20is%20fast,%20and%20it%20predates%20anyone%20writing%20a%20parser%20that%20is%20fast%20enough%20to%20not%20be%20the%20bottleneck.%20Tree-sitter%20changed%20the%20second%20fact.%20Deslop%20refuses%20to%20pretend%20otherwise.%3C/p%3E%3Ch2%20id=%22what-line-matching-misses%22%20tabindex=%22-1%22%3E%3Ca%20class=%22header-anchor%22%20href=%22#what-line-matching-misses%22%3EWhat%20line-matching%20misses%3C/a%3E%3C/h2%3E%3Cp%3EA%20line-matcher%20cannot%20see%20past:%3C/p%3E%3Cul%3E%3Cli%3E%3Cstrong%3EFormatting.%3C/strong%3E%20Two%20identical%20functions%20formatted%20differently%20look%20like%20different%20code.%3C/li%3E%3Cli%3E%3Cstrong%3ERename.%3C/strong%3E%20Changing%20%3Ccode%3Euser%3C/code%3E%20to%20%3Ccode%3Ecustomer%3C/code%3E%20across%20a%20method%20breaks%20every%20match.%3C/li%3E%3Cli%3E%3Cstrong%3EReorder.%3C/strong%3E%20Swapping%20two%20independent%20statements%20produces%20zero%20overlap%20in%20the%20tokenizer&#39;s%20world.%3C/li%3E%3Cli%3E%3Cstrong%3ESugar.%3C/strong%3E%20LINQ%20versus%20%3Ccode%3Eforeach%3C/code%3E,%20%3Ccode%3Easync/await%3C/code%3E%20versus%20callbacks,%20list%20comprehension%20versus%20loop%20%E2%80%94%20all%20the%20same%20code,%20all%20invisible%20to%20a%20tokenizer.%3C/li%3E%3C/ul%3E%3Cp%3EYou%20can%20patch%20around%20each%20individually.%20CPD%20normalizes%20whitespace.%20jscpd%20has%20mode%20toggles.%20Simian%20lets%20you%20configure%20what%20counts%20as%20a%20match.%20Every%20patch%20is%20a%20pile%20of%20heuristics%20that%20fail%20at%20the%20next%20edge%20case.%20The%20architecture%20does%20not%20support%20doing%20better.%3C/p%3E%3Ch2%20id=%22what-tree-sitter-lets-us-do%22%20tabindex=%22-1%22%3E%3Ca%20class=%22header-anchor%22%20href=%22#what-tree-sitter-lets-us-do%22%3EWhat%20tree-sitter%20lets%20us%20do%3C/a%3E%3C/h2%3E%3Cp%3EA%20tree-sitter%20parser%20produces%20an%20AST%20for%20every%20file%20in%20the%20repo.%20From%20that%20tree%20we%20can:%3C/p%3E%3Cul%3E%3Cli%3Enormalize%20identifiers%20and%20literals%20to%20canonical%20placeholders,%20so%20renames%20collapse%20to%20the%20same%20fingerprint;%3C/li%3E%3Cli%3Ehash%20subtrees%20independently,%20so%20the%20fingerprint%20of%20a%20method%20is%20stable%20regardless%20of%20where%20it%20lives%20in%20the%20file;%3C/li%3E%3Cli%3Eoperate%20on%20subtrees%20rather%20than%20lines,%20so%20formatting%20and%20whitespace%20are%20irrelevant;%3C/li%3E%3Cli%3Eemit%20byte%20ranges%20that%20survive%20every%20kind%20of%20source%20transformation%20except%20semantic%20rewrite.%3C/li%3E%3C/ul%3E%3Cp%3EThe%20pipeline%20this%20enables%20is%20linear,%20deterministic,%20and%20cheap.%20No%20heuristics.%20No%20per-language%20special%20cases%20beyond%20the%20grammar.%20Adding%20a%20language%20is:%20implement%20the%20%3Ccode%3ELanguageParser%3C/code%3E%20trait,%20pin%20the%20grammar,%20done.%3C/p%3E%3Ch2%20id=%22why-no-regex-is-written-into-the-rulebook%22%20tabindex=%22-1%22%3E%3Ca%20class=%22header-anchor%22%20href=%22#why-no-regex-is-written-into-the-rulebook%22%3EWhy%20&amp;quot;no%20regex&amp;quot;%20is%20written%20into%20the%20rulebook%3C/a%3E%3C/h2%3E%3Cp%3EThe%20%3Ccode%3ECLAUDE.md%3C/code%3E%20for%20this%20repo%20says%20it%20plainly:%20%3Cstrong%3Eregex%20on%20source%20code%20is%20prohibited.%3C/strong%3E%20Not%20&amp;quot;avoid,&amp;quot;%20not%20&amp;quot;prefer%20parsers&amp;quot;%20%E2%80%94%20illegal.%20That%20rule%20exists%20because%20regex-on-source%20is%20a%20slippery%20slope.%20The%20first%20one%20handles%20a%20niche%20case%20a%20parser%20cannot%20easily%20express.%20The%20second%20one%20fixes%20a%20bug%20in%20the%20first.%20By%20the%20fifth,%20the%20codebase%20has%20a%20regex%20layer%20shadowing%20a%20parser%20layer%20and%20nobody%20can%20reason%20about%20which%20one%20fires%20first.%3C/p%3E%3Cp%3ETree-sitter%20is%20not%20a%20convenience%20in%20Deslop%20%E2%80%94%20it%20is%20the%20entire%20foundation.%20Every%20clone%20type%20the%20tool%20detects,%20every%20signal%20it%20fuses,%20every%20byte%20range%20it%20emits%20comes%20from%20the%20AST.%20Removing%20tree-sitter%20would%20not%20cost%20a%20feature;%20it%20would%20leave%20no%20tool%20behind.%3C/p%3E%3Ch2%20id=%22what-this-means-for-you%22%20tabindex=%22-1%22%3E%3Ca%20class=%22header-anchor%22%20href=%22#what-this-means-for-you%22%3EWhat%20this%20means%20for%20you%3C/a%3E%3C/h2%3E%3Cul%3E%3Cli%3E%3Cstrong%3ERename%20refactors%20do%20not%20hide%20duplication.%3C/strong%3E%20A%20cluster%20survives%20an%20identifier%20rename%20because%20the%20fingerprint%20runs%20on%20the%20normalized%20AST.%3C/li%3E%3Cli%3E%3Cstrong%3EFormatting%20changes%20do%20not%20create%20false%20positives.%3C/strong%3E%20Reformatting%20a%20file%20with%20%3Ccode%3Erustfmt%3C/code%3E%20does%20not%20change%20what%20Deslop%20sees.%3C/li%3E%3Cli%3E%3Cstrong%3ELanguage%20parity%20is%20real.%3C/strong%3E%20The%20same%20fingerprinting%20logic%20runs%20on%20C#,%20Rust,%20Python,%20Dart,%20and%20every%20language%20added%20later.%20Cross-language%20comparisons%20(when%20they%20make%20sense)%20use%20the%20same%20math.%3C/li%3E%3C/ul%3E%3Cp%3ELine-matching%20is%20a%201990s%20compromise%20with%20hardware%20that%20no%20longer%20exists.%20Tree-sitter%20is%20the%20upgrade.%20Deslop%20ships%20the%20upgrade%20as%20the%20baseline,%20not%20a%20premium%20tier.%3C/p%3E</content>
    <summary>Deslop parses every language with tree-sitter — no regex, no line-matching. Why that constraint matters, and how it survives reformatting and identifier renaming.</summary>
  </entry>
  <entry>
    <title>Why the ranking formula is the entire product</title>
    <link href="https://deslop.live/blog/ranking-formula/"/>
    <id>https://deslop.live/blog/ranking-formula/</id>
    <updated>2026-04-15T00:00:00Z</updated>
    <content type="html">https://deslop.live/%3Cp%3EA%20duplicate-detection%20tool%20that%20reports%20clusters%20without%20ranking%20them%20is%20a%20search%20engine%20that%20returns%20results%20in%20insertion%20order.%20You%20can%20tell%20the%20user%20&amp;quot;there%20are%20142%20clusters,&amp;quot;%20and%20you%20have%20just%20transferred%20the%20problem%20from%20the%20tool%20to%20the%20human.%20Line%20one%20of%20the%20report%20is%20the%20only%20line%20that%20matters%20on%20the%20first%20look.%20Everything%20else%20in%20Deslop%20exists%20to%20make%20line%20one%20correct.%3C/p%3E%3Ch2%20id=%22the-formula%22%20tabindex=%22-1%22%3E%3Ca%20class=%22header-anchor%22%20href=%22#the-formula%22%3EThe%20formula%3C/a%3E%3C/h2%3E%3Cpre%3E%3Ccode%3Eweight%20=%20clone_node_count%20%C3%97%20(cluster_size%20%E2%88%92%201)%20%C3%97%20log2(1%20+%20spanned_bytes)%3C/code%3E%3C/pre%3E%3Cp%3EImplemented%20in%20%3Ca%20href=%22https://github.com/Nimblesite/Deslop/blob/main/crates/deslop-core/src/cluster.rs%22%3E%3Ccode%3Ecrates/deslop-core/src/cluster.rs::rank_weight%3C/code%3E%3C/a%3E.%20Three%20factors,%20all%20multiplicative,%20with%20one%20logarithmic%20damper.%3C/p%3E%3Cp%3E%3Cstrong%3E%3Ccode%3Eclone_node_count%3C/code%3E%3C/strong%3E%20%E2%80%94%20the%20AST%20node%20count%20of%20the%20duplicated%20fragment.%20A%20five-node%20getter%20is%20not%20interesting.%20A%20fifty-node%20method%20with%20nested%20control%20flow%20is.%20Node%20count%20is%20the%20closest%20proxy%20we%20have%20to%20&amp;quot;how%20much%20effort%20was%20duplicated.&amp;quot;%3C/p%3E%3Cp%3E%3Cstrong%3E%3Ccode%3Ecluster_size%20%E2%88%92%201%3C/code%3E%3C/strong%3E%20%E2%80%94%20the%20number%20of%20%3Cem%3Eadditional%3C/em%3E%20members%20beyond%20the%20first.%20Two%20copies%20counts%20as%20one%20duplicate%20pair.%20Five%20copies%20counts%20as%20four.%20A%20singleton%20cluster%20scores%20zero%20by%20construction,%20which%20is%20the%20mathematically%20honest%20version%20of%20&amp;quot;one%20occurrence%20isn&#39;t%20a%20duplicate.&amp;quot;%3C/p%3E%3Cp%3E%3Cstrong%3E%3Ccode%3Elog2(1%20+%20spanned_bytes)%3C/code%3E%3C/strong%3E%20%E2%80%94%20payoff%20scale,%20in%20bytes,%20dampened%20by%20%3Ccode%3Elog2%3C/code%3E.%20The%20byte%20total%20tracks%20how%20much%20code%20an%20extraction%20would%20actually%20move;%20the%20logarithm%20prevents%20a%20single%205000-line%20vendored%20file%20from%20dominating%20four%20genuine%2050-line%20method%20copies.%20Bytes%20(not%20lines)%20are%20the%20source%20of%20truth%20because%20Deslop%20addresses%20occurrences%20by%20%3Ccode%3E[byte_start,%20byte_end)%3C/code%3E%20everywhere%20%E2%80%94%20line%20numbers%20are%20render-time%20only.%3C/p%3E%3Cp%3EMultiplying%20the%20three%20gives%20a%20number%20that%20is%20dimensionally%20sensible%20(effort%20%C3%97%20repetition%20%C3%97%20blast%20radius)%20and%20monotonic%20in%20every%20argument.%20Doubling%20the%20node%20count%20doubles%20the%20weight;%20doubling%20the%20cluster%20size%20more%20than%20doubles%20it%20%E2%80%94%20the%20boost%20is%20biggest%20for%20small%20clusters%20(going%20from%20two%20copies%20to%20four%20triples%20the%20%3Ccode%3Esize%20%E2%88%92%201%3C/code%3E%20term)%20and%20settles%20toward%20an%20exact%20doubling%20as%20clusters%20grow;%20doubling%20the%20bytes%20adds%20one%20to%20the%20log%20term.%3C/p%3E%3Ch2%20id=%22what-the-formula-deliberately-excludes%22%20tabindex=%22-1%22%3E%3Ca%20class=%22header-anchor%22%20href=%22#what-the-formula-deliberately-excludes%22%3EWhat%20the%20formula%20deliberately%20excludes%3C/a%3E%3C/h2%3E%3Cul%3E%3Cli%3E%3Cstrong%3ELanguage%20weight.%3C/strong%3E%20An%20identical-code%20C#%20duplicate%20and%20an%20identical-code%20Rust%20duplicate%20score%20identically%20if%20their%20nodes%20%C3%97%20(size%20%E2%88%92%201)%20%C3%97%20log%20spans%20match.%20Language%20preferences%20belong%20in%20configuration,%20not%20the%20ranking.%3C/li%3E%3Cli%3E%3Cstrong%3ESignal%20weight.%3C/strong%3E%20The%20ranking%20does%20not%20multiply%20by%20%3Ccode%3Eembedding_cos%3C/code%3E%20or%20%3Ccode%3Estructural%3C/code%3E.%20Those%20signals%20gate%20whether%20a%20cluster%20exists%20at%20all%20(the%20fused%20threshold%20sits%20at%200.85%20in%20%3Ca%20href=%22https://github.com/Nimblesite/Deslop/blob/main/crates/deslop-core/src/pair.rs%22%3E%3Ccode%3Epair.rs%3C/code%3E%3C/a%3E).%20Once%20accepted,%20every%20cluster%20is%20ranked%20on%20the%20same%20scale.%3C/li%3E%3Cli%3E%3Cstrong%3EFile%20age%20/%20churn.%3C/strong%3E%20Tempting,%20and%20wrong.%20Old%20stable%20duplication%20is%20still%20duplication.%20Adding%20a%20churn%20factor%20would%20hide%20long-standing%20problems%20that%20the%20team%20has%20learned%20to%20live%20with%20%E2%80%94%20which%20is%20precisely%20the%20kind%20of%20problem%20Deslop%20should%20surface.%3C/li%3E%3Cli%3E%3Cstrong%3EUser-configurable%20weights.%3C/strong%3E%20Non-negotiable.%20If%20every%20team%20tuned%20their%20own%20weights,%20cross-repo%20comparison%20would%20be%20meaningless,%20and%20&amp;quot;weight%20=%202184&amp;quot;%20in%20a%20blog%20post%20would%20communicate%20nothing.%3C/li%3E%3C/ul%3E%3Ch2%20id=%22the-consequence-of-that-choice%22%20tabindex=%22-1%22%3E%3Ca%20class=%22header-anchor%22%20href=%22#the-consequence-of-that-choice%22%3EThe%20consequence%20of%20that%20choice%3C/a%3E%3C/h2%3E%3Cp%3EBecause%20the%20ranking%20is%20a%20single%20fixed%20formula,%20two%20things%20become%20true:%3C/p%3E%3Col%3E%3Cli%3E%3Cstrong%3EEvery%20report%20is%20comparable.%3C/strong%3E%20The%20worst%20cluster%20in%20your%20repo%20can%20be%20directly%20compared%20to%20the%20worst%20cluster%20in%20someone%20else&#39;s%20repo.%20Numbers%20mean%20the%20same%20thing%20everywhere.%3C/li%3E%3Cli%3E%3Cstrong%3EEvery%20bug%20in%20the%20ranking%20is%20a%20user-visible%20bug.%3C/strong%3E%20If%20I%20change%20the%20formula%20in%20a%20minor%20version,%20every%20CI%20pipeline%20that%20gates%20on%20a%20score%20threshold%20breaks%20silently.%20So%20the%20formula%20is%20load-bearing,%20and%20changes%20go%20through%20the%20same%20review%20bar%20as%20the%20JSON%20schema.%3C/li%3E%3C/ol%3E%3Ch2%20id=%22what-changes-what-doesnt%22%20tabindex=%22-1%22%3E%3Ca%20class=%22header-anchor%22%20href=%22#what-changes-what-doesnt%22%3EWhat%20changes,%20what%20doesn&#39;t%3C/a%3E%3C/h2%3E%3Cp%3ESignals%20evolve.%20The%20embedding%20model%20will%20change.%20The%20LSH%20bands%20will%20be%20retuned.%20Clone-type%20definitions%20may%20pick%20up%20a%20fifth%20category%20for%20ML-generated%20near-misses.%20All%20of%20that%20is%20downstream%20of%20ranking.%3C/p%3E%3Cp%3EThe%20ranking%20formula%20is%20the%20one%20surface%20we%20commit%20to%20keeping%20stable.%20It%20is%20what%20makes%20Deslop%20a%20tool%20you%20can%20trust%20%E2%80%94%20rather%20than%20a%20search%20engine%20that%20returns%20142%20clusters%20in%20insertion%20order.%3C/p%3E</content>
    <summary>Deslop ranks duplicate-code clusters by clone_node_count × (cluster_size − 1) × log2(1 + spanned_bytes). The worst offender is always line one. Here&#39;s why the formula is not configurable.</summary>
  </entry>
  <entry>
    <title>Duplication is the tax LLMs charge for speed</title>
    <link href="https://deslop.live/blog/ai-era-duplication/"/>
    <id>https://deslop.live/blog/ai-era-duplication/</id>
    <updated>2026-04-20T00:00:00Z</updated>
    <content type="html">https://deslop.live/%3Cp%3EEvery%20coding%20agent%20I%20have%20worked%20with%20has%20the%20same%20pathology.%20Given%20a%20request%20that%20resembles%20a%20request%20it%20has%20fulfilled%20before,%20it%20will%20cheerfully%20reach%20for%20the%20same%20shape%20of%20code%20%E2%80%94%20even%20when%20that%20shape%20already%20exists%20in%20the%20repo%20under%20a%20different%20name.%20Multiply%20that%20instinct%20across%20a%20team%20of%20three%20engineers%20and%20four%20agents,%20and%20you%20arrive%20at%20the%20condition%20that%20defines%20the%20current%20era%20of%20software:%20%3Cstrong%3Erepositories%20where%20the%20duplication%20rate%20grows%20faster%20than%20the%20feature%20rate%3C/strong%3E.%3C/p%3E%3Cp%3EThis%20is%20not%20a%20story%20about%20sloppy%20agents.%20The%20pattern-matching%20that%20makes%20an%20LLM%20useful%20is%20the%20same%20pattern-matching%20that%20makes%20it%20duplicate.%20A%20transformer%20does%20not%20know%20your%20repo%20has%20a%20%3Ccode%3EUserRepository%3C/code%3E%20already.%20It%20knows%20that%20the%20training%20distribution%20contains%20a%20shape%20called%20&amp;quot;repository,&amp;quot;%20and%20it%20reproduces%20that%20shape.%20Three%20times,%20in%20three%20files,%20under%20three%20slightly-different%20names.%3C/p%3E%3Ch2%20id=%22why-existing-tools-are-not-the-answer%22%20tabindex=%22-1%22%3E%3Ca%20class=%22header-anchor%22%20href=%22#why-existing-tools-are-not-the-answer%22%3EWhy%20existing%20tools%20are%20not%20the%20answer%3C/a%3E%3C/h2%3E%3Cp%3EThe%20clone-detection%20field%20has%20thirty%20years%20of%20literature%20and%20a%20handful%20of%20production%20tools%20%E2%80%94%20CPD,%20Simian,%20jscpd,%20Sonar%20CPD.%20They%20all%20share%20two%20assumptions%20that%20no%20longer%20hold:%3C/p%3E%3Col%3E%3Cli%3E%3Cstrong%3EDuplication%20is%20an%20occasional%20bug.%3C/strong%3E%20These%20tools%20surface%20duplication%20as%20a%20list,%20sorted%20by%20discovery%20order%20or%20file%20name.%20That%20worked%20when%20the%20occasional%20duplication%20was%20hand-crafted.%20It%20does%20not%20work%20when%20duplication%20is%20the%20default%20state%20of%20the%20repo.%3C/li%3E%3Cli%3E%3Cstrong%3EHumans%20are%20the%20primary%20reader.%3C/strong%3E%20Output%20formats%20assume%20a%20developer%20will%20squint%20at%20a%20table,%20pick%20a%20cluster%20to%20investigate,%20and%20clean%20it%20up%20at%20their%20leisure.%20An%20agent%20cannot%20squint.%20An%20agent%20needs%20byte%20ranges,%20stable%20IDs,%20and%20a%20schema.%3C/li%3E%3C/ol%3E%3Cp%3EDeslop%20rebuilds%20from%20both%20assumptions.%20Output%20is%20ranked%20by%20the%20weighted%20impact%20of%20each%20cluster%20so%20the%20top%20row%20is%20always%20where%20the%20largest%20payoff%20lives.%20Output%20is%20JSON%20first,%20with%20text%20and%20HTML%20as%20views%20over%20the%20same%20schema.%20The%20audience%20is%20dual:%20the%20human%20who%20installs%20it%20and%20the%20agent%20who%20queries%20it.%3C/p%3E%3Ch2%20id=%22fast-feedback-is-the-entire-product%22%20tabindex=%22-1%22%3E%3Ca%20class=%22header-anchor%22%20href=%22#fast-feedback-is-the-entire-product%22%3EFast%20feedback%20is%20the%20entire%20product%3C/a%3E%3C/h2%3E%3Cp%3EThe%20feature%20that%20matters%20most%20is%20not%20the%20breadth%20of%20languages,%20not%20the%20precision%20of%20same-behavior%20detection%20(Type-4),%20not%20the%20cleverness%20of%20the%20fusion.%20It%20is%20%3Cstrong%3Etime%20to%20first%20useful%20signal%3C/strong%3E.%20A%20duplicate%20that%20surfaces%20three%20commits%20after%20it%20lands%20is%20a%20duplicate%20you%20will%20not%20refactor.%20A%20duplicate%20that%20surfaces%20while%20the%20agent%20is%20still%20holding%20the%20file%20open%20is%20a%20duplicate%20you%20fix%20before%20the%20next%20message.%3C/p%3E%3Cp%3ESo%20the%20entire%20pipeline%20is%20tuned%20for%20that.%20The%20cache%20is%20keyed%20so%20unchanged%20files%20are%20free,%20so%20a%20warm%20pass%20only%20re-parses%20the%20files%20you%20just%20touched.%20The%20ranking%20is%20cheap%20%E2%80%94%20two%20multiplications%20and%20a%20logarithm%20per%20cluster.%20The%20LSP%20shell%20ships%20today%20and%20lights%20duplication%20up%20in%20the%20editor%20at%20the%20speed%20of%20a%20spellchecker;%20the%20MCP%20shell%20exposes%20the%20same%20live%20analysis%20to%20Claude,%20Cursor,%20and%20Copilot%20before%20the%20agent%20even%20types%20the%20duplicate.%3C/p%3E%3Cp%3ESpeed%20is%20not%20a%20feature%20of%20Deslop.%20Speed%20is%20the%20whole%20point.%3C/p%3E%3Ch2%20id=%22what-to-do-with-a-finding%22%20tabindex=%22-1%22%3E%3Ca%20class=%22header-anchor%22%20href=%22#what-to-do-with-a-finding%22%3EWhat%20to%20do%20with%20a%20finding%3C/a%3E%3C/h2%3E%3Cp%3EA%20cluster%20in%20a%20Deslop%20report%20is%20a%20decision,%20not%20a%20verdict.%20The%20tool%20reports;%20you%20decide.%20Broadly%20there%20are%20three%20paths:%3C/p%3E%3Cul%3E%3Cli%3E%3Cstrong%3EExtract.%3C/strong%3E%20The%20fragments%20are%20identical%20enough,%20and%20share%20enough%20of%20a%20call%20graph,%20that%20a%20shared%20function%20is%20the%20clear%20answer.%20The%20%3Ccode%3Eaction_hints%3C/code%3E%20in%20the%20JSON%20flag%20these.%3C/li%3E%3Cli%3E%3Cstrong%3EReuse.%3C/strong%3E%20One%20of%20the%20fragments%20is%20the%20&amp;quot;real&amp;quot;%20implementation%20and%20the%20others%20should%20call%20into%20it.%20Pick%20the%20one%20with%20the%20best%20tests%20and%20delete%20the%20others.%3C/li%3E%3Cli%3E%3Cstrong%3EAccept.%3C/strong%3E%20Some%20duplication%20is%20intentional%20%E2%80%94%20test%20fixtures,%20bootstrapping,%20two%20things%20that%20look%20alike%20today%20but%20will%20diverge.%20Annotate%20and%20move%20on.%20Deslop%20does%20not%20judge;%20it%20just%20keeps%20score.%3C/li%3E%3C/ul%3E%3Cp%3EThe%20only%20wrong%20move%20is%20to%20ignore%20the%20top%20of%20the%20report.%20That%20is%20where%20the%20money%20is.%3C/p%3E%3Ch2%20id=%22where-this-goes%22%20tabindex=%22-1%22%3E%3Ca%20class=%22header-anchor%22%20href=%22#where-this-goes%22%3EWhere%20this%20goes%3C/a%3E%3C/h2%3E%3Cp%3EDeslop%20today%20is%20the%20live%20server.%20Two%20cooperating%20processes%20%E2%80%94%20a%20file%20watcher%20and%20LSP%20shell%20in%20one,%20an%20MCP%20shell%20in%20the%20other,%20talking%20over%20a%20local%20IPC%20socket%20%E2%80%94%20plus%20twelve%20MCP%20tools%20the%20agent%20can%20call%20mid-generation.%20Same%20pipeline,%20same%20schema,%20same%20cache%20as%20the%20CLI%20%E2%80%94%20the%20CLI%20is%20now%20the%20cold-cache%20fallback%20for%20CI%20gates.%20The%20VS%20Code%20extension%20bundles%20all%20of%20it%20(LSP,%20MCP,%20CLI)%20in%20a%20single%20VSIX.%20JetBrains%20is%20next.%3C/p%3E%3Cp%3EThe%20primary%20user%20of%20the%20server%20is%20not%20you.%20It%20is%20the%20agent%20you%20are%20pair-programming%20with.%20Which%20is%20as%20it%20should%20be:%20agents%20generate%20duplication;%20agents%20should%20fix%20it%20%E2%80%94%20and,%20with%20%3Ccode%3Efind-similar%3C/code%3E%20in%20their%20inner%20loop,%20agents%20should%20prevent%20it.%3C/p%3E%3Cp%3EInstall%20today.%20Open%20your%20messiest%20repo.%20Read%20line%20one.%3C/p%3E</content>
    <summary>Coding agents duplicate code faster than humans can review it. Why Deslop&#39;s live LSP + MCP server treats duplication as the defining problem of the AI era.</summary>
  </entry>
  <entry>
    <title>AI-Generated Code and Duplicate Code: What to Check</title>
    <link href="https://deslop.live/blog/ai-generated-code-duplicate-code/"/>
    <id>https://deslop.live/blog/ai-generated-code-duplicate-code/</id>
    <updated>2026-04-23T00:00:00Z</updated>
    <content type="html">https://deslop.live/%3Cp%3EIf%20you%20searched%20for%20&amp;quot;AI-generated%20code%20technical%20debt&amp;quot;,%20&amp;quot;duplicate%20code%20detection&amp;quot;,%20or%20&amp;quot;code%20clone%20detection&amp;quot;,%20the%20practical%20answer%20is%20this:%20AI%20does%20not%20have%20to%20generate%20broken%20code%20to%20make%20a%20codebase%20harder%20to%20maintain.%20It%20only%20has%20to%20generate%20the%20same%20idea%20twice,%20in%20two%20slightly%20different%20shapes,%20before%20anyone%20notices.%3C/p%3E%3Cp%3EThat%20is%20the%20duplicate-code%20problem%20in%20the%20AI%20era.%20The%20code%20may%20compile.%20The%20tests%20may%20pass.%20The%20pull%20request%20may%20look%20reasonable.%20But%20the%20repository%20now%20has%20two%20implementations%20that%20need%20the%20same%20future%20fix.%3C/p%3E%3Cp%3EFor%20the%20full%20implementation%20map,%20see%20the%20Deslop%20docs%20page:%20%3Ca%20href=%22/docs/research-background/%22%3EResearch%20Background%3C/a%3E.%20This%20post%20is%20the%20shorter%20version%20for%20teams%20trying%20to%20decide%20what%20to%20check%20first.%3C/p%3E%3Ch2%20id=%22the-search-terms-are-plain-english%22%20tabindex=%22-1%22%3E%3Ca%20class=%22header-anchor%22%20href=%22#the-search-terms-are-plain-english%22%3EThe%20search%20terms%20are%20plain%20English%3C/a%3E%3C/h2%3E%3Cp%3EGoogle%20Trends%20topic%20suggestions%20for%20this%20area%20are%20not%20academic%20labels%20like%20&amp;quot;Type-2%20clone&amp;quot;%20or%20&amp;quot;Type-3%20near-miss&amp;quot;.%20They%20are%20phrases%20engineers%20and%20managers%20actually%20search%20for:%3C/p%3E%3Cul%3E%3Cli%3EAI-generated%20code%3C/li%3E%3Cli%3Etechnical%20debt%3C/li%3E%3Cli%3Eduplicate%20code%3C/li%3E%3Cli%3Ecode%20duplication%3C/li%3E%3Cli%3Ecode%20clone%20detection%3C/li%3E%3Cli%3Evibe%20coding%3C/li%3E%3C/ul%3E%3Cp%3EThose%20phrases%20matter%20because%20they%20describe%20the%20operational%20problem.%20A%20team%20is%20not%20usually%20asking,%20&amp;quot;Do%20I%20have%20a%20Type-3%20clone?&amp;quot;%20It%20is%20asking,%20&amp;quot;Did%20AI%20just%20add%20the%20same%20business%20rule%20in%20three%20places?&amp;quot;%3C/p%3E%3Ch2%20id=%22does-ai-generated-code-create-technical-debt%22%20tabindex=%22-1%22%3E%3Ca%20class=%22header-anchor%22%20href=%22#does-ai-generated-code-create-technical-debt%22%3EDoes%20AI-generated%20code%20create%20technical%20debt?%3C/a%3E%3C/h2%3E%3Cp%3EIt%20can.%20The%20risk%20is%20not%20magic,%20and%20it%20is%20not%20unique%20to%20AI.%20Humans%20have%20copied%20code%20for%20decades.%20The%20change%20is%20throughput.%3C/p%3E%3Cp%3EAI%20coding%20assistants%20can%20produce%20a%20plausible%20repository-shaped%20answer%20quickly.%20When%20the%20prompt%20is%20similar%20to%20a%20previous%20task,%20the%20answer%20often%20has%20a%20familiar%20shape%20too:%20another%20repository%20class,%20another%20validation%20function,%20another%20mapper,%20another%20retry%20wrapper,%20another%20test%20fixture.%20That%20is%20useful%20in%20the%20moment%20and%20expensive%20later.%3C/p%3E%3Cp%3EThe%20research%20direction%20is%20moving%20the%20same%20way:%3C/p%3E%3Cul%3E%3Cli%3E%3Ca%20href=%22https://arxiv.org/abs/2504.12608%22%3ECode%20Copycat%20Conundrum%3C/a%3E%20studies%20repetition%20in%20LLM-generated%20code%20across%20character,%20statement,%20and%20block%20levels.%3C/li%3E%3Cli%3E%3Ca%20href=%22https://conf.researchr.org/details/fse-2025/fse-2025-research-papers/111/An-Empirical-Study-of-Code-Clones-from-Commercial-AI-Code-Generators%22%3EAn%20Empirical%20Study%20of%20Code%20Clones%20from%20Commercial%20AI%20Code%20Generators%3C/a%3E%20reports%20measurable%20Type-1%20and%20Type-2%20clone%20rates%20from%20studied%20commercial%20code%20generators.%3C/li%3E%3Cli%3E%3Ca%20href=%22https://arxiv.org/abs/2603.28592%22%3EDebt%20Behind%20the%20AI%20Boom%3C/a%3E%20studies%20technical%20debt%20introduced%20by%20AI-authored%20commits%20in%20production%20repositories.%3C/li%3E%3C/ul%3E%3Cp%3ENone%20of%20that%20means%20every%20AI-written%20line%20is%20bad.%20It%20means%20AI-generated%20code%20deserves%20the%20same%20repository-level%20appraisal%20as%20human%20code,%20only%20faster.%3C/p%3E%3Ch2%20id=%22what-should-a-duplicate-code-check-look-for%22%20tabindex=%22-1%22%3E%3Ca%20class=%22header-anchor%22%20href=%22#what-should-a-duplicate-code-check-look-for%22%3EWhat%20should%20a%20duplicate-code%20check%20look%20for?%3C/a%3E%3C/h2%3E%3Cp%3EA%20useful%20AI-era%20duplicate-code%20check%20should%20not%20stop%20at%20exact%20line%20matches.%20It%20should%20look%20for%20four%20levels%20of%20similarity:%3C/p%3E%3Col%3E%3Cli%3E%3Cstrong%3EExact%20duplicate%20code%3C/strong%3E:%20the%20same%20code%20copied%20with%20formatting%20or%20comment%20changes.%3C/li%3E%3Cli%3E%3Cstrong%3ERenamed%20duplicate%20code%3C/strong%3E:%20the%20same%20structure%20with%20different%20variable%20names%20or%20constants.%3C/li%3E%3Cli%3E%3Cstrong%3ENear-duplicate%20code%3C/strong%3E:%20mostly%20the%20same%20logic%20with%20inserted,%20deleted,%20or%20reordered%20statements.%3C/li%3E%3Cli%3E%3Cstrong%3ESame%20behavior,%20different%20code%3C/strong%3E:%20two%20implementations%20that%20solve%20the%20same%20problem%20with%20different%20syntax.%3C/li%3E%3C/ol%3E%3Cp%3EClassic%20code%20clone%20detection%20research%20calls%20those%20Type-1,%20Type-2,%20Type-3,%20and%20Type-4%20clones.%20Deslop%20uses%20those%20ideas,%20but%20the%20detailed%20algorithm%20write-up%20lives%20in%20%3Ca%20href=%22/docs/research-background/%22%3EResearch%20Background%3C/a%3E%20so%20this%20post%20does%20not%20repeat%20the%20whole%20docs%20page.%3C/p%3E%3Ch2%20id=%22why-line-matching-is-not-enough%22%20tabindex=%22-1%22%3E%3Ca%20class=%22header-anchor%22%20href=%22#why-line-matching-is-not-enough%22%3EWhy%20line%20matching%20is%20not%20enough%3C/a%3E%3C/h2%3E%3Cp%3ELine-based%20duplicate-code%20tools%20are%20good%20at%20finding%20obvious%20copy-paste.%20They%20are%20weaker%20when%20AI%20changes%20the%20surface%20shape:%3C/p%3E%3Cul%3E%3Cli%3E%3Ccode%3EcustomerId%3C/code%3E%20becomes%20%3Ccode%3EaccountId%3C/code%3E.%3C/li%3E%3Cli%3E%3Ccode%3Eforeach%3C/code%3E%20becomes%20a%20comprehension.%3C/li%3E%3Cli%3Ea%20helper%20is%20copied%20but%20moved%20into%20a%20different%20class.%3C/li%3E%3Cli%3Ethe%20same%20validation%20rule%20is%20rewritten%20with%20a%20different%20branch%20order.%3C/li%3E%3C/ul%3E%3Cp%3EThat%20is%20why%20Deslop%20starts%20from%20parsed%20syntax%20trees%20rather%20than%20raw%20text.%20It%20parses%20each%20file%20with%20tree-sitter,%20then%20strips%20out%20identifier%20and%20literal%20names%20so%20renamed%20copies%20still%20match.%20It%20fingerprints%20the%20tree%20structure,%20widens%20the%20net%20to%20near-duplicates%20with%20sibling%20windows%20and%20MinHash,%20and%20can%20optionally%20add%20embeddings%20for%20same-behavior%20matches.%20The%20short%20version:%20it%20compares%20code%20structure%20first,%20not%20lines%20first.%3C/p%3E%3Cp%3EThe%20full%20audit%20trail%20is%20in%20%3Ca%20href=%22/docs/how-it-works/%22%3EHow%20It%20Works%3C/a%3E%20and%20%3Ca%20href=%22/docs/research-background/%22%3EResearch%20Background%3C/a%3E.%3C/p%3E%3Ch2%20id=%22what-to-do-when-you-find-duplicate-code%22%20tabindex=%22-1%22%3E%3Ca%20class=%22header-anchor%22%20href=%22#what-to-do-when-you-find-duplicate-code%22%3EWhat%20to%20do%20when%20you%20find%20duplicate%20code%3C/a%3E%3C/h2%3E%3Cp%3EDo%20not%20treat%20every%20clone%20as%20a%20bug.%20Treat%20it%20as%20a%20decision.%3C/p%3E%3Cp%3E%3Cstrong%3EExtract%3C/strong%3E%20when%20the%20copies%20are%20clearly%20the%20same%20abstraction%20and%20will%20change%20together.%3C/p%3E%3Cp%3E%3Cstrong%3EReuse%3C/strong%3E%20when%20one%20implementation%20is%20already%20the%20better%20source%20of%20truth%20and%20the%20others%20should%20call%20it.%3C/p%3E%3Cp%3E%3Cstrong%3EAccept%3C/strong%3E%20when%20duplication%20is%20deliberate:%20fixtures,%20generated%20code,%20compatibility%20shims,%20or%20two%20paths%20that%20look%20alike%20now%20but%20are%20expected%20to%20diverge.%3C/p%3E%3Cp%3EThe%20mistake%20is%20not%20accepting%20duplication.%20The%20mistake%20is%20accepting%20it%20accidentally%20because%20no%20one%20measured%20it.%3C/p%3E%3Ch2%20id=%22why-deslop-ranks-findings%22%20tabindex=%22-1%22%3E%3Ca%20class=%22header-anchor%22%20href=%22#why-deslop-ranks-findings%22%3EWhy%20Deslop%20ranks%20findings%3C/a%3E%3C/h2%3E%3Cp%3EA%20duplicate-code%20report%20with%20200%20unordered%20findings%20is%20just%20another%20backlog.%20Deslop%20ranks%20clusters%20by%20impact%20so%20the%20first%20item%20is%20meant%20to%20be%20the%20highest-payoff%20review%20target.%3C/p%3E%3Cp%3EThat%20matters%20for%20AI%20coding%20agents.%20An%20agent%20does%20not%20need%20a%20wall%20of%20clone%20data.%20It%20needs%20a%20small,%20structured%20answer:%3C/p%3E%3Cul%3E%3Cli%3Ewhich%20duplicate%20cluster%20matters%20most,%3C/li%3E%3Cli%3Ewhere%20the%20byte%20ranges%20are,%3C/li%3E%3Cli%3Ewhy%20the%20cluster%20was%20flagged,%3C/li%3E%3Cli%3Ewhether%20the%20signal%20came%20from%20structure,%20token%20similarity,%20or%20embeddings.%3C/li%3E%3C/ul%3E%3Cp%3EThat%20is%20the%20reason%20Deslop%20is%20JSON-first%20and%20why%20the%20LSP/MCP%20path%20exists.%20AI%20can%20create%20duplicate%20code%20quickly;%20the%20feedback%20loop%20has%20to%20be%20just%20as%20close%20to%20the%20edit.%3C/p%3E%3Ch2%20id=%22faq%22%20tabindex=%22-1%22%3E%3Ca%20class=%22header-anchor%22%20href=%22#faq%22%3EFAQ%3C/a%3E%3C/h2%3E%3Ch3%20id=%22is-duplicate-code-always-technical-debt%22%20tabindex=%22-1%22%3E%3Ca%20class=%22header-anchor%22%20href=%22#is-duplicate-code-always-technical-debt%22%3EIs%20duplicate%20code%20always%20technical%20debt?%3C/a%3E%3C/h3%3E%3Cp%3ENo.%20Some%20duplicate%20code%20is%20intentional%20and%20cheaper%20than%20the%20abstraction%20it%20would%20replace.%20Deslop&#39;s%20job%20is%20to%20surface%20the%20evidence,%20not%20force%20a%20refactor.%3C/p%3E%3Ch3%20id=%22is-this-only-an-ai-problem%22%20tabindex=%22-1%22%3E%3Ca%20class=%22header-anchor%22%20href=%22#is-this-only-an-ai-problem%22%3EIs%20this%20only%20an%20AI%20problem?%3C/a%3E%3C/h3%3E%3Cp%3ENo.%20The%20code-clone%20literature%20predates%20modern%20LLMs%20by%20decades.%20AI%20matters%20because%20it%20can%20increase%20how%20quickly%20duplicate%20logic%20appears.%3C/p%3E%3Ch3%20id=%22can-an-llm-just-review-its-own-code-for-duplication%22%20tabindex=%22-1%22%3E%3Ca%20class=%22header-anchor%22%20href=%22#can-an-llm-just-review-its-own-code-for-duplication%22%3ECan%20an%20LLM%20just%20review%20its%20own%20code%20for%20duplication?%3C/a%3E%3C/h3%3E%3Cp%3ESometimes,%20but%20a%20deterministic%20report%20is%20easier%20to%20audit.%20Deslop%20points%20to%20files,%20byte%20ranges,%20signal%20scores,%20and%20report%20schema%20fields.%20An%20agent%20can%20read%20that%20report%20and%20then%20make%20a%20refactor%20plan.%3C/p%3E%3Ch3%20id=%22where-is-the-academic-background%22%20tabindex=%22-1%22%3E%3Ca%20class=%22header-anchor%22%20href=%22#where-is-the-academic-background%22%3EWhere%20is%20the%20academic%20background?%3C/a%3E%3C/h3%3E%3Cp%3EStart%20with%20%3Ca%20href=%22/docs/research-background/%22%3EDeslop&#39;s%20Research%20Background%3C/a%3E,%20then%20follow%20the%20linked%20papers.%20The%20most%20relevant%20entry%20points%20are:%3C/p%3E%3Cul%3E%3Cli%3E%3Ca%20href=%22https://leodemoura.github.io/files/ICSM98.pdf%22%3EClone%20Detection%20Using%20Abstract%20Syntax%20Trees%3C/a%3E%3C/li%3E%3Cli%3E%3Ca%20href=%22https://igm.univ-mlv.fr/~chilowi/research/syntax_tree_fingerprinting/syntax_tree_fingerprinting_ICPC09.pdf%22%3ESyntax%20Tree%20Fingerprinting%20for%20Source%20Code%20Similarity%20Detection%3C/a%3E%3C/li%3E%3Cli%3E%3Ca%20href=%22https://arxiv.org/abs/1512.06448%22%3ESourcererCC:%20Scaling%20Code%20Clone%20Detection%20to%20Big%20Code%3C/a%3E%3C/li%3E%3Cli%3E%3Ca%20href=%22https://arxiv.org/abs/2309.06424%22%3EUnveiling%20the%20potential%20of%20large%20language%20models%20in%20generating%20semantic%20and%20cross-language%20clones%3C/a%3E%3C/li%3E%3Cli%3E%3Ca%20href=%22https://arxiv.org/abs/2509.25754%22%3EAre%20Classical%20Clone%20Detectors%20Good%20Enough%20For%20the%20AI%20Era?%3C/a%3E%3C/li%3E%3C/ul%3E%3Cp%3EAI-generated%20code%20is%20not%20automatically%20bad%20code.%20But%20if%20it%20creates%20duplicate%20code%20faster%20than%20your%20team%20can%20review%20it,%20the%20maintenance%20bill%20is%20real.%20Measure%20it%20while%20the%20code%20is%20still%20fresh.%3C/p%3E</content>
    <summary>AI-generated code can create duplicate code and technical debt. Learn what to check with code clone detection and how Deslop audits AI-era codebases.</summary>
  </entry>
</feed>
