For AI
This page is the operating manual for AI coding agents — Claude Code, Cursor, GitHub Copilot, Continue, Codex — and the humans configuring them. It is deliberately directive: how to configure Deslop, what to exclude, how to gate CI, and how to parse the report. For wiring deslop-mcp into your client, read AI Integration first.
The one law: call find-similar before you write code
Deslop earns its keep through prevention, not cleanup. Before you author any new function, method, class, helper, fixture, or test setup, call the find-similar MCP tool with the proposed snippet (or its byte range) and read the response:
signals.fused ≥ 0.85, or bucketidentical/nearly_identical→ do not write the copy. Reuse the canonical occurrence the tool returns; extract a shared helper if needed.signals.fused < 0.6, or an empty response → proceed with authoring.0.6 ≤ fused < 0.85→ read the canonical occurrence and bias toward reuse.
find-similar is for authoring. When you are cleaning up existing duplicates, start with top-offenders, then cluster-by-id for the cluster you will merge. The paste-ready rule block for your project's AGENTS.md / CLAUDE.md is in the agent recipe.
| Tool | When to call it |
|---|---|
find-similar |
Before writing new code — does an equivalent already exist? |
top-offenders |
Worst clusters in the workspace, worst first. Start cleanup here. |
cluster-by-id |
Full member list + signals for one cluster you are about to merge. |
report-for-file / report-for-range |
Clusters touching a specific file or selection. |
schema-doc |
Authoritative JSON schema. Call once per session, not per response. |
Configure with .deslop.toml
Deslop reads .deslop.toml from the scan root (or the path passed to --config). No file is required — Deslop ships conservative built-in defaults. Every section and key is optional; omit what you do not need.
# Shared rules, applied to every language.
[defaults]
exclude = ["vendor/**", "third_party/**"] # dropped before parsing — never analysed
report_hide = ["**/*.generated.cs", "**/*.g.cs"] # analysed, but hidden from the ranked report
# Per-language overlays, keyed by the parser language id:
# csharp, rust, python, dart. Overlays EXTEND [defaults]; they never replace it.
[language.csharp]
report_hide = ["**/Migrations/**/*.cs"]
[language.rust]
exclude = ["**/target/**"]
# Opt-in CI gate. Exceeding this exits 3. See "Run in CI" below.
[threshold]
max_duplication_percent = 5.0
# Analysis behaviour.
[analysis]
allow_cross_language_comparison = false # true → compare clones across languages (ports, generated clients)
# Report rendering.
[report]
split_by_language = false # true → one HTML section per language
Keys must live under a section. A bare top-level exclude = [...] with no [defaults] header is silently ignored. Per-language sections are additive: a .rs file is matched against defaults.exclude ∪ language.rust.exclude.
What to exclude
Two tiers, with different semantics. Choose by intent:
| Key | Effect | Use for |
|---|---|---|
exclude |
File is dropped during discovery — never parsed, never counted in analysed_loc, never in any cluster. |
Vendored / third-party code you do not own and do not want analysed at all. |
report_hide |
File is analysed and can anchor a cluster, but each occurrence is flagged hidden: true. A cluster whose members are all hidden drops out of the ranking; a cluster with one visible member stays, so you still see "hand-written code duplicates generated code." |
Generated output you still want to detect hand-written copies of. |
Built-in defaults already cover the common cases — do not re-add them. Excluded by default: node_modules, target, dist, build, .venv, __pycache__, .cargo. Report-hidden by default: any path with a generated component, Alembic migrations under alembic/versions, and the suffixes *.g.cs, *.generated.cs, *.designer.cs, *.pb.cs, *.openapi.cs, *.generated.py, _generated.py, *_pb2.py, *_pb2_grpc.py. Add only project-specific patterns on top.
Globs are gitignore-style and matched against paths relative to the config file.
Run in CI
deslop exits 0 regardless of how much duplication it finds — unless you opt into a gate. Then it exits 3 when the repo-wide duplication_percent exceeds your ceiling. Two ways; the flag wins over the config key:
deslop . --fail-over 5.0 # exit 3 when duplication_percent > 5.0
# .deslop.toml — shared by local runs, CI, and agents
[threshold]
max_duplication_percent = 5.0
--fail-over 0 fails on any duplication. --no-fail-over clears the gate for a single local run. The value must be a finite number in [0.0, 100.0]; anything else exits 2.
Exit codes
| Code | Meaning |
|---|---|
0 |
Succeeded; within threshold, or no threshold set. |
1 |
Runtime error — bad scan path, parse/I-O failure, or an unreachable required embedding provider. Never a panic. |
2 |
Usage error — unknown flag, or an out-of-range / non-finite threshold. |
3 |
Threshold breached. The full report is still written to disk so the offenders can be surfaced. |
GitHub Actions
name: deslop
on: [push, pull_request]
jobs:
duplication-gate:
runs-on: ubuntu-latest
env:
DESLOP_VERSION: "0.1.0" # pin the tool version — see the Releases page
steps:
- uses: actions/checkout@v4
- name: Install the Deslop CLI
run: |
curl -sSfL "https://github.com/Nimblesite/Deslop/releases/download/v${DESLOP_VERSION}/deslop-${DESLOP_VERSION}-linux-x64.tar.gz" | tar -xz
echo "$PWD/deslop-${DESLOP_VERSION}-linux-x64" >> "$GITHUB_PATH"
- name: Gate on duplication
run: deslop . --fail-over 5.0 # or omit --fail-over to use [threshold] in .deslop.toml
- uses: actions/upload-artifact@v4
if: always()
with:
name: deslop-report
path: deslop-report.html
A non-zero exit fails the step. The if: always() upload keeps deslop-report.html even on a breach so a human can browse the offenders.
Read the reports
deslop-report.json is canonical and the only file you should parse — .txt and .html are renderers over it. Call schema-doc once for the authoritative schema, and see Output Formats for the full shape. The decision-relevant slice:
{
"metrics": {
"duplication_percent": 2.63,
"threshold": { "percent": 5.0, "breached": false, "source": "config" }
},
"clusters": [
{
"id": "0362505641efe3c7",
"weight": 1252.8,
"size": 3,
"bucket": "identical",
"signals": { "structural": 1.0, "token_jaccard": 0.98, "embedding_cos": 0.0, "fused": 0.99 },
"occurrences": [ { "path": "src/UserRepository.cs", "start_line": 12, "end_line": 41, "hidden": false } ],
"summary": "3 near-identical copies — safe to extract."
}
]
}
| Field | How to act on it |
|---|---|
metrics.duplication_percent |
The repo-wide headline number the CI gate compares against. |
metrics.threshold.breached |
true → the run exited 3 and the gate failed. source is cli, config, or none. |
clusters |
Sorted by weight descending — clusters[0] is always the worst offender. Work top-down. |
bucket |
identical / nearly_identical → extract a shared definition. loosely_similar → parametrise the difference. same_behavior → reconcile two implementations of one behaviour (needs --embeddings). |
signals.fused |
Unit-bounded confidence. ≥ 0.85 is the act-now line, the same threshold as the find-similar law above. |
occurrences[].hidden |
true marks a report_hide match — a hand-written clone of generated code. |
Do not silence findings by widening the threshold, marking code hidden, or splitting it into trivially different shapes. If Deslop flags it, treat it as a real signal until you have shown otherwise.