diff options
author | Wilfred Hughes <me@wilfred.me.uk> | 2021-09-24 23:32:34 -0700 |
---|---|---|
committer | Wilfred Hughes <me@wilfred.me.uk> | 2021-09-24 23:33:25 -0700 |
commit | c9f85d806e26613b3dd73a2e9054d80a6ac1d149 (patch) | |
tree | 73676275712304ce48a12b7a2e26c2d94a8e94af | |
parent | 3d12e56e0f66332638691aed08d7c8ddc03bad46 (diff) |
Document the main tricky cases with tree diffs0.10
-rw-r--r-- | CHANGELOG.md | 4 | ||||
-rw-r--r-- | manual/src/SUMMARY.md | 1 | ||||
-rw-r--r-- | manual/src/tricky_cases.md | 220 |
3 files changed, 225 insertions, 0 deletions
diff --git a/CHANGELOG.md b/CHANGELOG.md index 08bc821cd..aa5e5a482 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -17,6 +17,10 @@ Removed legacy regex-based parsing backend. Some additional runtime optimisations. +### Manual + +Added a chapter on difficult cases for tree diff algorithms. + ## 0.9 ### Parsing diff --git a/manual/src/SUMMARY.md b/manual/src/SUMMARY.md index a40fce0b4..8af4cde8a 100644 --- a/manual/src/SUMMARY.md +++ b/manual/src/SUMMARY.md @@ -5,6 +5,7 @@ - [Parsing](./parsing.md) - [Upstream Parsers](./upstream_parsers.md) - [Diffing](./diffing.md) + - [Tricky Cases](./tricky_cases.md) - [Contributing](./contributing.md) - [Alternative Projects](./alternative_projects.md) - [Tree Diffing](./tree_diffing.md) diff --git a/manual/src/tricky_cases.md b/manual/src/tricky_cases.md new file mode 100644 index 000000000..697ac4375 --- /dev/null +++ b/manual/src/tricky_cases.md @@ -0,0 +1,220 @@ +# Tricky Cases + +Tree diffing is challenging in some situations. This page demonstrates +difficult cases observed during development. + +Not all of these cases work well in difftastic yet. + +## Adding delimiters + +``` +;; Before +x + +;; After +(x) +``` + +This is tricky because `x` has changed its depth in the tree, but `x` +itself is unchanged. + +Not all tree diff algorithms handle this case. It is also challenging +to display this case clearly: we want to highlight the changed +delimiters, but not their content. This is challenging in larger +expressions. + +## Changing Delimiters + +``` +;; Before +(x) + +;; After +[x] +``` + +As with the wrapping case, we want to highlight the delimiters rather +than the `x`. + +## Expanding Delimiters + +``` +;; Before +(x) y + +;; After +(x y) +``` + +In this case, we want to highlight `y`. Highlighting the delimiters +could make `x` look changed. + +## Contracting Delimiters + +``` +;; Before +(x y) + +;; After +(x) y +``` + +This should be highlighted similar to the expanding delimiter case. + +## Reordering Within A List + +``` +;; Before +(x y) + +;; After +(y x) +``` + +We want to highlight the list contents and not the delimiters. + +## Middle Insertions + +``` +// Before +foo(bar(123)); + +// After +foo(extra(bar(123))); +``` + +We want to consider both `foo` and `bar` to be unchanged. This case is +challenging for diffing algorithms that do a bottom-up then top-down +matching of trees. + +## Sliders + +Sliders are a common problem in text based diffs, where lines are +matched in a confusing way. + +They typically look like this. The diff has to arbitrarily choose a +delimiter, and it chooses the wrong one. + +``` ++ } ++ ++ function foo () { + } +``` + +git-diff has some heuristics to reduce the risk of this (e.g. the +"patience diff"), but it can still occur. + +There's a similar problem in tree diffs. + +``` +;; Before +A B +C D + +;; After +A B +A B +C D +``` + +Ideally we'd prefer marking contiguous nodes as novel, so we highlight +`A B` rather than `B\nA`. From the perspective of a +longest-common-subsequence algorithm, these two choices are +equivalent. + +## Minimising Depth Changes + +``` +// Before +if true { + foo(123); +} +foo(456); + +// After +foo(789); +``` + +Do we consider `foo(123)` or `foo(456)` to match with `foo(789)`? +Difftastic prefers `foo(456)` by preferring nodes at the same nesting depth. + +## Replacements + +``` +// Before +function foo(x) { return x + 1; } + +// After +function bar(y) { baz(y); } +``` + +In this example, we've deleted a function and written a completely +different one. A tree-based diff could match up the `function` and the +outer delimiters, resulting in a confusing display showing lots of +small changes. + +As with sliders, the replacement problem can also occur in textual +line-based diffs. Line-diffs struggle if there are a small number of +common lines. The more precise, granular behaviour of tree diffs makes +this problem much more common though. + +## Similar Comments + +``` +// Before +/* The quick brown fox. */ +foobar(); + +// After +/* The slow brown fox. */ +foobaz(); +``` + +`foobar` and `foobaz` are completely different, and should never be +matched up. For comments, we'd rather match up comments that are +textually similar. + +## Multiline Comments + +``` +// Before +/* Hello + * World. */ + +// After +if (x) { + /* Hello + * World. */ +} +``` + +The inner content of these two comments is technically different. We +want to treat them as identical however. + +## Reflowing Doc Comments + +Block comments have prefixes that aren't meaningful. + +``` +// Before +/* The quick brown fox jumps + * over the lazy dog. */ + +// After +/* The quick brown fox immediately + * jumps over the lazy dog. */ +``` + +The inner content has changed from `jumps * over` to `immediately * +jumps over`. However, the `*` is decorative and we don't care that +it's moved. + +## Invalid Syntax + +There's no guarantee that the input we're given is valid syntax. Even +if the code is valid, it might use syntax that isn't supported by the +parser. + +Tree-sitter provided explicit error nodes, and difftastic treats them +as atoms so it can run the same tree diff algorithm regardless. |