summaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
authorWilfred Hughes <me@wilfred.me.uk>2021-09-24 23:32:34 -0700
committerWilfred Hughes <me@wilfred.me.uk>2021-09-24 23:33:25 -0700
commitc9f85d806e26613b3dd73a2e9054d80a6ac1d149 (patch)
tree73676275712304ce48a12b7a2e26c2d94a8e94af
parent3d12e56e0f66332638691aed08d7c8ddc03bad46 (diff)
Document the main tricky cases with tree diffs0.10
-rw-r--r--CHANGELOG.md4
-rw-r--r--manual/src/SUMMARY.md1
-rw-r--r--manual/src/tricky_cases.md220
3 files changed, 225 insertions, 0 deletions
diff --git a/CHANGELOG.md b/CHANGELOG.md
index 08bc821cd..aa5e5a482 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -17,6 +17,10 @@ Removed legacy regex-based parsing backend.
Some additional runtime optimisations.
+### Manual
+
+Added a chapter on difficult cases for tree diff algorithms.
+
## 0.9
### Parsing
diff --git a/manual/src/SUMMARY.md b/manual/src/SUMMARY.md
index a40fce0b4..8af4cde8a 100644
--- a/manual/src/SUMMARY.md
+++ b/manual/src/SUMMARY.md
@@ -5,6 +5,7 @@
- [Parsing](./parsing.md)
- [Upstream Parsers](./upstream_parsers.md)
- [Diffing](./diffing.md)
+ - [Tricky Cases](./tricky_cases.md)
- [Contributing](./contributing.md)
- [Alternative Projects](./alternative_projects.md)
- [Tree Diffing](./tree_diffing.md)
diff --git a/manual/src/tricky_cases.md b/manual/src/tricky_cases.md
new file mode 100644
index 000000000..697ac4375
--- /dev/null
+++ b/manual/src/tricky_cases.md
@@ -0,0 +1,220 @@
+# Tricky Cases
+
+Tree diffing is challenging in some situations. This page demonstrates
+difficult cases observed during development.
+
+Not all of these cases work well in difftastic yet.
+
+## Adding delimiters
+
+```
+;; Before
+x
+
+;; After
+(x)
+```
+
+This is tricky because `x` has changed its depth in the tree, but `x`
+itself is unchanged.
+
+Not all tree diff algorithms handle this case. It is also challenging
+to display this case clearly: we want to highlight the changed
+delimiters, but not their content. This is challenging in larger
+expressions.
+
+## Changing Delimiters
+
+```
+;; Before
+(x)
+
+;; After
+[x]
+```
+
+As with the wrapping case, we want to highlight the delimiters rather
+than the `x`.
+
+## Expanding Delimiters
+
+```
+;; Before
+(x) y
+
+;; After
+(x y)
+```
+
+In this case, we want to highlight `y`. Highlighting the delimiters
+could make `x` look changed.
+
+## Contracting Delimiters
+
+```
+;; Before
+(x y)
+
+;; After
+(x) y
+```
+
+This should be highlighted similar to the expanding delimiter case.
+
+## Reordering Within A List
+
+```
+;; Before
+(x y)
+
+;; After
+(y x)
+```
+
+We want to highlight the list contents and not the delimiters.
+
+## Middle Insertions
+
+```
+// Before
+foo(bar(123));
+
+// After
+foo(extra(bar(123)));
+```
+
+We want to consider both `foo` and `bar` to be unchanged. This case is
+challenging for diffing algorithms that do a bottom-up then top-down
+matching of trees.
+
+## Sliders
+
+Sliders are a common problem in text based diffs, where lines are
+matched in a confusing way.
+
+They typically look like this. The diff has to arbitrarily choose a
+delimiter, and it chooses the wrong one.
+
+```
++ }
++
++ function foo () {
+ }
+```
+
+git-diff has some heuristics to reduce the risk of this (e.g. the
+"patience diff"), but it can still occur.
+
+There's a similar problem in tree diffs.
+
+```
+;; Before
+A B
+C D
+
+;; After
+A B
+A B
+C D
+```
+
+Ideally we'd prefer marking contiguous nodes as novel, so we highlight
+`A B` rather than `B\nA`. From the perspective of a
+longest-common-subsequence algorithm, these two choices are
+equivalent.
+
+## Minimising Depth Changes
+
+```
+// Before
+if true {
+ foo(123);
+}
+foo(456);
+
+// After
+foo(789);
+```
+
+Do we consider `foo(123)` or `foo(456)` to match with `foo(789)`?
+Difftastic prefers `foo(456)` by preferring nodes at the same nesting depth.
+
+## Replacements
+
+```
+// Before
+function foo(x) { return x + 1; }
+
+// After
+function bar(y) { baz(y); }
+```
+
+In this example, we've deleted a function and written a completely
+different one. A tree-based diff could match up the `function` and the
+outer delimiters, resulting in a confusing display showing lots of
+small changes.
+
+As with sliders, the replacement problem can also occur in textual
+line-based diffs. Line-diffs struggle if there are a small number of
+common lines. The more precise, granular behaviour of tree diffs makes
+this problem much more common though.
+
+## Similar Comments
+
+```
+// Before
+/* The quick brown fox. */
+foobar();
+
+// After
+/* The slow brown fox. */
+foobaz();
+```
+
+`foobar` and `foobaz` are completely different, and should never be
+matched up. For comments, we'd rather match up comments that are
+textually similar.
+
+## Multiline Comments
+
+```
+// Before
+/* Hello
+ * World. */
+
+// After
+if (x) {
+ /* Hello
+ * World. */
+}
+```
+
+The inner content of these two comments is technically different. We
+want to treat them as identical however.
+
+## Reflowing Doc Comments
+
+Block comments have prefixes that aren't meaningful.
+
+```
+// Before
+/* The quick brown fox jumps
+ * over the lazy dog. */
+
+// After
+/* The quick brown fox immediately
+ * jumps over the lazy dog. */
+```
+
+The inner content has changed from `jumps * over` to `immediately *
+jumps over`. However, the `*` is decorative and we don't care that
+it's moved.
+
+## Invalid Syntax
+
+There's no guarantee that the input we're given is valid syntax. Even
+if the code is valid, it might use syntax that isn't supported by the
+parser.
+
+Tree-sitter provided explicit error nodes, and difftastic treats them
+as atoms so it can run the same tree diff algorithm regardless.