summaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
authorDrew DeVault <sir@cmpwn.com>2019-07-08 09:53:39 -0400
committerDrew DeVault <sir@cmpwn.com>2019-07-08 09:53:39 -0400
commit514e693389e25b0a806d59306085434f7e20b0de (patch)
tree22b8c6c73e62f3c4be809e7016069f0a5d14cfb6
parenta42adc3d5138d9841476c84dbea144fe80dbddec (diff)
Add annotations announcement
-rw-r--r--_posts/2019-07-08-Announcing-annotations-for-sourcehut.md229
1 files changed, 229 insertions, 0 deletions
diff --git a/_posts/2019-07-08-Announcing-annotations-for-sourcehut.md b/_posts/2019-07-08-Announcing-annotations-for-sourcehut.md
new file mode 100644
index 0000000..2e8c6dd
--- /dev/null
+++ b/_posts/2019-07-08-Announcing-annotations-for-sourcehut.md
@@ -0,0 +1,229 @@
+---
+layout: post
+title: Announcing code annotations for SourceHut
+---
+
+Today I'm happy to announce that code annotations are now available for
+[SourceHut](https://sourcehut.org)! <img style="display: inline; height: 1.2rem"
+src="/img/party.png" /> These allow you to decorate your code with arbitrary
+links and markdown. The end result looks something like this:
+
+![](https://sr.ht/w767.png)
+
+<small class="text-muted">
+ <a href="https://sourcehut.org">SourceHut</a> is the "hacker's forge", a
+ 100% open-source platform for hosting Git &amp; Mercurial repos, bug trackers,
+ mailing lists, continuous integration, and more. No JavaScript required!
+</small>
+
+The annotations shown here are sourced from a JSON file which you can generate
+and upload during your CI process. It looks something like this:
+
+```json
+{
+ "98bc0394a2f15171fb113acb5a9286a7454f22e7": [
+ {
+ "type": "markdown",
+ "lineno": 33,
+ "title": "1 reference",
+ "content": "- [../main.c:123](https://example.org)"
+ },
+ {
+ "type": "link",
+ "lineno": 38,
+ "colno": 7,
+ "len": 15,
+ "to": "#L6"
+ },
+ ...
+```
+
+You can probably infer from this that annotations are very powerful. Not only
+can you annotate your code's semantic elements to your heart's content, but you
+can also do exotic things we haven't thought of yet, for every programming
+language you can find a parser for.
+
+I'll going to go into some detail on the thought process that went into this
+feature's design and implementation in a moment, but if you're just excited and
+want to try it out, here are a few interesting annoated repos to browse:
+
+- [~sircmpwn/scdoc][scdoc]: man page generator (C)
+- [~sircmpwn/aerc][aerc]: TUI email client (Go)
+- [~mcf/cproc][cproc]: C compiler (C)
+
+[scdoc]: https://git.sr.ht/~sircmpwn/scdoc/tree/master/src/main.c
+[aerc]: https://git.sr.ht/~sircmpwn/aerc/tree/master/widgets/msgviewer.go
+[cproc]: https://git.sr.ht/~mcf/cproc/tree/master/scan.c
+
+And here are the docs for generating your own: [annotations on
+git.sr.ht](https://man.sr.ht/git.sr.ht/annotations.md). Currently annotators are
+available for C and Go, and I intend to write another for Python. For the rest,
+I'll be relying on the community to put together annotators for their favorite
+programming languages, and to help me expand on the ones I've built.
+
+## Design
+
+A lot of design thought went into this feature, but I knew one thing from the
+outset: I wanted to make a generic system that users could use to annotate their
+source code in any manner they chose. My friend Andrew Kelley (of
+[Zig](https://ziglang.org/) fame) once expressed to me his frustration with
+GitHub's refusal to implement syntax highlighting for "small" languages, citing
+a shortage of manpower. It's for this reason that it's important to me that
+SourceHut's open-source platform allows users large and small to volunteer to
+build the perfect integration for their needs - I don't scale alone[^1].
+
+[^1]: For the syntax highlighting problem, by the way, this is accomplished by using Pygments. Improvements to Pygments reach not only SourceHut, but a large community of projects, making the software ecosystem better for everyone.
+
+To get a head start for the most common use-cases - scanning source files and
+linking references and definitions together - the best approach was unclear. I
+spent a lot of time studying [ctags](http://ctags.sourceforge.net/), for
+example, which supports a huge set of programming languages, but unfortunately
+only finds definitions. I thought about combining this with another approach for
+finding references, but the only generic library with lots of parsers I'm aware
+of is [Pygments](http://pygments.org/), and I didn't necessarily want to bring
+Python into every user's CI process if they weren't already using it. That
+approach would also make it more difficult to customize the annotations for each
+language. Other options I considered were
+[cscope](http://cscope.sourceforge.net/) and
+[gtags](https://www.gnu.org/software/global/), but the former doesn't have many
+programming languages supported (making the tradeoff questionable), and the
+latter just uses Pygments anyway.
+
+So I decided: I'm going to write my own annotators for each language. Or at
+least the languages I use the most:
+
+- C, because I like it but also because
+ [scdoc](https://git.sr.ht/~sircmpwn/scdoc) is the demo repo shown on the
+ [SourceHut marketing page](https://sourcehut.org).
+- Python, because SourceHut is largely written in Python and using it to browse
+ itself would be cool.
+- Go, because parts of SourceHut are written in it but also because I use it a
+ lot for [my own projects](https://git.sr.ht/~sircmpwn/aerc). I also knew that
+ Go had at least *some* first-class support for working with its AST - and boy
+ was I in for a surprise.
+
+With these initial languages decided, let's turn to the implementations.
+
+## Annotating C code
+
+I began with the C annotator, because I knew it would be the most difficult.
+There does not exist any widely available standalone C parsing library to
+provide C programs with access to an AST. There's LLVM, but I have a deeply held
+belief that programming language compiler and introspection tooling should be
+implemented in the language itself. So, I set about to write a C parser from
+scratch.
+
+Or, almost from scratch. There exist two standard POSIX tools for writing
+compilers with: [lex][lex] and [yacc][yacc], which are respectively a lexer
+generator and a compiler compiler. Additionally, there are [pre-fab lex and
+yacc files](http://www.quut.com/c/ANSI-C-grammar-y.html) which *mostly*
+implement the C11 standard grammar. However, C is [not a context-free
+language](context), so additional work was necessary to track typedefs and use
+them to change future tokens emitted by the scanner. A little more work was also
+necessary for keeping track of line and column numbers in the lexer. Overall,
+however, this was relatively easy, and in less than a day's work I had a fully
+functional C11 parser.
+
+[lex]: https://pubs.opengroup.org/onlinepubs/9699919799/utilities/lex.html
+[yacc]: http://pubs.opengroup.org/onlinepubs/9699919799/utilities/yacc.html
+[context]: https://eli.thegreenplace.net/2007/11/24/the-context-sensitivity-of-cs-grammar/
+
+However, my celebration was short-lived as I started to feed my parser C
+programs from the wild. The GNU C Compiler, GCC, implements many C extensions,
+and their use, while inadvisable, is extremely common. Not least of the
+offenders is glibc, and thus running my parser on any system with glibc headers
+installed would likely immediately run into syntax errors. GCC's extensions are
+not documented in the form of an addendum to the C specification, but rather as
+end-user documentation and a 15 million lines-of-code compiler for you to
+reverse engineer. It took me almost a week of frustration to get a parser which
+worked passably on a large subset of the C programs found in the wild, and I
+imagine I'll be dealing with GNU problems for years to come. Please don't use C
+extensions, folks.
+
+In any case, the result now works fairly well for a lot of programs, and I have
+plans on expanding it to integrate more nicely with build systems like meson.
+Check out the code here: [anntoatec](https://git.sr.ht/~sircmpwn/annotatec). The
+features of the C annotator include:
+
+- Annotating function definitions with a list of files/linenos which call them
+- Linking function calls to the definition of that function
+
+In the future I intend to add support for linking to external symbols as well -
+for example, linking to the POSIX spec for functions specified by POSIX, or to
+the Linux man pages for Linux calls. It would also be pretty cool to support
+linking between related projects, so that wlroots calls in sway can be linked to
+their declarations in the wlroots repo.
+
+## Annotating Go code
+
+The Go annotator was far easier. I started over my morning cup of coffee today
+and I was finished with the basics by lunch. Go has a bunch of support in the
+standard library for parsing and analyzing Go programs - I was very impressed:
+
+- [go/ast](https://golang.org/pkg/go/ast/)
+- [go/scanner](https://golang.org/pkg/go/scanner/)
+- [go/token](https://golang.org/pkg/go/token/)
+- [go/types](https://golang.org/pkg/go/types/)
+
+To support Go 1.12's go modules, the experimental (but good enough)
+[packages](https://godoc.org/golang.org/x/tools/go/packages) module is available
+as well. All of this is nicely summarized by a lovely document in the [golang
+examples repository](https://github.com/golang/example/tree/master/gotypes). The
+type checker is also available as a library, something which is less common even
+among languages with parsers-as-libraries, and allows for many features which
+would be very difficult without it. Nice work, Go!
+
+The [resulting annotator](https://git.sr.ht/~sircmpwn/annotatego) clocks in at
+just over 250 lines of code - compare that to the C annotator's ~1,300 lines of
+C, lex, and yacc source code. The Go annotator is more featureful, too, it can:
+
+- Link function calls to their definitions, and in reverse
+- Link method calls to their definitions, and in reverse
+- Link variables to their definitions, even in other files
+- Link to godoc for symbols defined in external packages
+
+I expect a lot more to be possible in the future. It might get noisy if you turn
+everything on, so each annotation type is gated behind a command line flag.
+
+## Displaying annotations
+
+Displaying these annotations required a bit more effort than I would have liked,
+but the end result is fairly clean and reusable. Since SourceHut uses Pygments
+for syntax highlighting, I ended up writing a [custom
+Formatter](http://pygments.org/docs/formatterdevelopment/) based on the existing
+Pygments HtmlFormatter. The result is the [AnnotationFormatter][git.sr.ht
+formatter], which splices annotations into the highlighted code. One downside of
+this approach is that it works at the token level - a more sophisticated
+implementation will be necessary for annotations that span more than a single
+token. Annotations are fairly expensive to render, so the rendered HTML is
+stowed in Redis.
+
+[git.sr.ht formatter]: https://git.sr.ht/~sircmpwn/git.sr.ht/tree/master/gitsrht/annotations.py
+
+## The future?
+
+I intend to write a Python annotator soon, and I'll be relying on the community
+to build more. If you're looking for a fun weekend hack and a chance to learn
+more about your favorite programming language, this'd be a great project. The
+format for annotations on SourceHut is also pretty generalizable, so I encourage
+other code forges to reuse it so that our annotators are useful on every code
+hosting platform.
+
+builds.sr.ht will also soon grow first-class support for making these annotators
+available to your build process, as well as for making an OAuth token available
+(ideally with a limited set of permissions) to your build environment. Rigging
+up an annotator is a bit involved today ([though the docs
+help](https://man.sr.ht/git.sr.ht/annotations.md)), and streamlining that
+process will be pretty helpful. Additionally, this feature is only available for
+git.sr.ht, though it should generalize to hg.sr.ht fairly easily and I hope
+we'll see it available there soon.
+
+I'm also looking forward to seeing more novel use-cases for annotation. Can we
+indicate code coverage by coloring a gutter alongside each line of code? Can we
+link references to ticket numbers in the comments to your bug tracker? If you
+have any cool ideas, I'm all ears. Here's that list of cool annotated repos to
+browse again, if you made it this far and want to check them out:
+
+- [~sircmpwn/scdoc][scdoc]: man page generator (C)
+- [~sircmpwn/aerc][aerc]: TUI email client (Go)
+- [~mcf/cproc][cproc]: C compiler (C)