diff options
author | Drew DeVault <sir@cmpwn.com> | 2019-07-08 09:53:39 -0400 |
---|---|---|
committer | Drew DeVault <sir@cmpwn.com> | 2019-07-08 09:53:39 -0400 |
commit | 514e693389e25b0a806d59306085434f7e20b0de (patch) | |
tree | 22b8c6c73e62f3c4be809e7016069f0a5d14cfb6 | |
parent | a42adc3d5138d9841476c84dbea144fe80dbddec (diff) |
Add annotations announcement
-rw-r--r-- | _posts/2019-07-08-Announcing-annotations-for-sourcehut.md | 229 |
1 files changed, 229 insertions, 0 deletions
diff --git a/_posts/2019-07-08-Announcing-annotations-for-sourcehut.md b/_posts/2019-07-08-Announcing-annotations-for-sourcehut.md new file mode 100644 index 0000000..2e8c6dd --- /dev/null +++ b/_posts/2019-07-08-Announcing-annotations-for-sourcehut.md @@ -0,0 +1,229 @@ +--- +layout: post +title: Announcing code annotations for SourceHut +--- + +Today I'm happy to announce that code annotations are now available for +[SourceHut](https://sourcehut.org)! <img style="display: inline; height: 1.2rem" +src="/img/party.png" /> These allow you to decorate your code with arbitrary +links and markdown. The end result looks something like this: + +![](https://sr.ht/w767.png) + +<small class="text-muted"> + <a href="https://sourcehut.org">SourceHut</a> is the "hacker's forge", a + 100% open-source platform for hosting Git & Mercurial repos, bug trackers, + mailing lists, continuous integration, and more. No JavaScript required! +</small> + +The annotations shown here are sourced from a JSON file which you can generate +and upload during your CI process. It looks something like this: + +```json +{ + "98bc0394a2f15171fb113acb5a9286a7454f22e7": [ + { + "type": "markdown", + "lineno": 33, + "title": "1 reference", + "content": "- [../main.c:123](https://example.org)" + }, + { + "type": "link", + "lineno": 38, + "colno": 7, + "len": 15, + "to": "#L6" + }, + ... +``` + +You can probably infer from this that annotations are very powerful. Not only +can you annotate your code's semantic elements to your heart's content, but you +can also do exotic things we haven't thought of yet, for every programming +language you can find a parser for. + +I'll going to go into some detail on the thought process that went into this +feature's design and implementation in a moment, but if you're just excited and +want to try it out, here are a few interesting annoated repos to browse: + +- [~sircmpwn/scdoc][scdoc]: man page generator (C) +- [~sircmpwn/aerc][aerc]: TUI email client (Go) +- [~mcf/cproc][cproc]: C compiler (C) + +[scdoc]: https://git.sr.ht/~sircmpwn/scdoc/tree/master/src/main.c +[aerc]: https://git.sr.ht/~sircmpwn/aerc/tree/master/widgets/msgviewer.go +[cproc]: https://git.sr.ht/~mcf/cproc/tree/master/scan.c + +And here are the docs for generating your own: [annotations on +git.sr.ht](https://man.sr.ht/git.sr.ht/annotations.md). Currently annotators are +available for C and Go, and I intend to write another for Python. For the rest, +I'll be relying on the community to put together annotators for their favorite +programming languages, and to help me expand on the ones I've built. + +## Design + +A lot of design thought went into this feature, but I knew one thing from the +outset: I wanted to make a generic system that users could use to annotate their +source code in any manner they chose. My friend Andrew Kelley (of +[Zig](https://ziglang.org/) fame) once expressed to me his frustration with +GitHub's refusal to implement syntax highlighting for "small" languages, citing +a shortage of manpower. It's for this reason that it's important to me that +SourceHut's open-source platform allows users large and small to volunteer to +build the perfect integration for their needs - I don't scale alone[^1]. + +[^1]: For the syntax highlighting problem, by the way, this is accomplished by using Pygments. Improvements to Pygments reach not only SourceHut, but a large community of projects, making the software ecosystem better for everyone. + +To get a head start for the most common use-cases - scanning source files and +linking references and definitions together - the best approach was unclear. I +spent a lot of time studying [ctags](http://ctags.sourceforge.net/), for +example, which supports a huge set of programming languages, but unfortunately +only finds definitions. I thought about combining this with another approach for +finding references, but the only generic library with lots of parsers I'm aware +of is [Pygments](http://pygments.org/), and I didn't necessarily want to bring +Python into every user's CI process if they weren't already using it. That +approach would also make it more difficult to customize the annotations for each +language. Other options I considered were +[cscope](http://cscope.sourceforge.net/) and +[gtags](https://www.gnu.org/software/global/), but the former doesn't have many +programming languages supported (making the tradeoff questionable), and the +latter just uses Pygments anyway. + +So I decided: I'm going to write my own annotators for each language. Or at +least the languages I use the most: + +- C, because I like it but also because + [scdoc](https://git.sr.ht/~sircmpwn/scdoc) is the demo repo shown on the + [SourceHut marketing page](https://sourcehut.org). +- Python, because SourceHut is largely written in Python and using it to browse + itself would be cool. +- Go, because parts of SourceHut are written in it but also because I use it a + lot for [my own projects](https://git.sr.ht/~sircmpwn/aerc). I also knew that + Go had at least *some* first-class support for working with its AST - and boy + was I in for a surprise. + +With these initial languages decided, let's turn to the implementations. + +## Annotating C code + +I began with the C annotator, because I knew it would be the most difficult. +There does not exist any widely available standalone C parsing library to +provide C programs with access to an AST. There's LLVM, but I have a deeply held +belief that programming language compiler and introspection tooling should be +implemented in the language itself. So, I set about to write a C parser from +scratch. + +Or, almost from scratch. There exist two standard POSIX tools for writing +compilers with: [lex][lex] and [yacc][yacc], which are respectively a lexer +generator and a compiler compiler. Additionally, there are [pre-fab lex and +yacc files](http://www.quut.com/c/ANSI-C-grammar-y.html) which *mostly* +implement the C11 standard grammar. However, C is [not a context-free +language](context), so additional work was necessary to track typedefs and use +them to change future tokens emitted by the scanner. A little more work was also +necessary for keeping track of line and column numbers in the lexer. Overall, +however, this was relatively easy, and in less than a day's work I had a fully +functional C11 parser. + +[lex]: https://pubs.opengroup.org/onlinepubs/9699919799/utilities/lex.html +[yacc]: http://pubs.opengroup.org/onlinepubs/9699919799/utilities/yacc.html +[context]: https://eli.thegreenplace.net/2007/11/24/the-context-sensitivity-of-cs-grammar/ + +However, my celebration was short-lived as I started to feed my parser C +programs from the wild. The GNU C Compiler, GCC, implements many C extensions, +and their use, while inadvisable, is extremely common. Not least of the +offenders is glibc, and thus running my parser on any system with glibc headers +installed would likely immediately run into syntax errors. GCC's extensions are +not documented in the form of an addendum to the C specification, but rather as +end-user documentation and a 15 million lines-of-code compiler for you to +reverse engineer. It took me almost a week of frustration to get a parser which +worked passably on a large subset of the C programs found in the wild, and I +imagine I'll be dealing with GNU problems for years to come. Please don't use C +extensions, folks. + +In any case, the result now works fairly well for a lot of programs, and I have +plans on expanding it to integrate more nicely with build systems like meson. +Check out the code here: [anntoatec](https://git.sr.ht/~sircmpwn/annotatec). The +features of the C annotator include: + +- Annotating function definitions with a list of files/linenos which call them +- Linking function calls to the definition of that function + +In the future I intend to add support for linking to external symbols as well - +for example, linking to the POSIX spec for functions specified by POSIX, or to +the Linux man pages for Linux calls. It would also be pretty cool to support +linking between related projects, so that wlroots calls in sway can be linked to +their declarations in the wlroots repo. + +## Annotating Go code + +The Go annotator was far easier. I started over my morning cup of coffee today +and I was finished with the basics by lunch. Go has a bunch of support in the +standard library for parsing and analyzing Go programs - I was very impressed: + +- [go/ast](https://golang.org/pkg/go/ast/) +- [go/scanner](https://golang.org/pkg/go/scanner/) +- [go/token](https://golang.org/pkg/go/token/) +- [go/types](https://golang.org/pkg/go/types/) + +To support Go 1.12's go modules, the experimental (but good enough) +[packages](https://godoc.org/golang.org/x/tools/go/packages) module is available +as well. All of this is nicely summarized by a lovely document in the [golang +examples repository](https://github.com/golang/example/tree/master/gotypes). The +type checker is also available as a library, something which is less common even +among languages with parsers-as-libraries, and allows for many features which +would be very difficult without it. Nice work, Go! + +The [resulting annotator](https://git.sr.ht/~sircmpwn/annotatego) clocks in at +just over 250 lines of code - compare that to the C annotator's ~1,300 lines of +C, lex, and yacc source code. The Go annotator is more featureful, too, it can: + +- Link function calls to their definitions, and in reverse +- Link method calls to their definitions, and in reverse +- Link variables to their definitions, even in other files +- Link to godoc for symbols defined in external packages + +I expect a lot more to be possible in the future. It might get noisy if you turn +everything on, so each annotation type is gated behind a command line flag. + +## Displaying annotations + +Displaying these annotations required a bit more effort than I would have liked, +but the end result is fairly clean and reusable. Since SourceHut uses Pygments +for syntax highlighting, I ended up writing a [custom +Formatter](http://pygments.org/docs/formatterdevelopment/) based on the existing +Pygments HtmlFormatter. The result is the [AnnotationFormatter][git.sr.ht +formatter], which splices annotations into the highlighted code. One downside of +this approach is that it works at the token level - a more sophisticated +implementation will be necessary for annotations that span more than a single +token. Annotations are fairly expensive to render, so the rendered HTML is +stowed in Redis. + +[git.sr.ht formatter]: https://git.sr.ht/~sircmpwn/git.sr.ht/tree/master/gitsrht/annotations.py + +## The future? + +I intend to write a Python annotator soon, and I'll be relying on the community +to build more. If you're looking for a fun weekend hack and a chance to learn +more about your favorite programming language, this'd be a great project. The +format for annotations on SourceHut is also pretty generalizable, so I encourage +other code forges to reuse it so that our annotators are useful on every code +hosting platform. + +builds.sr.ht will also soon grow first-class support for making these annotators +available to your build process, as well as for making an OAuth token available +(ideally with a limited set of permissions) to your build environment. Rigging +up an annotator is a bit involved today ([though the docs +help](https://man.sr.ht/git.sr.ht/annotations.md)), and streamlining that +process will be pretty helpful. Additionally, this feature is only available for +git.sr.ht, though it should generalize to hg.sr.ht fairly easily and I hope +we'll see it available there soon. + +I'm also looking forward to seeing more novel use-cases for annotation. Can we +indicate code coverage by coloring a gutter alongside each line of code? Can we +link references to ticket numbers in the comments to your bug tracker? If you +have any cool ideas, I'm all ears. Here's that list of cool annotated repos to +browse again, if you made it this far and want to check them out: + +- [~sircmpwn/scdoc][scdoc]: man page generator (C) +- [~sircmpwn/aerc][aerc]: TUI email client (Go) +- [~mcf/cproc][cproc]: C compiler (C) |