Excmd.js
Work in progress: Writing a replacement parser for Tridactyl's command-line and rcfiles, in the spirit of Vi's command-line / Vimscript / ex.
Documentation: https://excmd.js.org/
Notable entry points to spelunking:
- Complete JavaScript module index: https://excmd.js.org/globals.html
- Complete OCaml module index: https://excmd.js.org/excmd/Excmd/index.html
- A note on the resolution of ambiguous shellwords: https://excmd.js.org/excmd/Excmd/Expression/index.html#reso
- Parsing entry-point functions: https://excmd.js.org/excmd/Excmd/Parser/index.html#parsing-entry-points
Building & contributing
Under development. Probably.
For now (read: until I, or somebody else, publishes a packaged copy of Menhir to npm!), a local OCaml development-environment, matching the version of BuckleScript's fork of OCaml, is required.
Here's a quick, up-to-date bootstrapping process for ~spring 2021:
git clone https://github.com/ELLIOTTCABLE/excmd.js.git
cd excmd.js
sh <(curl -sL https://raw.githubusercontent.com/ocaml/opam/master/shell/install.sh)
# (... or install opam using your platform's package-manager)
opam switch create ./packages/bs-excmd --deps-only --locked --ignore-constraints-on=ocaml
eval $(opam env --switch=./packages/bs-excmd --set-switch)
# Finally, install JavaScript dependencies, BuckleScript, and kick off the initial build
npm run bootstrap
Thereafter, when returning to the project, and before running lerna run build
or any other OCaml-
dependant commands, you have to remember to run eval $(opam env)
to add the OCaml binaries to
your shell's $PATH
:
# After i.e. `cd ~/Code/excmd.js`
eval $(opam env --switch=./packages/bs-excmd --set-switch)
Finally, after all of the above, you can let Lerna kick off the rest of the build, orchestrated by
bsb
, tsc
, and Ninja, variously:
lerna run build
Directory structure
There are two packages comprising this project, to be published separately to npm:
-
packages/bs-excmd/
: the lexer and parser themselves; written in OCaml using Sedlex and Menhir, and compiled to JavaScript using the ReScript compiler (née BuckleScript), published to npm asbs-excmd
... -
packages/excmd/
: ... and a thin TypeScript wrapper providing idiomatic JavaScript interfaces to the parser modules, published to npm as the primary package,excmd
.
Lerna, a JavaScript-ecosystem monorepo/multi-package management tool, orchestrates the building of these two interdependent subpackages.
Usage from OCaml
If you're hacking on this (or writing something other than JavaScript), it's useful to know that the project has a hybrid build-system, and can be built from the OCaml side (via Dune) or the JavaScript side (via ReScript.)
Handily, Dune supports dynamically building an OCaml interactive toplevel with any/all OCaml modules included:
cd packages/bs-excmd/
dune utop src
For my own expediency when iterating (sry not sry), the actual parser tests (as opposed to tests for the JavaScript interface, the lexing, or the string-handling minutiae) are also written in native OCaml, and evaluated by Dune:
cd packages/bs-excmd/
dune runtest
# After making changes, and verifying that the output is as-expected,
dune promote
Finally, the test-executable can interrogate arbitrary input, dumping the result in the same JSON-format as the tests use:
cd packages/bs-excmd/
dune exec test/parser_test.exe expression "hello"
dune exec test/parser_test.exe script "hello; there; friend"
Debugging tips
-
To debug the parser, these Menhir flags are particularly useful:
--log-automaton 1 --log-code 1 --log-grammar 1 --trace
. I've added those to an alternative"generator"
inpackages/bs-excmd/bsconfig.json
; simply swap the"name": "menhir"
generator belonging to the"parserAutomaton.ml"
edge with the"menhir-with-logging"
one:--- i/packages/bs-excmd/bsconfig.json +++ w/packages/bs-excmd/bsconfig.json @@ -9,7 +9,7 @@ { "name": "prepend-uax31", "edge": ["lexer.ml", ":", "uAX31.ml", "lexer.body.ml"] }, { "name": "menhir-tokens", "edge": ["tokens.ml", "tokens.mli", ":", "parserAutomaton.mly", "tokens.tail.ml", "tokens.tail.mli"] }, { "name": "menhir-lib", "edge": ["menhirLib.ml", "menhirLib.mli", ":", "parserAutomaton.mly"] }, - { "name": "menhir", "edge": ["parserAutomaton.ml", ":", "parserAutomaton.mly", "parserUtils.mly", "tokens.ml"] } + { "name": "menhir-with-logging", "edge": ["parserAutomaton.ml", ":", "parserAutomaton.mly", "parserUtils.mly", "tokens.ml"] } ] } ],
... then re-build all libraries with
lerna run prepare
. -
To debug OCaml implementation-code, it's useful to know that ReScript has a debugging mode that vastly improves the inspector output for data-structures. One thing those docs do not mention, however, is that you only need to add
[%%debugger.chrome]
to a single ML file in the current code-path — this is useful information when debugging a JavaScript interface like ours. (i.e. add the[%%debugger.chrome]
expression tosrc/parser.ml
, even if you're debugging something likesrc/interface.ts
that importsparser.bs.js
.) -
To debug OCaml implementation-code, it's useful to know that ReScript has a debugging mode that vastly improves the inspector output for data-structures. This can be enabled by passing
-bs-g
tobsc
, most easily by adding it to the"bsc-flags"
inbsconfig.json
:--- i/packages/bs-excmd/bsconfig.json +++ w/packages/bs-excmd/bsconfig.json @@ -95,4 +95,5 @@ "suffix": ".bs.js", "bsc-flags": [ + "-bs-g", "-bs-super-errors", "-bs-no-version-header",
Internationalization concerns w.r.t. lexing
I'm going to be broadly following Unicode 11's UAX #31 “Unicode Identifier And Pattern Syntax”; speaking formally, this implementation is planned to conform to requirements ...
-
R1, Default Identifiers: Identifiers begin with
XID_Start
, continue withXID_Continue += [U+200C-U+200D]
(subject to the restrictions below), allowing for medial (non-repeated, non-terminating) instances of the following characters:-
U+002D
:-
HYPHEN-MINUS, -
U+002E
:.
FULL STOP, -
U+00B7
:·
MIDDLE DOT,
... and excluding characters belonging to a script listed in “Candidate Characters for Exclusion from Identifiers” (UAX 31, Table 4).
-
-
R1a, Restricted Format Characters:
U+200C
&D
, that is, the zero-width non-joiners, shall only be parsed in a context necessary to handling the appropriate Farsi, Malayalam, etc. phrases: when breaking a cursive connection (context A1), and in a conjunct (context B.) (NYI!) -
R3,
Pattern_White_Space
andPattern_Syntax
Characters: Arguments and flags (unique to Tridactyl, and not occurring in the original Vimscript) are separated with whitespace, which is exactly the UnicodePattern_White_Space
category. -
R4, Equivalent Normalized Identifiers: The parser yields both display-form (what you typed) and normalized-form (what you meant) output. Where possible, your input should be displayed as-typed; but should be utilized as normalized before comparisons and references.