Epinowcast command line interface

@samabbott and I had a productive discussion this morning about the prospects for an epinowcast command line interface (CLI).

In brief, the idea would be that for routine / pipeline style work, epinowcast could be invoked directly from the command line, specifying minimally the data + output location, but also various run configuration options. It should be installable via the standard R package installation process (possibly +1 step to tell command line where to find those scripts to run).

So rather than users writing their own invocation script (for use in, e.g., an HPC environment), they could do something like

$ enw mydata.csv -c myconfig.json # ... or flags for each arg, etc

… which would yield a results file when complete. Obviously, this would be a particular restricted way of running epinowcast, but the main limitation there seems to be not having the preprocessing functions.

Plausible there could also be a complementary distinct CLI for preprocessing, with its own mini language.

$ enwpre raw.csv -c preconfig.json # yields the processed file name

I can imagine that might not be enough, so of course users might still have to do some of their own gnarly preprocessing. If we provided them a template script generator that 1) takes arguments, 2) saves a file, 3) yields that file path and then they just have to fill in the internals, after they do that, could have:

$ enwcust custom.R superraw.csv -c ugly.json

(which might also do some things like check for a library(enwcli) and then use of enwcli::emit() and enwcli::store() functions that ensure standardized outputs)

Then a workflow might look like:

$ enwcust custom.R superraw.csv -c ugly.json | enwpre -c preconfig.json | enw -c myconfig.json

Thoughts?

1 Like

Relative to installing command line tools via the standard R package distribution system: the obvious option to me seems to be

  • provide the CLI tools via inst/...
  • have the package itself have a small set of exported R functions, one of which adds the inst location to the command line path at the user’s request (to comply with CRAN requirements)

The alternatives might be

  • not CRAN, so don’t need the user interaction to do the install (very mild pro, lots of cons)
  • some other installation / build system, reflecting just the normal way people get command line tools (e.g. download this tar.gz, unpack, run its install.sh)
  • some synthesis ala Create a CLI for R with npm - Colin Fay configuration also in inst/ then distribution via npm (looking like R => install the package, run the PATH modification script)

Also, whatever implementation approach taken, the test of that tool would manifest as implementing the vignettes as a (small) collection of artefacts, using a minimal amount of not enwcli code to construct them, and then a readable one-liner to run the vignette from the command line.

1 Like

In debugging the R version issues, I have found rig to be an amazing cli tool. I think we could bootstrap it to include CmdStan downloading and provide consistent paths for command line calls?

1 Like

I think this a great idea and like of great use to people running at scale. If thinking about this I think we would also want to think about how this would interact with being run in a docker container as that might end up being a very common route for a lot of users concerned about reproducibility.

I like the idea of having 3 related but not quite the same tools and having the goal being to reproduce the vignettes as that seems like it should help put guardrails on development.

Of the install options I think this one seems the most natural? Yes it adds another kind of tool but I think the positives outweigh the negatives - especially as most users for this kind of thing are going to be on the more advanced end.

I worry slightly about putting the whole configuration in a JSON as it might be hard to people to specify and maintain but I definitely agree that we want to minimise the number of duplicated arguments in the CLI tool from the main tooling and so this does seem like a good way to go despite that.

Definitely want to be able to pass a config file (otherwise, too many arguments), but: I also think only-JSON is a bad approach, and agree we’re going to have to think carefully about how the arg-flags approach and JSON approach are kept in sync.

If using something like a standard-parsing-into-variables library (getopt, optparse, etc), then validate-load-attach a JSON file should create the exact same set of variables. Solution would likely manifest as translating the opt parsing library spec into JSON validator spec?