Improving Nashorn shebang Usage

v0.2, Michael Haupt, 2015-12-04

Problem Statement

In the currently available Nashorn releases both in JDK 8u60 and JDK 9, script argument handling by Nashorn is not helpful when JavaScript files use shebang style. This stems from the way the different OS platforms pass command line arguments into Java and consequently Nashorn.

Shebang

To recapitulate, shebang style allows to specify an interpreter for a script file and to pass one optional argument to it. The remainder of the command line is passed to the script. The optional argument should, according to the specification, not contain any white space. Essentially, this means that the rest of the shebang line following the interpreter path should be treated as one single argument.

Nashorn's Problems with shebang Handling

Consider the following shell script script.js:

#!/usr/bin/jjs --
print(arguments)

By means of the -- command line argument for script files / arguments separation, the shebang line intends to have all arguments after the script on the command line to be passed to the script as its own arguments. The expected behaviour of the script is as follows:

$ ./script.js hello world
hello,world

However, this expectation will not be met. Instead, executing the script in the above fashion will drop the user in the jjs REPL, completely ignoring both the script itself and pass all of the command line, including the script itself as arguments to the REPL:

$ ./script.js hello world
jjs> print(arguments)
./script.js,hello,world

The reason lies in what arguments are passed to Java from the shell, and thereby to Nashorn. The args array in Nashorn's Shell.main method will contain the following:

[--,./script.js,hello,world]

Being added by way of shebang handling, the argument separator comes first; consequently, all subsequent arguments will be treated as arguments to the script run by Nashorn, which is not visible due to its placement after the separator. There is no way of reliably determining whether a script is run in shebang mode or from a jjs command line.

To make Nashorn usable for scripting at shell level, it is necessary to provide a consistent usage for shebang arguments.

OS Platform Inconsistencies

Of the three platforms frequently used in Nashorn development, Linux and Cygwin appear to honour the specification request to pass the shebang optional argument as a single string. Mac OS X is different; it parses the optional argument into a collection of arguments that are passed individually. Thereby, Mac OS X essentially violates the specification, and its behaviour should not be relied upon.

Other Scripting Languages, and Nashorn's Position

A brief investigation of Python, Ruby, Perl and GNU Smalltalk reveals how they deal with shell interaction, respectively.

Python

Python has the following command line convention:

python [option] ... [-c cmd | -m mod | file | -] [arg] ...

Only one file may be specified, and all subsequent arguments given on the command line are arguments to that file.

Ruby and Perl

Ruby and Perl share their convention even to the details of the syntax of the command line:

ruby [switches] [--] [programfile] [arguments]
perl [switches] [--] [programfile] [arguments]

Only one file is allowed, and all subsequent arguments are script arguments.

GNU Smalltalk

GNU Smalltalk allows for a little more variety:

gst [ flag ... ] [ file ... ]
gst [ flag ... ] { -f | --file } file [ args ... ]

On the one hand, it is possible to give multiple files for execution, but they cannot be passed dedicated arguments. On the other, if a single file is given, which is indicated by the -f flag, all subsequent arguments are arguments to that file. The latter option thus is the same as the default in Python, Ruby, and Perl.

Nashorn's Take

Nashorn is a hybrid:

jjs [<options>] <files> [-- <arguments>]

It allows to pass many files as well as arguments, which are visible to all of the script files alike. The complications observed above stem from the flexibility.

Proposals

This section discusses several proposals for providing a seamless scripting experience with Nashorn in shebang settings. The overall goal is to be able to pass arguments to shebang-run scripts in a natural way, without having to give extra separators.

The intended behaviour for the aforementioned script.js would be as follows:

$ ./script.js hello world
hello,world

Currently, the desired output can be achieved using the following script source and command line:

#!/usr/local/bin/jjs
print(arguments)

$ ./script.js -- hello world
hello,world

Having to give the extra argument separator is an unnatural deviation from the way scripts are generally being used.

Proposal 1: Full Flexibility

Enable shebang argument handling that supersedes the specification in that it allows multiple arguments on the shebang line.

With this proposal, the shebang line can contain as many arguments as needed:

#!/usr/local/bin/jjs --language=es6 runmefirst.js --
...

This script would put jjs in ES6 mode, run an initial startup script, and accept all invocation arguments as script arguments.

Nashorn would be unique in allowing multiple shebang arguments, and it would gain considerable flexibility from that.

The implementation would involve the following items:

Heuristic analysis of the args array passed to Shell.main to identify the actual script being run.
Preprocessing of the same to place the actual script being run before the script/argument separator.
Potential insertion of a script/argument separator if none is given.
On Linux and Cygwin, parsing of the shebang argument (which is passed as a single string there) to separate it into particular arguments before applying the above steps.

The heuristic analysis is brittle, as a prototypical implementation has revealed.

Proposal 2: Reduced Flexibility

Enable shebang argument handling as specified.

A shebang line may only specify one argument, which is an argument to the interpreter. All other arguments are treated in the usual fashion by Nashorn.

#!/usr/local/bin/jjs --
print(arguments)

This mirrors the status quo but requires some fixing to make shebang usage work properly. In particular, this proposal is based on the assumption that at most the first argument in the args array passed to Shell.main originates from a shebang line.

Heuristic analysis of the args array passed to Shell.main to identify the actual script being run.
Preprocessing of the same to place the actual script being run before the script/argument separator.
Potential insertion of a script/argument separator if none is given.

The heuristic analysis remains brittle, as additional script files may be given on the command line, which it will not be easy to tell apart from script arguments under all circumstances.

Proposal 3: Argument Files

Introduce an idiom to only use an argument file in a shebang line.

This way, a shebang line could be coerced into accepting multiple arguments:

#!/usr/local/bin/jjs @scriptargs
...

The @scriptargs file would contain all of the arguments.

This would allow for adopting the "full flexibility" model outlined above. It also entails the same consequences in terms of implementation, as the arguments are inserted in place where the arguments file is mentioned.

Requiring users to provide an additional file with arguments is somewhat at odds with usual scripting practice.

Proposal 4: Restriction to a Single Script Plus Arguments

Introduce a dedicated mode that allows multiple arguments but only one script file.

The shebang line can mention an arbitrary amount of interpreter arguments, but only the first argument following them will be regarded as the script file to run, and all subsequent arguments will be treated as arguments to the script. The argument designating this dedicated mode must come last on the shebang line.

#!/usr/local/bin/jjs --language=es6 -s
print(arguments)

With the above script file, this invocation

$ ./script.js hello world

will lead to the args array

[--language=es6,-s,./script.js,hello,world]

Implementation of this proposal will entail the following:

Consideration of the argument following -s as the script file to execute.
Replacement of the -s,scriptfile sequence with scriptfile,-- in the args array during preprocessing.
On Linux and Cygwin, parsing of the shebang argument (which is passed as a single string there) to separate it into particular arguments before applying the above steps.

It will not be possible to pass arguments to Nashorn itself when executing a shebang script, unless these arguments are given on the shebang line. It will also not be possible to run extra JavaScript files unless they are given on the shebang line (before -s) or are explicitly loaded in the script itself.

Proposal 5: Implicit Restriction to a Single Script

Introduce a dedicated "silent shebang" mode that allows multiple interpreter arguments but only one script file.

The shebang line can mention an arbitrary amount of interpreter arguments. The first script file encountered in the args array is checked for a shebang line. If it has such a line, Nashorn enters "shebang mode", in which all arguments following the first script file will be considered arguments to the script.

#!/usr/local/bin/jjs --language=es6 --log=compiler
print(arguments)

With the above script file, the invocation

$ ./script.js hello world

will lead to the args array

[--language=es6,--log=compiler,./script.js,hello,world]

Implementation of this proposal will entail the following:

Traversing the args array and checking the first non-option argument for the presence of a shebang line.
In case a shebang line is found in the file described by the first non-option argument, insertion of the script/arguments separator right after that argument.
On Linux and Cygwin, parsing of the shebang argument (which is passed as a single string there) to separate it into particular arguments before applying the above steps.

If the user gives a JavaScript file as one of the shebang arguments, it will be considered the script to run instead of the script that was invoked at the shell prompt, i.e., it is the user's responsibility to provide "sensible" shebang lines.