Some mksh(1) prompt strings

Choosing one's shell prompt is a serious choice for some -- properly configured, it can provide a lot of information about one's working environment at a glance. For other people, it's important to have a prompt which is aesthetically pleasing, possibly with some colours; after all, it's going to get printed every time you press enter in a terminal, so you're going to see a lot of it.

GNU bash(1) supports a number of backslash escapes in its PS1 string, such as \u for the current username and \w for the current working directory, which many people are used to due to widespread use of bash(1) on Linux systems (indeed, OpenBSD's ksh(1) has acquired a similar capability). However, mksh(1)'s maintainer has chosen not to support special PS1 backslash escapes, as these aren't portable across the numerous platforms which mksh(1) supports, and also requires extending the lexer to special-case processing of PS1 (and no, despite my history of patching mksh, I haven't got round to porting the OpenBSD ksh(1) PS1 processing code to mksh(1) yet, as I've largely managed without them, as I'll get to in a moment).

As a result, there are a number of semi-sophisticated prompt strings which are possible with bash(1) (and ksh(1)) which rely on these backslash escapes which aren't immediately possible with mksh(1). Unless, of course, you get creative.

Preliminaries: substitutions

From mksh(1)'s manual, there are a number of substitutions which mksh(1) can perform. There are the usual POSIX parameter subsitutions (e.g. $foo or ${bar}) and command substitutions (e.g. $(baz) or old-style backtick delimited substitutions, which are really hard to enter into a markdown formatted document). However, mksh(1) has a couple of its own substitutions styles, namely function substitutions (${ command; }, abbreviated as "funsubs"), and value substitutions (${| command; }, abbreviated as "valsubs").

Function substitutions are quite similar to POSIX command substitutions in that the output generated by commands inside the substitutions is captured, however the substitution is executed in the same environment in which the substitution is being evaluated. It's as if the commands inside the substitution were wrapped inside a function, and the function had been called at the point where the substitution occured. This means that funsubs can have strange side-effects, like the following:

$ value=one
$ echo $value
one
$ thing="${ echo i'm inside a substitution; value=two; }"
$ echo $thing
i'm inside a substitution
$ echo $value
two

Value substitutions are exactly the same as function substitutions except that their output is not captured, and instead they evaluate to the value of the special expression-local variable REPLY. Again, fun side-effects can occur:

$ value=one
$ echo $value
$ thing="${| REPLY=beep; echo boo!; value=two; }"
boo!
$ echo $thing
beep
$ echo $value
two

Substitutions are useful for constructing dynamic prompt strings, as various kinds of command substitution can be embedded within PS1 so that commands are run when PS1 is evaluated. Using POSIX command substitutions, however, would result in a subshell being forked every time PS1 is evaluated (which occurs each time it's printed), but as mksh(1)'s valsubs are not executed in a subshell, this means it's possible to write a dynamic prompt which doesn't need any forks (provided we don't use any external commands).

Example 1: Arch-style

Arch Linux's default bash(1) prompt string is \[\u@\h \W\]\$. The important parts to note here are the \u and \h escapes, which are the current user's username and the machine's hostname, and the \W escape. This last one means the rightmost component of the current working directory, with the user's home directory replaced with a tilde. Simply printing the current working directory isn't much of a problem, as one could simply embed "${PWD}" in the prompt string, however this requires a little more processing.

We'll assume that the user's username is set in the USER environment variable, and that the host machine's fully-qualified domain name is in the HOSTNAME shell parameter. One additional gotcha we have to be careful of is that performing a comparison operation in the shell will set the $? parameter to the evaluated value of the comparison -- if we do this inside PS1, then this would clobber the return code of the last executed command, which is stored in $?, so we need to make sure to preserve this.

The logic for calculating the directory to display is as follows: if the user is in their home directory, print a tilde; if they are in the root directory, print a single slash; otherwise, strip all characters from the working directory apart from those following the rightmost slash. If we put this together, we should get:

PS1="[${USER}@${HOSTNAME%%.*} "'${| 
    typeset rc=$?
    if [[ "$PWD" == "$HOME" ]]; then
            REPLY+="~" 
    elif [[ "$PWD" == "/" ]]; then
            REPLY+="/"
    else
            REPLY+=${PWD##*/}
    fi

    REPLY+="]"
    (( USER_ID )) && REPLY+="$ " || REPLY+="# "
    return $rc
}'

Note the valsub is inside single-quotes. This will mean that the string is embedded verbatim in the PS1 variable, which will then be evaluated by the shell each time PS1 is printed.

Example 2: Path abbrevation

This is a prompt string which I've seen a friend use in fish(1), though I don't know where it originates from. The idea is to first replace the user's home directory at the beginning of the current working directory with a tilde, and then abbreviate each directory component apart from the rightmost to its first character. This has the advantage of reducing the amount of terminal width which long paths consume. One tweak I've made to this is to check whether the first character of a component is a dot, and in that case instead use two characters from that component.

For demonstrative purposes, it looks a bit like this:

multi@laptop ~> cd src/git/doas/bsd-compat/
multi@laptop ~/s/g/d/bsd-compat> cd
multi@laptop ~> cd .config/i3
multi@laptop ~/.c/i3> cd
multi@laptop ~> cd /usr/local/bin/
multi@laptop /u/l/bin> cd
multi@laptop ~>

The embedded command substitution needs to do quite a bit more work this time. We special case our home directory and the root directory as with the first example, then we first strip off the user's home directory if present. Then, we need to split the working directory into its components. There's a string splitting trick which uses parameter substitutions and herestrings in the pure bash bible, and conveniently mksh(1) supports exactly the same features (though with a slightly different syntax). We then select one or two characters from the beginning of each component as appropriate, and then append the remaining characters from the last component. It looks like this:

PS1="${USER}@${HOSTNAME%%.*} "'${|
    typeset rc=$?
    typeset pwd="${PWD}"
    typeset oldifs max last len comps part

    if [[ "${pwd}" == "${HOME}" ]]; then
        REPLY+="~"
    elif [[ "${pwd}" == "/" ]]; then
        REPLY+="/"
    else
        if [[ "${pwd}" !!= "${pwd#$HOME}" ]]; then
            REPLY+="~"
            pwd="${pwd#$HOME}"
        fi
        pwd="${pwd#/}"

        # from the "pure bash bible", modified for mksh
        oldifs="${IFS}"
        IFS="'$'\n''"
        read -rA -d "" comps <<< "${pwd//\//'$'\n''}"
        IFS="${oldifs}"

        for part in "${comps[@]}"; do
            [[ "${part:0:1}" == "." ]] && len=2 || len=1
            REPLY+="/${part:0:$len}"
        done

        max=$(( ${#comps[@]} - 1 ))
        REPLY+="${comps[$max]:$len}"
    fi
    REPLY+="> "
    return $rc
}'

This example is interesting in that it makes use of features of mksh(1) like arrays, or string slicing in the REPLY+="/${part:0:$len}" line. Note also in the lines which deal with splitting the path that we exit the single-quoted string in order to insert a literal newline character, which is used as a delimiter by the read builtin.

Example 3: Adelie-style, with colours

Some people like colours in their prompt strings. This is usually achieved by embedding ANSI escape codes in PS1, which are then interpreted by the terminal driver. In particular, the Adelie Linux default prompt string emboldens the hostname, and shows either a green dollar for regular users or a red hash for the root user.

There's a slight problem in that the shell needs to know the length of PS1 after it's evaluated in order to work out whether it needs to perform horizontal scrolling, but these escape sequences don't consume any display width. mksh(1) therefore has a feature (which was inherited from ksh88(1)) where if the second character of PS1 is a carriage return, then the first character is used as a delimiter to turn length counting on and off. This means that escape sequences can safely be embedded in PS1 by surrounding them with a delimiter.

In this case, we're going to use ASCII 0x01 as the delimiter, and we're going to build up PS1 instead of defining it all at once, and we'll pre-declare a couple of useful strings with escape sequences to make the construction a little clearer.

# first, some escaped escape codes for toggling special rendering modes
boldon=$'\1\e[01m\1'
boldoff=$'\1\e[22m\1'
hilighton=$'\1\e[01;36m\1'
hilightoff=$'\1\e[00m\1'

# the green dollar or red hash
greendollar=$'\1\e[1;32m\1$\1\e[00m\1'
redhash=$'\1\e[01;31m\1#\1\e[00m\1'

# this is the important bit, which tells mksh that characters surrounded
# by 0x01 bytes should not be counted towards the prompt's length
PS1=$'\1\r'

# only regular users have their username printed first
(( USER_ID )) && PS1="$PS1${USER} on "
PS1="$PS1$boldon${HOSTNAME%%.*}$boldoff "

# highlight the cwd for root
(( USER_ID )) || PS1="$PS1$hilighton"

# replace $HOME with ~
PS1="$PS1"'${|
    typeset rc=$?
    if [[ "${PWD}" !!= "${PWD#$HOME}" ]]; then
        REPLY+="~${PWD#$HOME}"
    else
        REPLY="${PWD}"
    fi

    return $?
}'

(( USER_ID )) || PS1="$PS1$hilightoff"

# append dollar or hash as necessary
(( USER_ID )) && PS1="$PS1 $greendollar " || PS1="$PS1 $redhash "

This is a little simpler than the second example, however it's a little longer in places due to the differences in the the prompt displayed for root and other users.

Concluding remarks

mksh(1)'s valsubs are a really neat feature, and it's quite nice to use them to construct dynamic prompt strings, as it reduces the overhead which would otherwise be expended by needing to fork a subshell every time PS1 is rendered.

home