[Contents]   [Back]   [Prev]   [Up]   [Next]   [Forward]  


Textual Conversion Packages

Precedence Parsing

(require 'precedence-parse) or (require 'parse)

This package implements:

Precedence Parsing Overview

This package offers improvements over previous parsers.

The notion of binding power may be unfamiliar to those accustomed to BNF grammars.

When two consecutive objects are parsed, the first might be the prefix to the second, or the second might be a suffix of the first. Comparing the left and right binding powers of the two objects decides which way to interpret them.

Objects at each level of syntactic grouping have binding powers.

A syntax tree is not built unless the rules explicitly do so. The call graph of grammar rules effectively instantiate the sytnax tree.

The JACAL symbolic math system (http://swiss.csail.mit.edu/~jaffer/JACAL) uses precedence-parse. Its grammar definitions in the file `jacal/English.scm' can serve as examples of use.

Rule Types

Here are the higher-level syntax types and an example of each. Precedence considerations are omitted for clarity. See section Grammar Rule Definition for full details.

Grammar: nofix bye exit
bye

calls the function exit with no arguments.

Grammar: prefix - negate
- 42

Calls the function negate with the argument 42.

Grammar: infix - difference
x - y

Calls the function difference with arguments x and y.

Grammar: nary + sum
x + y + z

Calls the function sum with arguments x, y, and y.

Grammar: postfix ! factorial
5 !

Calls the function factorial with the argument 5.

Grammar: prestfix set set!
set foo bar

Calls the function set! with the arguments foo and bar.

Grammar: commentfix /* */
/* almost any text here */

Ignores the comment delimited by /* and */.

Grammar: matchfix { list }
{0, 1, 2}

Calls the function list with the arguments 0, 1, and 2.

Grammar: inmatchfix ( funcall )
f(x, y)

Calls the function funcall with the arguments f, x, and y.

Grammar: delim ;
set foo bar;

delimits the extent of the restfix operator set.

Ruleset Definition and Use

Variable: *syn-defs*
A grammar is built by one or more calls to prec:define-grammar. The rules are appended to *syn-defs*. The value of *syn-defs* is the grammar suitable for passing as an argument to prec:parse.

Constant: *syn-ignore-whitespace*
Is a nearly empty grammar with whitespace characters set to group 0, which means they will not be made into tokens. Most rulesets will want to start with *syn-ignore-whitespace*

In order to start defining a grammar, either

(set! *syn-defs* '())

or

(set! *syn-defs* *syn-ignore-whitespace*)

Function: prec:define-grammar rule1 ...
Appends rule1 ... to *syn-defs*. prec:define-grammar is used to define both the character classes and rules for tokens.

Once your grammar is defined, save the value of *syn-defs* in a variable (for use when calling prec:parse).

(define my-ruleset *syn-defs*)

Function: prec:parse ruleset delim
Function: prec:parse ruleset delim port
The ruleset argument must be a list of rules as constructed by prec:define-grammar and extracted from *syn-defs*.

The token delim may be a character, symbol, or string. A character delim argument will match only a character token; i.e. a character for which no token-group is assigned. A symbol or string will match only a token string; i.e. a token resulting from a token group.

prec:parse reads a ruleset grammar expression delimited by delim from the given input port. prec:parse returns the next object parsable from the given input port, updating port to point to the first character past the end of the external representation of the object.

If an end of file is encountered in the input before any characters are found that can begin an object, then an end of file object is returned. If a delimiter (such as delim) is found before any characters are found that can begin an object, then #f is returned.

The port argument may be omitted, in which case it defaults to the value returned by current-input-port. It is an error to parse from a closed port.

Token definition

Function: tok:char-group group chars chars-proc
The argument chars may be a single character, a list of characters, or a string. Each character in chars is treated as though tok:char-group was called with that character alone.

The argument chars-proc must be a procedure of one argument, a list of characters. After tokenize has finished accumulating the characters for a token, it calls chars-proc with the list of characters. The value returned is the token which tokenize returns.

The argument group may be an exact integer or a procedure of one character argument. The following discussion concerns the treatment which the tokenizing routine, tokenize, will accord to characters on the basis of their groups.

When group is a non-zero integer, characters whose group number is equal to or exactly one less than group will continue to accumulate. Any other character causes the accumulation to stop (until a new token is to be read).

The group of zero is special. These characters are ignored when parsed pending a token, and stop the accumulation of token characters when the accumulation has already begun. Whitespace characters are usually put in group 0.

If group is a procedure, then, when triggerd by the occurence of an initial (no accumulation) chars character, this procedure will be repeatedly called with each successive character from the input stream until the group procedure returns a non-false value.

The following convenient constants are provided for use with tok:char-group.

Constant: tok:decimal-digits
Is the string "0123456789".
Constant: tok:upper-case
Is the string consisting of all upper-case letters ("ABCDEFGHIJKLMNOPQRSTUVWXYZ").
Constant: tok:lower-case
Is the string consisting of all lower-case letters ("abcdefghijklmnopqrstuvwxyz").
Constant: tok:whitespaces
Is the string consisting of all characters between 0 and 255 for which char-whitespace? returns true.

For the purpose of reporting problems in error messages, this package keeps track of the current column. When the column does not simply track input characters, tok:bump-column can be used to adjust the current-column.

Function: tok:bump-column pos port
Adds pos to the current-column for input-port port.

Nud and Led Definition

This section describes advanced features. You can skip this section on first reading.

The Null Denotation (or nud) of a token is the procedure and arguments applying for that token when Left, an unclaimed parsed expression is not extant.

The Left Denotation (or led) of a token is the procedure, arguments, and lbp applying for that token when there is a Left, an unclaimed parsed expression.

In his paper,

Pratt, V. R. Top Down Operator Precendence. SIGACT/SIGPLAN Symposium on Principles of Programming Languages, Boston, 1973, pages 41-51

the left binding power (or lbp) was an independent property of tokens. I think this was done in order to allow tokens with NUDs but not LEDs to also be used as delimiters, which was a problem for statically defined syntaxes. It turns out that dynamically binding NUDs and LEDs allows them independence.

For the rule-defining procedures that follow, the variable tk may be a character, string, or symbol, or a list composed of characters, strings, and symbols. Each element of tk is treated as though the procedure were called for each element.

Character tk arguments will match only character tokens; i.e. characters for which no token-group is assigned. Symbols and strings will both match token strings; i.e. tokens resulting from token groups.

Function: prec:make-nud tk sop arg1 ...
Returns a rule specifying that sop be called when tk is parsed. If sop is a procedure, it is called with tk and arg1 ... as its arguments; the resulting value is incorporated into the expression being built. Otherwise, (list sop arg1 ...) is incorporated.

If no NUD has been defined for a token; then if that token is a string, it is converted to a symbol and returned; if not a string, the token is returned.

Function: prec:make-led tk sop arg1 ...
Returns a rule specifying that sop be called when tk is parsed and left has an unclaimed parsed expression. If sop is a procedure, it is called with left, tk, and arg1 ... as its arguments; the resulting value is incorporated into the expression being built. Otherwise, left is incorporated.

If no LED has been defined for a token, and left is set, the parser issues a warning.

Grammar Rule Definition

Here are procedures for defining rules for the syntax types introduced in section Precedence Parsing Overview.

For the rule-defining procedures that follow, the variable tk may be a character, string, or symbol, or a list composed of characters, strings, and symbols. Each element of tk is treated as though the procedure were called for each element.

For procedures prec:delim, ..., prec:prestfix, if the sop argument is #f, then the token which triggered this rule is converted to a symbol and returned. A false sop argument to the procedures prec:commentfix, prec:matchfix, or prec:inmatchfix has a different meaning.

Character tk arguments will match only character tokens; i.e. characters for which no token-group is assigned. Symbols and strings will both match token strings; i.e. tokens resulting from token groups.

Function: prec:delim tk
Returns a rule specifying that tk should not be returned from parsing; i.e. tk's function is purely syntactic. The end-of-file is always treated as a delimiter.

Function: prec:nofix tk sop
Returns a rule specifying the following actions take place when tk is parsed:

Function: prec:prefix tk sop bp rule1 ...
Returns a rule specifying the following actions take place when tk is parsed:

Function: prec:infix tk sop lbp bp rule1 ...
Returns a rule declaring the left-binding-precedence of the token tk is lbp and specifying the following actions take place when tk is parsed:

Function: prec:nary tk sop bp
Returns a rule declaring the left-binding-precedence of the token tk is bp and specifying the following actions take place when tk is parsed:

Function: prec:postfix tk sop lbp
Returns a rule declaring the left-binding-precedence of the token tk is lbp and specifying the following actions take place when tk is parsed:

Function: prec:prestfix tk sop bp rule1 ...
Returns a rule specifying the following actions take place when tk is parsed:

Function: prec:commentfix tk stp match rule1 ...
Returns rules specifying the following actions take place when tk is parsed:

Parsing of commentfix syntax differs from the others in several ways. It reads directly from input without tokenizing; It calls stp but does not return its value; nay any value. I added the stp argument so that comment text could be echoed.

Function: prec:matchfix tk sop sep match rule1 ...
Returns a rule specifying the following actions take place when tk is parsed:

Function: prec:inmatchfix tk sop sep match lbp rule1 ...
Returns a rule declaring the left-binding-precedence of the token tk is lbp and specifying the following actions take place when tk is parsed:

Format (version 3.1)

(require 'format) or (require 'srfi-28)

Format Interface

Function: format destination format-string . arguments
An almost complete implementation of Common LISP format description according to the CL reference book Common LISP from Guy L. Steele, Digital Press. Backward compatible to most of the available Scheme format implementations.

Returns #t, #f or a string; has side effect of printing according to format-string. If destination is #t, the output is to the current output port and #t is returned. If destination is #f, a formatted string is returned as the result of the call. NEW: If destination is a string, destination is regarded as the format string; format-string is then the first argument and the output is returned as a string. If destination is a number, the output is to the current error port if available by the implementation. Otherwise destination must be an output port and #t is returned.

format-string must be a string. In case of a formatting error format returns #f and prints a message on the current output or error port. Characters are output as if the string were output by the display function with the exception of those prefixed by a tilde (~). For a detailed description of the format-string syntax please consult a Common LISP format reference manual. For a test suite to verify this format implementation load `formatst.scm'. Please send bug reports to lutzeb@cs.tu-berlin.de.

Note: format is not reentrant, i.e. only one format-call may be executed at a time.

Format Specification (Format version 3.1)

Please consult a Common LISP format reference manual for a detailed description of the format string syntax. For a demonstration of the implemented directives see `formatst.scm'.

This implementation supports directive parameters and modifiers (: and @ characters). Multiple parameters must be separated by a comma (,). Parameters can be numerical parameters (positive or negative), character parameters (prefixed by a quote character ('), variable parameters (v), number of rest arguments parameter (#), empty and default parameters. Directive characters are case independent. The general form of a directive is:

directive ::= ~{directive-parameter,}[:][@]directive-character

directive-parameter ::= [ [-|+]{0-9}+ | 'character | v | # ]

Implemented CL Format Control Directives

Documentation syntax: Uppercase characters represent the corresponding control directive characters. Lowercase characters represent control directive parameter descriptions.

~A
Any (print as display does).
~@A
left pad.
~mincol,colinc,minpad,padcharA
full padding.
~S
S-expression (print as write does).
~@S
left pad.
~mincol,colinc,minpad,padcharS
full padding.
~D
Decimal.
~@D
print number sign always.
~:D
print comma separated.
~mincol,padchar,commacharD
padding.
~X
Hexadecimal.
~@X
print number sign always.
~:X
print comma separated.
~mincol,padchar,commacharX
padding.
~O
Octal.
~@O
print number sign always.
~:O
print comma separated.
~mincol,padchar,commacharO
padding.
~B
Binary.
~@B
print number sign always.
~:B
print comma separated.
~mincol,padchar,commacharB
padding.
~nR
Radix n.
~n,mincol,padchar,commacharR
padding.
~@R
print a number as a Roman numeral.
~:@R
print a number as an "old fashioned" Roman numeral.
~:R
print a number as an ordinal English number.
~R
print a number as a cardinal English number.
~P
Plural.
~@P
prints y and ies.
~:P
as ~P but jumps 1 argument backward.
~:@P
as ~@P but jumps 1 argument backward.
~C
Character.
~@C
prints a character as the reader can understand it (i.e. #\ prefixing).
~:C
prints a character as emacs does (eg. ^C for ASCII 03).
~F
Fixed-format floating-point (prints a flonum like mmm.nnn).
~width,digits,scale,overflowchar,padcharF
~@F
If the number is positive a plus sign is printed.
~E
Exponential floating-point (prints a flonum like mmm.nnnEee).
~width,digits,exponentdigits,scale,overflowchar,padchar,exponentcharE
~@E
If the number is positive a plus sign is printed.
~G
General floating-point (prints a flonum either fixed or exponential).
~width,digits,exponentdigits,scale,overflowchar,padchar,exponentcharG
~@G
If the number is positive a plus sign is printed.
~$
Dollars floating-point (prints a flonum in fixed with signs separated).
~digits,scale,width,padchar$
~@$
If the number is positive a plus sign is printed.
~:@$
A sign is always printed and appears before the padding.
~:$
The sign appears before the padding.
~%
Newline.
~n%
print n newlines.
~&
print newline if not at the beginning of the output line.
~n&
prints ~& and then n-1 newlines.
~|
Page Separator.
~n|
print n page separators.
~~
Tilde.
~n~
print n tildes.
~<newline>
Continuation Line.
~:<newline>
newline is ignored, white space left.
~@<newline>
newline is left, white space ignored.
~T
Tabulation.
~@T
relative tabulation.
~colnum,colincT
full tabulation.
~?
Indirection (expects indirect arguments as a list).
~@?
extracts indirect arguments from format arguments.
~(str~)
Case conversion (converts by string-downcase).
~:(str~)
converts by string-capitalize.
~@(str~)
converts by string-capitalize-first.
~:@(str~)
converts by string-upcase.
~*
Argument Jumping (jumps 1 argument forward).
~n*
jumps n arguments forward.
~:*
jumps 1 argument backward.
~n:*
jumps n arguments backward.
~@*
jumps to the 0th argument.
~n@*
jumps to the nth argument (beginning from 0)
~[str0~;str1~;...~;strn~]
Conditional Expression (numerical clause conditional).
~n[
take argument from n.
~@[
true test conditional.
~:[
if-else-then conditional.
~;
clause separator.
~:;
default clause follows.
~{str~}
Iteration (args come from the next argument (a list)). Iteration bounding is controlled by configuration variables format:iteration-bounded and format:max-iterations. With both variables default, a maximum of 100 iterations will be performed.
~n{
at most n iterations.
~:{
args from next arg (a list of lists).
~@{
args from the rest of arguments.
~:@{
args from the rest args (lists).
~^
Up and out.
~n^
aborts if n = 0
~n,m^
aborts if n = m
~n,m,k^
aborts if n <= m <= k

Not Implemented CL Format Control Directives

~:A
print #f as an empty list (see below).
~:S
print #f as an empty list (see below).
~<~>
Justification.
~:^
(sorry I don't understand its semantics completely)

Extended, Replaced and Additional Control Directives

~mincol,padchar,commachar,commawidthD
~mincol,padchar,commachar,commawidthX
~mincol,padchar,commachar,commawidthO
~mincol,padchar,commachar,commawidthB
~n,mincol,padchar,commachar,commawidthR
commawidth is the number of characters between two comma characters.
~I
print a R4RS complex number as ~F~@Fi with passed parameters for ~F.
~Y
Pretty print formatting of an argument for scheme code lists.
~K
Same as ~?.
~!
Flushes the output if format destination is a port.
~_
Print a #\space character
~n_
print n #\space characters.
~/
Print a #\tab character
~n/
print n #\tab characters.
~nC
Takes n as an integer representation for a character. No arguments are consumed. n is converted to a character by integer->char. n must be a positive decimal number.
~:S
Print out readproof. Prints out internal objects represented as #<...> as strings "#<...>" so that the format output can always be processed by read.
~:A
Print out readproof. Prints out internal objects represented as #<...> as strings "#<...>" so that the format output can always be processed by read.
~Q
Prints information and a copyright notice on the format implementation.
~:Q
prints format version.
~F, ~E, ~G, ~$
may also print number strings, i.e. passing a number as a string and format it accordingly.

Configuration Variables

Format has some configuration variables at the beginning of `format.scm' to suit the systems and users needs. There should be no modification necessary for the configuration that comes with SLIB. If modification is desired the variable should be set after the format code is loaded. Format detects automatically if the running scheme system implements floating point numbers and complex numbers.

format:symbol-case-conv
Symbols are converted by symbol->string so the case type of the printed symbols is implementation dependent. format:symbol-case-conv is a one arg closure which is either #f (no conversion), string-upcase, string-downcase or string-capitalize. (default #f)
format:iobj-case-conv
As format:symbol-case-conv but applies for the representation of implementation internal objects. (default #f)
format:expch
The character prefixing the exponent value in ~E printing. (default #\E)
format:iteration-bounded
When #t, a ~{...~} control will iterate no more than the number of times specified by format:max-iterations regardless of the number of iterations implied by modifiers and arguments. When #f, a ~{...~} control will iterate the number of times implied by modifiers and arguments, unless termination is forced by language or system limitations. (default #t)
format:max-iterations
The maximum number of iterations performed by a ~{...~} control. Has effect only when format:iteration-bounded is #t. (default 100)

Compatibility With Other Format Implementations

SLIB format 2.x:
See `format.doc'.
SLIB format 1.4:
Downward compatible except for padding support and ~A, ~S, ~P, ~X uppercase printing. SLIB format 1.4 uses C-style printf padding support which is completely replaced by the CL format padding style.
MIT C-Scheme 7.1:
Downward compatible except for ~, which is not documented (ignores all characters inside the format string up to a newline character). (7.1 implements ~a, ~s, ~newline, ~~, ~%, numerical and variable parameters and :/@ modifiers in the CL sense).
Elk 1.5/2.0:
Downward compatible except for ~A and ~S which print in uppercase. (Elk implements ~a, ~s, ~~, and ~% (no directive parameters or modifiers)).
Scheme->C 01nov91:
Downward compatible except for an optional destination parameter: S2C accepts a format call without a destination which returns a formatted string. This is equivalent to a #f destination in S2C. (S2C implements ~a, ~s, ~c, ~%, and ~~ (no directive parameters or modifiers)).

This implementation of format is solely useful in the SLIB context because it requires other components provided by SLIB.

Standard Formatted I/O

stdio

(require 'stdio)

requires printf and scanf and additionally defines the symbols:

Variable: stdin
Defined to be (current-input-port).
Variable: stdout
Defined to be (current-output-port).
Variable: stderr
Defined to be (current-error-port).

Standard Formatted Output

(require 'printf)

Procedure: printf format arg1 ...
Procedure: fprintf port format arg1 ...
Procedure: sprintf str format arg1 ...
Procedure: sprintf #f format arg1 ...
Procedure: sprintf k format arg1 ...

Each function converts, formats, and outputs its arg1 ... arguments according to the control string format argument and returns the number of characters output.

printf sends its output to the port (current-output-port). fprintf sends its output to the port port. sprintf string-set!s locations of the non-constant string argument str to the output characters.

Two extensions of sprintf return new strings. If the first argument is #f, then the returned string's length is as many characters as specified by the format and data; if the first argument is a non-negative integer k, then the length of the returned string is also bounded by k.

The string format contains plain characters which are copied to the output stream, and conversion specifications, each of which results in fetching zero or more of the arguments arg1 .... The results are undefined if there are an insufficient number of arguments for the format. If format is exhausted while some of the arg1 ... arguments remain unused, the excess arg1 ... arguments are ignored.

The conversion specifications in a format string have the form:

% [ flags ] [ width ] [ . precision ] [ type ] conversion

An output conversion specifications consist of an initial `%' character followed in sequence by:

Exact Conversions

`b', `B'
Print an integer as an unsigned binary number. Note: `%b' and `%B' are SLIB extensions.
`d', `i'
Print an integer as a signed decimal number. `%d' and `%i' are synonymous for output, but are different when used with scanf for input (see section Standard Formatted Input).
`o'
Print an integer as an unsigned octal number.
`u'
Print an integer as an unsigned decimal number.
`x', `X'
Print an integer as an unsigned hexadecimal number. `%x' prints using the digits `0123456789abcdef'. `%X' prints using the digits `0123456789ABCDEF'.

Inexact Conversions

`f'
Print a floating-point number in fixed-point notation.
`e', `E'
Print a floating-point number in exponential notation. `%e' prints `e' between mantissa and exponont. `%E' prints `E' between mantissa and exponont.
`g', `G'
Print a floating-point number in either fixed or exponential notation, whichever is more appropriate for its magnitude. Unless an `#' flag has been supplied, trailing zeros after a decimal point will be stripped off. `%g' prints `e' between mantissa and exponont. `%G' prints `E' between mantissa and exponent.
`k', `K'
Print a number like `%g', except that an SI prefix is output after the number, which is scaled accordingly. `%K' outputs a dot between number and prefix, `%k' does not.

Other Conversions

`c'
Print a single character. The `-' flag is the only one which can be specified. It is an error to specify a precision.
`s'
Print a string. The `-' flag is the only one which can be specified. A precision specifies the maximum number of characters to output; otherwise all characters in the string are output.
`a', `A'
Print a scheme expression. The `-' flag left-justifies the output. The `#' flag specifies that strings and characters should be quoted as by write (which can be read using read); otherwise, output is as display prints. A precision specifies the maximum number of characters to output; otherwise as many characters as needed are output. Note: `%a' and `%A' are SLIB extensions.
`%'
Print a literal `%' character. No argument is consumed. It is an error to specify flags, field width, precision, or type modifiers with `%%'.

Standard Formatted Input

(require 'scanf)

Function: scanf-read-list format
Function: scanf-read-list format port
Function: scanf-read-list format string

Macro: scanf format arg1 ...
Macro: fscanf port format arg1 ...
Macro: sscanf str format arg1 ...

Each function reads characters, interpreting them according to the control string format argument.

scanf-read-list returns a list of the items specified as far as the input matches format. scanf, fscanf, and sscanf return the number of items successfully matched and stored. scanf, fscanf, and sscanf also set the location corresponding to arg1 ... using the methods:

symbol
set!
car expression
set-car!
cdr expression
set-cdr!
vector-ref expression
vector-set!
substring expression
substring-move-left!

The argument to a substring expression in arg1 ... must be a non-constant string. Characters will be stored starting at the position specified by the second argument to substring. The number of characters stored will be limited by either the position specified by the third argument to substring or the length of the matched string, whichever is less.

The control string, format, contains conversion specifications and other characters used to direct interpretation of input sequences. The control string contains:

Unless the specification contains the `n' conversion character (described below), a conversion specification directs the conversion of the next input field. The result of a conversion specification is returned in the position of the corresponding argument points, unless `*' indicates assignment suppression. Assignment suppression provides a way to describe an input field to be skipped. An input field is defined as a string of characters; it extends to the next inappropriate character or until the field width, if specified, is exhausted.

Note: This specification of format strings differs from the ANSI C and POSIX specifications. In SLIB, white space before an input field is not skipped unless white space appears before the conversion specification in the format string. In order to write format strings which work identically with ANSI C and SLIB, prepend whitespace to all conversion specifications except `[' and `c'.

The conversion code indicates the interpretation of the input field; For a suppressed field, no value is returned. The following conversion codes are legal:

`%'
A single % is expected in the input at this point; no value is returned.
`d', `D'
A decimal integer is expected.
`u', `U'
An unsigned decimal integer is expected.
`o', `O'
An octal integer is expected.
`x', `X'
A hexadecimal integer is expected.
`i'
An integer is expected. Returns the value of the next input item, interpreted according to C conventions; a leading `0' implies octal, a leading `0x' implies hexadecimal; otherwise, decimal is assumed.
`n'
Returns the total number of bytes (including white space) read by scanf. No input is consumed by %n.
`f', `F', `e', `E', `g', `G'
A floating-point number is expected. The input format for floating-point numbers is an optionally signed string of digits, possibly containing a radix character `.', followed by an optional exponent field consisting of an `E' or an `e', followed by an optional `+', `-', or space, followed by an integer.
`c', `C'
Width characters are expected. The normal skip-over-white-space is suppressed in this case; to read the next non-space character, use `%1s'. If a field width is given, a string is returned; up to the indicated number of characters is read.
`s', `S'
A character string is expected The input field is terminated by a white-space character. scanf cannot read a null string.
`['
Indicates string data and the normal skip-over-leading-white-space is suppressed. The left bracket is followed by a set of characters, called the scanset, and a right bracket; the input field is the maximal sequence of input characters consisting entirely of characters in the scanset. `^', when it appears as the first character in the scanset, serves as a complement operator and redefines the scanset as the set of all characters not contained in the remainder of the scanset string. Construction of the scanset follows certain conventions. A range of characters may be represented by the construct first-last, enabling `[0123456789]' to be expressed `[0-9]'. Using this convention, first must be lexically less than or equal to last; otherwise, the dash stands for itself. The dash also stands for itself when it is the first or the last character in the scanset. To include the right square bracket as an element of the scanset, it must appear as the first character (possibly preceded by a `^') of the scanset, in which case it will not be interpreted syntactically as the closing bracket. At least one character must match for this conversion to succeed.

The scanf functions terminate their conversions at end-of-file, at the end of the control string, or when an input character conflicts with the control string. In the latter case, the offending character is left unread in the input stream.

Program and Arguments

Getopt

(require 'getopt)

This routine implements Posix command line argument parsing. Notice that returning values through global variables means that getopt is not reentrant.

Obedience to Posix format for the getopt calls sows confusion. Passing argc and argv as arguments while referencing optind as a global variable leads to strange behavior, especially when the calls to getopt are buried in other procedures.

Even in C, argc can be derived from argv; what purpose does it serve beyond providing an opportunity for argv/argc mismatch? Just such a mismatch existed for years in a SLIB getopt-- example.

I have removed the argc and argv arguments to getopt procedures; and replaced them with a global variable:

Variable: *argv*
Define *argv* with a list of arguments before calling getopt procedures. If you don't want the first (0th) element to be ignored, set *optind* to 0 (after requiring getopt).

Variable: *optind*
Is the index of the current element of the command line. It is initially one. In order to parse a new command line or reparse an old one, *optind* must be reset.

Variable: *optarg*
Is set by getopt to the (string) option-argument of the current option.

Function: getopt optstring
Returns the next option letter in *argv* (starting from (vector-ref argv *optind*)) that matches a letter in optstring. *argv* is a vector or list of strings, the 0th of which getopt usually ignores. optstring is a string of recognized option characters; if a character is followed by a colon, the option takes an argument which may be immediately following it in the string or in the next element of *argv*.

*optind* is the index of the next element of the *argv* vector to be processed. It is initialized to 1 by `getopt.scm', and getopt updates it when it finishes with each element of *argv*.

getopt returns the next option character from *argv* that matches a character in optstring, if there is one that matches. If the option takes an argument, getopt sets the variable *optarg* to the option-argument as follows:

If, when getopt is called, the string (vector-ref argv *optind*) either does not begin with the character #\- or is just "-", getopt returns #f without changing *optind*. If (vector-ref argv *optind*) is the string "--", getopt returns #f after incrementing *optind*.

If getopt encounters an option character that is not contained in optstring, it returns the question-mark #\? character. If it detects a missing option argument, it returns the colon character #\: if the first character of optstring was a colon, or a question-mark character otherwise. In either case, getopt sets the variable getopt:opt to the option character that caused the error.

The special option "--" can be used to delimit the end of the options; #f is returned, and "--" is skipped.

RETURN VALUE

getopt returns the next option character specified on the command line. A colon #\: is returned if getopt detects a missing argument and the first character of optstring was a colon #\:.

A question-mark #\? is returned if getopt encounters an option character not in optstring or detects a missing argument and the first character of optstring was not a colon #\:.

Otherwise, getopt returns #f when all command line options have been parsed.

Example:

#! /usr/local/bin/scm
(require 'program-arguments)
(require 'getopt)
(define argv (program-arguments))

(define opts ":a:b:cd")
(let loop ((opt (getopt (length argv) argv opts)))
  (case opt
    ((#\a) (print "option a: " *optarg*))
    ((#\b) (print "option b: " *optarg*))
    ((#\c) (print "option c"))
    ((#\d) (print "option d"))
    ((#\?) (print "error" getopt:opt))
    ((#\:) (print "missing arg" getopt:opt))
    ((#f) (if (< *optind* (length argv))
              (print "argv[" *optind* "]="
                     (list-ref argv *optind*)))
          (set! *optind* (+ *optind* 1))))
  (if (< *optind* (length argv))
      (loop (getopt (length argv) argv opts))))

(slib:exit)

Getopt---

Function: getopt-- optstring
The procedure getopt-- is an extended version of getopt which parses long option names of the form `--hold-the-onions' and `--verbosity-level=extreme'. Getopt-- behaves as getopt except for non-empty options beginning with `--'.

Options beginning with `--' are returned as strings rather than characters. If a value is assigned (using `=') to a long option, *optarg* is set to the value. The `=' and value are not returned as part of the option string.

No information is passed to getopt-- concerning which long options should be accepted or whether such options can take arguments. If a long option did not have an argument, *optarg* will be set to #f. The caller is responsible for detecting and reporting errors.

(define opts ":-:b:")
(define *argv* '("foo" "-b9" "--f1" "--2=" "--g3=35234.342" "--"))
(define *optind* 1)
(define *optarg* #f)
(require 'qp)
(do ((i 5 (+ -1 i)))
    ((zero? i))
  (let ((opt (getopt-- opts)))
    (print *optind* opt *optarg*)))
-|
2 #\b "9"
3 "f1" #f
4 "2" ""
5 "g3" "35234.342"
5 #f "35234.342"

Command Line

(require 'read-command)

Function: read-command port

Function: read-command
read-command converts a command line into a list of strings suitable for parsing by getopt. The syntax of command lines supported resembles that of popular shells. read-command updates port to point to the first character past the command delimiter.

If an end of file is encountered in the input before any characters are found that can begin an object or comment, then an end of file object is returned.

The port argument may be omitted, in which case it defaults to the value returned by current-input-port.

The fields into which the command line is split are delimited by whitespace as defined by char-whitespace?. The end of a command is delimited by end-of-file or unescaped semicolon (;) or newline. Any character can be literally included in a field by escaping it with a backslach (\).

The initial character and types of fields recognized are:

`\'
The next character has is taken literally and not interpreted as a field delimiter. If \ is the last character before a newline, that newline is just ignored. Processing continues from the characters after the newline as though the backslash and newline were not there.
`"'
The characters up to the next unescaped " are taken literally, according to [R4RS] rules for literal strings (see section `Strings' in Revised(4) Scheme).
`(', `%''
One scheme expression is read starting with this character. The read expression is evaluated, converted to a string (using display), and replaces the expression in the returned field.
`;'
Semicolon delimits a command. Using semicolons more than one command can appear on a line. Escaped semicolons and semicolons inside strings do not delimit commands.

The comment field differs from the previous fields in that it must be the first character of a command or appear after whitespace in order to be recognized. # can be part of fields if these conditions are not met. For instance, ab#c is just the field ab#c.

`#'
Introduces a comment. The comment continues to the end of the line on which the semicolon appears. Comments are treated as whitespace by read-dommand-line and backslashes before newlines in comments are also ignored.

Function: read-options-file filename

read-options-file converts an options file into a list of strings suitable for parsing by getopt. The syntax of options files is the same as the syntax for command lines, except that newlines do not terminate reading (only ; or end of file).

If an end of file is encountered before any characters are found that can begin an object or comment, then an end of file object is returned.

Parameter lists

(require 'parameters)

Arguments to procedures in scheme are distinguished from each other by their position in the procedure call. This can be confusing when a procedure takes many arguments, many of which are not often used.

A parameter-list is a way of passing named information to a procedure. Procedures are also defined to set unused parameters to default values, check parameters, and combine parameter lists.

A parameter has the form (parameter-name value1 ...). This format allows for more than one value per parameter-name.

A parameter-list is a list of parameters, each with a different parameter-name.

Function: make-parameter-list parameter-names
Returns an empty parameter-list with slots for parameter-names.

Function: parameter-list-ref parameter-list parameter-name
parameter-name must name a valid slot of parameter-list. parameter-list-ref returns the value of parameter parameter-name of parameter-list.

Function: remove-parameter parameter-name parameter-list
Removes the parameter parameter-name from parameter-list. remove-parameter does not alter the argument parameter-list.

If there are more than one parameter-name parameters, an error is signaled.

Procedure: adjoin-parameters! parameter-list parameter1 ...
Returns parameter-list with parameter1 ... merged in.

Procedure: parameter-list-expand expanders parameter-list
expanders is a list of procedures whose order matches the order of the parameter-names in the call to make-parameter-list which created parameter-list. For each non-false element of expanders that procedure is mapped over the corresponding parameter value and the returned parameter lists are merged into parameter-list.

This process is repeated until parameter-list stops growing. The value returned from parameter-list-expand is unspecified.

Function: fill-empty-parameters defaulters parameter-list
defaulters is a list of procedures whose order matches the order of the parameter-names in the call to make-parameter-list which created parameter-list. fill-empty-parameters returns a new parameter-list with each empty parameter replaced with the list returned by calling the corresponding defaulter with parameter-list as its argument.

Function: check-parameters checks parameter-list
checks is a list of procedures whose order matches the order of the parameter-names in the call to make-parameter-list which created parameter-list.

check-parameters returns parameter-list if each check of the corresponding parameter-list returns non-false. If some check returns #f a warning is signaled.

In the following procedures arities is a list of symbols. The elements of arities can be:

single
Requires a single parameter.
optional
A single parameter or no parameter is acceptable.
boolean
A single boolean parameter or zero parameters is acceptable.
nary
Any number of parameters are acceptable.
nary1
One or more of parameters are acceptable.

Function: parameter-list->arglist positions arities parameter-list
Returns parameter-list converted to an argument list. Parameters of arity type single and boolean are converted to the single value associated with them. The other arity types are converted to lists of the value(s).

positions is a list of positive integers whose order matches the order of the parameter-names in the call to make-parameter-list which created parameter-list. The integers specify in which argument position the corresponding parameter should appear.

Getopt Parameter lists

(require 'getopt-parameters)

Function: getopt->parameter-list optnames arities types aliases desc ...

Returns *argv* converted to a parameter-list. optnames are the parameter-names. arities and types are lists of symbols corresponding to optnames.

aliases is a list of lists of strings or integers paired with elements of optnames. Each one-character string will be treated as a single `-' option by getopt. Longer strings will be treated as long-named options (see section Getopt).

If the aliases association list has only strings as its cars, then all the option-arguments after an option (and before the next option) are adjoined to that option.

If the aliases association list has integers, then each (string) option will take at most one option-argument. Unoptioned arguments are collected in a list. A `-1' alias will take the last argument in this list; `+1' will take the first argument in the list. The aliases -2 then +2; -3 then +3; ... are tried so long as a positive or negative consecutive alias is found and arguments remain in the list. Finally a `0' alias, if found, absorbs any remaining arguments.

In all cases, if unclaimed arguments remain after processing, a warning is signaled and #f is returned.

Function: getopt->arglist optnames positions arities types defaulters checks aliases desc ...

Like getopt->parameter-list, but converts *argv* to an argument-list as specified by optnames, positions, arities, types, defaulters, checks, and aliases. If the options supplied violate the arities or checks constraints, then a warning is signaled and #f is returned.

These getopt functions can be used with SLIB relational databases. For an example, See section Using Databases.

If errors are encountered while processing options, directions for using the options (and argument strings desc ...) are printed to current-error-port.

(begin
  (set! *optind* 1)
  (set! *argv* '("cmd" "-?")
  (getopt->parameter-list
   '(flag number symbols symbols string flag2 flag3 num2 num3)
   '(boolean optional nary1 nary single boolean boolean nary nary)
   '(boolean integer symbol symbol string boolean boolean integer integer)
   '(("flag" flag)
     ("f" flag)
     ("Flag" flag2)
     ("B" flag3)
     ("optional" number)
     ("o" number)
     ("nary1" symbols)
     ("N" symbols)
     ("nary" symbols)
     ("n" symbols)
     ("single" string)
     ("s" string)
     ("a" num2)
     ("Abs" num3))))
-|
Usage: cmd [OPTION ARGUMENT ...] ...

  -f, --flag
  -o, --optional=<number>
  -n, --nary=<symbols> ...
  -N, --nary1=<symbols> ...
  -s, --single=<string>
      --Flag
  -B
  -a        <num2> ...
      --Abs=<num3> ...

ERROR: getopt->parameter-list "unrecognized option" "-?"

Filenames

(require 'filename)

Function: filename:match?? pattern
Function: filename:match-ci?? pattern

Returns a predicate which returns a non-false value if its string argument matches (the string) pattern, false otherwise. Filename matching is like glob expansion described the bash manpage, except that names beginning with `.' are matched and `/' characters are not treated specially.

These functions interpret the following characters specially in pattern strings:

`*'
Matches any string, including the null string.
`?'
Matches any single character.
`[...]'
Matches any one of the enclosed characters. A pair of characters separated by a minus sign (-) denotes a range; any character lexically between those two characters, inclusive, is matched. If the first character following the `[' is a `!' or a `^' then any character not enclosed is matched. A `-' or `]' may be matched by including it as the first or last character in the set.

Function: filename:substitute?? pattern template
Function: filename:substitute-ci?? pattern template

Returns a function transforming a single string argument according to glob patterns pattern and template. pattern and template must have the same number of wildcard specifications, which need not be identical. pattern and template may have a different number of literal sections. If an argument to the function matches pattern in the sense of filename:match?? then it returns a copy of template in which each wildcard specification is replaced by the part of the argument matched by the corresponding wildcard specification in pattern. A * wildcard matches the longest leftmost string possible. If the argument does not match pattern then false is returned.

template may be a function accepting the same number of string arguments as there are wildcard specifications in pattern. In the case of a match the result of applying template to a list of the substrings matched by wildcard specifications will be returned, otherwise template will not be called and #f will be returned.

((filename:substitute?? "scm_[0-9]*.html" "scm5c4_??.htm")
 "scm_10.html")
=> "scm5c4_10.htm"
((filename:substitute?? "??" "beg?mid?end") "AZ")
=> "begAmidZend"
((filename:substitute?? "*na*" "?NA?") "banana")
=> "banaNA"
((filename:substitute?? "?*?" (lambda (s1 s2 s3) (string-append s3 s1)))
 "ABZ")
=> "ZA"

Function: replace-suffix str old new

str can be a string or a list of strings. Returns a new string (or strings) similar to str but with the suffix string old removed and the suffix string new appended. If the end of str does not match old, an error is signaled.

(replace-suffix "/usr/local/lib/slib/batch.scm" ".scm" ".c")
=> "/usr/local/lib/slib/batch.c"

Function: call-with-tmpnam proc k

Function: call-with-tmpnam proc
Calls proc with k arguments, strings returned by successive calls to tmpnam. If proc returns, then any files named by the arguments to proc are deleted automatically and the value(s) yielded by the proc is(are) returned. k may be ommited, in which case it defaults to 1.

Function: call-with-tmpnam proc suffix1 ...
Calls proc with strings returned by successive calls to tmpnam, each with the corresponding suffix string appended. If proc returns, then any files named by the arguments to proc are deleted automatically and the value(s) yielded by the proc is(are) returned.

Batch

(require 'batch)

The batch procedures provide a way to write and execute portable scripts for a variety of operating systems. Each batch: procedure takes as its first argument a parameter-list (see section Parameter lists). This parameter-list argument parms contains named associations. Batch currently uses 2 of these:

batch-port
The port on which to write lines of the batch file.
batch-dialect
The syntax of batch file to generate. Currently supported are:

The `batch' module uses 2 enhanced relational tables (see section Using Databases) to store information linking the names of operating-systems to batch-dialectes.

Function: batch:initialize! database
Defines operating-system and batch-dialect tables and adds the domain operating-system to the enhanced relational database database.

Variable: *operating-system*
Is batch's best guess as to which operating-system it is running under. *operating-system* is set to (software-type) (see section Configuration) unless (software-type) is unix, in which case finer distinctions are made.

Function: batch:call-with-output-script parms file proc
proc should be a procedure of one argument. If file is an output-port, batch:call-with-output-script writes an appropriate header to file and then calls proc with file as the only argument. If file is a string, batch:call-with-output-script opens a output-file of name file, writes an appropriate header to file, and then calls proc with the newly opened port as the only argument. Otherwise, batch:call-with-output-script acts as if it was called with the result of (current-output-port) as its third argument.

The rest of the batch: procedures write (or execute if batch-dialect is system) commands to the batch port which has been added to parms or (copy-tree parms) by the code:

(adjoin-parameters! parms (list 'batch-port port))

Function: batch:command parms string1 string2 ...
Calls batch:try-command (below) with arguments, but signals an error if batch:try-command returns #f.

These functions return a non-false value if the command was successfully translated into the batch dialect and #f if not. In the case of the system dialect, the value is non-false if the operation suceeded.

Function: batch:try-command parms string1 string2 ...
Writes a command to the batch-port in parms which executes the program named string1 with arguments string2 ....

Function: batch:try-chopped-command parms arg1 arg2 ... list
breaks the last argument list into chunks small enough so that the command:
arg1 arg2 ... chunk

fits withing the platform's maximum command-line length.

batch:try-chopped-command calls batch:try-command with the command and returns non-false only if the commands all fit and batch:try-command of each command line returned non-false.

Function: batch:run-script parms string1 string2 ...
Writes a command to the batch-port in parms which executes the batch script named string1 with arguments string2 ....

Note: batch:run-script and batch:try-command are not the same for some operating systems (VMS).

Function: batch:comment parms line1 ...
Writes comment lines line1 ... to the batch-port in parms.

Function: batch:lines->file parms file line1 ...
Writes commands to the batch-port in parms which create a file named file with contents line1 ....

Function: batch:delete-file parms file
Writes a command to the batch-port in parms which deletes the file named file.

Function: batch:rename-file parms old-name new-name
Writes a command to the batch-port in parms which renames the file old-name to new-name.

In addition, batch provides some small utilities very useful for writing scripts:

Function: truncate-up-to path char
Function: truncate-up-to path string
Function: truncate-up-to path charlist
path can be a string or a list of strings. Returns path sans any prefixes ending with a character of the second argument. This can be used to derive a filename moved locally from elsewhere.
(truncate-up-to "/usr/local/lib/slib/batch.scm" "/")
=> "batch.scm"

Function: string-join joiner string1 ...
Returns a new string consisting of all the strings string1 ... in order appended together with the string joiner between each adjacent pair.

Function: must-be-first list1 list2
Returns a new list consisting of the elements of list2 ordered so that if some elements of list1 are equal? to elements of list2, then those elements will appear first and in the order of list1.

Function: must-be-last list1 list2
Returns a new list consisting of the elements of list1 ordered so that if some elements of list2 are equal? to elements of list1, then those elements will appear last and in the order of list2.

Function: os->batch-dialect osname
Returns its best guess for the batch-dialect to be used for the operating-system named osname. os->batch-dialect uses the tables added to database by batch:initialize!.

Here is an example of the use of most of batch's procedures:

(require 'databases)
(require 'parameters)
(require 'batch)
(require 'filename)

(define batch (create-database #f 'alist-table))
(batch:initialize! batch)

(define my-parameters
  (list (list 'batch-dialect (os->batch-dialect *operating-system*))
        (list 'operating-system *operating-system*)
        (list 'batch-port (current-output-port)))) ;gets filled in later

(batch:call-with-output-script
 my-parameters
 "my-batch"
 (lambda (batch-port)
   (adjoin-parameters! my-parameters (list 'batch-port batch-port))
   (and
    (batch:comment my-parameters
                   "================ Write file with C program.")
    (batch:rename-file my-parameters "hello.c" "hello.c~")
    (batch:lines->file my-parameters "hello.c"
                       "#include <stdio.h>"
                       "int main(int argc, char **argv)"
                       "{"
                       "  printf(\"hello world\\n\");"
                       "  return 0;"
                       "}" )
    (batch:command my-parameters "cc" "-c" "hello.c")
    (batch:command my-parameters "cc" "-o" "hello"
                  (replace-suffix "hello.c" ".c" ".o"))
    (batch:command my-parameters "hello")
    (batch:delete-file my-parameters "hello")
    (batch:delete-file my-parameters "hello.c")
    (batch:delete-file my-parameters "hello.o")
    (batch:delete-file my-parameters "my-batch")
    )))

Produces the file `my-batch':

#! /bin/sh
# "my-batch" script created by SLIB/batch Sun Oct 31 18:24:10 1999
# ================ Write file with C program.
mv -f hello.c hello.c~
rm -f hello.c
echo '#include <stdio.h>'>>hello.c
echo 'int main(int argc, char **argv)'>>hello.c
echo '{'>>hello.c
echo '  printf("hello world\n");'>>hello.c
echo '  return 0;'>>hello.c
echo '}'>>hello.c
cc -c hello.c
cc -o hello hello.o
hello
rm -f hello
rm -f hello.c
rm -f hello.o
rm -f my-batch

When run, `my-batch' prints:

bash$ my-batch
mv: hello.c: No such file or directory
hello world

HTML

(require 'html-form)

Function: html:atval txt
Returns a string with character substitutions appropriate to send txt as an attribute-value.

Function: html:plain txt
Returns a string with character substitutions appropriate to send txt as an plain-text.

Function: html:meta name content
Returns a tag of meta-information suitable for passing as the third argument to html:head. The tag produced is `<META NAME="name" CONTENT="content">'. The string or symbol name can be `author', `copyright', `keywords', `description', `date', `robots', ....

Function: html:http-equiv name content
Returns a tag of HTTP information suitable for passing as the third argument to html:head. The tag produced is `<META HTTP-EQUIV="name" CONTENT="content">'. The string or symbol name can be `Expires', `PICS-Label', `Content-Type', `Refresh', ....

Function: html:meta-refresh delay uri

Function: html:meta-refresh delay

Returns a tag suitable for passing as the third argument to html:head. If uri argument is supplied, then delay seconds after displaying the page with this tag, Netscape or IE browsers will fetch and display uri. Otherwise, delay seconds after displaying the page with this tag, Netscape or IE browsers will fetch and redisplay this page.

Function: html:head title backlink tags ...

Function: html:head title backlink

Function: html:head title

Returns header string for an HTML page named title. If backlink is a string, it is used verbatim between the `H1' tags; otherwise title is used. If string arguments tags ... are supplied, then they are included verbatim within the <HEAD> section.

Function: html:body body ...
Returns HTML string to end a page.

Function: html:pre line1 line ...
Returns the strings line1, lines as PREformmated plain text (rendered in fixed-width font). Newlines are inserted between line1, lines. HTML tags (`<tag>') within lines will be visible verbatim.

Function: html:comment line1 line ...
Returns the strings line1 as HTML comments.

HTML Forms

Function: html:form method action body ...
The symbol method is either get, head, post, put, or delete. The strings body form the body of the form. html:form returns the HTML form.

Function: html:hidden name value
Returns HTML string which will cause name=value in form.

Function: html:checkbox pname default
Returns HTML string for check box.

Function: html:text pname default size ...
Returns HTML string for one-line text box.

Function: html:text-area pname default-list
Returns HTML string for multi-line text box.

Function: html:select pname arity default-list foreign-values
Returns HTML string for pull-down menu selector.

Function: html:buttons pname arity default-list foreign-values
Returns HTML string for any-of selector.

Function: form:submit submit-label command

Function: form:submit submit-label

The string or symbol submit-label appears on the button which submits the form. If the optional second argument command is given, then *command*=command and *button*=submit-label are set in the query. Otherwise, *command*=submit-label is set in the query.

Function: form:image submit-label image-src
The image-src appears on the button which submits the form.

Function: form:reset
Returns a string which generates a reset button.

Function: form:element pname arity default-list foreign-values
Returns a string which generates an INPUT element for the field named pname. The element appears in the created form with its representation determined by its arity and domain. For domains which are foreign-keys:
single
select menu
optional
select menu
nary
check boxes
nary1
check boxes

If the foreign-key table has a field named `visible-name', then the contents of that field are the names visible to the user for those choices. Otherwise, the foreign-key itself is visible.

For other types of domains:

single
text area
optional
text area
boolean
check box
nary
text area
nary1
text area

Function: form:delimited pname doc aliat arity default-list foreign-values

Returns a HTML string for a form element embedded in a line of a delimited list. Apply map form:delimited to the list returned by command->p-specs.

Function: html:delimited-list row ...
Wraps its arguments with delimited-list (`DL' command.

Function: get-foreign-choices tab
Returns a list of the `visible-name' or first fields of table tab.

Function: command->p-specs rdb command-table command

The symbol command-table names a command table in the rdb relational database. The symbol command names a key in command-table.

command->p-specs returns a list of lists of pname, doc, aliat, arity, default-list, and foreign-values. The returned list has one element for each parameter of command command.

This example demonstrates how to create a HTML-form for the `build' command.

(require (in-vicinity (implementation-vicinity) "build.scm"))
(call-with-output-file "buildscm.html"
  (lambda (port)
    (display
     (string-append
      (html:head 'commands)
      (html:body
       (sprintf #f "<H2>%s:</H2><BLOCKQUOTE>%s</BLOCKQUOTE>\\n"
                (html:plain 'build)
                (html:plain ((comtab 'get 'documentation) 'build)))
       (html:form
        'post
        (or "http://localhost:8081/buildscm" "/cgi-bin/build.cgi")
        (apply html:delimited-list
               (apply map form:delimited
                      (command->p-specs build '*commands* 'build)))
        (form:submit 'build)
        (form:reset))))
     port)))

HTML Tables

(require 'db->html)

Function: html:table options row ...

Function: html:caption caption align

Function: html:caption caption
align can be `top' or `bottom'.

Function: html:heading columns
Outputs a heading row for the currently-started table.

Function: html:href-heading columns uris
Outputs a heading row with column-names columns linked to URIs uris.

Function: html:linked-row-converter k foreigns

The positive integer k is the primary-key-limit (number of primary-keys) of the table. foreigns is a list of the filenames of foreign-key field pages and #f for non foreign-key fields.

html:linked-row-converter returns a procedure taking a row for its single argument. This returned procedure returns the html string for that table row.

Function: table-name->filename table-name

Returns the symbol table-name converted to a filename.

Function: table->linked-html caption db table-name match-key1 ...

Returns HTML string for db table table-name chopped into 50-row HTML tables. Every foreign-key value is linked to the page (of the table) defining that key.

The optional match-key1 ... arguments restrict actions to a subset of the table. See section Table Operations.

Function: table->linked-page db table-name index-filename arg ...

Returns a complete HTML page. The string index-filename names the page which refers to this one.

The optional args ... arguments restrict actions to a subset of the table. See section Table Operations.

Function: catalog->html db caption arg ...

Returns HTML string for the catalog table of db.

HTML editing tables

A client can modify one row of an editable table at a time. For any change submitted, these routines check if that row has been modified during the time the user has been editing the form. If so, an error page results.

The behavior of edited rows is:

After any change to the table, a sync-database of the database is performed.

Function: command:modify-table table-name null-keys update delete retrieve

Function: command:modify-table table-name null-keys update delete

Function: command:modify-table table-name null-keys update

Function: command:modify-table table-name null-keys

Returns procedure (of db) which returns procedure to modify row of table-name. null-keys is the list of null keys indicating the row is to be deleted when any matches its corresponding primary key. Optional arguments update, delete, and retrieve default to the row:update, row:delete, and row:retrieve of table-name in db.

Function: command:make-editable-table rdb table-name arg ...
Given table-name in rdb, creates parameter and *command* tables for editing one row of table-name at a time. command:make-editable-table returns a procedure taking a row argument which returns the HTML string for editing that row.

Optional args are expressions (lists) added to the call to command:modify-table.

The domain name of a column determines the expected arity of the data stored in that column. Domain names ending in:

`*'
have arity `nary';
`+'
have arity `nary1'.

Function: html:editable-row-converter k names edit-point edit-converter

The positive integer k is the primary-key-limit (number of primary-keys) of the table. names is a list of the field-names. edit-point is the list of primary-keys denoting the row to edit (or #f). edit-converter is the procedure called with k, names, and the row to edit.

html:editable-row-converter returns a procedure taking a row for its single argument. This returned procedure returns the html string for that table row.

Each HTML table constructed using html:editable-row-converter has first k fields (typically the primary key fields) of each row linked to a text encoding of these fields (the result of calling row->anchor). The page so referenced typically allows the user to edit fields of that row.

HTML databases

Function: db->html-files db dir index-filename caption
db must be a relational database. dir must be #f or a non-empty string naming an existing sub-directory of the current directory.

db->html-files creates an html page for each table in the database db in the sub-directory named dir, or the current directory if dir is #f. The top level page with the catalog of tables (captioned caption) is written to a file named index-filename.

Function: db->html-directory db dir index-filename

Function: db->html-directory db dir
db must be a relational database. dir must be a non-empty string naming an existing sub-directory of the current directory or one to be created. The optional string index-filename names the filename of the top page, which defaults to `index.html'.

db->html-directory creates sub-directory dir if neccessary, and calls (db->html-files db dir index-filename dir). The `file:' URI of index-filename is returned.

Function: db->netscape db dir index-filename

Function: db->netscape db dir
db->netscape is just like db->html-directory, but calls browse-url with the uri for the top page after the pages are created.

HTTP and CGI

(require 'http) or (require 'cgi)

Function: http:header alist
Returns a string containing lines for each element of alist; the car of which is followed by `: ', then the cdr.

Function: http:content alist body ...
Returns the concatenation of strings body with the (http:header alist) and the `Content-Length' prepended.

Variable: *http:byline*
String appearing at the bottom of error pages.

Function: http:error-page status-code reason-phrase html-string ...
status-code and reason-phrase should be an integer and string as specified in RFC 2068. The returned page (string) will show the status-code and reason-phrase and any additional html-strings ...; with *http:byline* or SLIB's default at the bottom.

Function: http:forwarding-page title dly uri html-string ...
The string or symbol title is the page title. dly is a non-negative integer. The html-strings ... are typically used to explain to the user why this page is being forwarded.

http:forwarding-page returns an HTML string for a page which automatically forwards to uri after dly seconds. The returned page (string) contains any html-strings ... followed by a manual link to uri, in case the browser does not forward automatically.

Function: http:serve-query serve-proc input-port output-port
reads the URI and query-string from input-port. If the query is a valid `"POST"' or `"GET"' query, then http:serve-query calls serve-proc with three arguments, the request-line, query-string, and header-alist. Otherwise, http:serve-query calls serve-proc with the request-line, #f, and header-alist.

If serve-proc returns a string, it is sent to output-port. If serve-proc returns a list, then an error page with number 525 and strings from the list. If serve-proc returns #f, then a `Bad Request' (400) page is sent to output-port.

Otherwise, http:serve-query replies (to output-port) with appropriate HTML describing the problem.

This example services HTTP queries from port-number:


(define socket (make-stream-socket AF_INET 0))
(and (socket:bind socket port-number) ; AF_INET INADDR_ANY
     (socket:listen socket 10)        ; Queue up to 10 requests.
     (dynamic-wind
         (lambda () #f)
         (lambda ()
           (do ((port (socket:accept socket) (socket:accept socket)))
               (#f)
             (let ((iport (duplicate-port port "r"))
                   (oport (duplicate-port port "w")))
               (http:serve-query build:serve iport oport)
               (close-port iport)
               (close-port oport))
             (close-port port)))
         (lambda () (close-port socket))))

Function: cgi:serve-query serve-proc
reads the URI and query-string from (current-input-port). If the query is a valid `"POST"' or `"GET"' query, then cgi:serve-query calls serve-proc with three arguments, the request-line, query-string, and header-alist. Otherwise, cgi:serve-query calls serve-proc with the request-line, #f, and header-alist.

If serve-proc returns a string, it is sent to (current-input-port). If serve-proc returns a list, then an error page with number 525 and strings from the list. If serve-proc returns #f, then a `Bad Request' (400) page is sent to (current-input-port).

Otherwise, cgi:serve-query replies (to (current-input-port)) with appropriate HTML describing the problem.

Function: make-query-alist-command-server rdb command-table

Function: make-query-alist-command-server rdb command-table #t

Returns a procedure of one argument. When that procedure is called with a query-alist (as returned by uri:decode-query, the value of the `*command*' association will be the command invoked in command-table. If `*command*' is not in the query-alist then the value of `*suggest*' is tried. If neither name is in the query-alist, then the literal value `*default*' is tried in command-table.

If optional third argument is non-false, then the command is called with just the parameter-list; otherwise, command is called with the arguments described in its table.

Parsing HTML

(require 'html-for-each)

Function: html-for-each file word-proc markup-proc white-proc newline-proc

file is an input port or a string naming an existing file containing HTML text. word-proc is a procedure of one argument or #f. markup-proc is a procedure of one argument or #f. white-proc is a procedure of one argument or #f. newline-proc is a procedure of no arguments or #f.

html-for-each opens and reads characters from port file or the file named by string file. Sequential groups of characters are assembled into strings which are either

Procedures are called according to these distinctions in order of the string's occurrence in file.

newline-proc is called with no arguments for end-of-line not within a markup or comment.

white-proc is called with strings of non-newline whitespace.

markup-proc is called with hypertext markup strings (including `<' and `>').

word-proc is called with the remaining strings.

html-for-each returns an unspecified value.

Function: html:read-title file limit

Function: html:read-title file
file is an input port or a string naming an existing file containing HTML text. If supplied, limit must be an integer. limit defaults to 1000.

html:read-title opens and reads HTML from port file or the file named by string file, until reaching the (mandatory) `TITLE' field. html:read-title returns the title string with adjacent whitespaces collapsed to one space. html:read-title returns #f if the title field is empty, absent, if the first character read from file is not `#\<', or if the end of title is not found within the first (approximately) limit words.

Function: htm-fields htm

htm is a hypertext markup string.

If htm is a (hypertext) comment, then htm-fields returns #f. Otherwise htm-fields returns the hypertext element symbol (created by string-ci->symbol) consed onto an association list of the attribute name-symbols and values. Each value is a number or string; or #t if the name had no value assigned within the markup.

URI

(require 'uri)

Implements Uniform Resource Identifiers (URI) as described in RFC 2396.

Function: make-uri

Function: make-uri fragment

Function: make-uri query fragment

Function: make-uri path query fragment

Function: make-uri authority path query fragment

Function: make-uri scheme authority path query fragment

Returns a Uniform Resource Identifier string from component arguments.

Function: uri:make-path path

Returns a URI string combining the components of list path.

Function: html:anchor name
Returns a string which defines this location in the (HTML) file as name. The hypertext `' will link to this point.
(html:anchor "(section 7)")
=>
""

Function: html:link uri highlighted
Returns a string which links the highlighted text to uri.
(html:link (make-uri "(section 7)") "section 7")
=>
"section 7"

Function: html:base uri
Returns a string specifying the base uri of a document, for inclusion in the HEAD of the document (see section HTML).

Function: html:isindex prompt
Returns a string specifying the search prompt of a document, for inclusion in the HEAD of the document (see section HTML).

Function: uri->tree uri-reference base-tree

Function: uri->tree uri-reference

Returns a list of 5 elements corresponding to the parts (scheme authority path query fragment) of string uri-reference. Elements corresponding to absent parts are #f.

The path is a list of strings. If the first string is empty, then the path is absolute; otherwise relative. The optional base-tree is a tree as returned by uri->tree; and is used as the base address for relative URIs.

If the authority component is a Server-based Naming Authority, then it is a list of the userinfo, host, and port strings (or #f). For other types of authority components the authority will be a string.

(uri->tree "http://www.ics.uci.edu/pub/ietf/uri/#Related")
=>
(http "www.ics.uci.edu" ("" "pub" "ietf" "uri" "") #f "Related")

Function: uri:split-fields txt chr

Returns a list of txt split at each occurrence of chr. chr does not appear in the returned list of strings.

Function: uri:decode-query query-string
Converts a URI encoded query-string to a query-alist.

uric: prefixes indicate procedures dealing with URI-components.

Function: uric:encode uri-component allows
Returns a copy of the string uri-component in which all unsafe octets (as defined in RFC 2396) have been `%' escaped. uric:decode decodes strings encoded by uric:encode.

Function: uric:decode uri-component
Returns a copy of the string uri-component in which each `%' escaped characters in uri-component is replaced with the character it encodes. This routine is useful for showing URI contents on error pages.

Function: uri:path->keys path-list ptypes
path-list is a path-list as returned by uri:split-fields. uri:path->keys returns a list of items returned by uri:decode-path, coerced to types ptypes.

File-system Locators and Predicates

Function: path->uri path
Returns a URI-string for path on the local host.

Function: absolute-uri? str
Returns #t if str is an absolute-URI as indicated by a syntactically valid (per RFC 2396) scheme; otherwise returns #f.

Function: absolute-path? file-name
Returns #t if file-name is a fully specified pathname (does not depend on the current working directory); otherwise returns #f.

Function: null-directory? str
Returns #t if changing directory to str would leave the current directory unchanged; otherwise returns #f.

Function: glob-pattern? str
Returns #t if the string str contains characters used for specifying glob patterns, namely `*', `?', or `['.

Before RFC 2396, the File Transfer Protocol (FTP) served a similar purpose.

Function: parse-ftp-address uri

Returns a list of the decoded FTP uri; or #f if indecipherable. FTP Uniform Resource Locator, ange-ftp, and getit formats are handled. The returned list has four elements which are strings or #f:

  1. username
  2. password
  3. remote-site
  4. remote-directory

Parsing XML

(require 'xml-parse) or (require 'ssax)

The XML standard document referred to in this module is
http://www.w3.org/TR/1998/REC-xml-19980210.html.

The present frameworks fully supports the XML Namespaces Recommendation
http://www.w3.org/TR/REC-xml-names.

String Glue

Function: ssax:reverse-collect-str list-of-frags

Given the list of fragments (some of which are text strings), reverse the list and concatenate adjacent text strings. If LIST-OF-FRAGS has zero or one element, the result of the procedure is equal? to its argument.

Function: ssax:reverse-collect-str-drop-ws list-of-frags

Given the list of fragments (some of which are text strings), reverse the list and concatenate adjacent text strings while dropping "unsignificant" whitespace, that is, whitespace in front, behind and between elements. The whitespace that is included in character data is not affected.

Use this procedure to "intelligently" drop "insignificant" whitespace in the parsed SXML. If the strict compliance with the XML Recommendation regarding the whitespace is desired, use the ssax:reverse-collect-str procedure instead.

Character and Token Functions

The following functions either skip, or build and return tokens, according to inclusion or delimiting semantics. The list of characters to expect, include, or to break at may vary from one invocation of a function to another. This allows the functions to easily parse even context-sensitive languages.

Exceptions are mentioned specifically. The list of expected characters (characters to skip until, or break-characters) may include an EOF "character", which is coded as symbol *eof*

The input stream to parse is specified as a PORT, which is the last argument.

Function: ssax:assert-current-char char-list string port

Reads a character from the port and looks it up in the char-list of expected characters. If the read character was found among expected, it is returned. Otherwise, the procedure writes a message using string as a comment and quits.

Function: ssax:skip-while char-list port

Reads characters from the port and disregards them, as long as they are mentioned in the char-list. The first character (which may be EOF) peeked from the stream that is not a member of the char-list is returned.

Function: ssax:init-buffer

Returns an initial buffer for ssax:next-token* procedures. ssax:init-buffer may allocate a new buffer at each invocation.

Function: ssax:next-token prefix-char-list break-char-list comment-string port

Skips any number of the prefix characters (members of the prefix-char-list), if any, and reads the sequence of characters up to (but not including) a break character, one of the break-char-list.

The string of characters thus read is returned. The break character is left on the input stream. break-char-list may include the symbol *eof*; otherwise, EOF is fatal, generating an error message including a specified comment-string.

ssax:next-token-of is similar to ssax:next-token except that it implements an inclusion rather than delimiting semantics.

Function: ssax:next-token-of inc-charset port

Reads characters from the port that belong to the list of characters inc-charset. The reading stops at the first character which is not a member of the set. This character is left on the stream. All the read characters are returned in a string.

Function: ssax:next-token-of pred port

Reads characters from the port for which pred (a procedure of one argument) returns non-#f. The reading stops at the first character for which pred returns #f. That character is left on the stream. All the results of evaluating of pred up to #f are returned in a string.

pred is a procedure that takes one argument (a character or the EOF object) and returns a character or #f. The returned character does not have to be the same as the input argument to the pred. For example,

(ssax:next-token-of (lambda (c)
                      (cond ((eof-object? c) #f)
                            ((char-alphabetic? c) (char-downcase c))
                            (else #f)))
                    (current-input-port))

will try to read an alphabetic token from the current input port, and return it in lower case.

Function: ssax:read-string len port

Reads len characters from the port, and returns them in a string. If EOF is encountered before len characters are read, a shorter string will be returned.

Data Types

TAG-KIND
A symbol `START', `END', `PI', `DECL', `COMMENT', `CDSECT', or `ENTITY-REF' that identifies a markup token
UNRES-NAME
a name (called GI in the XML Recommendation) as given in an XML document for a markup token: start-tag, PI target, attribute name. If a GI is an NCName, UNRES-NAME is this NCName converted into a Scheme symbol. If a GI is a QName, `UNRES-NAME' is a pair of symbols: (PREFIX . LOCALPART).
RES-NAME
An expanded name, a resolved version of an `UNRES-NAME'. For an element or an attribute name with a non-empty namespace URI, `RES-NAME' is a pair of symbols, (URI-SYMB . LOCALPART). Otherwise, it's a single symbol.
ELEM-CONTENT-MODEL
A symbol:
`ANY'
anything goes, expect an END tag.
`EMPTY-TAG'
no content, and no END-tag is coming
`EMPTY'
no content, expect the END-tag as the next token
`PCDATA'
expect character data only, and no children elements
`MIXED'
`ELEM-CONTENT'
URI-SYMB
A symbol representing a namespace URI -- or other symbol chosen by the user to represent URI. In the former case, URI-SYMB is created by %-quoting of bad URI characters and converting the resulting string into a symbol.
NAMESPACES
A list representing namespaces in effect. An element of the list has one of the following forms:
(prefix