Roxygen

Proposal

  1. Deliverables
  2. Timeline
  3. Grammar
  4. Rd-roxygen equivalence
  5. Intermediate representation

Deliverables

Timeline

A conceivable timeline is as follows:

May 26–June 1
Parse R objects and extract relevant data
June 1–July 7
Parse Roxygen tags
July 7–July 20
Generate intermediate representation
July 20–August 11
Translate Rd, namespaces and collations from intermediate representation

Grammar

Although R should be parsed using internal procedures like parse(), formals(), etc.; Roxygen blocks require a formal grammar. The following EBNF representation needs to be refined to include composite elements:

body = [ [ brief description, ] detailed description, ] { elements };
brief description = { escaped text }, newline;
detailed description = { escaped text }, newline, newline;
element = { simple element | demarcated element | list element |
            composite element};
simple element = tag symbol, keyword, { escaped text };
demarcated element = tag symbol, demarcated keyword, { text },
                     tag symbol, "end", demarcated keyword;
list element = tag symbol, list keyword, { items },
               tag symbol, "end", list keyword;
item = tag symbol, "item", { escaped text };
composite element = table | function | displayed function;
keyword = "name" | "alias" | "title" | "brief" | "usage" | "param" |
          "details" | "return" | "reference" | "note" | "attention" |
          "author" | "sa" | "see" | "example" | "keyword" | "source" |
          "n" | "section" | "e" | "em" | "b" | "squote" | "dquote" |
          "kbd" | "samp" | "package" | "file" | "email" | "url" |
          "var" | "env" | "option" | "command" | "dfn" | "cite" |
          "acronym" | "ref" | "R" | "dots" | "ldots" | "export" |
          "import" | "include" | "enc" | "concept" | "encoding" |
          "tab";
demarcated keyword = "code" | "verbatim" | "f" | "df";
list keyword = "enumeration" | "itemize" | "describe";
table = tag symbol, "table", { row }, tag symbol, "endtable";
row = { text, [field delimiter] }, row delimeter;
field delimeter = tag symbol, "tab";
row delimeter = tag symbol, "n";
function = tag symbol, "f", { text }, tag symbol, "endf";
displayed function = tag symbol, "df", { text }, tag symbol, "enddf";
tag symbol = "@";
escaped tag symbol = "\@";
text = ? UTF-8 visible characters ?;
escaped text = text - tag symbol | escaped tag symbol;

Rd-roxygen equivalence

Rd keywordRoxygen equivalent
name name
alias alias*
title title*
description brief
usage usage*
arguments param
details details
value return
reference reference*
note note, attention
author author
seealso sa, see
examples example
keyword keyword*
docType n/a
format n/a
source source*
S4method n/a
cr n
section section
emph e, em
strong b
bold b
sQuote squote*
dQuote dquote*
code code
preformatted verbatim
kbd kbd*
samp samp*
pkg pkg*
file file*
email email*
url url*
var var
env env
options options*
command command*
dfn dfn*
cite cite*
acronym acronym*
itemize itemize*
enumerate enumerate*
item item*
describe describe*
tabular tabular*
link ref
linkS4class n/a
eqn f*
deqn df*
R R*
enc enc*
concept concept*
encoding encoding*
export export*
import import*
include include*
slot slot*
prototye prototye*

* New keyword not found in Doxygen.
Keyword exists in Doxygen, but with different semantics.

Intermediate representation

S-expressions are readily parsible and less verbose than their XML counterpart, without sacrificing readability. We propose, therefore, something like the following for an intermediate parse-tree representation:

(class (name "person")
       (slot (name "fullname")
             (description "The full name of the person"))
       (slot (name "birthyear")
             (description "The year of birth"))
       (prototype "Prototype person is named John Doe, 1971"))

The above is merely an example; the intermediate representation should be extensible and tied intimately to the grammar.

Peter Danenberg <pcd at roxygen dot org>