BGF transformation operator suite v.1.0

2010-03-07

Foreword

This chapter describes the transformational suite for BGF. Most of the information present here is located in the XML Schema definition of the language, part of the SLPS project. The rest was introduced by the language documentation transformation commands in the process of automated generation of the manual in its present form.

Normative references

bgf.xsd

Foreword
Normative references
Design goals
Notation
Folding and unfolding transformations
Other refactoring transformations
Language increasing transformations
Language decreasing transformations
Refactorings in term-oriented semantics
Semantics revising transformations
Decorative transformations
dump
rename
reroot
Compatibility

Design goals

XBGF operator suite was developed mainly for grammar convergence activities.

Grammar recovery. In order to recover a working grammar from a real grammar artifact one needs to perform various activities such as grammar extraction, grammar beautification, deyaccification, error fixing, etc,—all of them are present in XBGF.
Grammar convergence. An infrastructure for grammar convergence (see Chapter 4) contains three essential parts: grammar extractors, a grammar comparator and a grammar transformer. The latter component is needed for changing the source grammars so that they become structurally identical at the convergence point. XBGF suite can be used as a framework for programmable grammar transformations, it provides all the required expressive power.
Language documentation. It is possible and quite expected that the grammar that is shown in the language standard, programmer's manual or any similar document is not exactly the same as the one used for creating a compiler. For instance, Java Language Specification includes two grammars, each one created and maintained manually and separately. A transformation suite can be useful in such a case since it helps to generate one of the versions (perhaps the more reader-friendly one) from the other automatically, both making their relationship more explicit and reducing maintenance effort and increasing reliability.
Language evolution. A language rarely stays as it was developed on its first day. If any language is being used in practice for some time, new constructs are added to it to respond to the needs of the end users (programmers). XBGF suite has multiple operators for expressing language extensions and revisions, they can be used to document the changes between versions and dialects of the language.
Language design. DSL design is not a rare activity nowadays, and it is quite common to develop the language gradually, regularly synchronising with the needs and knowledge of domain experts. XBGF suite allows to document these steps as transformation steps departing from the base-line grammar or even an empty grammar and finally reaching the language ready to be deployed.
Disciplined grammar adaptation. In order to adapt the baseline grammar to the working circumstances one needs it to work in, it is useful to have a good support for grammar transformations. XBGF can be used to converge differently adapted grammars or as a standalone tool for applying grammar adaptation steps.

Notation

BGF is a BNF-like Grammar Format, an XML dialect of Extended Backus Naur Form that was used in the language convergence infrastructure. Its abstract syntax grammar is automatically extracted from the corresponding XML Schema and presented below:

grammar:
        root::nonterminal* production*

production:
        label::label? nonterminal::nonterminal expression

expression:
        epsilon::ε
        empty::ε
        value::value
        any::ε
        terminal::terminal
        nonterminal::nonterminal
        selectable::(selector::selector expression)
        sequence::(expression+)
        marked::expression
        choice::(expression+)
        optional::expression
        plus::expression
        star::expression

value:
        int::ε
        string::ε

label:
        string

nonterminal:
        string

selector:
        string

terminal:
        string

All BGF and XBGF listings are presented in a unified pretty-printed way. The concrete syntax for it is presented below:

grammar:
        production+

label:
        "[" string "]"

production:
        label::label? nonterminal::string ":" right-hand-side

right-hand-side:
        NEWLINE (INDENT symbol+ NEWLINE)+ NEWLINE

symbol:
        "EPSILON"
        "EMPTY"
        "ANY"
        "STRING"
        "INT"
        terminal::(""" string """)
        nonterminal::string
        selectable::(selector::string "::" symbol)
        sequence::("(" symbol+ ")")
        choice::("(" (symbol ("|" symbol)*) ")")
        optional::(symbol "?")
        plus::(symbol "+")
        star::(symbol "*")
        marked::("<" symbol ">")

Any XBGF command is pretty-printed as the name of the transformation and all the parameters (productions, expressions, etc) in brackets, followed by a semicolon.

List of definitions

Grammar

A set of interdependent productions.

grammar:
        production*

Sequence

Sequential composition of multiple transformations.

sequence:
        (transformation | atomic)*

Atomic

Multiple transformations that must be for some reason perceived as one step.

atomic:
        transformation+

Transformation

A list of all the XBGF transformations is grouped in seven categories: folding-unfolding-transformation collects those commands that perform the well-known folding/unfolding operations in slightly varied circumstances; refactoring-transformation contains transformations that perform factoring and reorganising procedures that do not alter the language generated by the grammar; increasing-transformation increase the semantics of the language by adding new options and alternatives to it; decreasing-transformation similarly decrease the semantics; concrete-revising-transformation are refactorings if we use term-oriented semantics (abstract syntax) but they are neither semantic preserving, increasing nor decreasing transformations if we use string-oriented semantics (concrete syntax); transformations from abstract-revising-transformation change the language generated by the grammar in a way that they cannot be a priori classified as any of the above; decorative-transformation are special refactorings that are used to make or destroy labels and selectors in BGF.

transformation:
        folding-unfolding-transformation
        refactoring-transformation
        increasing-transformation
        decreasing-transformation
        concrete-revising-transformation
        abstract-revising-transformation
        decorative-transformation
        rename
        reroot
        dump

Scope

Several transformation operators are possibly restricted to a specific scope as opposed to their application to the full input grammar. Two major forms of scope can be identified. First, a production can be appointed by its label. Second, a definition (nonterminal) can be appointed by its defined nonterminal. Arguably, one may want to be able to appoint a production even when it is not labeled, but a prior designate transformation can then be used in order to attach a label to the production in question.

[label] scope:
        label

[nonterminal] scope:
        nonterminal

Marked

Some of the grammar transformations, namely addH, removeH, appear, disappear, upgrade, downgrade, abstractize, concretize, inject, project, anonymize, deanonymize, accept only marked productions as their arguments.

marked-production:
        production

When transformations need to happen very locally, the level of nonterminal or production is insufficient and introduction of selectable sub-expressions is too much extra effort. For this cases, XBGF uses markers that show at which point exactly should the transformation take place. Markers are pretty-printed as angle brackets.

Folding and unfolding transformations

Folding and unfolding activities are the most basic ones in grammar transformation and the most used ones in grammar convergence. Since grammar comparison is done in such a way that only applies very basic algebraic laws in its endeavours to match the two grammars, many more sophisticated manipulations need to be executed semi-automatically in a programmable fashion. These manual steps help to establish a stronger link between the convergence point and the original grammar artifact since they aid to reveal some unapparent properties of those grammars. All these transformations are provably correct in the sense that it is possible to prove that the languages generated by the grammars before the transformation and after it are indeed the same. All refactorings are easily reversible and introduced below in pairs.

folding-unfolding-transformation:
        unfold
        fold
        inline
        extract
        abridge
        detour
        unchain
        chain

unfold

The most basic unfolding transformation searches the scope for all the instances of the given nonterminal usage and replaces such occurrences with the defining expression of that nonterminal. By default the scope of the transformation is the full grammar, but it can be limited to all the definitions of one nonterminal or to one labelled production. Regardless of the specified scope, unfolding is not applied to the definition of the argument nonterminal. Almost all the unfold transformations used in Java Language Specification case study are restricted in scope by a nonterminal. The reason for such statistics is that when the language engineer wants to give up the nonterminal, he uses the inline transformations. However, unfold usually happens as a part of sequences with fold, downgrade, disappear, deyaccify, distribute, etc.—in which case it is only natural to limit the impact of every step. The definition that is being unfolded is assumed to be horizontal, i.e. to consist of one single production. See the section on refactorings for more information about horizontal and vertical definitions.

Syntax

unfold:
        nonterminal in::scope?

Example

Given the input:

[l1] foo:
        bar

[l2] qux:
        bar

bar:
        wez*

After using this transformation:


unfold(bar);

Will look like this:

[l1] foo:
        wez*

[l2] qux:
        wez*

bar:
        wez*

fold

Folding replaces every expression that matches with the right hand side of the given nonterminal's definition with the nonterminal itself. As with unfold, fold works on the scope of the grammar, and its impact can be limited to one labelled production or to all the productions belonging to one nonterminal. Regardless of the specified scope, folding is not applied to the definition of the argument nonterminal. Since this transformation strives to preserve the language, it needs a horizontal definition to work. When only one of several existing definitions is used for folding, it would actually increase the semantics of the language after transformation—the corresponding XBGF command is called upgrade.

Syntax

fold:
        nonterminal in::scope?

Example

Very much like unfolding, folding can take place locally. For instance,

[l1] foo:
        wez*

[l2] qux:
        wez*

bar:
        wez*

After using this transformation:


fold(bar in foo);

Will look like this:

[l1] foo:
        bar

[l2] qux:
        wez*

bar:
        wez*

inline

When this transformation is performed, an existing definition is eliminated by inlining. This means that the argument nonterminal identifies the (horizontal) definition that is to be unfolded and stripped away from the grammar. The semantics of inline is that of a sequential composition of unfold and eliminate.

Syntax

inline:
        nonterminal

Semantics

The inline transformation is by far the most used in Java Language Specification case study. One of the reasons is what we call layering: in particular expressions and statements are introduced in the G_j^R with a set of related nonterminals: LabeledStatement, IfThenElseStatement, WhileStatement, ForStatement, etc, and CastExpression, PreIncrementExpression, PreDecrementExpression, PostfixExpression, etc. G_j^I takes another approach, just listing all the alternatives in the productions for Statement and Expression. In order to converge these two variants, a lot of inlining transformations are needed. This can also be apparent from the statistics table, that demonstrates that targets that converge only “readable” or only “implementable” grammars, require less than ten inline transformations each, while each target that takes both readable and implementable grammars in, contains 67–100 inline transformations in convergence path.

Example

An example follows. When we have:

foo:
        wez

bar:
        wez ".." wez

wez:
        qux*

After using this transformation:


inline(wez);

It will look like this:

foo:
        qux*

bar:
        qux* ".." qux*

extract

A new definition is introduced by extraction. The argument provided for this transformation is a production that identifies both the fresh nonterminal to be introduced and the expression that is used both as a pattern for folding and as a right hand side of the added definition. An optional scope can limit the application of the folding part of the extraction transformation to a specific production or a specific nonterminal. If the nonterminal defined by the argument production is already mentioned (i.e., defined or referenced) in the current grammar, the transformation refuses to work and reports an error. This usually signals an error in the language engineer's logic because the existing traces of a possibly similar nonterminal conflict with the definition that is being introduced.

Syntax

extract:
        production in::scope?

Semantics

As seen from the experience gained from Java Language Specification case study, it is highly unusual for extract to have limited scope. However, sometimes a limited impact is desired in order to avoid excessive subsequent unfoldings when the convergence requests for having several nonterminals with similar definitions.

Example

Extraction also works vertically. Given the input:

TypeDeclaration:
        ClassDeclaration
        InterfaceDeclaration
        ";"

After performing this transformation step:


extract(
 ClassOrInterfaceDeclaration:
        ClassDeclaration
        InterfaceDeclaration
);

The result will be:

TypeDeclaration:
        ClassOrInterfaceDeclaration
        ";"

ClassOrInterfaceDeclaration:
        ClassDeclaration
        InterfaceDeclaration

abridge

Given a reflexive chain production, i.e., a production whose defined nonterminal equals its body, this production is simply removed from the grammar, even if it contains some potentially valuable information (like labels and selectors).

Syntax

abridge:
        production

Semantics

Reflexive chain productions are rarely encountered explicitly in the base-line grammars, but sometimes series of transformations result in them, and usually they are not needed. An example of a transformation sequence that yields a reflexive chain production can be a step from concrete syntax definition to abstract syntax definition. Concrete syntax usually needs explicit bracketing constructions for recursive composition, and after stripping away terminals and merging layers, these bracketing constructions become reflexive chain productions. The Factorial Language case study has shown the need for it.

Example

Consider this abstract syntax:

[constant] expr:
        int

[neg] expr:
        expr

[bracket] expr:
        expr

After performing this transformation step:


abridge(
 [bracket] expr:
        expr
);

The grammar will be the same, but without the reflexive chain production labelled as “bracket” previously:

[constant] expr:
        int

[neg] expr:
        expr

detour

A reverse of abridge that can introduce a reflexive chain production.

Syntax

detour:
        production

Example

In the same way it was removed in the previous example, the bracketing production can be added to the grammar. The transformation that reverts the impact of the previous abridge, looks like this:


detour(
 [bracket] expr:
        expr
);

unchain

Unchaining is a disciplined form of inlining. The argument production must occur in the input grammar, and it must be a chain production, i.e., a production with a nonterminal as its defining expression. The latter nonterminal is the one whose definition is to be inlined; it must not have any occurrences except in the chain production at hand. The unchain transformation does not increase the expressivity of the transformational language: technically, it is nothing more than an inline with a precondition. However, this particular precondition seems useful and not uncommon when dealing with layered grammars.

Syntax

unchain:
        production

Semantics

Chain productions are not commonly encountered in grammars of mainstream programming languages. However, when converging grammars that hail from different kinds of sources (i.e., different extraction processes) it can be frequently needed to align grammars that use chain productions with grammars that use labelled ones.

Example

Consider this grammar:

[constant] expr:
        int

[binary] expr:
        binexpr

binexpr:
        expr op expr

After performing this transformation step:


unchain(
 [binary] expr:
        binexpr
);

The auxiliary nonterminal symbol is gone, as is the chain production:

[constant] expr:
        int

[binary] expr:
        expr op expr

chain

Just like unchain is a specific form of inline, chaining is a disciplined form of extraction. The argument production will be part of the resulting grammar; it is a chain production, i.e., a production with a nonterminal as its defining expression. That nonterminal is the one whose definition is to be extracted. That definition is the defining expression of the production (from the input grammar) whose defined nonterminal and label (if any) matches with the argument production.

Syntax

chain:
        production

Example

In the same way it was removed in the previous example, the chain production can be added to the grammar. The transformation that reverts the impact of the previous unchain, looks like this:


chain(
 [binary] expr:
        binexpr
);

Other refactoring transformations

Here is a list of the XBGF transformations that perform other provably semantic-preserving refactorings

refactoring-transformation:
        massage
        distribute
        factor
        deyaccify
        yaccify
        eliminate
        introduce
        import
        vertical
        horizontal
        equate
        rassoc
        lassoc

massage

The grammar is rewritten by local transformations such that the language generated by the grammar (or the denotation according to any other semantics for that matter) is preserved. The known rewriting rules affect the use of selectors and regular expression operators: e.g., any symbol will always generate the same set of strings that the same symbol wrapped in a selector. There are two expression arguments: one to be matched, and another one that replaces the matched expression. One of them must be in a “massage relation” to the other. The scope of the transformation can be limited.

Syntax

massage:
        expression expression in::scope?

The massage relation is defined as follows. First of all, a choice of any two symbols with EBNF modifiers can be refactored to a single modifier according to the table below (parenthesized expressions are not implemented directly, but covered by other variants):

\|	ε	x	x ?	x⁺	x^*
ε	( ε )	x ?	x ?	x^*	x^*
x	x ?	x	x ?	x⁺	x^*
x ?	x ?	x ?	( x ? )	x^*	x^*
x⁺	x^*	x⁺	x^*	( x⁺ )	x^*
x^*	x^*	x^*	x^*	x^*	( x^* )

Second, a composition of two EBNF modifers can be massaged to one modifier according to the table below:

y	y ?	y⁺	y^*
x ?	x ?	x^*	x^*
x⁺	x^*	x⁺	x^*
x^*	x^*	x^*	x^*

Sequential composition of symbols is more complicated since it does not necessarily yield an EBNF modifier (those modifiers are not expressive enough to denote “one or two” repetitions, “two or more”, etc). The following table shows the possible massage manipulations:

	x	x ?	x⁺	x^*
x				x⁺
x ?			x⁺	x^*
x⁺		x⁺		x⁺
x^*	x⁺	x^*	x⁺	x^*

Associativity rules for massage: x ( y x ) ? = ( x y ) ? x x ( y x )⁺ = ( x y )⁺ x x ( y x )^* = ( x y )^* x

Example

Distributivity rules for optionality modifier such as these: ( x ? y ? ) ? = x ? y ? ( x^* y ? ) ? = x^* y ? ( x ? y^* ) ? = x ? y^* ( x^* y^* ) ? = x^* y^* ( x | y ) ? = x ? | y ? are not explicitly covered by massage since it is possible to emulate them with a sequence of abovementioned patterns of massage, as well as with factor and similar transformations. Let us take the last formula as an example of a massaging that takes several steps to complete. The input BGF is:

foo:
        (bar | qux)?

After performing these transformation steps:



massage(
 (bar | qux)?,
 ((bar | qux) | ε));


massage(
 ε,
 (bar::ε | qux::ε));


factor(
 ((bar | qux) | (bar::ε | qux::ε)),
 ((bar | bar::ε) | (qux | qux::ε)));


anonymize(
 foo:
        (bar | <bar::ε>)
        (qux | qux::ε)
);


massage(
 (bar | ε),
 bar?);


anonymize(
 foo:
        bar?
        (qux | <qux::ε>)
);


massage(
 (qux | ε),
 qux?);

The result will be:

foo:
        bar?
        qux?

The selectors and anonymize commands are necessary because otherwise the choice of two epsilons would be removed automatically during the normalisation phase. The rest of distributivity laws are expressed quite similarly to this example.

distribute

Distribute sequential composition over choices so that choices are pulled out of sequences. The transformation is either attempted for all productions of a nonterminal or for a specific one appointed by its label.

Syntax

distribute:
        scope

Semantics

In fact, distribute is nothing more than an automated version of factor that agressively pushes all the choices that can be found in a production outwards. This transformation is apparently non-injective, hence, it is impossible to have a complete inverse of it. A more general factor transformation, however, is as capable of emulating distribute's effect as it is capable of doing the reverse thing.

Example

For instance,

foo:
        bar (qux | wez)

After using this transformation:


distribute( in foo );

Will look like this:

foo:
        bar qux
        bar wez

factor

Massage modulo factorisation over choices. The transformation is either attempted for all productions of a nonterminal or for a specific one appointed by its label.

Syntax

factor:
        expression expression in::scope?

Semantics

Factor transformations tend to be quite frequently used in grammar convergence. They also have a tendency to be very long—since it is impossible to implement factor symmetrically to distribute (i.e., fully automated), the language engineer needs to supply two complete expressions. The transformer then can easily assert that they are related by distribution: basically, it internally performs distribute on both of them and expects them to become identical. Hence, it is possible to do “incomplete” factoring by pushing choices inwards but not to the innermost position. Two most commonly seen patterns of factor use are the following. First, it is applied when we have a choice of two long expressions that are almost identical except for some mismatching part. That part can be either extracted or massaged later with more transformations. Second, it is needed when we have a wide choice with the same leading (or trailing) symbol, and the goal is to let the common part stay and encapsulate the rest inside a different nonterminal (by following extract).

Example

For instance,

After using this transformation (note the order of expressions):


factor(
 ((c d e) | (c f g)),
 (c ((d e) | (f g))));

Will look like this:

a:
        a
        b
        c ((d e) | (f g))
        h
        i

deyaccify

Deyaccification is a widely used term that means converting recursive definitions to iterative ones where possible. The name comes from YACC, or Yet Another Compiler Compiler, a tool which underlying parsing technology limits were enforcing the usage of recursive definitions back in the 1970s. However, it somehow became common practice to remain within them even when grammar engineers do not use yacc as such at all. The name of a nonterminal must be provided as an argument, then the transformation engine checks if the grammar productions for this nonterminal fit into one of the yaccified patterns. If not, the error is reported, otherwise the definition is replaced by one that uses regular expression operators instead of epsilon, choice, and recursion. Both left- and right-recursive forms can be factored with this transformation.

Syntax

deyaccify:
        nonterminal::nonterminal

Deyaccification uses several general patterns. Left recursion like this:

foo:
        bar

foo:
        foo wez

Becomes:

foo:
        bar wez*

Right recursion like this:

foo:
        bar

foo:
        wez foo

Becomes:

foo:
        wez* bar

In either case, it is checked if bar and wez are the same nonterminal. If they are, the result is more concise:

foo:
        bar+

yaccify

This transformation is the reverse of deyaccify. The productions provided as arguments must be yaccified with respect to the actual content of the grammar. If the deyaccification process on them is successful and yields the production that can be found in the grammar, it is removed and replaced by these simpler definitions of an optional or repeating nonterminal, given in BNF-only expressiveness. Some complex yaccify cases require prior use of extract for introduction of an nonterminal for the optional or repeating phrase.

Syntax

yaccify:
        production+

Semantics

Yaccification is a typical example of grammar adaptation activity. However, it can be utilised in grammar convergence process as well: think of a situation when one of the sources is yaccified using left recursion while the other one—using right recursion. In such a case it would be better to deyaccify both of them. If this is due to some considerations impossible or undesirable, one can deyaccify, say, left recursion and then yaccify if back to right recursion. Since it is not possible for the transformation engine to guess which kind of BNF recursion the suite user would need, yaccify takes two productions as parameters, unlike deyaccify which works perfectly just given the nonterminal name.

Example

For instance, this piece of grammar:

foo:
        bar+

can either be yaccified with left recursion:


yaccify(
 foo:
        bar
 foo:
        foo bar
);

to look like this:

foo:
        bar

foo:
        foo bar

or yaccified with right recursion:


yaccify(
 foo:
        bar
 foo:
        bar foo
);

to look like this:

foo:
        bar

foo:
        bar foo

eliminate

An unused definition (at most used within the definition itself) is removed. The undefine operator should be utilised instead when the definition must be removed despite remaining uses. The remove operator should be utilised instead when only part of the definition (i.e., a production of a vertical definition) is to be removed.

Syntax

eliminate:
        nonterminal::nonterminal

Example

For instance,

expr:
        int

intexpr:
        int

After using this transformation:


eliminate(intexpr);

Will look like this:

expr:
        int

introduce

A definition of a fresh nonterminal is added. The add operator should be used instead, if the nonterminal is already defined, is to be merely extended. The define operator should be used instead, if the nonterminal is readily in use, but merely lacks a definition.

Syntax

introduce:
        production+

Example

For instance,

a:
        b

b:
        ε

After using this transformation:


introduce(
 c:
        a
 c:
        b
);

Will look like this:

a:
        b

b:
        ε

c:
        a

c:
        b

import

Allows to import another grammar: the nonterminals within it can refer to one another, but none of the existing productions are allowed to refer to them before this transformation takes place.

Syntax

import:
        production+

Semantics

Consider a scenario where we want to introduce two productions, each defining a fresh nonterminal symbol, and each using the other. Without import the only way to do so would be to run one introduce and one define, which is semantically wrong since we are sure that before the first nonterminal is introduced, the second one was fresh. So, instead we take the interdependent productions together and introduce them in one step. Technically, import can be used any time to substitute any number of introduce transformations. Whether this is a desired use pattern or not, is left at the discretion of the language engineer.

Example

For instance,

X:
        "a" "b"

After using this transformation:


import(
 A:
        B X
 B:
        A
        ε
);

It will look like this:

X:
        "a" "b"

A:
        B X

B:
        A
        ε

vertical

Turn top-level choices into multiple productions. The transformation is either attempted for all productions of a nonterminal or for a specific one appointed by its label. The action is a reverse of horizontal. Occasionally we use terms “vertical” productions or nonterminals and “horizontal” ones. By vertical nonterminals we mean those that are defined by a list of productions, with every production lacking a top-level choice. A horizontal nonterminal, on the other hand, is defined by one production that is a top-level choice. Nonterminals that employ both top-level choices and splitting into multiple productions are neither horizontal nor vertical.

Syntax

vertical:
        scope

Example

If the original production contained selectors:

decs:
        onedec::dec
        moredecs::(dec decs)

then, after using this transformation:


vertical( in decs );

they are converted to labels:

[onedec] decs:
        dec

[moredecs] decs:
        dec decs

horizontal

Turn a definition based on multiple productions into a top choice-based one. The action is a reverse of vertical.

Syntax

horizontal:
        nonterminal::nonterminal

Example

If some or all of the original productions are labelled:

[onedec] decs:
        dec

[moredecs] decs:
        dec decs

the, after using this transformation:


horizontal(decs);

each label is converted to a selector in a corresponding place:

decs:
        onedec::dec
        moredecs::(dec decs)

rassoc

This transformation operator replaces an iterative production found in the grammar by the argument production, which is a right associative repeating equivalent of the former. Its defining expression involves a pattern of binary recursion with regard to the defined nonterminal. The “r” in “rassoc” refers to the intended effect at the level of derivation trees: the list of subtrees is to be converted into a nested binary tree in a right-associative manner.

Syntax

rassoc:
        production

Example

For instance,

[constant] expr:
        int

[binary] expr:
        expr (op expr)*

After using this transformation:


rassoc(
 [binary] expr:
        expr op expr
);

Will look like this:

[constant] expr:
        int

[binary] expr:
        expr op expr

lassoc

The same as rassoc, but replaces an iterative production found in the grammar by a left associative repeating equivalent. The “l” in “lassoc” refers to the intended effect at the level of derivation trees: the list of subtrees is to be converted into a nested binary tree in a left-associative manner.

Syntax

lassoc:
        production

Example

For instance,

[terminal] expr:
        string

[sequence] expr:
        expr+

After using this transformation:


lassoc(
 [sequence] expr:
        expr expr
);

Will look like this:

[terminal] expr:
        string

[sequence] expr:
        expr expr

equate

Two nonterminals, say x and y, are merged, if their definitions are identical.

Syntax

equate:
        align::nonterminal with::nonterminal

Language increasing transformations

Here is a list of the XBGF transformations that lenghten the grammar (increase semantics).

increasing-transformation:
        add
        appear
        widen
        upgrade
        unite

add

Nonterminal definitions can be extended ("added to") vertically and horizontally. In the former case, a given production is added to an existing definition. In the latter case, a given branch is added to a given expression. The horizontal mode is there for convenience only because it could be simulated by a sequence of extraction, vertical addition, and inlining. There are two operators that are very similar to the (vertical) add operator: define and introduce. The define operator should be used when an the definition of an undefined nonterminal is added. The introduce operator should be used when a fresh nonterminal is to be defined.

addV

Syntax

[vertical] add:
        production

Vertical addition operates on the level of productions: it adds one more production for some nonterminal to any number of productions that are already present in the grammar.

Example

Given the input:

expr:
        int

After using this transformation:


addV(
 expr:
        id
);

The result will look like this:

expr:
        int

expr:
        id

addH

Syntax

[horizontal] add:
        marked-production

Horizontal addition looks inside productions: it adds any marked part of an internal choice by either introducing one or enhancing the existing one. This allows to skip pre-transformational vertical and post-transformational horizontal steps for productions with a top-level choice, which is the most common use of this transformation. However, it is useful to have a command at hand that is capable of adding alternatives to any particular place of any grammar production. Markers must denote the new part: i.e., the production without the marked part must be present in the grammar, and if it is, the result will contain a production with the marked part instead. Obviously, the markers itself do not end up in the grammar.

Example

Given the input:

N:
        a
        b

After using this transformation:


addH(
 N:
        <"x">
        a
        b
);

The result will look like this:

N:
        "x"
        a
        b

Example

Given the input:

expr:
        "-"? int

After using this transformation:


addH(
 expr:
        (<"+"> | "-")? int
);

The result will look like this:

expr:
        ("+" | "-")? int

appear

The purpose of this transformation operator is to insert an nillable symbol (i.e., reducible to an empty sequence) at any place in any existing grammar production. It takes a production as an input. Inside that production, one nillable subexpression should be marked.

Syntax

appear:
        marked-production

Example

Given the input:

foo:
        bar

After using this transformation:


appear(
 foo:
        bar <qux?>
);

The result will look like this:

foo:
        bar qux?

widen

The grammar is rewritten by local transformations such that the language generated by the grammar (or the denotation according to any other semantics for that matter) is increased. The known rewriting rules affect the use of epsilon and regular expression operators. There are two expression arguments: one to be matched, and another one that replaces the matched expression. The scope of the transformation can be limited.

Syntax

widen:
        expression expression in::scope?

The widening relation is defined as follows: x can be widened to x^? or x⁺ or x^* x^? can be widened to x^* x⁺ can be widened to x^* It is trivial to prove that for each case the expression on the left is included in the expression on the right, but not otherwise. For going the other way narrow transformation is used. For shaping an expression into a completely equivalent one, use massage.

Example

Given the input:

[main] program:
        fun::function

After using this transformation:


widen(
 function,
 function+
 in [main]);

The result will look like this:

[main] program:
        fun::(function+)

upgrade

Upgrading is a special variation of replacement and a slightly more powerful and liberal form of folding. This operator replaces an expression by a nonterminal that can be evaluated to it. The first parameter is the scope production with an expression marked. The second parameter is one of that nonterminal's definitions, which right hand side equals that expression.

Syntax

upgrade:
        marked-production production

Example

Given the input:

a:
        d e c

b:
        d e

b:
        f g

After using this transformation:


upgrade(
 a:
        <b> c
 b:
        d e
);

The result will look like this:

a:
        b c

b:
        d e

b:
        f g

unite

Two nonterminals, say x and y, are merged (possibly recursively). That is, the definitions of x and y (i.e., their productions) are merged in one definition while preserving the nonterminal y and replacing all occurrences of x (in the definition of x and anywhere else) by y.

Syntax

unite:
        add::nonterminal to::nonterminal

Example

Given the input:

foo:
        "a"

foo:
        "b"

bar:
        "d"

After using this transformation:


unite(bar, foo);

The result will look like this:

foo:
        "a"

foo:
        "b"

foo:
        "d"

Language decreasing transformations

Here is a list of the XBGF transformations that shorten the grammar (decrease semantics).

decreasing-transformation:
        remove
        disappear
        narrow
        downgrade

remove

Productions can be removed from existing, vertical definitions. The remaining definition must not become empty, i.e., undefined. There is the undefine operator that can be applied in that case. There is also a horizontal mode of removing branches from choices.

removeV

Syntax

[vertical] remove:
        production

Vertical removal operates on the level of productions: it takes away one production for some nonterminal leaving at least one more in the grammar.

Example

Given the input:

expr:
        int

expr:
        id

After using this transformation:


removeV(
 expr:
        id
);

The result will look like this:

expr:
        int

removeH

Syntax

[horizontal] remove:
        marked-production

Horizontal removal looks inside productions: it removes any marked part of an internal choice, getting rid of the choice altogether if necessary (say, if the removed part was one of two alternatives). This allows to skip pre-transformational vertical and post-transformational horizontal steps for productions with a top-level choice, which is the most common use of this transformation. However, it is useful to have a command at hand that is capable of removing alternatives from any particular place of any grammar production. Markers must denote the part to be removed: i.e., the production with the marked part must be present in the grammar, and if it is, the result will contain a production without the marked part instead. Obviously, the markers itself do not end up in the grammar.

Example

Given the input:

foo:
        "x"
        bar
        wez

After using this transformation:


removeH(
 foo:
        <"x">
        bar
        wez
);

The result will look like this:

foo:
        bar
        wez

Example

Given the input:

expr:
        ("+" | "-")? int

After using this transformation:


removeH(
 expr:
        (<"+"> | "-")? int
);

The result will look like this:

expr:
        "-"? int

disappear

This operator behaves like project, but only allows for nillable elements (optional, star) to disappear.

Syntax

disappear:
        marked-production

Example

Given the input:

foo:
        bar wez* qux

After using this transformation:


disappear(
 foo:
        bar <wez*> qux
);

The result will look like this:

foo:
        bar qux

narrow

The grammar is rewritten by local transformations such that the language generated by the grammar (or the denotation according to any other semantics for that matter) is decreased. The known rewriting rules affect the use of epsilon and regular expression operators. There are two expression arguments: one to be matched, and another one that replaces the matched expression. The scope of the transformation can be limited.

Syntax

narrow:
        expression expression in::scope?

The narrowing relation is defined as follows: x^? can be narrowed to x x⁺ can be narrowed to x x^* can be narrowed to x or x^? or x⁺ It is possible to prove that for each case the expression on the right is included in the expression on the right, but not otherwise. For going the other way widen transformation is used. For shaping an expression into a completely equivalent one, use massage.

Example

Given the input:

program:
        fun::(function*)

After using this transformation:


narrow(
 function*,
 function+);

The result will look like this:

program:
        fun::(function+)

downgrade

Replaces a nonterminal with one of its definitions. The first parameter is the scope production with one of the nonterminals marked. The second parameter is one of that nonterminal's definitions, which right hand side will be used for replacement. The XBGF processor looks for the first production with the marked part (but without the markers). If it is found, the marked part is replaced with the right hand side of the second argument production.

Syntax

downgrade:
        marked-production production

Example

Given the input:

a:
        b c

b:
        d e

b:
        f g

After using this transformation:


downgrade(
 a:
        <b> c
 b:
        d e
);

The result will look like this:

a:
        d e c

b:
        d e

b:
        f g

Refactorings in term-oriented semantics

We may refer to the semantics of a grammar as the language (set of strings) generated by the grammar, as it is common for formal languages — for context-free grammars, in particular. With the string-oriented semantics in mind, all transformations mentioned above in folding and refactoring sections are semantics-preserving. To give an example where different semantics are needed consider the scenario of aligning a concrete and an abstract syntax. When necessary, we may apply the algebraic interpretation of a grammar, where grammar productions constitute an algebraic signature subject to a term-algebraic model. In this case, the terminal occurrences in any given production do no longer carry semantic meaning; they are part of the function symbol. (Hence, abstract and concrete syntaxes can be aligned now.) Some transformations that were effortlessly semantics-preserving w.r.t. the string-oriented semantics, require designated bijective mappings w.r.t. the term-oriented semantics, e.g., fold/unfold manipulations, but generally, the term-oriented semantics admits a larger class of semantics-preserving transformations than the string-oriented one. The following section gathers those transformations that would have been considered refactorings if we only used term-oriented semantics. From the string-oriented point of view they revise semantics and can be deemed as neither grammar lengthening nor grammar shortening transformations.

concrete-revising-transformation:
        abstractize
        concretize
        permute

abstractize

As project, but with an additional precondition that the part to be removed should consist of terminals. This is checked automatically by the XBGF engine: if the precondition fails, the transformation is inapplicable.

Syntax

abstractize:
        marked-production

Example

Given the input:

A:
        b "x" c "y"

After using this transformation:


abstractize(
 A:
        b <"x"> c "y"
);

Will look like this:

A:
        b c "y"

concretize

Just as abstractize is a preconditioned version of project, this operator is a variation of inject. The XBGF engine checks if the marked part only consists of terminal symbols: if yes, injection happens; if not, the transformation is unapplicable.

Syntax

concretize:
        marked-production

Example

Given the input:

A:
        b c

After using this transformation:


concretize(
 A:
        b <"x"> c
);

Will look like this:

A:
        b "x" c

permute

The argument production defines the intended result of the transformation — a production that has the same components in the sequential composition, but in a different order, when compared to the corresponding production in the input grammar with the same defined nonterminal and the same label, if any.

Syntax

permute:
        production

Example

Given the input:

a:
        b d* c

After using this transformation:


permute(
 a:
        b c d*
);

Will look like this:

a:
        b c d*

Semantics revising transformations

Some grammar differences may require more arbitrary replacements, that cannot be expressed as semantics-preserving even in abstract syntax. These include projections or injections of non-optional nonterminals, adding definitions for bottom nonterminals, performing volatile replacements. Whenever a transformation from this group is used in a conergence path, it signals either about a construction point where the grammar engineer is having a temporary shortcut to be substituted later with a longer sequence of more accurate manipulations, or an inevitable error in the BGF that needs fixing (preferably in the original artifact—specification, compiler sources, etc).

abstract-revising-transformation:
        define
        undefine
        redefine
        inject
        project
        replace

define

An undefined nonterminal is resolved by this operator. The nonterminal must be in use. The introduce operator should be used when a fresh nonterminal is to be defined. The add operator should be used when an existing definition is to be extended.

Syntax

define:
        production+

undefine

This operator allows to undefine a nonterminal, i.e., remove all its productions. The nonterminal must have using occurrences elsewhere than just in its own definition. If there are no such using occurrences, then the less disruptive eliminate operator is to be used.

Syntax

undefine:
        nonterminal+

redefine

Redefine is a replace variant that works on production level. When this transformation is executed, all current productions for the nonterminal are dropped, and all the given ones are added to the grammar. This transformation is nothing more than syntactic sugar for an undefine followed directly with define. Having it as a separate type of transformation allows to more clearly distinguish completing the grammar with absent definitions (usually as initial correction step) and brutal nonterminal-level replacements (semantic revising).

Syntax

redefine:
        production+

inject

The argument production defines the intended result of the transformation — a production that has additional components in the sequential composition, when compared to the corresponding production in the input grammar with the same defined nonterminal and the same label, if any.

Syntax

inject:
        marked-production

Example

Given the input:

a:
        b d

After using this transformation:


inject(
 a:
        b <c> d
);

Will look like this:

a:
        b c d

project

The argument production defines the intended result of the transformation — a production that has fewer components in the sequential composition, when compared to the corresponding production in the input grammar with the same defined nonterminal and the same label, if any.

Syntax

project:
        marked-production

Example

Given the input:

a:
        b c d

After using this transformation:


project(
 a:
        b <c> d
);

Will look like this:

a:
        b d

replace

This operator provides a last resort to grammar editing. It basically provides access to free editing without any semantically meaningful preconditions. There are two expression arguments: one to be matched, and another one that replaces the matched expression. The scope of the transformation can be limited.

Syntax

replace:
        expression expression in::scope?

Example

It is possible to use replace in sophisticated context, cutting out any pieces of the grammar to be replace with something different. For instance, given the input:

a:
        b c
        b d
        b e

After using this transformation (suppose we do not have factor):


replace(
 ((b c) | (b e)),
 (b (c | e)));

Will look like this:

a:
        b (c | e)
        b d

Decorative transformations

Last but not least, there are four refactorings that are very specific to the BGF itself. Not all grammar definition formalisms have labelled productions, but since we do, we also need two transformation steps made possible: to designate an already available production with a label, and to unlabel an existing labelled production. By anonymising we refer to stripping selectors from subexpressions, and by deanonymising, naturally, adding selectors.

decorative-transformation:
        designate
        unlabel
        deanonymize
        anonymize

designate

An unlabeled production is labeled. The argument production is the intended result, i.e., the labeled production—the transformation refuses to work if the argument production contains no label. Labelling transformations serve two roles usually: they can be used directly to make the labels in both grammars agree so that they can converge; or they are used to mark the target for the transformations that follow them and perform local manipulations.

Syntax

designate:
        production

Example

Given the input:

expr:
        int

After using this transformation:


designate(
 [intexpr] expr:
        int
);

Will look like this:

[intexpr] expr:
        int

unlabel

This is a reverse of designate that strips an existing production from a label.

Syntax

unlabel:
        label

Example

Unlike the previous one, this transformation relies on the fact that all labels are unique within a grammar. This assumption allowed us to simplify the calling syntax. So, given the input:

[intexpr] expr:
        int

After using this transformation:


unlabel([intexpr]);

Will look like this:

expr:
        int

deanonymize

While labels are only used to name individual productions, selectors can name arbitrary parts of any BGF expression. This command allows to add a selector to the existing production. To avoid disambiguations, the whole production is required as an argument, with the newly introduced part being marked.

Syntax

deanonymize:
        marked-production

Example

Selectors can be introduced one at a time or in batch, but each one must be marked separately. For instance, given the input:

A:
        first::"a" "a" "a"

After using this transformation:


deanonymize(
 A:
        first::"a" <second::"a"> <third::"a">
);

Will look like this:

A:
        first::"a" second::"a" third::"a"

anonymize

This operator is the reverse of deanonymize that strips one (marked) selector from an existing definition.

Syntax

Given the input BGF and a clear goal to strip all selectors, it becomes trivial to generate a list of anonymize commands that, if executed on the same grammar, would produce a selector-free yet structurally equivalent grammar. We used such a generator called strips in the FL case study as the final stage to converge the abstract syntax (with selectors) with the concrete syntax (with terminals), see the section about generators.

anonymize:
        marked-production

Example

Given the input:

[binary] expr:
        "(" expr op::binary_op expr ")"

[unary] expr:
        op::unary_op expr

After using this transformation:


anonymize(
 [unary] expr:
        <op::unary_op> expr
);

Will look like this:

[binary] expr:
        "(" expr op::binary_op expr ")"

[unary] expr:
        unary_op expr

dump

This is a debugging tool for XBGF. When the dump command is encountered, the transformation sequence stops and the grammar in its current state is dumped to the file xbgf.log. The contents of xbgf.log can be used as an input for grammar comparator or for copy-pasting productions and expressions from the grammar to the construction point of the XBGF sequence.

Syntax

dump:
        ε

rename

Labels, nonterminals, selectors and terminals can be renamed. Being in line with the fundamental notion of renaming, such renaming must be done consistently throughout the entire grammar, without introducing any clashes. There is one justifiable exception. That is, arguably, the scope of selectors is the level of production as opposed to necessarily the entire grammar. Hence, selectors can be renamed in such a scope, optionally.

renameL

Syntax

[label] rename:
        from::label to::label

Renaming labels is a semantic preserving grammar transformation pretty-printed as renameL. Given two label names, it simply searches the grammar for productions with the original label and re-designates them with the new one. renameL is a simple syntactic sugar for the specific combination of unlabel and designate.

Example

Given the input:

[constant] expr:
        int

[binary] expr:
        expr op::binary_op expr

[unary] expr:
        op::unary_op expr

After using this transformation:


renameL([binary], [binary_expr] );

Will look like this:

[constant] expr:
        int

[binary_expr] expr:
        expr op::binary_op expr

[unary] expr:
        op::unary_op expr

renameN

Syntax

[nonterminal] rename:
        from::nonterminal to::nonterminal

Similarly, this transformation can be used to rename nonterminals. This variant is a syntactic sugar for the specific combination of inline and extract, it is a semantic preserving grammar transformation that is pretty-printed as renameN.

Example

Given the input:

[constant] expr:
        int

[binary] expr:
        expr op::binary_op expr

[unary] expr:
        op::unary_op expr

After using this transformation:


renameN(expr, exp);

Will look like this:

[constant] exp:
        int

[binary] exp:
        exp op::binary_op exp

[unary] exp:
        op::unary_op exp

renameS

Syntax

[selector] rename:
        in::label? from::selector to::selector

Selectors can also be renamed by a semantic preserving grammar transformation that is pretty-printed as renameS. This variant is a syntactic sugar for the specific combination of anonymize and deanonymize.

Example

Given the input:

[constant] expr:
        int

[binary] expr:
        expr op::binary_op expr

[unary] expr:
        op::unary_op expr

After using this transformation:


renameS(op, operator);

Will look like this:

[constant] expr:
        int

[binary] expr:
        expr operator::binary_op expr

[unary] expr:
        operator::unary_op expr

renameT

Syntax

[terminal] rename:
        from::terminal to::terminal

Renaming terminals breaks string-oriented (concrete) semantics, but is still possible. This variant is pretty-printed as renameT, its behaviour is essentially that of a sequential composition of abstractize and concretize, but its meaning is different: it changes an entity that is already present in the grammar, not removes or adds anything.

Example

Given the input:

x:
        "x"

After using this transformation:


renameT("x", "y");

Will look like this:

x:
        "y"

reroot

Redefine the roots (start symbols) of the grammar. The empty set of roots is interpreted to abbreviate the complete set of nonterminals used or defined by a grammar.

Syntax

reroot:
        root::nonterminal*

Compatibility

In order to establish the relation between BGF that is being used in the actual working system and BNF that is being reported here, we apply the whole process of grammar convergence to these two grammars. We presume to conclude that this BNF dialect plus empty grammars plus root elements is the same as BGF language plus indentation rules plus layout rules plus terminal symbols. BNF was defined as a base-line grammar for a pretty-printer and therefor defines concrete syntax. BGF is extracted from the corresponding XML Schema and contains abstract syntax annotated with selectors. We choose to converge them closer to abstract syntax (BGF). First, we resolve the name mismatch and compensate for the lack of notion of “root elements” in BNF:



renameN(expression, symbol);


reroot();


disappear(
 grammar:
        <root::nonterminal*> production*
);

The only remaining transformations applied to the BGF grammar are these:



narrow(
 production*,
 production+);


inline(terminal);


inline(nonterminal);


inline(selector);


vertical( in symbol );


vertical( in value );

As we can see, the first of these transformations, narrow, shortens the grammar, but its semantics only means that we do not want to have empty samples while an empty grammar is still acceptable in general. Since this is data refinement, a semantic decreasing transformation is used without hesitation. The rest of the transformational sequence are trivial refactorings (verticalisations are already performed in the example from the previous section). The BNF source undergoes the following transformation for stripping it from lexical details:


project(
 right-hand-side:
        <NEWLINE> (<INDENT> symbol+ <NEWLINE>)+ <NEWLINE>
);

Before the rest of the concrete syntax (i.e., the terminals) is stripped away, we need to add some labels that correspond to the selectors of its BGF counterpart:



extract(
 value:
        "STRING"
        "INT"
);


vertical( in symbol );


vertical( in value );


designate(
 [epsilon] symbol:
        "EPSILON"
);


designate(
 [empty] symbol:
        "EMPTY"
);


designate(
 [any] symbol:
        "ANY"
);


designate(
 [string] value:
        "STRING"
);


designate(
 [int] value:
        "INT"
);


designate(
 [value] symbol:
        value
);

Now we can remove all terminals from the grammar without disrupting its structure (i.e., various alternatives in symbol will not collide and vanish during normalisation phase). Since we do not want to make a distinction and plan for all terminals to be removed, the following transformation script is generated by a special tool executed automatically from LCI (see the section about generators).


abstractize(
 label:
        <"["> string <"]">
);


abstractize(
 production:
        label::label? nonterminal::string <":"> right-hand-side
);


abstractize(
 [epsilon] symbol:
        <"EPSILON">
);


abstractize(
 [empty] symbol:
        <"EMPTY">
);


abstractize(
 [any] symbol:
        <"ANY">
);


abstractize(
 [terminal] symbol:
        <"""> string <""">
);


abstractize(
 [selectable] symbol:
        selector::string <"::"> symbol
);


abstractize(
 [sequence] symbol:
        <"("> symbol+ <")">
);


abstractize(
 [choice] symbol:
        <"("> symbol (<"|"> symbol)* <")">
);


abstractize(
 [optional] symbol:
        symbol <"?">
);


abstractize(
 [plus] symbol:
        symbol <"+">
);


abstractize(
 [star] symbol:
        symbol <"*">
);


abstractize(
 [marked] symbol:
        <"<"> symbol <">">
);


abstractize(
 [string] value:
        <"STRING">
);


abstractize(
 [int] value:
        <"INT">
);

Figure. Full convergence diagram for BNF and BGF. The top nodes are sources, the bottom node is the target, the arc labels are separate XBGF scripts, the nodes contain numbers of name mismatches and structural mismatches between each step and the synch point (marked as a double circle).

Finally we run a grammar comparator to see what is left and notice one mismatch that is easily fixed with massage, as well as the right hand side still having (symbol⁺)⁺ instead of just symbol. This corresponds to the design decision that treats top-level choices and top-level sequences differently in BNF to make them more appealing to the eye by avoiding unnecessary parenthesizing. The very specific upgrade command is run twice here to fold first the sequence and then the choice.



massage(
 (symbol symbol*),
 symbol+);


upgrade(
 right-hand-side:
        <symbol>+
 [sequence] symbol:
        symbol+
);


upgrade(
 right-hand-side:
        <symbol>
 [choice] symbol:
        symbol+
);


inline(right-hand-side);

After that, the grammars fully converge. The conclusion is that BNF language plus empty grammars plus root elements is the same as BGF language plus indentation rules plus layout rules plus terminal symbols.

BGF transformation operator suite v.1.02010-03-07

Foreword

Normative references

Table of contents

Design goals

Notation

List of definitions

Folding and unfolding transformations

unfold

Syntax

Example

fold

Syntax

Example

inline

Syntax

Semantics

Example

extract

Syntax

Semantics

Example

abridge

Syntax

Semantics

Example

detour

Syntax

Example

unchain

Syntax

Semantics

Example

chain

Syntax

Example

Other refactoring transformations

massage

Syntax

Example

distribute

Syntax

Semantics

Example

factor

Syntax

Semantics

Example

deyaccify

Syntax

yaccify

Syntax

Semantics

Example

eliminate

Syntax

Example

introduce

Syntax

Example

import

Syntax

Semantics

Example

vertical

Syntax

Example

horizontal

Syntax

Example

rassoc

Syntax

Example

lassoc

Syntax

Example

equate

Syntax

Language increasing transformations

add

BGF transformation operator suite v.1.0

2010-03-07