Skip to content

Commit

Permalink
fix: Conditional assembly expressions (#65)
Browse files Browse the repository at this point in the history
Co-authored-by: Zeibura Kathau <zeibura.kathau@broadcom.com>
  • Loading branch information
asmelko and zeibura authored Aug 24, 2020
1 parent 225b74f commit 99c45ee
Show file tree
Hide file tree
Showing 146 changed files with 7,052 additions and 17,452 deletions.
1 change: 1 addition & 0 deletions benchmark/benchmark.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@
#include <fstream>
#include <iostream>
#include <limits>
#include <sstream>

#include "json.hpp"

Expand Down
77 changes: 34 additions & 43 deletions docs/Analyzer-pages/Expressions.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,61 +4,52 @@ HLASM differentiates two kinds of expressions: *Conditional Assembly* (CA) and *

HLASM evaluates CA expressions during assembly generation. For further details, refer to the [[HLASM overview]].

We employ the ANTLR 4 Parse-Tree Visitors during the expression evaluation. For further detail on ANTLR, refer to [[Third party libraries]]

The HLASM CA expression is conceptually similar to expressions in other languages: they support unary and binary operators, functions, variables and literals. In HLASM, each expression has a type. *Arithmetic*, *Logic*, *Character* expressions are supported. We implement the logic in the following classes:

`expression`
A pure virtual class that defines a shared interface, operators, and functions. The class also implements evaluation logic for terms and factors.

`diagnostic_op`
The concept of *diagnostics* is fundamental. During the evaluation of an expression, an error can occur (syntactic or semantic). Hence, we try to improve the user experience by reporting diagnostics. Each instance of `expression` has a pointer to `diagnostic_op` associated to it. If the pointer is `null`, it is considered error-free. During the evaluation of a child expression, the parent checks for errors and propagates the error upwards. The checks and propagation are implemented by the `copy_return_on_error` macro, which must be called immediately before the creation of a new expression during evaluation.

The `expression` class implements the evaluation as follows: A `std::deque` of `expression` pointers is passed. The evaluation iterates the list from left to right. Functions, binary, and unary operators consume the rest of the deque.
- `ca_expression`
A pure virtual class that defines a shared interface, operators, and functions.

Some expression symbols can be either HLASM keywords or variable identifiers (see the example below). Therefore, the resolution of symbols is complicated and cannot be done straight, but instead during the evaluation time. The order of the expression’s terms and the previous evaluation context is crucial for disambiguation.

<!-- -->
name operation operands
AND EQU 1
NOT EQU 0
AIF (NOT AND AND AND).LAB <- EVALUATES TO (!1 & 1)
- `ca_unary_operator` and `ca_binary_operator`
These virtual classes provide a point of inheritance for specialized classes that represent binary and unary operators that are found in HLASM. There are basic arithmetic operators (e.g. plus, minus) or function operators (e.g. `NOT`, `SLL`).

- `keyword_expression`
Helper class that represents HLASM keywords in expressions. It determines a keyword type from a string, containing its arity (unary, binary) and priority.
- Term classes `ca_function`, `ca_constant`, `ca_string`, `ca_symbol`, `ca_symbol_attribute`, `ca_var_sym`
These classes all inherit from the `ca_expression` class. Each of them represents a term that can be used in HLASM conditional assembly expressions.
The following examples show the usage of each class:

- `logic_expression`
Represents a boolean expression.
| Class | Examples of terms they represent |
|:------------------------|:----------------------------------|
| `ca_function` | `FIND('abc','d')`, `DCLEN('abcd')`|
| `ca_constant` | `42`, `C'A'` |
| `ca_string` | `'abc'`, `'**findme**'(3,*)` |
| `ca_symbol` | `R1` |
| `ca_symbol_attribute` | `L'DC_HALF`, `T'&VAR` |
| `ca_var_sym` | `&VAR`, `&VAR(1,2,3)` |

- `arithmetic_expression`
Represents an arithmetic expression.
- `ca_expr_list`
This is the class that holds a list of instantiated objects of the above stated classes.
In logical expressions, some symbols can be either expression operators or ordinary symbol identifiers (see the example below). Therefore, the resolution of symbols can be complicated and cannot be done straight during parsing. This class holds the list of the terms that contributed to the logical expressions and contains an algorithm that disambiguates the expression (from the example logical expression `(NOT AND AND AND)`, the object of this class would hold four `ca_symbol` objects, one `NOT` and three `AND`s.

- `arithmetic_logic_expr_wrapper`
HLASM language supports expressions with operands of mixed types. For more straightforward and readable use of arithmetic and logical expressions, this class wraps them under one class.
<!-- -->
AND EQU 1
AIF (NOT AND AND AND).LAB <- EVALUATES TO (!1 & 1)

- `character_expression`
Represents a character expression.
- `ca_expr_policies`
Static classes that provide useful information about each built-in function (e.g. return type, number of parameters) and operators. The classes are divided into arithmetic, logical and character because some operators have different meanings in different types of expressions (like logical and arithmetic `AND`).

- `ebcdic_encoding`
This class defines a custom EBCDIC literal and provides helper functions for conversion between EBCDIC and ASCII. EBCDIC is a character encoding used on IBM mainframes. It has a different layout to ASCII.
This class defines a custom EBCDIC literal and provides helper functions for conversion between EBCDIC and ASCII. EBCDIC is a character encoding used on IBM mainframes. It has a different layout to ASCII.

<img src="img/ebcdic.png" alt="EBCDIC layout. Taken from https://i.stack.imgur.com/h3u5A.png."/>

- `error_messages`
A static class with a list of all `diagnostic_op`s that can be generated from expressions.

## CA expression evaluation

In the previous section, we described the representation of the CA expressions themselves. In this section, we explain the coupling of CA expressions with grammar.

The `expression_evaluator` encapsulates the coupling logic between the grammar and the expression logic. That is, the evaluator has a notion about grammar, which translates into C++ expression logic.
## Resolution and evaluation of CA expressions

The top-level expression first gathers a list of space-separated expressions. The evaluation must be done using a list from left to right (not using a tree) as any token may be a keyword (such as the operator `AND`) or variable identifier, depending on the position in an expression (using language keywords as identifiers is allowed in HLASM). `expression::evaluate` provides the disambiguation.
To evaluate a CA expression, the expression object has to be resolved once. Each class overrides the `resolve_expression` method which typically checks whether it has the correct number of fields and that the fields are of the correct type. The `ca_expr_list` class does the most of the resolving work.
It contains an algorithm that creates an expression tree from its list of expression terms. This tree is then used for further evaluation.

During its work, the evaluator substitutes variable and ordinary symbols for their values. To know which values to substitute, the evaluator is given *evaluation context*. This consists of objects that are required for correct evaluation: *HLASM context* for symbol values, *attribute provider* for values of symbol attributes that are not yet defined and *library provider* for evaluation of some types of symbol attributes.
When an expression was resolved once, it can be properly evaluated.
During the evaluation, variable and ordinary symbols are substituted for their values. To determine which values to substitute, the `evaluate` method is given *evaluation context*. This consists of objects that are required for correct evaluation: *HLASM context* for symbol values, *attribute provider* for values of symbol attributes that are not yet defined and *library provider* for evaluation of some types of symbol attributes.

Lookahead is triggered in conditional assembly expressions when evaluation visits a yet undefined ordinary symbol. As this might be a rather demanding operation, the expression evaluator uses *expression analyzer*. It looks for all the undefined symbol references in the expression and collects them in a common collection. Then, the lookahead is triggered to look for all references in the collection. Hence, it is triggered once per expression rather than any time an undefined symbol reference is found.
Lookahead is triggered in conditional assembly expressions when an evaluation visits a yet undefined ordinary symbol. As this might be a rather demanding operation, the `ca_expression` class contains the method `get_undefined_attributed_symbols`. It looks for all the undefined symbol references in the expression and collects them in a common collection. Then, the lookahead can be triggered to look for all references in the collection. Hence, it is triggered once per expression rather than any time an undefined symbol reference is found.

# Machine expressions

Expand All @@ -70,16 +61,16 @@ We use a standard infix tree representation of expressions. There is an interfac

- `mach_expr_symbol` represents an ordinary symbol.

- `mach_expr_data_attr` represents an attribute of a symbol (e.g. `LSYM` is length of symbol `SYM`)
- `mach_expr_data_attr` represents an attribute of a symbol (e.g. `L'SYM` is length of symbol `SYM`)

- `mach_expr_location_counter` represents a location counter represented by an asterisk in expressions.

- `mach_expr_self_def` represents a self defining term (e.g. `X’1F’`)
- `mach_expr_self_def` represents a self defining term (e.g. `X'1F'`)

The following example shows a representation for one specific expression.

<img src="img/mach_expr_example.svg" alt="Example representation of the machine expression (A-4)+LB." />
<img src="img/mach_expr_example.svg" alt="Example representation of the machine expression (A-4)+L'B." />

Machine expressions can also evaluate the expressions they represent. The evaluation is done in a recursive manner. It is fairly simple when there are no symbols used in the expression — each node in the tree computes the result with basic arithmetic operations.

However, the process can get tricky since expressions might contain e.g. `mach_expr_symbol`, whose value is dependant on symbols defined in other parts of source code. Moreover, the result of a machine expression can be an absolute value (a number) or relocatable value (an address). The process of symbol resolution is explained in the *symbol dependency tables* section of [[HLASM context tables]].
However, the process can get tricky since expressions might contain e.g. `mach_expr_symbol`, whose value is dependant on symbols defined in other parts of source code. Moreover, the result of a machine expression can be an absolute value (a number) or relocatable value (an address). The process of symbol resolution is explained in the *symbol dependency tables* section of [[HLASM context tables]].
2 changes: 1 addition & 1 deletion docs/Home.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,4 +26,4 @@ Firstly, in [[HLASM overview]], we briefly explain the basics of HLASM needed t

The page [[language server]] describes the responsibilities of the language server as the communication provider between the extension client and the parsing library. The [[workspace manager]] is the entry point to the parsing library used by the language server. The purpose of its sub-components is to handle file management, dependency resolution and parsing.

The core of the processing of a HLASM file is implemented inside the [[analyzer]]. The project also provides macro tracing through the standard debugging procedure and it is fully explained in [[macro tracer]].The last mentioned component is the [[extension|VSCode extension]], which communicates with the language server and provides IDE features to the user. In [[build instructions]], we provide a guide to build this project.
The core of the processing of a HLASM file is implemented inside the [[analyzer]]. The project also provides macro tracing through the standard debugging procedure and it is fully explained in [[macro tracer]]. The last mentioned component is the [[extension|VSCode extension]], which communicates with the language server and provides IDE features to the user. In [[build instructions]], we provide a guide to build this project.
Loading

0 comments on commit 99c45ee

Please sign in to comment.