diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index 59366392e..dd849de23 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -19,7 +19,7 @@ Make sure that the user email you specified on your local git is the same as on For more information, please see the Eclipse Committer Handbook: https://www.eclipse.org/projects/handbook/#resources-commit -Such signoff is easily achieved using the `--signoff` option of the `git commit` command (provided that the git credentials are properly configured): +The signoff is easily achieved using the `--signoff` option of the `git commit` command (provided that the git credentials are properly configured): ``` git commit --signoff ``` diff --git a/README.md b/README.md index bea181c0d..6d60e5cb2 100644 --- a/README.md +++ b/README.md @@ -6,9 +6,7 @@ -HLASM Language Support is an extension for [Visual Studio Code](https://code.visualstudio.com/) (and [Theia](https://theia-ide.org/)) that adds support for -the High Level Assembler. It provides code completion, highlighting and navigation features, detects common mistakes in the source, and lets you trace the evaluation of the -conditional assembly source code, using a modern debugging interface. +HLASM Language Support is an extension for [Visual Studio Code](https://code.visualstudio.com/) (and [Theia](https://theia-ide.org/)) that adds support for the High Level Assembler language. It provides code completion, highlighting and navigation features, detects common mistakes in the source, and lets you trace the evaluation of the conditional assembly source code, using a modern debugging interface. The extension is available on the [Visual Studio Marketplace](https://marketplace.visualstudio.com/items?itemName=broadcomMFD.hlasm-language-support). You can install it in a standard way from within the Visual Studio Code. @@ -16,10 +14,10 @@ HLASM Language Support is also part of [Code4z](https://marketplace.visualstudio ## Useful information -- If you have a question about the functionalities of the extension, or come across a problem [file an issue](https://github.com/eclipse/che-che4z-lsp-for-hlasm/issues). -- Contributions are always welcome! Please see the [CONTRIBUTING.md](CONTRIBUTING.md) for more information -- See the project [wiki](https://github.com/eclipse/che-che4z-lsp-for-hlasm/wiki/) for project documentation -- For instructions how to build and install the project from source, see the project [wiki](https://github.com/eclipse/che-che4z-lsp-for-hlasm/wiki/Build-instructions) +- If you have a question about the functionalities of the extension, or come across a problem, [file an issue](https://github.com/eclipse/che-che4z-lsp-for-hlasm/issues). +- Contributions are always welcome! Please see the [CONTRIBUTING.md](CONTRIBUTING.md) for more information. +- See the project [wiki](https://github.com/eclipse/che-che4z-lsp-for-hlasm/wiki/) for project documentation. +- For instructions on how to build and install the project from source, see the project [wiki](https://github.com/eclipse/che-che4z-lsp-for-hlasm/wiki/Build-instructions). - All [releases](https://github.com/eclipse/che-che4z-lsp-for-hlasm/releases) are available here on GitHub. - Any and all feedback is appreciated and welcome! diff --git a/clients/vscode-hlasmplugin/README.md b/clients/vscode-hlasmplugin/README.md index 01e1c66d2..29b05567d 100644 --- a/clients/vscode-hlasmplugin/README.md +++ b/clients/vscode-hlasmplugin/README.md @@ -17,25 +17,27 @@ HLASM Language Support is also part of [Code4z](https://marketplace.visualstudio ## Getting Started -### Usage +### Enabling the Extension Follow these steps to open a HLASM project: -1. In menu _File_ -> _Open Folder..._, select the folder with the HLASM sources. +1. In _File_ -> _Open Folder..._, select the folder with the HLASM sources. 2. Open any HLASM source file (note that HLASM does not have a standard filename extension) or create a new file. 3. If the auto-detection of HLASM language does not recognize the file, set it manually in the bottom-right corner of the VS Code window. -4. The extension is now enabled on the open file. If you have macro definitions in separate files or use the COPY instruction, you need to setup the workspace. +4. The extension is now enabled on the open file. If you have macro definitions in separate files or use the COPY instruction, you need to set up a workspace. -### Setting up a multi-file project environment +### Setting Up a Multi-File Project Environment -HLASM COPY instruction copies the source code from various external files, as driven by HLASM evaluation. The source code interpreter in the HLASM Extension needs to be set up correctly to be able to find the same files as the HLASM assembler program. +The HLASM COPY instruction copies the source code from various external files, as driven by HLASM evaluation. The source code interpreter in the HLASM Extension needs to be set up correctly to be able to find the same files as the HLASM assembler program. -This is done by setting up two configuration files — `proc_grps.json` and `pgm_conf.json`. The extension guides the user in their creation: +To do this, set up two configuration files — `proc_grps.json` and `pgm_conf.json`. Follow these steps: -1. After opening a HLASM file for the first time, two pop-ups are displayed. Select _Create pgm_conf.json with current program_ and _Create empty proc_grps.json_. The two configuration files are then created with default values. They are written into the `.hlasmplugin` subfolder. -2. Navigate to the `proc_grps.json` file. This is the entry point where you can specify paths to macro definitions and COPY files. To do this, simply fill the `libs` array with the corresponding paths. For example, if you have your macro files in the `ASMMAC/` folder, add the string `"ASMMAC"` into the libs array. +1. After you open a HLASM file for the first time, two pop-ups display. Select _Create pgm_conf.json with current program_ and _Create empty proc_grps.json_. + The two configuration files are then created with default values. They are stored in the `.hlasmplugin` subfolder. +2. Navigate to the `proc_grps.json` file. This is the entry point where you can specify paths to macro definitions and COPY files. +3. Fill the `libs` array with the corresponding paths. For example, if you have your macro files in the `ASMMAC/` folder, add the string `"ASMMAC"` into the libs array. -Follow [Configuration](#External-Macro-Libraries-and-COPY-Members) for more detailed instructions for configuring the environment. +Follow [Configuration](#Configuration) for more detailed instructions on configuring the environment. ## Language Features @@ -83,16 +85,20 @@ Breakpoints can be set before or during the debugging session. ![](https://github.com/eclipse/che-che4z-lsp-for-hlasm/raw/master/clients/vscode-hlasmplugin/readme_res/tracer.gif) -## External Macro Libraries and COPY Members +## Configuration + +### External Macro Libraries and COPY Members The HLASM Language Support extension looks for locally stored members when a macro or COPY instruction is evaluated. The paths of these members are specified in two configuration files in the `.hlasmplugin` folder of the currently open workspace: - `proc_grps.json` defines _processor groups_ by assigning a group name to a list of directories. Hence, the group name serves as a unique identifier of a set of HLASM libraries defined by a list of directories. - `pgm_conf.json` provides a mapping between _programs_ (open-code files) and processor groups. It specifies which list of directories is used with which source file. If a relative source file path is specified, it is relative to the current workspace. -Therefore, to use a predefined set of macro and copy members, do the following steps: -1. Enumerate the library directories in `proc_grps.json` and name them with an identifier; thus, create a new processor group. -2. Use the identifier of the new processor group with the name of your source code file in `pgm_conf.json` to assign the library members to the program. +To use a predefined set of macro and copy members, follow these steps: +1. Specify any number of library directories to search for macros and COPY files in `proc_grps.json`. These directories are searched in order they are listed. +2. Name the group of directories with an identifier. + You have created a new processor group. +3. Use the identifier of the new processor group with the name of your source code file in `pgm_conf.json` to assign the library members to the program. The structure of the configuration is based on CA Endevor® SCM. Ensure that you configure these files before using macros from separate files or the COPY instruction. When you open a HLASM file or manually set the HLASM language for a file, you can choose to automatically create these files for the current program. @@ -142,42 +148,41 @@ The following example specifies that GROUP1 is used when working with `source_co ``` If you have the two configuration files configured as above and invoke the MAC1 macro from `source_code`, the folder `ASMMAC/` in the current workspace is searched for a file with the exact name "MAC1". If that search is unsuccessful the folder `C:/SYS.ASMMAC` is searched. If that search is unsuccessful an error displays that the macro does not exist. -Note that the macro `MAC1` is searched in directories in order as they are listed in the configuration. - -There is also the option `alwaysRecognize` which takes an array of wildcards. It allows you to configure two things: -- All files matching these wildcards will always be recognized as HLASM files. -- If an extension wildcard is defined, all macro and copy files with such extension may be used in the source code. For example, with the extension wildcard `*.hlasm`, a user may add macro `MAC` to his source code even if it is in a file called `Mac.hlasm`. - -Example of `alwaysRecognize`: - -With the following configuration file, processor group `GROUP1` will be assigned to `source_code` and `source_code.hlasm` file as well. Also, macro and copy files in the `lib` directory can be referenced and correctly recognized in the program without the `.asm` extension. +The program name in `pgm_conf.json` can be wildcarded, as in the following example: ``` { "pgms": [ { - "program": "source_code", + "program": "*", "pgroup": "GROUP1" } - ], - "alwaysRecognize" : ["*.hlasm", "libs/*.asm"] + ] } ``` +In this example, GROUP1 is used for all open code programs. -Example of wildcards: +### File Extensions + +`pgm_conf.json` includes the optional parameter `alwaysRecognize` in which you can specify an array of wildcards. +- All files matching these wildcards are automatically recognized as HLASM files. +- If an extension wildcard is defined, all macro and copy files with this extension can be used in the source code. + +For example, with the extension wildcard `*.hlasm`, a user can add the macro `MAC` to his source code even if it is in a file called `MAC.hlasm`. Additionally, all files with the extension `.hlasm` are automatically recognised as HLASM files. + +The following example of `pgm_conf.json` specifies that the processor group `GROUP1` is assigned to both `source_code` and `source_code.hlasm`. Also, macro and copy files in the `lib` directory are referenced and correctly recognized in the program without the `.asm` extension. -The program field in `pgm_conf.json` supports wildcards, for example: ``` { "pgms": [ { - "program": "*", + "program": "source_code", "pgroup": "GROUP1" } - ] + ], + "alwaysRecognize" : ["*.hlasm", "libs/*.asm"] } ``` -In this example, GROUP1 is used for all open code programs. ## Questions, issues, feature requests, and contributions - If you have a question about how to accomplish something with the extension, or come across a problem file an issue on [GitHub](https://github.com/eclipse/che-che4z-lsp-for-hlasm) diff --git a/docs/Analyzer-pages/Analyzer.md b/docs/Analyzer-pages/Analyzer.md index 0aa79adf0..c78814cf0 100644 --- a/docs/Analyzer-pages/Analyzer.md +++ b/docs/Analyzer-pages/Analyzer.md @@ -1,35 +1,32 @@ -The role of the analyzer is to provide a facade over objects and methods to create a simple interface for lexical and semantic processing (analyzing) of a single HLASM source file. The output of the analysis is the basic input of the LSP server. +The role of the analyzer is to provide a facade over objects and methods to create a simple interface for lexical and semantic processing (analysis) of a single HLASM source file. The output of the analysis is the basic input of the LSP server. -After the analyzer is constructed, it analyzes the provided source file. As a result, it updates HLASM context tables and provides a list of diagnostics linked to the file, highlighting, list of symbol definitions, etc. +After the analyzer is constructed, it analyzes the provided source file. As a result, it updates HLASM context tables and provides a list of diagnostics linked to the file, highlighting, a list of symbol definitions, etc. Overview -------- The analyzer is composed of several sub-components, all required to properly process the file. -**LSP data collector** collects and retrieves all LSP information created while processing the file. +- The **LSP data collector** collects and retrieves all LSP information created while processing the file. +- **HLASM context tables** hold information about the context of the processed HLASM source code. +- **Lexer–Parser sub-components** simplify the processing interface and ease the use of this component. They are needed to create a source file parser. +- The **processing manager** executes the main loop where the file is processed. -**HLASM context tables** hold information about the context of the processed HLASM source code. +The LSP data collector is required by the Lexer-Parser sub-components. They are composed into the parser object required by the processing manager. HLASM context tables are used by the manager and the sub-components as well. -**Lexer–Parser sub-components** simplify the processing interface and ease the use of this component. They are needed to create a source file parser. - -**Processing manager** executes the main loop where the file is processed. - -LSP data collector is required by Lexer-Parser sub-components. They are composed into the parser object required by processing manager. HLASM context tables are used by the manager and the sub-components as well. - -The components together contribute to the proper functionality of the method `analyze`. It processes a provided file and fills LSP data collector from which LSP information can be further retrieved. +The components together contribute to the proper functionality of the method `analyze`. It processes a provided file and fills the LSP data collector, from which LSP information can be further retrieved. ### Construction In order to parse a HLASM file, the analyzer class is constructed with the following parameters: -- *Name and content of a file.* +- *Name and content of the file.* - *Parse library provider* – an object responsible for resolving source file dependencies. The dependencies are only discovered during the analysis, so it is not possible to provide the files beforehand. -- *Processing tracer* (see [[Macro tracer]]). +- *Processing tracer* (see [[macro tracer]]). -When this constructor is used, the analyzer creates HLASM context tables and processes the provided source as an open-code. We say that the analyzer has *owner semantics*; it is the owner of the context tables. +When this constructor is used, the analyzer creates HLASM context tables and processes the provided source as an open-code. The analyzer has *owner semantics*; it is the owner of the context tables. The analyzer provides *reference semantics* as well (holding just a reference of the context tables). The provided source is not treated as an open-code, rather as an external file dependency. The constructor of an analyzer with reference semantics adds the following two parameters to the previous one: @@ -37,9 +34,9 @@ The analyzer provides *reference semantics* as well (holding just a reference of - *Library data* — states how the dependency file should be treated (see [[Processing manager]]). -This constructor is called within open-code analyzer by its sub-components when they use the *Parse library provider*. +This constructor is called within the open-code analyzer by its sub-components when they use the *Parse library provider*. -The components of analyzer are further described in the following pages: +The components of the analyzer are further described in the following pages: 1. [[LSP data collector]] 2. [[Processing manager]] 3. [[Instruction format validation]] diff --git a/docs/Analyzer-pages/Expressions.md b/docs/Analyzer-pages/Expressions.md index 8289a4a46..2cca81cdd 100644 --- a/docs/Analyzer-pages/Expressions.md +++ b/docs/Analyzer-pages/Expressions.md @@ -6,17 +6,17 @@ HLASM evaluates CA expressions during assembly generation. For further details, We employ the ANTLR 4 Parse-Tree Visitors during the expression evaluation. For further detail on ANTLR, refer to [[Third party libraries]] -HLASM CA expression is conceptually similar to the expressions in other languages: they support unary and binary operators, functions, variables and literals. In HLASM, each expression has a type. *Arithmetic*, *Logic*, *Character* expressions are supported. We implement the logic in the following classes: +The HLASM CA expression is conceptually similar to expressions in other languages: they support unary and binary operators, functions, variables and literals. In HLASM, each expression has a type. *Arithmetic*, *Logic*, *Character* expressions are supported. We implement the logic in the following classes: `expression` A pure virtual class that defines a shared interface, operators, and functions. The class also implements evaluation logic for terms and factors. `diagnostic_op` -The concept of *diagnostics* is fundamental. During the evaluation of an expression, an error can occur (syntactic or semantic). Hence, we try to improve the user experience by reporting diagnostics. Each instance of `expression` has a pointer to `diagnostic_op` associated to it. If the pointer is `null`, it is considered error-free. During the evaluation of a child expression, the parent checks for errors and propagates the error upwards. Checking and propagating of an error is implemented by `copy_return_on_error` macro, which one should call immediately before the creation of a new expression during evaluation. +The concept of *diagnostics* is fundamental. During the evaluation of an expression, an error can occur (syntactic or semantic). Hence, we try to improve the user experience by reporting diagnostics. Each instance of `expression` has a pointer to `diagnostic_op` associated to it. If the pointer is `null`, it is considered error-free. During the evaluation of a child expression, the parent checks for errors and propagates the error upwards. The checks and propagation are implemented by the `copy_return_on_error` macro, which must be called immediately before the creation of a new expression during evaluation. The `expression` class implements the evaluation as follows: A `std::deque` of `expression` pointers is passed. The evaluation iterates the list from left to right. Functions, binary, and unary operators consume the rest of the deque. -Some expression symbols can be either HLASM keywords or variable identifiers (see example below). Therefore, the resolution of symbols is complicated and cannot be done straight, but instead during the evaluation-time. The order of the expression’s terms and the previous evaluation context is crucial for the disambiguation. +Some expression symbols can be either HLASM keywords or variable identifiers (see the example below). Therefore, the resolution of symbols is complicated and cannot be done straight, but instead during the evaluation time. The order of the expression’s terms and the previous evaluation context is crucial for disambiguation. name operation operands @@ -25,61 +25,61 @@ Some expression symbols can be either HLASM keywords or variable identifiers (se NOT EQU 0 AIF (NOT AND AND AND).LAB <- EVALUATES TO (!1 & 1) -`keyword_expression` +- `keyword_expression` Helper class that represents HLASM keywords in expressions. It determines a keyword type from a string, containing its arity (unary, binary) and priority. -`logic_expression` +- `logic_expression` Represents a boolean expression. -`arithmetic_expression` +- `arithmetic_expression` Represents an arithmetic expression. -`arithmetic_logic_expr_wrapper` +- `arithmetic_logic_expr_wrapper` HLASM language supports expressions with operands of mixed types. For more straightforward and readable use of arithmetic and logical expressions, this class wraps them under one class. -`character_expression` +- `character_expression` Represents a character expression. -`ebcdic_encoding` -This class defines a custom EBCDIC literal and provides helper functions for conversion between EBCDIC and ASCII. EBCDIC is a character encoding used in the IBM mainframe. It has a different layout than ASCII. +- `ebcdic_encoding` +This class defines a custom EBCDIC literal and provides helper functions for conversion between EBCDIC and ASCII. EBCDIC is a character encoding used on IBM mainframes. It has a different layout to ASCII. EBCDIC layout. Taken from https://i.stack.imgur.com/h3u5A.png. -`error_messages` -It is a static class with list of all `diagnostic_op` that can be generated from expressions. +- `error_messages` +A static class with a list of all `diagnostic_op`s that can be generated from expressions. ## CA expression evaluation -In the previous section, we described the representation of the CA expressions themselves. In this section, we explain the coupling of CA expressions with grammar via visitor. +In the previous section, we described the representation of the CA expressions themselves. In this section, we explain the coupling of CA expressions with grammar. The `expression_evaluator` encapsulates the coupling logic between the grammar and the expression logic. That is, the evaluator has a notion about grammar, which translates into C++ expression logic. -The top-level expression first gathers a list of space-separated expressions. The evaluation must be done using a list from left to right (not using a tree) as any token may be a keyword (such as `AND` operator) or variable identifier, depending on a position in an expression (using language keywords as identifiers is allowed in HLASM). `expression::evaluate` provides the disambiguation. +The top-level expression first gathers a list of space-separated expressions. The evaluation must be done using a list from left to right (not using a tree) as any token may be a keyword (such as the operator `AND`) or variable identifier, depending on the position in an expression (using language keywords as identifiers is allowed in HLASM). `expression::evaluate` provides the disambiguation. -During its work, evaluator substitutes variable and ordinary symbols for their values. To know which values to substitute, evaluator is given *evaluation context*. It consists of objects that are required for correct evaluation: *HLASM context* for symbol values, *attribute provider* for values of symbol attributes that are not yet defined and *library provider* for evaluation of some types of symbol attributes as well. +During its work, the evaluator substitutes variable and ordinary symbols for their values. To know which values to substitute, the evaluator is given *evaluation context*. This consists of objects that are required for correct evaluation: *HLASM context* for symbol values, *attribute provider* for values of symbol attributes that are not yet defined and *library provider* for evaluation of some types of symbol attributes. -Lookahead is triggered in conditional assembly expressions when evaluation visits yet undefined ordinary symbol. As this can be rather demanding operation, expression evaluator uses *expression analyzer*. It looks for all the undefined symbol references in expression and collects them to a common collection. Then, the lookahead is triggered to look for all references in the collection. Hence, it is triggered once per expression rather than any time an undefined symbol reference is found. +Lookahead is triggered in conditional assembly expressions when evaluation visits a yet undefined ordinary symbol. As this might be a rather demanding operation, the expression evaluator uses *expression analyzer*. It looks for all the undefined symbol references in the expression and collects them in a common collection. Then, the lookahead is triggered to look for all references in the collection. Hence, it is triggered once per expression rather than any time an undefined symbol reference is found. # Machine expressions -In HLASM, machine expressions are used as operands of machine and assembler instructions. Their result may be a simple absolute number or an address. +In HLASM, machine expressions are used as operands of machine and assembler instructions. Their result is a simple absolute number or an address. -We use a standard infix tree representation of expressions. There is an interface `machine_expression` which is implemented by several classes that represent operators and terms. Each binary operator holds two expressions — left and right operands. Terms are leaf classes that do not hold any other expressions and directly represent a value. There are several classes representing different terms valid in machine expressions: +We use a standard infix tree representation of expressions. There is an interface, `machine_expression`, which is implemented by several classes that represent operators and terms. Each binary operator holds two expressions — the left and right operands. Terms are leaf classes that do not hold any other expressions and directly represent a value. There are several classes representing different terms valid in machine expressions: -- `mach_expr_constant` represents simply a number. +- `mach_expr_constant` represents a number. - `mach_expr_symbol` represents an ordinary symbol. -- `mach_expr_data_attr` represents attribute of a symbol (e.g. `L’SYM` is length of symbol `SYM`) +- `mach_expr_data_attr` represents an attribute of a symbol (e.g. `L’SYM` is length of symbol `SYM`) -- `mach_expr_location_counter` represents location counter represented by asterisk in expressions. +- `mach_expr_location_counter` represents a location counter represented by an asterisk in expressions. -- `mach_expr_self_def` represents self defining term (e.g. `X’1F’`) +- `mach_expr_self_def` represents a self defining term (e.g. `X’1F’`) -shows an example representation for one concrete expression. +The following example shows a representation for one specific expression. Example representation of the machine expression (A-4)+L’B. -Machine expressions are also able to evaluate the expressions they represent. The evaluation is done in a recursive manner. It is fairly simple when there are no symbols used in the expression — each node in the tree simply computes the result with basic arithmetic operations. +Machine expressions can also evaluate the expressions they represent. The evaluation is done in a recursive manner. It is fairly simple when there are no symbols used in the expression — each node in the tree computes the result with basic arithmetic operations. -However, the process can get tricky since expressions may contain e.g. `mach_expr_symbol` whose value is dependant on symbols defined in other parts of source code. Moreover, result of a machine expression may be an absolute value (a number) or relocatable value (an address). The process of symbols resolution is explained in the *symbol dependency tables* section of [[HLASM context tables]]. +However, the process can get tricky since expressions might contain e.g. `mach_expr_symbol`, whose value is dependant on symbols defined in other parts of source code. Moreover, the result of a machine expression can be an absolute value (a number) or relocatable value (an address). The process of symbol resolution is explained in the *symbol dependency tables* section of [[HLASM context tables]]. \ No newline at end of file diff --git a/docs/Analyzer-pages/HLASM-context-tables.md b/docs/Analyzer-pages/HLASM-context-tables.md index 55d303c62..c919fa5dd 100644 --- a/docs/Analyzer-pages/HLASM-context-tables.md +++ b/docs/Analyzer-pages/HLASM-context-tables.md @@ -1,46 +1,59 @@ -HLASM context tables (in code referred simply as `hlasm_context`) are composition of tables and stacks that describe the state of the currently processed open-code. This structure is persistent between source files within an open-code. It is created in an analyzer and has the same lifespan. +HLASM context tables (in code referred to as `hlasm_context`) are composed of tables and stacks that describe the state of the currently processed open-code. This structure is persistent between source files within an open-code. It is created in an analyzer and has the same lifespan. -It is composed of: +The structure is composed of: -- *Macro & Copy storage* – stores macro and copy definition definitions. +- *Macro & Copy storage* +Stores macro and copy definitions. -- *ID storage* – stores symbol identifiers. +- *ID storage* +Stores symbol identifiers. -- *Scope stack* – stores nested macro invocations and local variable symbols. +- *Scope stack* +Stores nested macro invocations and local variable symbols. -- *Global variable symbol storage* – stores global variable symbols. +- *Global variable symbol storage* +Stores global variable symbols. -- *Source stack* – stores nested source files. +- *Source stack* +Stores nested source files. -- *Processing stack* – stores stack of processings in a source file. +- *Processing stack* +Stores processing stacks in a source file. -- *LSP context* – stores structures for LSP requests. +- *LSP context* +Stores structures for LSP requests. -- *Ordinary assembly context* – encapsulates structures describing Ordinary assembly. +- *Ordinary assembly context* +Encapsulates structures describing ordinary assembly. -### Macro definition +### Macro Definition HLASM context stores visited macro definitions in the *macro strorage*. -Macro definition is represented by: +A macro definition is represented by: -- *Macro identifier*. It identifies the macro. +- *Macro identifier* +Identifies the macro. -- *Calling parameters*. They are assigned real value when the macro is called. +- *Calling parameters*. +These parameters are assigned a real value when the macro is called. -- *Block of statement*. It represents the body of the macro. +- *Block of statements*. +Represents the body of the macro. -- *Block of copy nestings*. It is an array with one-to-one relation with block of statements. Each entry is a list of in-file locations that represents how much is the statement nested in COPY calls. +- *Block of copy nestings*. +An array with a one-to-one relation with a block of statements. Each entry is a list of in-file locations that represents how often the statement is nested in COPY calls. -- *Label storage*. The storage of sequence symbols that occur in the macro definition. +- *Label storage*. +The storage of sequence symbols that occur in the macro definition. -When macro is called, *macro invocation* object is created. It shares the content of a respective macro definition with an exception of calling parameters as they are assigned real value passed with the call. Also, it contains index to the top statement of the invocation. +When a macro is called, a *macro invocation* object is created. It shares the content of the respective macro definition with an exception of calling parameters as they are assigned real values when passed with the call. Also, it contains an index to the top statement of the invocation. The macro invocation is stored in the context’s *scope stack*. -### Scope stack +### Scope Stack -This stack (stack of `code_scope` objects) holds information about the scope of variable symbols. The scope changes when macro is visited. The initial scope is the open-code. +This stack (the stack of `code_scope` objects) holds information about the scope of variable symbols. The scope changes when a macro is visited. The initial scope is the open-code. The stack elements contain: @@ -54,63 +67,63 @@ The stack elements contain: ### COPY -HLASM context stores visited COPY members in the *copy strorage*. +HLASM context stores visited COPY members in the *copy storage*. -COPY member definition is much more simple than the macro definition as it does not hold any more semantic information than the sequence of statements (the definition itself). +The COPY member definition is much more simple than the macro definition as it does not hold any more semantic information than the sequence of statements (the definition itself). -When copy is visited, copy member invocation is created and pushed in the copy stack of last entry of the *source stack*. +When a copy is visited, a copy member invocation is created and pushed in the copy stack of the last entry of the *source stack*. -### Source stack and Processing stack +### Source Stack and Processing Stack -This stacks are responsible for the nests of opened files (source stack) and what they are opened for (processing stack). As the relation of source entry and processing entry is one-to-many, the information is stored in two arrays rather than one. +These stacks are responsible for the nests of opened files (source stack) and what they are opened for (processing stack). As the relation of source entry and processing entry is one-to-many, the information is stored in two arrays rather than one. -When [[statement processor|statement processors]] is changed (e.g. macro or copy definition is processed, lookahead is needed, ...), this information is stored in the processing stack. If a new file is opened during this change then source stack is updated as well. +When [[statement processors]] are changed (e.g. when a macro or copy definition is processed, or a lookahead is needed), this information is stored in the processing stack. If a new file is opened during this change then source stack is updated as well. -Source stack contains: +The source stack contains the following: - *Source file identifier* - *Copy stack* – the nest of copy calls active for the source file. -- *Processed statement location* – data that locates last processed statement in the source file. +- *Processed statement location* – data that locates the last processed statement in the source file. -Processing stack contains *processing kind*. +The processing stack contains the *processing kind*. -The reasoning of organizing this two stacks in such a way is: +The reason behind organizing these two stacks in such a way is: -1. Context has enough information to fully reconstruct the statement. +1. The context has enough information to fully reconstruct the statement. -2. Easy retrieval of the correct copy stack for copy statement provider. +2. Ease of retrieval of the correct copy stack for a copy statement provider. -### ID storage +### ID Storage ID storage holds the string identifiers that are used by the open-code. It stores the string and retrieves a pointer. It is guaranteed that if two different strings with the same value are passed to the storage, the resulting pointers are equal. It simplifies work with IDs and saves space. -### Variable symbols +### Variable Symbols -In HLASM language, variable symbol is general term for symbols beginning with ampersand. However, they can be separated into several structures that capture a common behavior: +In HLASM language, a variable symbol is a general term for symbols beginning with an ampersand. However, they can be separated into several structures that capture a common behavior: -- *SET symbols* – represent HLASM SET symbols. +- *SET symbols* -- *System variables* – represent HLASM system variables. +- *System variables* -- *Macro parameters* – represent HLASM macro parameters. +- *Macro parameters* -They inherit common abstract ancestor *variable symbol*. SET symbols are further divided into *SETA*, *SETB* and *SETC* symbols. Macro parameters are divided into *keyword* and *positional* parameters (see the picture below). They are stored in respective storage (global storage, scope stack, macro definition) that determines their scope. +They inherit the *variable symbol* as a common abstract ancestor. SET symbols are further divided into *SETA*, *SETB* and *SETC* symbols. Macro parameters are divided into *keyword* and *positional* parameters (see the picture below). They are stored in whichever storage determines their scope (global storage, scope stack, macro definition). The inheritance of variable symbols. -### LSP context +### LSP Context -The LSP context serves as the collection point for the data needed to answer the LSP requests. It is a part of the HLASM context to be able to pass on the LSP data between different parsed files. +The LSP context serves as the collection point for the data needed to answer LSP requests. It is part of the HLASM context to be able to pass on LSP data between different parsed files. The [[LSP data collector]] stores its values inside the LSP context tables. -### Ordinary assembly context +### Ordinary Assembly Context -The above described structures aimed to describe the high-level part of the language (code generation). As we move closer to the resulting object code of the source file, the describing structures get complicated. Therefore, HLASM context contains object storing just this part of the processing. +The above described structures aimed to describe the high-level part of the language (code generation). As we move closer to the resulting object code of the source file, the describing structures get complicated. Therefore, HLASM context contains an object storing just this part of the processing. The composition of ordinary assembly context @@ -118,97 +131,92 @@ Ordinary assembly context consists of three main components (see the picture abo 1. *Symbol storage*. Stores ordinary symbols. -2. *Section storage*. Has notion of all generated sections, each section containing its location counters. +2. *Section storage*. Has the notion of all generated sections, each section containing its location counters. 3. *Symbol dependency tables*. Contains yet unresolved dependencies between symbols prior to the currently processed instruction. #### Symbol -This class represents HLASM ordinary symbol. Besides its identifier and location, symbol contains *value* and *attributes* components. +This class represents HLASM ordinary symbols. Besides its identifier and location, the symbol contains the components *value* and *attributes*. Value -can be assigned *absolute* or *relocatable* values. With addition to that, it can also be assigned an empty value stating that symbol is not yet defined. +*Absolute* or *relocatable* values can be assigned. In addition, it can also be assigned an empty value stating that symbol is not yet defined. Attributes -structure holds symbol attributes like type, length, scale and integer. +Holds symbol attributes like type, length, scale and integer. #### Section -Section is a structure representing HLASM section (created by CSECT, DSECT, ...). It contains enumeration *section kind* describing type of the section prior to the used instruction. The structure also holds *location counter storage* with defined location counters. +This class is a structure representing a HLASM section (created by CSECT, DSECT, ...). It contains the *section kind*, which describes the type of the section prior to the used instruction. The structure also holds the *location counter storage* with defined location counters. -#### Location counter +#### Location Counter -This structure contains data and operations for one location counter. The data is stored in helper sub-structure *location counter data*. +This structure contains data and operations for one location counter. The data is stored in the helper sub-structure *location counter data*. -Location counter data -is a structure defining current value of the location counter. It consists of: +The location counter data is a structure defining the current value of the location counter. It consists of: -- *Storage* stating total number of bytes occupied by the location counter. +- *Storage*, stating the total number of bytes occupied by the location counter. -- Vector of *spaces*, blocks of bytes with yet not known length. +- A vector of *spaces*, blocks of bytes with unknown length. -- Vector of *storage* between each space. +- A vector of *storage* between each space. -- Currently valid *alignment* (used when data contain spaces). +- The currently valid *alignment* (used when data contains spaces). -The location counter value is transformable into a relocatable value. It is represented by structure *address*. +The location counter value is transformable into a relocatable value. It is represented by the structure *address*. -Address -consists of: +The address consists of: -- Array of *bases*. A base is a beginning of a corresponding section. They serve as points of reference for the address. +- An array of *bases*. A base is the beginning of a corresponding section. They serve as points of reference for the address. -- Array of *spaces* that are present in the address. +- An array of *spaces* that are present in the address. -- *Offset* from the bases. +- The *offset* from the bases. -The common composition of an address is one base section (as the start of the address) and value of storage (as the offset from it). +The common composition of an address is one base section (as the start of the address) and the value of storage (as the offset from it). -The need for the whole array of bases to be present is because addresses from different sections can be arbitrarily added or subtracted. This information is needed as the correct sequence of arithmetic operations can reduce number of bases (even spaces) to zero and create absolute value. This value can be later used in places where a relocatable value would be forbidden. +The need for the whole array of bases to be present is because addresses from different sections can be arbitrarily added or subtracted. This information is needed as the correct sequence of arithmetic operations can reduce the number of bases (even spaces) to zero and create an absolute value. This value can be later used in places where a relocatable value is forbidden. -Space -is block of bytes with yet not known length. It is created in the active location counter when execution of counter’s operation can not be performed due to non previously defined ordinary symbols. See the different kind of spaces and the reason of creation in the table below. +A space is a block of bytes with an unknown length. It is created in the active location counter when execution of the counter’s operation cannot be performed due to undefined ordinary symbols. The table below lists the different kind of spaces and the reason for their creation. -| **Space Kind** | **Creation**| -|:---------------|-----------------------------------------------------------:| -| Ordinary | when instruction outputs data of unknown length| -| LOCTR begin | when defining more than one location counter in a section| -| Alignment | when current alignment is unknown due to previous spaces| -| LOCTR set | when moving counter’s value to the address with spaces| -| LOCTR max | when moving counter’s value to the next available location| -| LOCTR unknown | when moving counter’s value to the yet unknown address| +| **Space Kind** | **Creation**| +|:---------------|-------------------------------------------------------------:| +| Ordinary | when an instruction outputs data of unknown length| +| LOCTR begin | when defining more than one location counter in a section| +| Alignment | when the current alignment is unknown due to previous spaces| +| LOCTR set | when moving the counter’s value to the address with spaces| +| LOCTR max |when moving the counter’s value to the next available location| +| LOCTR unknown | when moving the counter’s value to an unknown address| -When a space length becomes known, all addresses containing the spaces need to be updated (remove the space and append offset). Therefore, space structure contains an array of address listeners. Hence, when an address is assigned a relocatable value that contains the space, the address is added to its array. This serves as an easy point of space resolving. +When a space length becomes known, all addresses containing the spaces need to be updated (remove the space and append the offset). Therefore, space structure contains an array of address listeners. When an address is assigned a relocatable value that contains the space, the address is added to its array. This serves as an easy point of space resolving. +The ORG instruction can arbitrarily move the location counter’s value forward and backward. In addition to that, ORG can also order the location counter to set its value to the next available value (the lowest untouched address, see [[HLASM overview#Location_Counter|HLASM overview]]). Combining this with the possible spaces creation, the location counter holds an array of location counter data to properly set the next available value. +#### Symbol Dependency Tables -ORG instruction can arbitrarily move location counter’s value forward and backward. With addition to that, ORG can also order location counter to set it’s value to the next available value (the lowest untouched address, see location counter in [[HLASM overview]]). Combining this with the possible spaces creation, location counter holds an array of the location counter data to properly set the next available value. - -#### Symbol dependency tables - -HLASM forbids cyclic symbol definition. This component maintains dependencies between symbols and detects possible cycles. Let us describe the main components of dependency resolving. +HLASM forbids cyclic symbol definition. This component maintains dependencies between symbols and detects possible cycles. This section describes the main components of dependency resolution. - **Dependant** -is a structure used in the symbol dependency tables. It encapsulates objects that can be dependent on another. Dependant object can be a *symbol*, *symbol attribute* and *space*. +A structure used in the symbol dependency tables. It encapsulates objects that can be dependent on another. A dependant object can be a *symbol*, *symbol attribute* or a *space*. - **Dependable** -interface is implemented by a class if its instance can contain dependencies. The interface has a method to retrieve a structure holding the respective *dependants*. +An interface that is implemented by a class if its instance can contain dependencies. The interface has a method to retrieve a structure holding the respective *dependants*. - **Resolvable** -interface adds up to the dependable interface. It is implemented by objects that serve as values assignable to *depednants*. It provides methods to return *symbol value* with help of the dependency solver. +An interface that adds up to the dependable interface. It is implemented by objects that serve as values assignable to *dependants*. It provides methods to return a *symbol value* with the help of the dependency solver. -- **Dependency solver** -is an interface that can return value of the symbol providing its identifier. It is implemented by Ordinary assembly context. +- **Dependency Solver** +An interface that can return the value of a symbol providing its identifier. It is implemented by ordinary assembly context. -Having described building blocks, we can move to the symbol dependency tables composition. +Having described the building blocks, we can move to the symbol dependency tables composition. - **Dependency map** -is the primary storage of dependencies. It has *dependants* as keys and *resolvables* as values. The semantics for pair *(D,R)* is that D is dependent on the dependencies from R. Each time new dependency is added, this map is searched for cycle. +The primary storage of dependencies. It has *dependants* as keys and *resolvables* as values. The semantics for pair *(D,R)* is that D is dependent on the dependencies from R. Each time a new dependency is added, this map is searched for cycles. -- **Dependency sources map** - serves as a source objects storage of a resolvable in the dependency map. Hence for the pair *(D,R)* from dependency map, source object of *R* is in the dependency source map under the key *D*. +- **Dependency Sources Map** + Serves as a source object storage of a resolvable in the dependency map. Hence for the pair *(D,R)* from the dependency map, the source object of *R* is in the dependency source map under the key *D*. - The source objects are statements. To be more specific, as one statement can be a source for more distinct resolvables, this source map only stores pointers to the *postponed statements storage*. + The source objects are statements. As one statement can be a source for more distinct resolvables, this source map only stores pointers to the *postponed statements storage*. -- **Postponed statements storage** -holds statements that are sources of resolvables in dependency map. The reason they are stored is that they can not be checked yet as they contain dependencies. Therefore, they are postponed in the storage until all of the dependencies are resolved. Then they are passed to the respective checker. +- **Postponed Statements Storage** +Holds statements that are sources of resolvables in dependency map. The reason they are stored is that they cannot be checked yet as they contain dependencies. Therefore, they are postponed in the storage until all of the dependencies are resolved. Then they are passed to the respective checker. diff --git a/docs/Analyzer-pages/Instruction-format-validation.md b/docs/Analyzer-pages/Instruction-format-validation.md index 809a08640..d0225b340 100644 --- a/docs/Analyzer-pages/Instruction-format-validation.md +++ b/docs/Analyzer-pages/Instruction-format-validation.md @@ -1,50 +1,48 @@ -Instruction format validation ------------------------------ -One of the essential ways to provide results of the parsing to the user is through error messages. Many of these messages are created in *Instruction checker* which validates the usage of different kinds of instructions. +One of the essential ways to provide results of parsing to the user is through error messages. Many of these messages are created in the *Instruction checker* which validates the usage of different kinds of instructions. -Instruction checker is an abstract class for various types of instructions. Its `check` method is being called from the [[instruction processors]] to check whether the specific instruction is used with correct parameters. As assembler and machine instructions have different formats, we derive separate *assembler* and *machine* checkers from the instruction checker. CA instructions do not have a derived checker class as they are all being checked during their interpretation. +Instruction checker is an abstract class for various types of instructions. Its `check` method is called from the [[instruction processors]] to check whether the specific instruction is used with correct parameters. As assembler and machine instructions have different formats, we derive separate *assembler* and *machine* checkers from the instruction checker. CA instructions do not have a derived checker class as they are all checked during their interpretation. -The checkers need an access to the definitions of all possible instructions. These instructions are stored statically inside an object called *instruction*. It consists of 4 different containers: +The checkers need access to the definitions of all possible instructions. These instructions are stored statically inside an object called an *instruction*. It consists of 4 different containers: -- *machine_instructions* is a map of instruction names to machine instruction object, which contains various data such as format, size or vector of instruction’s operands. +- *machine_instructions* is a map of instruction names to machine instruction objects, which contains various data such as format, size and the vector of the instruction’s operands. -- *mnemonic_codes* maps instruction names to their mnemonic code. The mnemonic codes are simplified versions of specific machine instructions, substituting one of the operands by a default value. The mnemonic code objects provides a list of operands to be substituted along with the original instruction name. +- *mnemonic_codes* maps instruction names to their mnemonic code. The mnemonic codes are simplified versions of specific machine instructions, substituting one of the operands with a default value. The mnemonic code objects provides a list of operands to be substituted along with the original instruction name. -- *assembler_instructions* is similar to the machine instructions. However, as the assembler instructions do not have formats, these classes only state minimum/maximum number of operands for specific instruction. In the **Assembler instruction checker** section below, we explain how the assembler instructions are validated. +- *assembler_instructions* is similar to the machine instructions. However, as the assembler instructions do not have formats, these classes only state the minimum/maximum number of operands for a specific instruction. In the **Assembler Instruction Checker** section below, we explain how the assembler instructions are validated. - *ca_instructions* only contains a list of possible CA instructions. -Both assembler and machine checker works in a similar manner: +Both the assembler and machine checkers work in a similar manner: -1. Either assembler or machine processor calls the `check` method of its respective checker. This method accepts the instruction name, the vector of used operands, the range of statement and the diagnostic collector. +1. Either the assembler or machine processor calls the `check` method of its respective checker. This method accepts the instruction name, the vector of the operands used, the range of the statement and the diagnostic collector. -2. Checker finds the correct instruction based on the provided name and calls the `check` method of its instruction class, along with the same parameters as mentioned above. +2. The checker finds the correct instruction based on the provided name and calls the `check` method of its instruction class, along with the same parameters as mentioned above. 3. The instruction itself compares its possible operands with the used operands. -4. More validations may be necessary, based on the instruction. +4. More validations might be necessary, based on the instruction. -5. In case of mismatch, a diagnostic is added to the passed diagnostic container. +5. If there is a mismatch, a diagnostic is added to the passed diagnostic container. -### Machine instruction checker +### Machine Instruction Checker -All machine instructions have a precisely defined format which makes the validation based on these formats straightforward. Machine instructions checker operates with machine instructions and their mnemonic codes. +All machine instructions have a precisely defined format which makes validation based on these formats straightforward. The machine instructions checker operates with machine instructions and their mnemonic codes. -The formats are defined by several basic operands such as register or address and state which combination of these operands are acceptable. For example, instruction LR has format RR, which means it accepts only 2 arbitrary (but correct) registers. +The formats are defined by several basic operands such as register and address, and state which combination of these operands are acceptable. For example, the instruction LR has the format RR, which means it accepts only 2 arbitrary (but correct) registers. Operand diagram for the ORG instruction. -### Assembler instruction checker +### Assembler Instruction Checker -Validation of assembler instructions is more complicated as there are no pre-defined formats for them. Each of them is described by custom operand diagrams, which demonstrate the dependencies and relations between operands of a specific instruction. An example of such diagram for the ORG instruction is shown in the picture above. As an addition to the basic operands used for machine instructions, each assembler instruction might have its own operands, called keywords. +Validation of assembler instructions is more complicated as there are no pre-defined formats for them. Each of them is described by custom operand diagrams, which demonstrate the dependencies and relations between operands of a specific instruction. An example of such a diagram for the ORG instruction is shown in the picture above. As an addition to the basic operands used for machine instructions, each assembler instruction might have its own operands, called keywords. -Due to these irregularities, we derive instruction-specific classes from assembler instruction class. Each of them implements the `check` method, to provide the customized checking. +Due to these irregularities, we derive instruction-specific classes from the assembler instruction class. Each of them implements the `check` method, to provide the customized checking. -#### Data Definition checking +#### Data Definition Checking -Data definition is a type of operand in HLASM. It represents data that is assembled directly into object code (see [[HLASM overview]]). +A data definition is a type of operand in HLASM. It represents data that is assembled directly into object code (see [[HLASM overview]]). -Since there are many types of data definition, there is a data definition subcomponent of instruction validation. Whenever any component of the project needs information about a data definition operand, it can use this subcomponent. It analyzes each type of data definition and is able to return its length, attributes and check its validity. +Since there are many types of data definition, there is a data definition subcomponent of instruction validation. Whenever any component of the project needs information about a data definition operand, it can use this subcomponent. It analyzes each type of data definition and is able to return its length and attributes and check its validity. -Each type is different and many have special conditions that must be met to be valid. That is why there is an abstract class `data_def_type_base`, which has 38 implementations — one for each type (including type extensions). The types are then available in a static associative map that maps names of types to their representations. +Each type is different and many have special conditions that must be met to be valid. Hence, there is an abstract class `data_def_type_base`, which has 38 implementations — one for each type (including type extensions). The types are then available in a static associative map that maps names of types to their representations. diff --git a/docs/Analyzer-pages/Instruction-processors.md b/docs/Analyzer-pages/Instruction-processors.md index 1147b8ed8..56adf908f 100644 --- a/docs/Analyzer-pages/Instruction-processors.md +++ b/docs/Analyzer-pages/Instruction-processors.md @@ -1,25 +1,23 @@ -Opencode processor divides processing of HLASM instruction types into several *instruction processors*. Each processor is responsible for processing instructions that belong to one instruction type. +The opencode processor divides processing of HLASM instruction types into several *instruction processors*. Each processor is responsible for processing instructions that belong to one instruction type. -As a format of some instruction kinds can be rather complicated, instruction processors contain *[[Instruction format validators|Instruction format validation]]*. They check the statement to validate the correctness of used operand format as well as the correctness of the actual operand values. +As the format of some instruction types can be rather complicated, instruction processors contain *[[Instruction format validation|instruction format validators]]*. They check the statement to validate the correctness of the used operand format as well as the correctness of the actual operand values. -During the instruction processing, processors work with HLASM *[[expressions]]*. They need to be evaluated to correctly perform the processing. +During instruction processing, processors work with HLASM *[[expressions]]*. They need to be evaluated to correctly perform the processing. There are four specialized instruction processors: -- **Macro IP** -looks up for macro definition in HLASM context tables and calls it. - -- **Assembler and Machine IP** -processes assembler and machine instructions to retain consistency in HLASM context tables. - -- **Conditional assembly IP** -executes conditional assembly instructions. - -See the current list of processed instruction in the following table. - | **IP** | **Processed instructions**| |:-------------------------|--------------------------------------------------:| | **Assembler** | \*SECT, COM, LOCTR, EQU, DC, DS, COPY, EXTRN, ORG| | **Machine** | *Instruction format validation only*| | **Macro** | *ANY*| | **Conditional Assembly** | SET\*, GBL\*, ANOP, ACTR, AGO, AIF, MACRO, MEND| + +- **Macro IP** +Looks for a macro definition in HLASM context tables and calls it. + +- **Assembler and Machine IP** +Processes assembler and machine instructions to retain consistency in HLASM context tables. + +- **Conditional Assembly IP** +Executes conditional assembly instructions. \ No newline at end of file diff --git a/docs/Analyzer-pages/LSP-data-collector.md b/docs/Analyzer-pages/LSP-data-collector.md index fb907fb4d..78005667a 100644 --- a/docs/Analyzer-pages/LSP-data-collector.md +++ b/docs/Analyzer-pages/LSP-data-collector.md @@ -1,26 +1,26 @@ -The data collection is necessary to be able to reply to the LSP requests without the need to re-parse. During the parsing process, a component called *LSP info processor* processes and stores this information. The main goal of this component is to collect as much information as possible to provide meaningful and complex replies to the LSP requests while maintaining the memory and parse-time overhead negligible. +Data collection is necessary to be able to reply to LSP requests without the need to re-parse. During the parsing process, a component called the *LSP info processor* processes and stores this information. The main goal of this component is to collect as much information as possible to provide meaningful and complex replies to LSP requests while maintaining low memory and parse-time overheads. -*LSP info processor* is invoked after each parsed and processed statement to collect and store the information it needs inside the *LSP context* (part of HLASM context). +The *LSP info processor* is invoked after each parsed and processed statement to collect and store the information it needs inside the *LSP context* (part of HLASM context). -### Supported LSP language features +### Supported LSP Language Features The plugin implements four LSP language features: - **hover** -The *hover* feature is invoked whenever a user moves his mouse cursor over a symbol for a short period of time. Typically a box with the information about the selected symbol appears right next to it. +The *hover* feature is invoked whenever a user moves his mouse cursor over a symbol for a short period of time. Typically a box with information about the selected symbol appears right next to it. - **complete** -The *complete* feature may be triggered by a custom set of events such as typing a specific character. The server responds with a list of possible correct options that can be inserted into the particular position. +The *complete* feature can be triggered by a custom set of events such as typing a specific character. The server responds with a list of possible correct options that can be inserted into the particular position. - **go_to_definition** The *go_to_definition* feature is invoked manually from the editor by selecting a symbol and consequently invoking the `go_to_definition` command. The editor “jumps” to the location of the currently selected symbol’s definition by moving the cursor to that location. - **references** -The *references* feature is invoked in a similar manner as the *go_to_definition* feature. But the results of the *references* feature are displayed as a list of all references to the selected symbol in the project, not just the definition of it. +The *references* feature is invoked in a similar manner to the *go_to_definition* feature. The results of the *references* feature are displayed as a list of all references to the selected symbol in the project, not just the definition of it. -### Supported HLASM symbols +### Supported HLASM Symbols -The symbols, on which the user might call mentioned LSP features, are *instruction symbols*, *variable symbols*, *sequence symbols* and *ordinary symbols*. +The symbols for which the user might call the above mentioned LSP features are *instruction symbols*, *variable symbols*, *sequence symbols* and *ordinary symbols*. The *references* and the *go_to_definition* features are very similar for each symbol type and in most cases work as described above. @@ -28,7 +28,7 @@ However, there are two exceptions to the standard behavior of the *go_to_definit On the other hand, the responses to the *hover* and the *complete* features vary for each symbol type and are described in the following tables: -| **Symbol Type** | **Hover contents** | +| **Symbol Type** | **Hover Contents** | |:----------------|:----------------------------------------------------------| |instruction| the type of the instruction, the syntax of its parameters the version (macros only), the documentation| | variable | the type of the variable — bool/string/number | @@ -36,7 +36,7 @@ On the other hand, the responses to the *hover* and the *complete* features vary | ordinary | absolute/relocatable, the value, the values of attributes | | (COPY) | the name of the copy file | -
+
| **Symbol Type** | **Trigger Characters, Events** | **Response** | |:----------------|:--------------------------------------|:----------------------------| diff --git a/docs/Analyzer-pages/Lexer.md b/docs/Analyzer-pages/Lexer.md index 319c28e81..7354d151d 100644 --- a/docs/Analyzer-pages/Lexer.md +++ b/docs/Analyzer-pages/Lexer.md @@ -1,18 +1,18 @@ -Lexer’s responsibility is to read source string and break it into tokens — small pieces of text with special meaning. The most important properties of the lexer: +The lexer's responsibility is to read a source string and break it into tokens — small pieces of text with special meanings. The most important properties of the lexer are: - each token has a location in the source text, - has the ability to check whether all characters are valid in the HLASM source, -- can jump in the source file backward and forward if necessary (for implementation of instructions like AGO and AIF). Because of this, it is not possible to use any standard lexing tool, and the lexer has to be implemented from scratch. +- can jump backward and forward in the source file if necessary (for implementation of instructions like AGO and AIF). Because of this, it is not possible to use a standard lexing tool. -As previously mentioned, we designed a custom lexer for HLASM. We had a number of reasons to do so. HLASM language is complex. It was first introduced several decades ago and, during this long time, the language was subjected to development. Such a long time period has made the HLASM language complex. Also, it contains some aggressive features, for example, `AREAD` or `COPY`, that can alter the source code at parse time. +The latter point necessitated designing a custom lexer for HLASM. We had a number of reasons to do so. The HLASM language is complex. It was first introduced several decades ago and, during this long time, the language was subjected to prolonged development which has resulted in complexity. Also, it contains some features, for example, `AREAD` or `COPY`, that can alter the source code at parse time. -Conventional lexing tools are most often based on regular expressions. As discussed above, there are several difficulties that one must consider while designing lexer for this particular language. A regular expression-based lexer would be too difficult or even impossible to design. +Conventional lexing tools are most often based on regular expressions. As discussed above, there are several difficulties that one must consider while designing a lexer for this particular language. A regular expression-based lexer would be too difficult or even impossible to design. -One could match separate characters from the input and let the parser or semantic analysis deal with some of the described problems. This drastic solution would cost performance, as parsers are usually more performance demanding. +One could match separate characters from the input and let the parser or semantic analysis deal with some of the described problems. This drastic solution would cost in performance, as parsers are usually more performance demanding. -### Source file encodings +### Source File Encodings Source code encodings differ for the used libraries. All strings are encoded in `UTF` as follows: @@ -22,62 +22,62 @@ Source code encodings differ for the used libraries. All strings are encoded in - `UTF-32` ANTLR 4 source code representation. -### Lexer components +### Lexer Components Lexer architecture overview. - Note, there are two `input_source`s and there are many `token`s generated. The *AINSERT buffer* has higher priority. Specifically, if the buffer is non-empty, lexer consumes the input from this buffer. + Note that there are two `input_source`s and there are many `token`s generated. The *AINSERT buffer* has higher priority. Specifically, if the buffer is non-empty, the lexer consumes the input from this buffer. - Beside of the custom lexer, we altered ANTLR’s classes `Token`, `TokenFactory` and `ANTLRInputStream` (see Using antlr in [[Third party libraries]]). The reason was to add custom attributes to token that are vital for later stages of the HLASM code analysis (parsing, semantic analyses, etc.). Lexer functionality is implemented in following classes (see the picture above): + Besides the custom lexer, we altered ANTLR’s classes `Token`, `TokenFactory` and `ANTLRInputStream` (see [[Third party libraries#ANTLR 4 Pipeline|ANTLR 4 pipeline]]). The reason was to add custom attributes to tokens that are vital for later stages of the HLASM code analysis (parsing, semantic analysis etc.). Lexer functionality is implemented in the following classes (see the picture above): - `token` - implements ANTLR’s class `Token` and extends it by adding properties important for location of the token within the input stream. As the LSP protocol works with offsets encoded in `UTF-16` and ANTLR 4 works with `UTF-32` encoding, we add attributes for `UTF-16` positions too. - - Token does not carry the actual text from the source but instead references the position in code (unlike `CommonToken`). Note that the position of a token is vital for further analysis. - - | | | - |:---------------|----------------------------------------------:| - | **IGNORED** | sequence of characters ignored in processing| - | **COMMENT** | commentary statements| - | **EOLLN** | token signalling the end of statement| - | **SPACE** | a sequence of spaces| - | **IDENTIFIER** | symbol identifier| - | **ORDSYMBOL** | Ordinary symbol identifier| - | **PROCESS** | process statement token| - | **NUM** | number| - | **ATTR** | apostrophe that serves as attribute reference| + Implements ANTLR’s class `Token` and extends it by adding properties important for location of the token within the input stream. As the LSP protocol works with offsets encoded in `UTF-16` and ANTLR 4 works with `UTF-32` encoding, we add attributes for `UTF-16` positions too. + + A token does not carry the actual text from the source but instead references the position in code (unlike `CommonToken`). Note that the position of a token is vital for further analysis. + + | | | + |:---------------|-----------------------------------------------:| + | **IGNORED** | sequence of characters ignored in processing| + | **COMMENT** | commentary statements| + | **EOLLN** | token signalling the end of a statement| + | **SPACE** | a sequence of spaces| + | **IDENTIFIER** | symbol identifier| + | **ORDSYMBOL** | Ordinary symbol identifier| + | **PROCESS** | process statement token| + | **NUM** | number| + | **ATTR** |apostrophe that serves as an attribute reference| | **ASTERISK, SLASH, MINUS, PLUS, LT, GT, EQUALS, LPAR, RPAR** | expression tokens| | **DOT, COMMA, APOSTROPHE, AMPERSAND, VERTICAL** | special meaning tokens| - Interesting remark of HLASM language complexity is absence of *string* token (see the table above). Lexer does not generate this token due to the existence of model statements. There, variable symbol can be written anywhere in the statement (even in the middle of the string), what significantly restricts lexer. + A notable component of HLASM language complexity is the absence of a *string* token (see the table above). The lexer does not generate this token due to the existence of model statements. There, a variable symbol can be written anywhere in the statement (even in the middle of the string), which significantly restricts the lexer. - `token_factory` -produces tokens of previously described custom type `token`. + Produces tokens of the previously described custom type `token`. - `input_source` - implements [`ANTLRInputStream`](https://www.antlr.org/api/Java/org/antlr/v4/runtime/ANTLRInputStream.html) which encapsulates source code. This implementation adds API for resetting, rewinding and rewriting input. + Implements [`ANTLRInputStream`](https://www.antlr.org/api/Java/org/antlr/v4/runtime/ANTLRInputStream.html) which encapsulates source code. This implementation adds an API for resetting, rewinding and rewriting input. - Note the usage of `UTF` encodings: `_data` (source code string) and positions/indices in API are in `UTF-32`; `getText` returns `UTF-8` string. + Note the usage of `UTF` encodings: `_data` (source code string) and positions/indices in API are in `UTF-32`; `getText` returns a `UTF-8` string. - `lexer` -is based on ANTLR’s [`TokenSource`](https://www.antlr.org/api/Java/org/antlr/v4/runtime/TokenSource.html) class. As most lexers, it is also, in principle, a finite state machine. The most important difference compared to conventional FSMs and other lexers is added communication interface that connects the parser and the instruction interpreter with the lexer. Unusual is also input rewinding (to support `AREAD`, for example), lexing from parallel sources (`AINSERT` buffer) and some helper API for subsequent processing stages. + Based on ANTLR’s [`TokenSource`](https://www.antlr.org/api/Java/org/antlr/v4/runtime/TokenSource.html) class. As with most lexers, it is also, in principle, a finite state machine. The most important difference compared to conventional FSMs and other lexers is the added communication interface that connects the parser and the instruction interpreter with the lexer. Other unconventional features include input rewinding (to support instructions such as `AREAD`), lexing from parallel sources (`AINSERT` buffer) and a helper API for subsequent processing stages. Important functions: - `nextToken()` -implements main functionality: lexes and emits tokens. Before lexing, the function uses the right input stream (either the source code or `AINSERT` buffer if not empty). After choosing the right input source, the lexer emits token. HLASM introduces *continuation* symbol (an arbitrary non-blank symbol in the continuation column) that breaks one logical line into two or more source-code lines. The end of one logical line indicates `EOLLN` token. Such token is important for further (syntactic and semantic) analysis. +Implements the main functionality: lexing and emitting tokens. Before lexing, the function uses the right input stream (either the source code or `AINSERT` buffer if not empty). After choosing the right input source, the lexer emits a token. HLASM introduces a *continuation* symbol (an arbitrary non-blank symbol in the continuation column) that breaks one logical line into two or more source-code lines. The end of one logical line indicates an `EOLLN` token. This token is important for further (syntactic and semantic) analysis. - `create_token()` -creates token of given type. The lexer’s internal state gives the position of the token. +Creates a token of a given type. The lexer’s internal state gives the position of the token. - `consume()` -consumes character from the input stream and updates lexer’s internal state (used in `create_token()`). +Consumes a character from the input stream and updates the lexer’s internal state (used in `create_token()`). - `lex_tokens()` -lexing of most of the token types. +Lexes of most of the token types. - `lex_begin()` -up to certain column, the input can be ignored (can be set in HLASM). +Up to a certain column, the input can be ignored (can be set in HLASM). - `lex_end()` -lexes everything after continuation symbol. +Lexes everything after a continuation symbol. diff --git a/docs/Analyzer-pages/Parser.md b/docs/Analyzer-pages/Parser.md index d7a7499c9..8daceef09 100644 --- a/docs/Analyzer-pages/Parser.md +++ b/docs/Analyzer-pages/Parser.md @@ -1,26 +1,26 @@ -Parser component takes tokens produced by lexer from token stream and recognizes HLASM statements. The parser inherits from the HLASM recognizer generated by ANTLR (see Antlr in [[Third party libraries]]) to provide further operations. +The parser component takes tokens produced by the lexer from the token stream and recognizes HLASM statements. The parser inherits from the HLASM recognizer generated by ANTLR (see [[Third party libraries]]) to provide further operations. -### Parser workflow +### Parser Workflow -Parser (in code referenced as `parser_impl`) implements opencode statement provider interface. This means that, according to the statement passing in [[Statement providers]], parser needs to parse each statement in *two steps*: +The parser (in code referenced as `parser_impl`) implements the opencode statement provider interface. This means that, according to the statement passing in [[statement providers]], the parser needs to parse each statement in *two steps*: -1. Parser calls rule `label_instr`. It parses label and instruction fields into respective structures. The operand and remark field is stored as a string. +1. The parser calls the rule `label_instr`, which parses the label and instruction fields into their respective structures. The operand and remark field are stored as a string. -2. After retrieving the processing format, the parser selects corresponding rule to parse operands. With the rule, it parses remaining string from the previous step. +2. After retrieving the processing format, the parser selects a corresponding rule to parse operands. With the rule, it parses the remaining string from the previous step. -For the means of parsing remaining strings, parser subcomponent contains actually *two parsers*. The first one parses statement after statement from a source file. The second parses the operands from the string passed by the first parser. +For the means of parsing remaining strings, the parser subcomponent contains two parsers. The first one parses statement after statement from a source file. The second parses the operands from the string passed by the first parser. -To achieve operands having correctly set ranges prior to the source file rather than to the passed string, the parser uses *Range provider*. It helps the second parser to have ranges of reparsed operands consistent with the ranges of other fields. It is initialized with the begin location of operand field in the statement and all ranges furtherly created in parsing are adjusted to have correct boundaries. +To ensure the operands have correctly set ranges prior to the source file rather than to the passed string, the parser uses a *range provider*. It helps the second parser to have ranges of reparsed operands consistent with the ranges of other fields. It is initialized with the begin location of the operand field in the statement and all further ranges created during parsing are adjusted to have correct boundaries. -### Statement structure +### Statement Structure -During parsing of a statement, several structures are created and collected. They are `label_si, instruction_si, operand_si, remark_si` (*si* as semantic information). They are collected with `collector` and built into `statement_si` structure. +During the parsing of a statement, several structures are created and collected. They are `label_si, instruction_si, operand_si, remark_si` (*si* = semantic information). They are collected with `collector` and built into the structure `statement_si`. -Label and instruction structures can contain either identifier of a symbol or — when in model statement — concatenation of strings and variable symbols. Remark field is simply just a string as it serves as a commentary statement field. Operand field contains list of operands used in the statement. They can be of several formats. +Label and instruction structures can contain either an identifier of a symbol or — when in a model statement — a concatenation of strings and variable symbols. The remark field is a string as it serves as a commentary statement field. The operand field contains a list of operands used in the statement. They can be of several formats. -#### Operand formats +#### Operand Formats -The statement processor can request parser to retrieve statements with this operand formats: +The statement processor can request the parser to retrieve statements with these operand formats: - *machine/assembler/conditional assembly/macro* – instruction operands. Each type of instruction has its specific format. @@ -28,11 +28,11 @@ The statement processor can request parser to retrieve statements with this oper - *deferred* – operands with not yet known format. Stored as a string. -Each operand format has corresponding *operand structure*. They all inherit abstract `operand` and each have various children for different kinds of the operand format (see the picture below). Assembler and Machine operand structures inherit from *Evaluable operand*. It is a common structure for operand objects that are composed of resolvable objects (see *Symbol dependency tables* in [[HLASM context tables]). +Each operand format has a corresponding *operand structure*. They all inherit the abstract `operand` and each have various children for different kinds of operand format (see the picture below). Assembler and Machine operand structures inherit from the *evaluable operand*. It is a common structure for operand objects that are composed of resolvable objects (see [[HLASM context tables#Symbol Dependency Tables|symbol dependency tables]].) Operand structure inheritance. -#### Concatenation structures +#### Concatenation Structures A model statement is a statement that contains a variable symbol in any of the statement fields. This variable symbol is further to be substituted by an arbitrary string and then re-parsed. Hence, the field is formed by concatenating individual sub-fields, which are represented by specialized structures. The concatenation can be further evaluated to produce the final string. @@ -42,23 +42,23 @@ The helper structures are: - `var_sym` – a substitutable variable symbol. -- `dot`, `equals` – characters with special meaning. +- `dot`, `equals` – characters with a special meaning. - `sublist` – a recursive concatenation enclosed in parentheses. -### Grammar implementation +### Grammar Implementation -Grammar rules describing parser are separated into several files (see the [[Grammar visualization]]): +Grammar rules describing the parser are separated into several files (see the [[grammar visualization]]): - `hlasm_parser.g4` – Top level rules are stored here. - `lookahead_rules.g4` – Rules for lookahead mode. -- `label_field_rules.g4` – Rules taking care of label field of statement. +- `label_field_rules.g4` – Rules taking care of the label field of statements. -- `instruction_field_rules.g4` – Rules taking care of instruction field of statement. +- `instruction_field_rules.g4` – Rules taking care of the instruction field of statements. -- `operand_field_rules.g4` – Rules taking care of operand field of statement. +- `operand_field_rules.g4` – Rules taking care of the operand field of statements. - `macro/machine/assembler/ca/model/deferred_operand_rules.g4` – Various operand field rules. diff --git a/docs/Analyzer-pages/Processing-manager.md b/docs/Analyzer-pages/Processing-manager.md index 3bb9d9a9f..0718aaad1 100644 --- a/docs/Analyzer-pages/Processing-manager.md +++ b/docs/Analyzer-pages/Processing-manager.md @@ -1,18 +1,18 @@ -Processing manager is a major component in the processing of a HLASM source file. It decides which stream of statements is to be processed and how statements are going to be processed. It contains components responsible for instruction interpretation as well as instruction format validation. +The processing manager is a major component in the processing of a HLASM source file. It decides which stream of statements is to be processed and how statements are to be processed. It contains components responsible for instruction interpretation as well as instruction format validation. -Nature of the HLASM source interpretation requires that various parsers and code generators interlace to implement semantics of all instructions. Processing manager performs this by maintaining 2 sets of active generators (“providers”) and consumers (parsers, “processors”) and executing them on demand, in an interleaved manner. +The nature of HLASM source interpretation requires that various parsers and code generators interlace to implement semantics of all instructions. The processing manager performs this by maintaining 2 sets of active generators (“providers”) and consumers (parsers, “processors”) and executing them on demand, in an interleaved manner. -The architecture of Processing manager +Architecture of the processing manager ### Overview -Following objects passed by analyzer serve as an input for the processing manager: +The following objects passed by the analyzer serve as an input for the processing manager: -- *Parser* that provides statements from the processed file. Further on in this chapter, we will refer to the parser as to the *Opencode statement provider*. +- *Parser* which provides statements from the processed file. Further on in this chapter, we will refer to the parser as the *opencode statement provider*. -- *HLASM context tables* that hold current state of the parsed code. +- *HLASM context tables* that hold the current state of the parsed code. -- *Library data* defining the initial state of the manager (whether the file is copy member, macro definition, etc.; see *Initial state of manager*). +- *Library data* defining the initial state of the manager (whether the file is a copy member, macro definition, etc. For more information, see [[#Initial State of Manager]] below.) - *Name* of the processed file. @@ -20,44 +20,44 @@ Following objects passed by analyzer serve as an input for the processing manage - *Statement fields parser* for parsing yet unresolved statement fields. -- *Processing tracer* for tracing processed statements (see [[Macro tracer]]). +- *Processing tracer* for tracing processed statements (see [[macro tracer]]). ### Composition As the processing of the HLASM source file is rather complicated, we define a complex set of abstraction objects over the complicated assembling of HLASM language: -- **[[Statement provider|Statement providers]]** -Statement provider is able to produce `statement` structures. Its functionality is to provide statements from its various statement sources (e.g., a source file for Opencode provider, a macro/copy definition for Macro/Copy provider). +- **[[Statement providers]]** +A statement provider is able to produce `statement` structures. Its functionality is to provide statements from its various statement sources (e.g. a source file for the opencode provider, a macro/copy definition for the macro/copy provider). -- **[[Statement processor|Statement processors]]** -Statement processor is an object that takes statement structures from a provider. Then, it performs a specific action with the acquired statement; namely, stores it into macro/copy definition (*Macro/Copy processor*) or looks for sequence symbol (*Lookahed processor*) or performs contained instruction (*Opencode processor*). +- **[[Statement processors]]** +A statement processor is an object that takes statement structures from a provider. Then, it performs a specific action with the acquired statement; namely, stores it into a macro/copy definition (*macro/copy processor*), looks for a sequence symbol (*lookahead processor*), or performs a contained instruction (*opencode processor*). - **[[Instruction processors]]** Instruction processors help opencode statement processor in performing actions with the instructions contained in a statement. Each one of four instruction processors (Macro, Assembler, Machine and Conditional Assembly IP) processes separate sub-set of a broad set of HLASM instructions. -- **[[Instruction format validators|Instruction format validation]]** -Instruction format validators are used by instruction processors. As an input, they take operands of an instruction and serve to validate their correctness. +- **[[Instruction format validation|Instruction format validators]]** +Instruction format validators are used by instruction processors. As an input, they take operands of an instruction and validate their correctness. -Processing manager encapsulates above mentioned objects and determines which processor/provider will be used next. +The processing manager encapsulates above mentioned objects and determines which processor/provider is used next. -### The main loop of manager +### Main Loop of Manager -Processing manager contains an array of active statement processors and an array of active statement providers. It is in the control of which processor–provider pair currently operates. +The processing manager contains an array of active statement processors and an array of active statement providers. It is in the control of which processor–provider pair currently operates. -The main processing loop works with the currently operating processor and provider. In the loop body, statement provider provides next statement for statement processor that processes it accordingly. The loop breaks when all processors finish work and none of them is active. +The main processing loop works with the currently operating processor and provider. In the loop body, the statement provider provides the next statement for the statement processor, which processes it accordingly. The loop breaks when all processors finish working and none of them is active. -When provider ends its statement stream or processor finishes its work, it is replaced with another. The following rules apply: +When the provider ends its statement stream or the processor finishes its work, it is replaced with another. The following rules apply: 1. When a processor finishes its work, the next processor is selected from the array. -2. When a provider finishes — before the next provider is selected from the array — manager checks whether it triggers the termination of the current processor as well (see *terminal condition* in the table at the end of [[Statement processors]]). If true, perform rule 1, otherwise the current processor stays active. +2. When a provider finishes — before the next provider is selected from the array — the manager checks whether it triggers the termination of the current processor as well (see *terminal condition* in the table at the end of [[statement processors]]). If true, perform rule 1, otherwise the current processor stays active. -### Initial state of manager +### Initial State of Manager -During initialization, the manager sets various statement providers and processors as a default. It is very important as it determines the way how the source is processed. The manager determines this from *library data* passed by analyzer. +During initialization, the manager sets various statement providers and processors as a default. This is very important as it determines the way how the source is processed. The manager determines this from *library data* passed by the analyzer. -Library data contain a file name and an enumeration indicating a kind of the file that is being parsed — *processing kind*. +Library data contains a file name and an enumeration indicating the kind of the file that is being parsed — the *processing kind*. -*Ordinary* processing kind states that the file being processed is the main source file (in HLASM called open-code). It is the first file to be processed. With this information, manager initializes all statement providers and *only* opencode processor. This initial state is applied when analyzer has owner semantics. +The *ordinary* processing kind states that the file being processed is the main source file (in HLASM called opencode). It is the first file to be processed. With this information, the manager initializes all statement providers and *only* the opencode processor. This initial state is applied when the analyzer has owner semantics. -*Copy* and *Macro* processing kinds state that manager will process source code that contains copy or macro definition respectively. Hence, *only* copy definition processor or macro definition processor is initialized. Also, all statement providers but the macro statement provider are initialized as no macros will be visited nor needed as a statement source when processing new source code. The library data is passed when analyzer has reference semantics. +The *copy* and *macro* processing kinds state that the manager processes source code that contains copy or macro definition respectively. Hence, *only* a copy definition processor or a macro definition processor is initialized. Also, all statement providers but the macro statement provider are initialized, as no macros are visited or needed as a statement source when processing new source code. The library data is passed when the analyzer has reference semantics. diff --git a/docs/Analyzer-pages/Statement-processors.md b/docs/Analyzer-pages/Statement-processors.md index d54ccb052..690598994 100644 --- a/docs/Analyzer-pages/Statement-processors.md +++ b/docs/Analyzer-pages/Statement-processors.md @@ -1,12 +1,12 @@ -The motivation for distinguishing different statement processors was the complexity of HLASM language. There are many cases when the same statements require different processing under different circumstances (e.g. COPY instruction in macro is handled differently than in opencode, or lookahead mode can accept statements that would fail when processed by ordinary processing). +The motivation for distinguishing different statement processors is the complexity of HLASM language. There are many cases when the same statements require different processing under different circumstances (e.g. a COPY instruction in a macro is handled differently to in opencode, or lookahead mode can accept statements that would fail when processed by ordinary processing). -During processing, statement processing kinds can be nested. Hence, statement processors are dynamically assigned to the manager when needed and removed from it when they finish. This happens when the processor encounters specific statement (e.g. statement with a special instruction or non previously defined sequence symbol, see the table at the end of this page). For this purpose, they use *processing state listener* interface (implemented by processing manager) that tells the manager to change the current processor. +During processing, statement processing kinds can be nested. Hence, statement processors are dynamically assigned to the manager when needed and removed from it when they finish. This happens when the processor encounters a specific statement (e.g. a statement with a special instruction or previously undefined sequence symbol; see the table at the end of this page for further information). For this purpose, they use the *processing state listener* interface (implemented by the processing manager), which tells the manager to change the current processor. -#### Statement structure +#### Statement Structure -Statement consists of *statement fields* — *label field*, *instruction field*, *operands field*, *remark field*. It is used by statement processors and produced by statement providers. +A statement consists of *statement fields* — *label field*, *instruction field*, *operands field*, and *remark field*. It is used by statement processors and produced by statement providers. -The abstract class *HLASM statement* is the ancestor for all statement related classes. Then, there are abstract classes *deferred statement* and *resolved statement*. Deferred statement has its operand field stored in uresolved — deferred — format (in code stored as string). This statement is created when actual instruction is not yet known prior to the statement creation (see the example below). Resolved statements are complementary to the deferred statements as their instruction — as well as operand format — is known prior to the statement creation. +The abstract class *HLASM statement* is the ancestor for all statement related classes. Then, there are the abstract classes *deferred statement* and *resolved statement*. Deferred statement has its operand field stored in uresolved — deferred — format (in code stored as string). This statement is created when the actual instruction is not yet known before the statement is created (see the example below). Resolved statements are complementary to the deferred statements as their instruction and operand format are known before the statement is created. *VALUE OF INSTRUCTION IN DEFERRED STATEMENT IS PARAMETER OF MACRO MAC MACRO @@ -15,33 +15,33 @@ The abstract class *HLASM statement* is the ancestor for all statement related c MEND -#### Copy and Macro definition Processors +#### Copy and Macro Definition Processors -Both of these statement processors handle statement collecting, forming definition structure and storing it into HLASM context tables. They come into effect when COPY instruction or macro definition is encountered in the source code. +Both of these statement processors handle statement collecting, forming definition structure and storing it into HLASM context tables. They come into effect when a COPY instruction or macro definition is encountered in the source code. -The statements collected inside copy or macro definitions are mainly deferred statements. That is because variable symbols can not be resolved inside the definition and because HLASM allows instruction aliasing (renaming instructions). Therefore, during the processing of a definition, as the instruction field is parsed, the format of its operands is unknown. It is fully deduced when the definition is handed over to the provider and processed by the opencode processor. +The statements collected inside copy or macro definitions are mainly deferred statements. That is because variable symbols can not be resolved inside the definition, and because HLASM allows instruction aliasing (renaming instructions). Therefore, during the processing of a definition, as the instruction field is parsed, the format of its operands is unknown. It is fully deduced when the definition is handed over to the provider and processed by the opencode processor. -However, some statements in the macro and copy definitions forbid aliasing and the operand format can be deduced immediately (e.g. conditional assembly instructions in macro definition). This leads to the processors necessity to ask the provider to retrieve the statement with correct format – accordingly to the deduced one based on the instruction being provided. +However, some statements in the macro and copy definitions forbid aliasing and the operand format can be deduced immediately (e.g. conditional assembly instructions in macro definition). This leads to the processors necessity to ask the provider to retrieve the statement with the correct format – which is determined based on the instruction being provided. #### Lookahead Processor -Lookahead processor is activated when currently processed conditional assembly statement requires a value of undefined ordinary or sequence symbol. It looks through the succeeding statements and finishes when the target symbol is found or when all statement providers finish. Then the processing continues from where the lookahead started. +The lookahead processor is activated when a currently processed conditional assembly statement requires a value of an undefined ordinary or sequence symbol. It looks through the succeeding statements and finishes when the target symbol is found or when all statement providers finish. Then the processing continues from where the lookahead started. #### Opencode Processor -The functionality of Opencode processor (`ordinary_processor` class) can be described as follows: +The functionality of the opencode processor (`ordinary_processor` class) can be described as follows: 1. If a model statement is encountered, it substitutes the variable symbols and resolves the statement. -2. It checks statement for validity. +2. It checks the statement for validity. 3. It performs instruction by updating HLASM context tables with the help of *instruction processors*. -4. It is passed *processing tracer* by the manager. Each time a statement is processed by opencode processor, it triggers processing tracer. The tracer serves as a listener pattern used by the *[[Macro tracer]]*. +4. It is passed *processing tracer* by the manager. Each time a statement is processed by the opencode processor, it triggers the processing tracer. The tracer serves as a listener pattern used by the *[[macro tracer]]*. -In the table below, we can see that it does not have a field that starts Opencode processor. That is because this processor is set as a default by the manager. Further, Copy processor does not finish itself during its work as it can only be finished by its *terminal condition*. +In the table below, we can see that there is no field that starts the opencode processor. That is because this processor is set as a default by the manager. Additionally, the copy processor does not finish itself during its work as it can only be finished by its *terminal condition*. -Terminal condition can be triggered by a finishing provider. It indicates that the processor needs to finish its work when a specific provider exhausted its statement stream. +A terminal condition can be triggered by a finishing provider. It indicates that the processor needs to finish its work when a specific provider exhausted its statement stream. | **Processor** | END instruction | COPY instruction | MACRO instruction | MEND instruction | undefined symbol | |:--------------|:---:|:---:|:---:|:---:|:---:| diff --git a/docs/Analyzer-pages/Statement-providers.md b/docs/Analyzer-pages/Statement-providers.md index 32cab32b3..0e06b8566 100644 --- a/docs/Analyzer-pages/Statement-providers.md +++ b/docs/Analyzer-pages/Statement-providers.md @@ -1,12 +1,12 @@ -Due to the macro definitions, copy file includes and statement generation, it is difficult to state which statement should be processed next. For this reason, we define abstraction over various sources of statements called *statement providers*. +Due to the macro definitions, copy file includes and statement generation, it is difficult to determine which statement to process next. For this reason, we define abstraction over various sources of statements called *statement providers*. -In contrary to statement processors, statement providers are ordered based on the priority (lower index, greater priority): +Unlike statement processors, statement providers are ordered based on the priority (lower index, greater priority): 1. Macro definition statement provider 2. Copy definition statement provider 3. Opencode statement provider -In each iteration of [[processing manager]], providers are asked whether they have statements to provide based on the ordering. That is because after each iteration, a provider with greater priority than the previously used one can be activated. +In each iteration of the [[processing manager]], providers are asked whether they have statements to provide based on the ordering. That is because after each iteration, a provider with greater priority than the previously used one can be activated. | **Processors** | Macro provider ends | Copy provider ends | Opencode provider ends | |:---------------|:---:|:---:|:---:| @@ -15,38 +15,38 @@ In each iteration of [[processing manager]], providers are asked whether they ha | **Macro** | finish | continue | finish | | **Lookahead** | finish | continue | finish | -For the main loop to be correctly defined, the end of opencode provider triggers terminal condition for all statement processors. Hence, when opencode provider finishes then all the processors finish as well and the processing ends (see the table above). +For the main loop to be correctly defined, the end of the opencode provider triggers the terminal condition for all statement processors. Hence, when the opencode provider finishes, all the processors finish as well and the processing ends (see the table above). -#### Statement passing +#### Statement Passing -In HLASM language, it is difficult to parse statements into one common structure due to its *representational ambiguity*; the major difference between operand fields of different instruction formats. Moreover, when parsing statements, the instruction format can be yet unknown. Therefore, operand fields are stored as strings. This means that during statement passing when instruction format is deduced, the provider has responsibility to produce correct statement format. The following steps are applied in the statement passing (see also the picture below): +In HLASM language, it is difficult to parse statements into one common structure due to its *representational ambiguity*; the major difference between operand fields of different instruction formats. Moreover, when parsing statements, the instruction format can be yet unknown. Therefore, operand fields are stored as strings. This means that during statement passing when instruction format is deduced, the provider has the responsibility to produce the correct statement format. The following steps are applied in the statement passing (see also the picture below): -1. Provider retrieves the instruction field part of the statement. +1. The provider retrieves the instruction field part of the statement. -2. Provider calls processor method `get_processing_status` with instruction field as a parameter. +2. The provider calls the processor method `get_processing_status` with the instruction field as a parameter. -3. Return value of the call determines the required format of the operand field for the processor; the whole statemement can be retrieved correctly. +3. The return value of the call determines the required format of the operand field for the processor; the whole statement can be retrieved correctly. -4. Provider returns statement with correct format to the processor. +4. The provider returns the correctly formatted statement to the processor. -The process of statement passing. +Process of statement passing. -#### Copy and Macro definition Provider +#### Copy and Macro Definition Provider -These providers are activated when COPY instruction copies a file into the source code or when a macro is visited, respectively. They provide a sequence of statements to an arbitrary processor until all statements from the copy or macro definition are provided. After that, if there is no nested invocation, a provider with lower priority is selected. +These providers are activated when a COPY instruction copies a file into the source code or when a macro is visited, respectively. They provide a sequence of statements to an arbitrary processor until all statements from the copy or macro definition are provided. After that, if there is no nested invocation, a provider with a lower priority is selected. -To avoid infinite macro recursion, HLASM language itself has a restriction for the level of nested macro invocation depending on the complexity of nested macros. We set the limit to 100 as it suffices in all tested code. +To avoid infinite macro recursion, the HLASM language itself has a restriction for the level of nested macro invocation depending on the complexity of nested macros. We set the limit to 100 as it suffices in all tested code. For COPY members, recursion is forbidden. #### Opencode Provider -Opencode provider is active as long as there are statements in the source file. It retrieves statements from the source code with help of [[lexer]] and [[parser]]. +The opencode provider is active as long as there are statements in the source file. It retrieves statements from the source code with the help of the [[lexer]] and [[parser]]. -### Statement field parser +### Statement Field Parser -Statement field parser is an interface passed to the statement providers by processing manager. It is implemented by the [[parser]]. +The statement field parser is an interface passed to the statement providers by the processing manager. It is implemented by the [[parser]]. -At first, it is used during statement passing. In some cases provider is requested a specific format of a string-stored statement. The string is re-parsed with the according format. Then the field is returned back to the statement provider. +At first, it is used during statement passing. In some cases the provider is requested to provide a specific format of a string-stored statement. The string is re-parsed with the according format. Then the field is returned back to the statement provider. -Another use of field parser is in opencode processor as model statements are resolved there. After variable symbol substitution, the resulting string field is re-parsed with field parser. +Another use of the field parser is in the opencode processor as model statements are resolved there. After variable symbol substitution, the resulting string field is re-parsed with the field parser. diff --git a/docs/Architecture-overview.md b/docs/Architecture-overview.md index cdb6047fe..8936b69fb 100644 --- a/docs/Architecture-overview.md +++ b/docs/Architecture-overview.md @@ -2,15 +2,15 @@ The architecture is based on the way modern code editors and IDEs are extended to support additional languages. We chose to implement [Language Server Protocol](https://microsoft.github.io/language-server-protocol/) (LSP), which is supported by a majority of contemporary editors. -In LSP, the two parties that communicate are called a *client* and a *language server*. A simple example is displayed in the picture above. The client runs as a part of an editor. The language server may be a standalone application that is connected to the client by a pipe or TCP. All language-specific user actions (for example the Go to definition command) are transformed into standard LSP messages and are sent to the language server. The language server then analyzes the source code and sends back a response, which is then interpreted and presented to the user in editor-specific way. This architecture makes possible to only have one LSP client implementation for each code editor, which may be reused by all programming languages. And vice versa, every language server may be easily used by any editor that has an implementation of the LSP client. +In LSP, the two parties that communicate are called a *client* and a *language server*. A simple example is displayed in the picture above. The client runs as a part of an editor. The language server may be a standalone application that is connected to the client by a pipe or TCP. All language-specific user actions (for example the Go to definition command) are transformed into standard LSP messages and are sent to the language server. The language server then analyzes the source code and sends back a response, which is then interpreted and presented to the user in an editor-specific way. This architecture makes it possible to only have one LSP client implementation for each code editor, which can be reused by all programming languages. And vice versa, every language server can be easily used by any editor that has an implementation of the LSP client. -To add support for HLASM, we have implemented the LSP language server and written a lightweight extension to an editor, which uses an already existing implementation of the LSP client. To implement source code highlighting, we had to extend the protocol with a new notification. This notification is used for transferring information from the language server to the VS Code client, which is extended to highlight code in editor based on the incoming custom notifications. +To add support for HLASM, we have implemented the LSP language server and written a lightweight extension to an editor, which uses an already existing implementation of the LSP client. To implement source code highlighting, we had to extend the protocol with a new notification. This notification is used for transferring information from the language server to the VS Code client, which is extended to highlight code in the editor based on the incoming custom notifications. -In this chapter, we further decompose the project into smaller components and describe their relations. The two main components are the parser library and the language server — an executable application that uses the parser library. An overview of the architecture is in the picture below. The architecture of whole project is shown in [[Architecture visualization]]. +In this chapter, we further decompose the project into smaller components and describe their relationships. The two main components are the parser library and the language server — an executable application that uses the parser library. An overview of the architecture is in the picture below. The architecture of the whole project is shown in [[Architecture visualization]]. -The architecture of HLASM Plugin +Architecture of the HLASM Plugin -Language server component +Language Server Component ------------------------- The responsibility of the language server component is to maintain the LSP session, convert incoming JSON messages and use the parser library to execute them. The functionality includes: @@ -23,30 +23,30 @@ The responsibility of the language server component is to maintain the LSP sessi - asynchronous request handling: when a user makes several consecutive changes to a source code, parsing on every change is not needed -Parser library component +Parser Library Component ------------------------ Parser library is the core of the project — it encapsulates the analyzer, which provides all parsing capabilities, and workspace manager, which keeps track of open files in the editor and manages their dependencies. It has to keep the representation of workspaces and files in the parser library exactly the same as the user sees in the editor. It also starts the analyzer when needed, manages workspace configuration and provides external macro and copy libraries to analyzer. -### Parser library API +### Parser Library API The parser library API is based on LSP — every relevant request and notification has a corresponding method in the parsing library. At first, the API implements the LSP notifications that ensure the editor state synchronization. Apart from working with individual files, the LSP also supports workspaces. A workspace is basically just a folder that contains related source codes. The LSP also supports working with multiple workspaces at the same time. We use it when searching for dependencies of HLASM source codes (macros, and copy files). -The parser library has the exact contents of all files in open workspaces. To achieve that, there is a file watcher running in the LSP client that notifies the server when any of the HLASM source files is changed outside of editor. For example, when a user deletes an external macro file, the parser library should react by reporting that it cannot find the macro. +The parser library has the exact contents of all files in open workspaces. To achieve that, there is a file watcher running in the LSP client that notifies the server when any of the HLASM source files is changed outside of editor. For example, when a user deletes an external macro file, the parser library reacts by reporting that it cannot find the macro. -The list of necessary editor state synchronization notifications follows: +The list of necessary editor state synchronization notifications is as follows: -- Text synchronization notifications (`didOpen`, `didChange`, `didClose`) that inform the library about files that are currently open in the editor and their exact contents. +- Text synchronization notifications (`didOpen`, `didChange`, `didClose`), which inform the library about files that are currently open in the editor and their exact contents. -- `DidChangeWorkspaceFolders` notification that informs the library when a workspace has been opened or closed. +- `DidChangeWorkspaceFolders`, which informs the library when a workspace has been opened or closed. -- `DidChangeWatchedFiles` notification +- `DidChangeWatchedFiles` Next, the API implements the requests and notifications that provide the parsing results, specifically: -- `publishDiagnostics` notification. A diagnostic is used to indicate a problem with source files, such as a compiler error or a warning. The parser library provides a callback to let the language server know that diagnostics have changed. +- `publishDiagnostics`. A diagnostic is used to indicate a problem with source files, such as a compiler error or a warning. The parser library provides a callback to let the language server know that diagnostics have changed. - Callback for highlighting information provision. @@ -54,37 +54,37 @@ Next, the API implements the requests and notifications that provide the parsing ### Analyzer -The analyzer processes a single HLASM file. It takes the contents of a source file by common string and a callback that can parse external files with specified name. It provides a list of diagnostics linked to the file, highlighting, list of symbol definitions, etc. +The analyzer processes a single HLASM file. It takes the contents of a source file by common string and a callback that can parse external files with a specified name. It provides a list of diagnostics linked to the file, highlighting, list of symbol definitions, and more. The analysis of HLASM code includes: - recognition of statements and their parts (lexing and parsing) -- interpretation of instructions that should be executed in compile time +- interpretation of instructions that are executed in compile time - reporting of problems with the source by producing LSP diagnostics - providing highlighting and LSP information -A HLASM source files have dependencies — other files that define macros or files brought in by the COPY instruction. Dependencies are only discovered during the processing of files, so it is not possible to provide the files with macro definitions beforehand. The analyzer gets a callback that finds a file with specified name, parse its contents and return it as a list of parsed statements. +HLASM source files have dependencies — other files that define macros or files brought in by the COPY instruction. Dependencies are only discovered when files are processed, so it is not possible to provide the files with macro definitions beforehand. The analyzer gets a callback that finds a file with a specified name, parses its contents and returns it as a list of parsed statements. -Client-side VS Code extension +Client-Side VS Code Extension ----------------------------- The VS Code extension component ensures seamless integration with the editor. Its functions are: - to start the HLASM language server and the LSP client that comes with VS Code, and to create a connection between them. -- to implement extension of the LSP protocol for enabling server-side highlighting. The extended client parses the information from the server and uses VS Code API to actually color the text in the editor. +- to implement an extension of the LSP protocol for enabling server-side highlighting. The extended client parses the information from the server and uses the VS Code API to color the text in the editor. -- to implement support for editing lines with continuations — when the user types something in front of the continuation character, it should stay in place. +- to implement support for editing lines with continuations — when the user types something in front of the continuation character, it remains in place. -Macro tracer +Macro Tracer ------------ -The macro tracer gives a possibility to trace the compilation of HLASM source code in a way similar to common debugging. This is the reason why we chose to implement support for the [Debug Adapter Protocol](https://microsoft.github.io/debug-adapter-protocol/) (DAP). It is very similar to LSP, so most of the code implementing LSP in the language server component may be reused for both protocols. +The macro tracer allows you to trace the compilation of HLASM source code in a way similar to common debugging. This is why we chose to implement support for the [Debug Adapter Protocol](https://microsoft.github.io/debug-adapter-protocol/) (DAP). It is very similar to LSP, so most of the code implementing LSP in the language server component can be reused for both protocols. -The language server component communicates with the macro tracer component in the parser library. Its API mirrors the requests and events of DAP. +The language server component communicates with the macro tracer component in the parser library. Its API mirrors the requests and events of the DAP. The main responsibilities of the macro tracer include: @@ -92,8 +92,8 @@ The main responsibilities of the macro tracer include: - `SetBreakpoints`, which transfers the information about breakpoints that the user has placed in the code -- `Threads`, `StackTrace`, `Scopes` and `Variables` requests to allow the DAP client to retrieve information about the current processing stack (stack of nested macros and copy instructions), available variable symbols and their values +- `Threads`, `StackTrace`, `Scopes` and `Variables` requests, which allow the DAP client to retrieve information about the current processing stack (stack of nested macros and copy instructions), available variable symbols and their values -- `stopped`, `exited` and `terminated` events to let the DAP client know about state of traced source code +- `stopped`, `exited` and `terminated` events, which let the DAP client know about state of traced source code -The macro tracer communicates with the workspace manager to retrieve the content of the traced files. It analyzes the source file in a separate thread and gets callbacks from the analyzer before each statement is processed. In the callback, the tracer puts the thread to sleep and waits for user interaction. During this time, it is possible to retrieve all variable and stack information from the processing to display it to the user. +The macro tracer communicates with the workspace manager to retrieve the content of the traced files. It analyzes the source file in a separate thread and receives callbacks from the analyzer before each statement is processed. In the callback, the tracer puts the thread to sleep and waits for user interaction. During this time, it is possible to retrieve all variable and stack information from the processing and display it to the user. diff --git a/docs/Build-instructions.md b/docs/Build-instructions.md index 16ad81548..3397bf10b 100644 --- a/docs/Build-instructions.md +++ b/docs/Build-instructions.md @@ -5,7 +5,7 @@ The result of a build is the Visual Studio Code extension packed into a VSIX fil Prerequisites ------------- -In order to build the project on any platform, following software needs to be installed: +In order to build the project on any platform, the following software needs to be installed: - CMake 3.10 or higher @@ -15,16 +15,16 @@ In order to build the project on any platform, following software needs to be in - Maven (the build system of ANTLR) -- Git (needed to download sources of the third party software) +- Git (to download sources of the third party software) -- npm (for compiling the typescript parts of the VS Code extension) +- npm (to compile the typescript parts of the VS Code extension) Windows ------- -On windows, we use Visual Studio Community 2019. We also have VS configurations for building and testing the project in WSL. +On Windows, we use Visual Studio Community 2019. We also have VS configurations for building and testing the project in WSL. -It is also possible to build the project from command line: +It is also possible to build the project from the command line: mkdir build && cd build cmake ../ @@ -32,18 +32,17 @@ It is also possible to build the project from command line: Linux ----- - -In addition to the prerequisites listed in \[prereq\], linux build has two more prerequisites: +In addition to the prerequisites listed in \[prereq\], the Linux build has two more prerequisites: - pkg-config - UUID library -We build the project for Ubuntu 18.04 and for the Alpine linux. +We build the project for Ubuntu 18.04 and for the Alpine Linux. ### Ubuntu -On Ubuntu 18.04 the following commands install all prerequisites and then build the project into `build` folder: +On Ubuntu 18.04 the following commands install all prerequisites and then build the project into the `build` folder: apt update && sudo apt install cmake g++-8 uuid-dev npm default-jdk pkg-config maven @@ -51,9 +50,9 @@ On Ubuntu 18.04 the following commands install all prerequisites and then build cmake -DCMAKE_C_COMPILER=gcc-8 -DCMAKE_CXX_COMPILER=g++-8 ../ cmake --build . -### Alpine linux +### Alpine Linux -The build works on Alpine linux version 3.10. The following commands install all prerequisites and then build the project into `build` folder: +The build works on Alpine linux version 3.10. The following commands install all prerequisites and then build the project into the `build` folder: apk update && apk add linux-headers git g++ cmake util-linux-dev npm ninja pkgconfig openjdk8 maven @@ -64,7 +63,7 @@ The build works on Alpine linux version 3.10. The following commands install all Mac OS ------ -We have only built the project on MacOS 10.14. In order to successfully build, we require LLVM 8 (it can be installed by using homebrew). +We have only built the project on MacOS 10.14. In order to successfully build, first install LLVM 8 using homebrew. The project can be built with a snippet like this: @@ -75,10 +74,10 @@ The project can be built with a snippet like this: For instance, a possible path to LLVM is `/usr/local/opt/llvm8` -Running tests +Running Tests ------------- -Once the project is built, there are two test executables in the `bin/` subdirectory of the build folder: `library_test` and `server_test`. Just run both of them to verify the build. +Once the project is built, there are two test executables in the `bin/` subdirectory of the build folder: `library_test` and `server_test`. Run both of them to verify the build. Installation ------------ @@ -97,4 +96,4 @@ The built VSIX can be manually installed into VS Code by following these steps: Alternatively, the plugin can be installed with following command: - code --install-extension \ No newline at end of file + code --install-extension diff --git a/docs/Configuration-of-libraries.md b/docs/Configuration-of-libraries.md new file mode 100644 index 000000000..fab16bf8c --- /dev/null +++ b/docs/Configuration-of-libraries.md @@ -0,0 +1,49 @@ +The parser library approaches the dependency resolution in a way similar to the mainframe. On a mainframe, you define the locations of your dependencies in a JCL file (more on JCL [here](https://www.ibm.com/support/knowledgecenter/zosbasics/com.ibm.zos.zjcl/zjclc_basicjclconcepts.htm)). As the user might want to include a large number of dependencies for multiple open codes, the source code management tool [CA Endevor](https://techdocs.broadcom.com/content/broadcom/techdocs/us/en/ca-mainframe-software/devops/ca-endevor-software-change-manager/18-0.html) groups these dependencies into so-called *processor groups*. Then, the user assigns a processor group to the open code and Endevor resolves the dependencies. + +To provide a similar experience with local files, the parser library simulates this behavior. If the user wants to include dependencies in his project, he has to define 2 configuration files inside his workspace: *pgm\_conf.json* and *proc\_grps.json*. The workspace component of the parser library then processes the configurations, retrieving their values upon initialization. Moreover, each time a save command is issued on any configuration file, the configuration values are reloaded via `load_config` method. + +### Processor groups + +The proc\_grps configuration file contains a JSON array of possible processor groups, which consist of a name and an array of folder paths, which can be relative to the root of the workspace. An example can be found in \[lst:proc\_grps\]. + +When `load_config` is called, the workspace retrieves these processor groups from the configuration file and creates libraries. The libraries provide information about paths to their dependency files. During the parsing, the workspace retrieves the library corresponding to the provided processor group name and uses it to search for a macro or copy file. + +### Program configuration + +The pgm\_conf configuration file contains a JSON array of program names (or wildcards \[section:wildcard\]), matched to their processor groups. It serves as a list of HLASM open code files and states the libraries (in the form of processor groups) that contain the dependencies of each open code. An example can be found in \[lst:pgm\_conf\]. + +From this configuration, the workspace remembers the processor group - open code mapping. + + { + "pgroups": [ + { + "name":"GROUP1", + "libs": [ + "ASMMAC/", + "C:/SYS.ASMMAC" + ] + }, + { + "name":"GROUP2", + "libs": [ + "G2MAC/", + "C:/SYS.ASMMAC" + ] + } + ] + } + + + { + "pgms": [ + { + "program": "source_code", + "pgroup": "GROUP1" + }, + { + "program": "second_file", + "pgroup": "GROUP2" + }, + ] + } + \ No newline at end of file diff --git a/docs/Extension.md b/docs/Extension.md index dc6a81cae..b45719530 100644 --- a/docs/Extension.md +++ b/docs/Extension.md @@ -9,45 +9,45 @@ Standard LSP Extension The core of the extension is an activation event which starts the plugin for VSCode. -Upon activation, *Language Client* and *[[Language server]]* are started as child processes of VSCode and a pipe is open for their communication. The LSP communication and its features are handled by the *vscodelc* package. +Upon activation, the *Language Client* and *[[Language server]]* are started as child processes of VSCode and a pipe is open for their communication. The LSP communication and its features are handled by the *vscodelc* package. To be independent of pipes, we have added an option to use TCP, which assigns a random free port for TCP communication. DAP Extension ------------- -[*Macro Tracer*](https://github.com/eclipse/che-che4z-lsp-for-hlasm/wiki/Macro-tracer)  is implemented using DAP, which is also supported out-of-the-box by VSCode. Similarly to LSP TCP support, we dynamically assign a random free port for DAP communication during the activation. +The [*Macro Tracer*](https://github.com/eclipse/che-che4z-lsp-for-hlasm/wiki/Macro-tracer) is implemented using DAP, which is also supported out-of-the-box by VSCode. Similarly to LSP TCP support, we dynamically assign a random free port for DAP communication during activation. -Additional implemented features +Additional Implemented Features ------------------------------- -To simplify the work with HLASM in modern editors, several features are added to the extension . These additions are specific for Visual Studio Code (and Theia) and are not a part of the LSP specification. +To simplify the work with HLASM in modern editors, several features are added to the extension. These additions are specific to Visual Studio Code (and Theia) and are not a part of the LSP specification. ### Language Detection -The usual workflow with the extension begins with downloading HLASM source codes from mainframe. Typically, these files will not have any file extension and even if they do, they might differ across various products. +The usual workflow with the extension begins with downloading HLASM source code from a mainframe. Typically, these files will not have any file extension and even if they do, they might differ across various products. To cope with this problem, there are several mechanisms that help the user to recognize the file as HLASM automatically: - **Macro Detection** -Each file starting with line *MACRO* (arbitrary number of whitespace before and after) is recognized as HLASM. +Each file starting with the line *MACRO* (with an arbitrary amount of whitespace before and after) is recognized as HLASM. - **Configuration Files Detection** Every file either defined as a program or as a part of a processor group is recognized as HLASM. - **Wildcards** -Configuration file *pgm\_conf.json* contains a field *alwaysRecognize*, which consists of user-defined wildcards. Every file that satisfies at least one of these wildcards is recognized as HLASM. +The configuration file *pgm\_conf.json* contains the field *alwaysRecognize*, which consists of user-defined wildcards. Every file that satisfies at least one of these wildcards is recognized as HLASM. - **Automatic Language Detection** Whenever a user opens a file, its contents are scanned line by line. If the file has a sufficient ratio of HLASM lines to all lines, it is considered to be HLASM. - The HLASM line recognition is mostly based on a pre-defined set of most used instructions. If a line correctly uses one of these instructions, it is counted as a HLASM line. Continued line of a HLASM line is also a HLASM line. Moreover, a HLASM line must not exceed 80 characters. + The HLASM line recognition is mostly based on a pre-defined set of commonly used instructions. If a line correctly uses one of these instructions, it is counted as a HLASM line. A continued line of a HLASM line is also a HLASM line. If a line exceeds 80 characters, it is not counted as a HLASM line. - Comment lines or empty lines are skipped and not counted. + Comment lines and empty lines are skipped and not counted. - We tested the detection on 11.000 HLASM files and 9.000 non HLASM files. The best results were observed using 4/10 ratio, with 88% true positive recognition and 95% true negative recognition. + We tested the detection on 11,000 HLASM files and 9,000 non-HLASM files. The best results were observed using a ratio of 4:10, with 88% true positive recognition and 95% true negative recognition. - Because of the indeterminate outcomes, this method is meant to be used as a fall-back in case all previous methods do not suffice. + Because of the indeterminate outcomes, this method is meant to be used as a fall-back in case all other methods do not suffice. All detection layers are visualized in the following picture: @@ -55,15 +55,15 @@ All detection layers are visualized in the following picture: ### Continuation Handling -Due to historical reasons, HLASM has a 80 character-per-line limitation. Modern languages do not enforce such restriction and therefore IDEs such as VSCode allow the user to extend their lines freely. This causes 2 major inconveniences. +Due to historical reasons, HLASM has a limitation of 80 characters per line. Modern languages do not enforce such restrictions and therefore IDEs such as VSCode allow the user to extend their lines freely. This causes 2 major inconveniences. -First of all, the user must add the continuation character on a very specific column manually. Secondly, each time the user types in between continuation character and the instruction/parameters, the continuation character is pushed from its requisite position and needs to be moved back, again manually. +Firstly, the user must add the continuation character on a specific column manually. Secondly, each time the user types in between the continuation character and the instruction/parameters, the continuation character is pushed from its requisite position and needs to be moved back, again manually. To improve this behavior, the extension offers an option to activate *Continuation Handling*. -The first problem is solved by adding two editor commands *insertContinuation* and *deleteContinuation*, which, when invoked, insert/delete the continuation character on its correct position. +The first problem is solved by adding two editor commands *insertContinuation* and *deleteContinuation*, which, when invoked, insert/delete the continuation character at its correct position. -To improve the second problem, the option overrides standard VSCode commands, commonly used when working in editor such as *type*, *deleteLeft*, *deleteRight*, *cut* and *paste*. They offset the continuation character by removing/adding whitespaces in front of it. +To improve the second problem, the option overrides standard VSCode commands commonly used when working in the editor such as *type*, *deleteLeft*, *deleteRight*, *cut* and *paste*. They offset the continuation character by removing/adding whitespaces in front of it. ### Configuration Prompt @@ -71,10 +71,10 @@ If a workspace contains a HLASM file, but does not have the configuration files ### HLASM Semantic Highlighting -In case of HLASM, a semantic (server-side) highlighting is desired. The multi-layered nature of the language causes that in quite common scenarios, specific parts of the code can be properly highlighted if and only if some previous part was completely processed (parameters for instructions, skipped code thanks to code generation, defined macros, continuations, etc...). +To highlight HLASM code, a semantic (server-side) approach is desired. Due to the multi-layered nature of the language, specific parts of the code commonly cannot be properly highlighted unless a previous part was completely processed (parameters for instructions, skipped code thanks to code generation, defined macros, continuations, etc.) -Based on the open [pull request to the VSCode Language Server](https://github.com/microsoft/vscode-languageserver-node/pull/367/files), we added *semanticHighlighting* as an extra feature of LSP. This feature works in a very similar manner, implementing the LSP interfaces that VSCode provides. It works as a notification from the server to the client, containing ranges inside the document and their respective tokens (e.g. instruction, label, parameter, comment,..). +Based on the open [pull request to the VSCode Language Server](https://github.com/microsoft/vscode-languageserver-node/pull/367/files), we added *semanticHighlighting* as an extra feature of LSP. This feature works in a very similar manner, implementing the LSP interfaces that VSCode provides. It works as a notification from the server to the client, containing ranges inside the document and their respective tokens (e.g. instruction, label, parameter, comment). -On top of that, we extended *semanticHighlighting* to *ASMsemanticHighlighting*, which adds the ability to notify the client about a new code layout, specifically begin, continuation and continue columns. These fields can be set in the HLASM code (via ICTL instruction) and are required for the *Continuation Handling* feature to work properly. Our client-server communication is shown in the figure below. +We also extended *semanticHighlighting* to *ASMsemanticHighlighting*, which adds the ability to notify the client about a new code layout, specifically the *begin*, *continuation* and *continue* columns. These fields can be set in the HLASM code (via an ICTL instruction) and are required for the *Continuation Handling* feature to work properly. Our client-server communication is shown in the figure below. -The addition of semantic highlighting to the LSP communication. \ No newline at end of file +The addition of semantic highlighting to the LSP communication. diff --git a/docs/HLASM-overview.md b/docs/HLASM-overview.md index 253229869..689a57d26 100644 --- a/docs/HLASM-overview.md +++ b/docs/HLASM-overview.md @@ -1,52 +1,52 @@ Ordinary assembly languages consist solely of ordinary machine instructions. High-level assemblers generally extend them with features commonly found in high-level programming languages, such as control statements similar to *if, while, for* as well as custom callable macros. -IBM High Level Assembler (HLASM) satisfies this definition and adds other features, which will be described in this section. +IBM High Level Assembler (HLASM) satisfies this definition and adds other features, which are described in this section. Syntax ------ -HLASM syntax is similar to a common assembler, but due to historical reasons it has limitations, like line length limited to 80 characters (as that was the length of a punched card line). +HLASM syntax is similar to a common assembler, but due to historical reasons it has limitations, such as the line length limit of 80 characters (as that was the length of a punched card line). ### Statement -HLASM program consists of a sequence of *statements*, which are used to produce both compile-time code and run-time code (see [Assembling]). A statement consists of four fields separated by spaces that can be split into more lines using continuations (see \[Continuation\]). Following are the existing fields: +A HLASM program consists of a sequence of *statements*, which are used to produce both compile-time code and run-time code (see [Assembling]). A statement consists of four fields separated by spaces that can be split into more lines using continuations (see \[Continuation\]). The following are the existing fields: - **Name field** — Serves as a place for named constants that are to be used in the code. This field is optional, but, when present, it must start at the begin column of a line. -- **Instruction field** — The only mandatory field, represents the instruction that is executed. It must not begin in the first column, as it would be interpreted as a name field. +- **Instruction field** — The only mandatory field, representing the instruction that is executed. It must not begin in the first column, as this space is reserved for the name field. -- **Operands field** — Field for instruction operands, located immediately after instruction field. Individual operands must be separated by a comma, and, depending on the specific instruction, can be either blank, in a form of an apostrophe separated string, or represented by a sequence of characters. +- **Operands field** — A field for instruction operands, located immediately after the instruction field. Individual operands must be separated by a comma, and, depending on the specific instruction, can be either blank, in a form of an apostrophe separated string, or represented by a sequence of characters. - **Remark field** — Optional, serves as inline commentary. Located either after the operands field, or, in case the operands are omitted, after the instruction field. +The following is an example of a basic statement containing all fields. ``` name instruction operands remark .NOMOV AGO (&WH).L1,.L2,.L3 SEQUENTIAL BRANCH ``` -shows an example of a basic statement containing all fields. ### Symbols -In HLASM, symbols are used to represent source location or arbitrary value. They are defined in name field; then, they can be used in operand field. There are three types of symbols recognized by the assembler each having different syntax rules: +In HLASM, symbols are used to represent source location or arbitrary value. They are defined in the name field; then, they can be used in the operand field. There are three types of symbols recognized by the assembler, each with different syntax rules: -- Ordinary symbol — consists of at most 63 alphanumeric characters. The first character must be an alphabetic character (being one from `a-z`, `A-Z`, `@`, `#`, `$`, `_`). +- Ordinary symbol — consists of up to 63 alphanumeric characters. The first character must be an alphabetic character (being one from `a-z`, `A-Z`, `@`, `#`, `$`, `_`). `REG11` -- Variable symbol — must start with ampersand (`&`). The second character must be alphabetic followed by up to 61 alphanumeric characters. +- Variable symbol — must start with an ampersand (`&`). The second character must be alphabetic followed by up to 61 alphanumeric characters. `&DB_VER` -- Sequence symbol — follows the same rules as the variable symbol but the leading character is dot (`.`). +- Sequence symbol — follows the same rules as the variable symbol but the leading character is a dot (`.`). `.NOMOV` -Semantic meaning of each symbol is further described in the following sections. +The meaning of each symbol is further described in the following sections. ### Continuations -Individual statements sometimes contain more than 80 characters, which does not agree with the historical line length limitations. Therefore, a special feature called *continuation* exists. +Individual statements sometimes contain more than 80 characters, which does not agree with the line length limitations. Therefore, a special feature called *continuation* exists. For this purpose the language specification defines four special columns: @@ -62,7 +62,7 @@ The begin column defines where the statements can be started. The end column determines the position of the end of the line. Anything written to its right does not count as content of the statement, and is rather used as a line sequence number (see \[fig01:line\]). -The continuation column is used to indicate that the statement continues on the next line. For proper indication, an arbitrary character other than space must be written in this column. The remainder of the statement must then start on the continue column. +The continuation column is used to indicate that the statement continues on the next line. For proper indication, an arbitrary character other than a space must be written in this column. The remainder of the statement must then start on the continue column. An example of an instruction where its last operand exceeded column 72 of the line can be seen in \[lst:overflow\]. @@ -82,47 +82,47 @@ Some instructions also support the *extended format* of the operands. This allow Assembling ---------- -Having briefly outlined the syntax, we now describe the assembly process of HLASM. +This section describes the assembly process of HLASM. -We distinguish two types of processing: +There are two types of processing: -- *conditional assembly (CA) processing* — the main purpose of which is to generate statements for ordinary assembly (see \[CA\_proc\]) +- *conditional assembly (CA) processing*, the main purpose of which is to generate statements for ordinary assembly (see \[CA\_proc\]) -- *ordinary assembly processing* — which handles *machine instructions* and *assembler instructions* (see \[mach\_instr\], \[asm\_instrs\]) +- *ordinary assembly processing*, which handles *machine instructions* and *assembler instructions* (see \[mach\_instr\], \[asm\_instrs\]) -### Ordinary assembly +### Ordinary Assembly Ordinary assembly, along with machine and assembler instructions, is responsible for the runtime behavior of the program. It allows the generation of code from both traditional machine instructions and special-purpose assembler instructions. Moreover, it assigns values to *ordinary symbols*. -#### Ordinary symbols +#### Ordinary Symbols -In HLASM, an *ordinary symbol* is a named run-time constant. It is defined by inputting its name into the name field of a statement along with a special assembler instruction. Each ordinary symbol can only be defined once, and its value is constant. There are two types of ordinary symbols: +In HLASM, an *ordinary symbol* is a named run-time constant. It is defined when its name is specified in the name field of a statement along with a special assembler instruction. Each ordinary symbol can only be defined once, and its value is constant. There are two types of ordinary symbols: -- An *absolute symbol* that simply has an integral value. +- An *absolute symbol*, which has an integral value. -- A *relocatable symbol* that represents an address in the resulting object code. A relocatable symbol can also be defined by writing the ordinary symbol name into the name field of a statement along with a machine instruction name. The symbol then denotes the address of the given instruction. +- A *relocatable symbol*, which represents an address in the resulting object code. A relocatable symbol can also be defined by writing the ordinary symbol name into the name field of a statement along with a machine instruction name. The symbol then denotes the address of the given instruction. -In addition to symbol value, ordinary symbols also contain a set of *attributes*, the most common ones being *type* and *length*. +In addition to symbol value, ordinary symbols also contain a set of *attributes*. The most common attributes are *type* and *length*. -#### Machine instructions +#### Machine Instructions *Machine instructions* represent the actual processor instructions executed during run-time. Similarly to traditional assemblers, they are translated into corresponding opcodes and their operands are processed. However, HLASM also allows expressions to be passed as their operands, which may use ordinary symbols and support integer and address arithmetic. -#### Assembler instructions +#### Assembler Instructions In addition to machine instructions, HLASM assembler also provides *assembler instructions* (in other systems commonly termed *directives*). They instruct the assembler to make specific actions rather than to assemble opcodes. For example, they generate run-time data constants, create ordinary symbols, organize the resulting object code and generally affect how the assembler operates. Following are examples of assembler instructions: -- **ICTL** — changes values of the previously described line columns (i.e. begin column may begin at column 2 etc.). +- **ICTL** — changes values of the previously described line columns (i.e. the begin column can be moved to column 2 etc.) -- **DC**, **DS** — reserves space in object code for data described in operands field and assembles them in place (i.e. assembles float, double, character array, address etc.). These instructions take *data definition* as operands. shows examples of data definition. +- **DC**, **DS** — reserves space in object code for data described in the operands field and assembles them in place (i.e. assembles float, double, character array, address etc.) These instructions take *data definition* as operands. - **EQU** — defines ordinary symbols. -- **COPY** — copies text from a specified file (called *copy member*) and pastes it in place of the instruction. Path to the folder of the file is passed to assembler before the start of assembly. It is very similar to the C preprocessor `#include` directive. +- **COPY** — copies text from a specified file (called a *copy member*) and pastes it in place of the instruction. The path to the folder of the file is passed to assembler before the assembly starts. It is very similar to the C preprocessor `#include` directive. -- **CSECT** — creates an executable control section, which serves as the start of relative addressing. It is followed by sequence of machine instructions. +- **CSECT** — creates an executable control section, which serves as the start of relative addressing. It is followed by a sequence of machine instructions. @@ -143,9 +143,9 @@ Following are examples of assembler instructions: of expression A+4, then B and then 4 more copies of the same -#### Ordinary symbols resolution +#### Resolution of Ordinary Symbols -All the assembler instructions and ordinary symbols must be resolved before the assembler creates the final object file. However, as the HLASM language supports forward declaration of ordinary symbols, the assembly may be quite complicated. Consider an example in \[lst:ordinary\_assembly\]. When the instruction on line 1 is seen for the first time, it is impossible to determine its length, because the symbol `LEN` is not defined yet (character L with an expression in parentheses in DS operand of type C specifies how many bytes should be reserved in the program). The same applies to the length of the instruction on the second line. Furthermore, it is also impossible to determine the exact value of relocatable symbols `ADDR` and `HERE` because of the unknown length of the preceding instructions. +All the assembler instructions and ordinary symbols must be resolved before the assembler creates the final object file. However, as the HLASM language supports forward declaration of ordinary symbols, the assembly might be quite complicated. Consider the example below. When the instruction on line 1 is seen for the first time, it is impossible to determine its length, because the symbol `LEN` is not defined yet (the character L with an expression in parentheses in the operand DS of type C specifies how many bytes are reserved in the program). The same applies to the length of the instruction on the second line. It is also impossible to determine the exact value of relocatable symbols `ADDR` and `HERE` because of the unknown length of the preceding instructions. ``` DS CL(LEN) @@ -157,19 +157,19 @@ SIZE EQU 1 ``` -In the next step, `LEN` is defined. However, it cannot be evaluated, because the subtraction of addresses `ADDR` and `HERE` is dependent on the unknown length of instruction on second line and therefore on the symbol `SIZE`. The whole program is resolved only when the assembly reaches the last line, which defines the length of instruction `02`. Afterwards, it is possible to resolve `LEN` and finally the length of instruction `01`. +In the next step, `LEN` is defined. However, it cannot be evaluated, because the subtraction of addresses `ADDR` and `HERE` is dependent on the unknown length of the instruction on the second line and therefore on the symbol `SIZE`. The whole program is resolved only when the assembly reaches the last line, which defines the length of instruction `02`. Afterwards, it is possible to resolve `LEN` and finally the length of instruction `01`. -The dependency graph created from these principles can be arbitrarily deep and complicated, however it must not contain cycles (a symbol must not be transitively dependent on itself). +The dependency graph created from these principles might be arbitrarily deep and complicated, however it must not contain cycles (a symbol must not be transitively dependent on itself). -### Object file layout +### Object File Layout -The product of ordinary assembly is an object file. Let us briefly describe its layout. +The product of ordinary assembly is an object file. This section describes its layout. #### Sections -An object file consists of so-called *sections*. They are user-defined (by instructions CSECT, DSECT, …) and can be of different kinds, each with various properties. Absolute positions of sections within the object file are undefined — they are determined automatically after the compilation. This also implies that all relocatable symbols are only defined relatively to the section that contains them. +An object file consists of so-called *sections*. They are user-defined (by instructions such as CSECT and DSECT) and can be of different kinds, each with various properties. Absolute positions of sections within the object file are undefined — they are determined automatically after compilation. This also implies that all relocatable symbols are only defined relatively to the section that contains them. -#### Location counter +#### Location Counter Any time a machine instruction is encountered, its opcode is outputted to the *next available address*. Each section has a structure pointing to this address — a so-called *location counter*. @@ -179,53 +179,53 @@ At the end of assembly, all code denoted by location counters is assembled in a The value of the location counter can be arbitrarily changed by the ORG instruction. It can be moved backwards or forwards (with restriction of counter underflow) to set the next address. This means that user can generate some code, move counter backwards and overwrite it. Then the ORG instruction can be used to set location counter to the next available untouched address to continue in object creation. -### Conditional assembly +### Conditional Assembly Conditional assembly is another feature provided by HLASM. It is essentially a macro-language built on top of a traditional assembler. -User may use conditional assembly instructions to either define *variable symbols*, which can be used in any statement to alter its meaning, or to define *macros* — reusable pieces of code with parameters. Based on these instructions, conditional assembly then alters the textual representation of the source code and selects which lines will be processed next. +A user can use conditional assembly instructions to either define *variable symbols*, which can be used in any statement to alter its meaning, or to define *macros* — reusable pieces of code with parameters. Based on these instructions, conditional assembly then alters the textual representation of the source code and selects which lines are processed next. -#### Variable symbols +#### Variable Symbols Variable symbols serve as compile-time variables. Statements that contain them are called *model statements*. During conditional assembly, variable symbols are substituted for their value to create a statement processable by ordinary assembly. For example, a user can write a variable symbol in the operation field and generate any instruction that can be a result of a substitution. -Variable symbols also have notion of their type — they can be defined either as integer, boolean or string. CA instructions gather this information for different sorts of conditional branching. +Variable symbols also have a notion of their type — they can be defined either as an integer, boolean or string. CA instructions gather this information for different sorts of conditional branching. -#### Sequence symbols +#### Sequence Symbols -A sequence symbol is important in compile-time branching. When written in a name field of a specific statement, branching instructions can use the symbol as a label to jump to the specified statement; hence, alter the further code generation. +A sequence symbol is important in compile-time branching. When written in a name field of a specific statement, branching instructions can use the symbol as a label to jump to the specified statement and alter any further code generation. -#### CA instructions +#### CA Instructions -CA instructions are not assembled into object code. They are used to select which instructions will be processed by the assembler next. +CA instructions are not assembled into object code. They are used to select which instructions are processed by the assembler next. -One example of their capabilities is conditional and unconditional branching. As HLASM provides a variety of built-in binary or unary operations on variable symbols, complex conditional expressions can be created. This is important in HLASM, as the user can alter the flow of instructions that will be assembled into an executable program. +One example of their capabilities is conditional and unconditional branching. As HLASM provides a variety of built-in binary or unary operations on variable symbols, complex conditional expressions can be created. This is important in HLASM, as the user can alter the flow of instructions that are assembled into an executable program. -Another subset of CA instructions operates on variable symbols. These can be used to define variable symbols locally or globally, assign or update their values. +Another subset of CA instructions operates on variable symbols. These can be used to define variable symbols locally or globally, and to assign or update their values. #### Macros -A *macro* is a structure consisting of a *name*, *input parameters* and a *body*, which is a sequence of statements. When a macro is called in a HLASM program, each statement in its body is executed. Both nested and recursive calls of macros are allowed. Macro body can also contain CA instructions, or even a sequence of instructions generating another macro definition. With the help of variable symbols, HLASM has the power to create custom, task specific macros. +A *macro* is a structure consisting of a *name*, *input parameters* and a *body*, which is a sequence of statements. When a macro is called in a HLASM program, each statement in its body is executed. Both nested and recursive calls of macros are allowed. A macro body can also contain CA instructions, or even a sequence of instructions generating another macro definition. With the help of variable symbols, HLASM has the power to create custom, task specific macros. #### Description of a HLASM code example The current section contains description of the example shown in \[lst:example\]. -On lines `01-04`, we see a *macro definition*. It is defined with name `GEN_LABEL`, variable `NAME` and contains one instruction in its body, which assigns the current address to the label in `NAME`. +Lines `01-04` contain a *macro definition*. It is defined with the name `GEN_LABEL`, the variable `NAME` and contains one instruction in its body, which assigns the current address to the label in `NAME`. On line `06`, the *copy instruction* is used, which includes the contents of the `REGS` file. -Line `08` establishes a start of an executable section `TEST`. +Line `08` establishes the start of an executable section called `TEST`. -On line `09`, an integer value is assigned to a variable symbol `VAR`. The value is the length attribute of previously non-defined constant `DOUBLE`. The assembler looks for the definition of the constant to properly evaluate the conditional assembly expression. In the next line, there is a CA branching instruction `AIF`. If value of `VAR` equals 4, all the text between `AIF` and `.END` is completely skipped and assembling continues on line `18`, where the branching symbol `.END` is located. +On line `09`, an integer value is assigned to the variable symbol `VAR`. The value is the length attribute of the previously non-defined constant `DOUBLE`. The assembler looks for the definition of the constant to properly evaluate the conditional assembly expression. In the next line, there is a CA branching instruction `AIF`. If value of `VAR` equals 4, all the text between `AIF` and `.END` is completely skipped and assembling continues on line `18`, where the branching symbol `.END` is located. Lines `12-13` show examples of machine instructions that are directly assembled into object code. Lines `11` and `14` contain examples of a macro call. -On line `15`, the constant `LEN` is assigned the difference of two addresses, which results in absolute ordinary symbol. This value is next used to generate character data. +On line `15`, the constant `LEN` is assigned the difference of two addresses, which results in an absolute ordinary symbol. This value is used to generate character data. -Instruction `DC` on line `17` creates value of type double and assigns its address to the ordinary symbol `DOUBLE`. This constant also holds information about length, type and other attributes of the data. +The instruction `DC` on line `17` creates a value of the double type and assigns its address to the ordinary symbol `DOUBLE`. This constant also holds information about length, type and other attributes of the data. `ANOP` is an empty assembler action which defines the `.END` symbol and line `19` ends the assembling of the program. @@ -255,17 +255,17 @@ DOUBLE DC H'-3.729' ``` -Although CA processing may act like text preprocessing, it is still interlinked with ordinary processing. CA has mechanics that allow the assembler to gather information about statements that are printed during the processing. It can also access values created in ordinary assembly and use them in conditional branching, and is able to lookup constants that are not yet defined prior to the currently processed statement. During ordinary assembly, names of these instructions can also be aliased. +Although CA processing might act like text preprocessing, it is still interlinked with ordinary processing. CA has mechanics that allow the assembler to gather information about statements that are printed during the processing. It can also access values created in ordinary assembly and use them in conditional branching, and is able to lookup constants that are not yet defined prior to the currently processed statement. During ordinary assembly, the names of these instructions can also be aliased. To sum up, CA processing has variables for storing values during the compilation and CA instructions for conditional branching. Hence, it is Turing-complete while still evaluated during compile-time. -HLASM source structure +HLASM Source Structure ---------------------- The file that generates the object code is called an *open-code* file. It is the entry file of the HLASM compiler. Each open-code file can have in-file dependencies, specifically: -- External Macro definitions +- External macro definitions - Copy members -These are not treated as open-code files because they do not directly generate object code. Rather, they serve as statement sequences that are included in specific places of open-code and provide specific meaning. \ No newline at end of file +These are not treated as open-code files because they do not directly generate object code. Rather, they serve as statement sequences that are included in specific places of open-code and provide a specific meaning. diff --git a/docs/Home.md b/docs/Home.md index b7939554f..64218cb91 100644 --- a/docs/Home.md +++ b/docs/Home.md @@ -1,16 +1,17 @@ The IBM High Level Assembler Language (HLASM) is still actively used commercially, even though it is a relatively old language. Its roots go back to the 1970s, when IBM made their first mainframes. Since then, the IBM assembler has been revised several times — the last version (which is the concern of this project) was released in 1992. Although it is hard to believe, a lot of the software that has been written in the language over the years is still actively used and maintained, mainly because of the conservative mainframe users and IBM’s vendor lock-in. -Today, HLASM developers are forced to code in archaic terminals directly on the mainframe. Therefore, they spend a lot of time navigating around the code and the environment. For example, solely due to the fact that the user needs to navigate through plenty of terminal screens it takes around a minute just to get to a screen where it is possible to make a change in a file and recompile. For developers, it would be extremely useful to have an IDE plugin that would minimize contact with the mainframe terminal, could analyze the HLASM program, check its validity and make the code clearer by syntax highlighting. +Today, HLASM developers are forced to code in archaic terminals directly on the mainframe. Therefore, they spend a lot of time navigating around the code and the environment. For example, as the user needs to navigate through plenty of terminal screens it takes around a minute just to get to a screen where it is possible to make a change in a file and recompile. For developers, it would be extremely useful to have an IDE plugin that minimizes contact with the mainframe terminal, analyzes the HLASM program, checks its validity and makes the code clearer by highlighting the syntax. -We introduce such plugin for Visual Studio Code, which is one of the most popular code editors nowadays. It improves HLASM programming experience, so that it can be compared to coding in modern programming languages, by providing instant code validity checks, advanced highlighting, code analysis, and all the functionality that a programmer currently takes for granted when writing code. +To this end we have introduced a plugin for Visual Studio Code, which is one of the most popular code editors nowadays. It enhances the HLASM programming experience, so that it can be compared to coding in modern programming languages, by providing instant code validity checks, advanced highlighting, code analysis, and all the functionality that the modern programmer takes for granted when writing code. The most significant properties and features of the plugin are: - It is capable of interpreting and tracing a large subset of HLASM code-generating instructions. -- It contains a list of all built-in instructions that is used to validate the generated code. -- *[[Macro tracer]]* gives a possibility to trace the compilation of a HLASM source code step-by-step in a way similar to common debugging. -- It implements [[LSP and DAP]] protocols, providing interface that can be easily connected to numerous modern code editors. -- It was successfully used on a production HLASM codebase with over 15 million lines of code. +- It contains a list of all built-in instructions, which is used to validate the generated code. +- The *[[Macro tracer]]* enables tracing of the compilation of a HLASM source code step-by-step in a similar way to common debugging. +- It implements [[LSP and DAP]] protocols, providing an interface that can be easily connected to numerous modern code editors. + +The plugin was successfully used on a production HLASM codebase with over 15 million lines of code. The plugin is available on the [Visual Studio Code Marketplace](https://marketplace.visualstudio.com/items?itemName=broadcomMFD.hlasm-language-support). @@ -18,7 +19,11 @@ This wiki serves as an in-depth documentation for anyone who would like to under User documentation is available on the [Visual Studio Code Marketplace](https://marketplace.visualstudio.com/items?itemName=broadcomMFD.hlasm-language-support). -Organisation of this wiki +Contents ----------------------------- -First of all, in [[HLASM overview]], we briefly explain the basics of HLASM needed to comprehend the workflow of this language. In [[Architecture overview]], we provide an overview of the project’s architecture, naming the most important components and indicating their relations. Then, we describe these components in separate chapters in further detail. In [[Language server]], we state the responsibilities of the language server as the communication provider between the extension client and the parsing library. The [[workspace manager]] is the entry point to the parsing library used by the [[language server]]. The purpose of its sub-components is to handle file management, dependency resolution and parsing. The core of the processing of a HLASM file is implemented inside the [[analyzer]]. The project also provides macro tracing through the standard debugging procedure and it is fully explained in \[[macro tracer]].The last mentioned component is the [[VSCode extension|Extension]], which communicates with the [[language server]] and provides IDE features to the user. In [[Build instructions]], we provide a guide how to build this project. \ No newline at end of file +Firstly, in [[HLASM overview]], we briefly explain the basics of HLASM needed to comprehend the workflow of this language. In [[architecture overview]], we provide an overview of the project’s architecture, naming the most important components and indicating their relations. Then, we describe these components in separate chapters in further detail. + +The page [[language server]] describes the responsibilities of the language server as the communication provider between the extension client and the parsing library. The [[workspace manager]] is the entry point to the parsing library used by the language server. The purpose of its sub-components is to handle file management, dependency resolution and parsing. + +The core of the processing of a HLASM file is implemented inside the [[analyzer]]. The project also provides macro tracing through the standard debugging procedure and it is fully explained in [[macro tracer]].The last mentioned component is the [[extension|VSCode extension]], which communicates with the language server and provides IDE features to the user. In [[build instructions]], we provide a guide to build this project. diff --git a/docs/Language-server-pages/IO-handling.md b/docs/Language-server-pages/IO-handling.md index 68fbbf3ed..4cfc7efc8 100644 --- a/docs/Language-server-pages/IO-handling.md +++ b/docs/Language-server-pages/IO-handling.md @@ -1,5 +1,5 @@ The purpose of the `dispatcher` is to abstract from the complexity of working with raw strings and streams. It executes an infinite loop in which it reads messages from `std::iostream` and adds them to the [`request_manager`](https://github.com/eclipse/che-che4z-lsp-for-hlasm/wiki/Request-manager) as parsed JSON objects. At the same time, it is able to write responses in the correct format. -The language server communicates with the LSP client on a standard input and output, so we simply use the `dispatcher` with the standard `std::cin` and `std::cout` objects to communicate with the LSP client. +The language server communicates with the LSP client on a standard input and output, so we use the `dispatcher` with the standard `std::cin` and `std::cout` objects to communicate with the LSP client. -The DAP communicates using TCP/IP, which is less straightforward. Before the VS Code extension starts the language server, it finds a free TCP port and passes it as an argument to the language server executable. The `TCP handler` then starts listening on that port. Once the user wants to start the macro tracer, the DAP client connects to the port on localhost. The `tcp_handler` accepts the TCP client and creates a `dispatcher` and a `dap_server`. Once the DAP communication ends, both the `dispatcher` and the `dap_server` are destroyed and the `tcp_handler` starts listening again for the next DAP session. Thanks to the ASIO library (see [[Third party libraries]]) implementation of the `std::iostream` interface, the `dispatcher` is able to completely abstract from the fact that it is communicating through TCP and not through the standard IO. +The DAP communicates using TCP/IP, which is less straightforward. Before the VS Code extension starts the language server, it finds a free TCP port and passes it as an argument to the language server executable. The `TCP handler` then starts listening on that port. Once the user wants to start the macro tracer, the DAP client connects to the port on localhost. The `tcp_handler` accepts the TCP client and creates a `dispatcher` and a `dap_server`. Once the DAP communication ends, both the `dispatcher` and the `dap_server` are destroyed and the `tcp_handler` starts listening again for the next DAP session. Thanks to the ASIO library (see [[third party libraries]]) implementation of the `std::iostream` interface, the `dispatcher` is able to completely abstract from the fact that it is communicating through TCP and not through the standard IO. diff --git a/docs/Language-server-pages/LSP-and-DAP-server.md b/docs/Language-server-pages/LSP-and-DAP-server.md index 1a093ac2c..02a6fc69f 100644 --- a/docs/Language-server-pages/LSP-and-DAP-server.md +++ b/docs/Language-server-pages/LSP-and-DAP-server.md @@ -1,11 +1,9 @@ -LSP and DAP Server ------------------- -The servers are able to process incoming LSP and DAP requests. They get the messages in a form of already parsed JSONs. Then they extract the name of the requested method with its parameters from the message and call the corresponding method with the parameters encoded as JSON. +The servers are able to process incoming LSP and DAP requests. They get the messages in the form of already parsed JSONs. Then they extract the name of the requested method with its parameters from the message and call the corresponding method with the parameters encoded in JSON format. -There are two server implementations: `lsp_server` and `dap_server`. Both inherit from an abstract class called `server`. They implement protocol-specific processing of messages — although the protocols are quite similar (both are based on RPC), each protocol has different initialization and finalization, different message format, etc. +There are two server implementations: `lsp_server` and `dap_server`. Both inherit from an abstract class called `server`. They implement protocol-specific processing of messages — although the protocols are quite similar (both are based on RPC), each protocol has a different initialization and finalization, different message format, etc. -The functionality of servers is divided into `features`. Each feature implements several LSP or DAP methods by unpacking the arguments of the respective method and calling corresponding parser library function. During initialization, each feature adds its methods to the server’s list of implemented methods. The `lsp_server` uses three features: +The functionality of servers is divided into `features`. Each feature implements several LSP or DAP methods by unpacking the arguments of the respective method and calling the corresponding parser library function. During initialization, each feature adds its methods to the server’s list of implemented methods. The `lsp_server` uses three features: - *Text synchronization feature*, which handles the notifications about the state of open files in the editor. @@ -15,7 +13,7 @@ The functionality of servers is divided into `features`. Each feature implements The following table shows the list of all implemented LSP methods and the classes where the implementations lie. -| **Component** | **LSP Method name** | +| **Component** | **LSP Method Name** | |:--------------|:------------------------------------| | `lsp_server` | initialize
shutdown
exit
textDocument/publishDiagnostics| | Text synchronization feature| textDocument/didOpen
textDocument/didChange
textDocument/didClose
textDocument/semanticHighlighting| @@ -23,19 +21,19 @@ The following table shows the list of all implemented LSP methods and the classe |Language feature| textDocument/definition
textDocument/references
textDocument/hover
textDocument/completion| | -The DAP server uses only one feature — the Launch feature, which handles stepping through the code and retrieving information about both variables and stack trace. The following table shows the list of all implemented DAP methods: +The DAP server uses only one feature — the launch feature, which handles stepping through the code and retrieving information about both variables and stack trace. The following table shows the list of all implemented DAP methods: -| **Class** | **DAP Method name** | +| **Class** | **DAP Method Name** | |:----------|:---------------------------------| |`dap_server`| `initialize`
`disconnect`
`launch`| |`feature_launch`| `setBreakpoints`
`configurationDone`
`threads`
`stackTrace`
`scopes`
`next`
`stepIn`
`variables`
`continue`
`stopped`
`exited`
`terminated`| | -Response with result +Response With Result -------------------- According to the LSP and the DAP, the server is required to send messages back to the LSP/DAP client either as responses to requests (e.g. `hover`), notifications (e.g. textDocument/publishDiagnostics notification) or events (e.g. stopped event). Features require reference to an instance of the `response_provider` interface that provides methods `respond` and `notify` for sending messages back to the LSP client. Both LSP and DAP server classes implement the `response_provider` to form protocol-specific JSON with the arguments. -The servers then send the JSON to the LSP/DAP client using the `send_message_provider` interface. At this point, the final complete JSON response is formed. The `send_message_provider` then adds the message header and serializes the JSON using the JSON for Modern C++ library (see [[Third party libraries]]). +The servers then send the JSON to the LSP/DAP client using the `send_message_provider` interface. At this point, the final complete JSON response is formed. The `send_message_provider` then adds the message header and serializes the JSON using the JSON for Modern C++ library (see [[third party libraries]]). The only implementation of the `send_message_provider` interface is the [`dispatcher`](https://github.com/eclipse/che-che4z-lsp-for-hlasm/wiki/IO-handling). diff --git a/docs/Language-server-pages/LSP-and-DAP.md b/docs/Language-server-pages/LSP-and-DAP.md index 70c69c1ad..b413de8e9 100644 --- a/docs/Language-server-pages/LSP-and-DAP.md +++ b/docs/Language-server-pages/LSP-and-DAP.md @@ -1,11 +1,12 @@ Language Server Protocol ------------------------ -[Language Server Protocol](https://microsoft.github.io/language-server-protocol/) is used to extend code editors with support for additional programming languages. LSP defines 2 communicating entities: a client and a server. The LSP client is editor-specific and wraps interaction with the user. The LSP server is language-specific and provides information about the source code. +The [Language Server Protocol](https://microsoft.github.io/language-server-protocol/) is used to extend code editors with support for additional programming languages. The LSP defines 2 communicating entities: a client and a server. The LSP client is editor-specific and wraps interaction with the user. The LSP server is language-specific and provides information about the source code. -The main purpose of the LSP is to allow the language server to provide language-specific response to various user interactions with the code editor. Messages that flow through LSP can be divided into three categories: +The main purpose of the LSP is to allow the language server to provide a language-specific response to various user interactions with the code editor. Messages that flow through LSP are divided into three categories: -- **Parsing results presentation** Messages from the first category allow the language server to send results of source code analysis to the LSP client. The editor is then able to show them to the user. For example, when the user clicks on a symbol in HLASM code and then uses the ‘Go to definition’ function, the LSP client sends a request to the language server with the name of currently open file and current location in the file. The server is then expected to send back the location of the definition, so the editor can present it to the user (e.g. the editor moves the caret to the definition location). All such messages are listed in the following table: +- **Parsing results presentation** +Messages from the first category allow the language server to send results of source code analysis to the LSP client. The editor is then able to show them to the user. For example, when the user clicks on a symbol in HLASM code and then uses the ‘Go to definition’ function, the LSP client sends a request to the language server with the name of the currently open file and current location in the file. The server is then expected to send back the location of the definition, so the editor can present it to the user (e.g. the editor moves the caret to the definition location). All such messages are listed in the following table: | Message | Description | |:------------------------|:------------| @@ -15,21 +16,18 @@ The main purpose of the LSP is to allow the language server to provide language- | textDocument/completion |The client sends a position in an open file and how a completion box was triggered (i.e. with what key, automatically/manually). The server responds with a list of strings suggested for completion at the position.| |textDocument/publishDiagnostics|The server sends diagnostics to the client. A diagnostic represents a problem with the source code, e.g. compilation errors and warnings.| - -- **Editor state and file content synchronization** Messages from the second category flow mainly from the client to the server and ensure that the server has enough information to correctly analyze source code. All such messages can be found in the following table: +- **Editor state and file content synchronization** +Messages from the second category flow mainly from the client to the server and ensure that the server has enough information to correctly analyze source code. All such messages are listed in the following table: | Message | Description | |:--------------------------|:------------| | textDocument/didOpen
textDocument/didChange
textDocument/didClose|The server is notified whenever the user opens a file, changes contents of an already open file or closes a file in the editor.| - | workspace/didChangeWatchedFiles|The client notifies the server when a watched file is changed outside of the editor. Watched files selector is defined when the client is started (in the extension component).| + | workspace/didChangeWatchedFiles|The client notifies the server when a watched file is changed outside of the editor. Watched files selector is defined when the client is started (in the extension component.)| | workspace/didChangeWorkspaceFolders|The client notifies the server that the user has opened or closed a workspace.| +- **LSP initialization and finalization** +Lastly, there are several messages that handle protocol initialization and finalization. -- **LSP initialization and finalization** Lastly, there are several messages that handle protocol initialization and finalization. - - - - -LSP is based on [JSON RPC](https://www.jsonrpc.org/specification). There are two types of interaction in JSON RPC: requests and notifications. Both of them carry the information to invoke a method on the recipient side — name of the method and its arguments. The difference between the two is that each request requires a response containing the result of the method, whereas the notifications do not. +LSP is based on [JSON RPC](https://www.jsonrpc.org/specification). There are two types of interaction in JSON RPC: requests and notifications. Both of them carry the information to invoke a method on the recipient side — the name of the method and its arguments. The difference between the two is that each request requires a response containing the result of the method, whereas the notifications do not. The LSP uses the JSON RPC specification and further specifies how messages are transferred and defines methods, their arguments, responses and semantics. A raw message sent from the client to the server is shown in the following snippet: @@ -41,15 +39,15 @@ The LSP uses the JSON RPC specification and further specifies how messages are t The raw messages have HTTP-like headers. The only mandatory header is `Content-Length`, which tells the recipient the length of the following message. The JSON itself is sent after the header. -Inside the JSON, there is a name of the method to be invoked and parameters to pass to the method. In this case, the client is sending a notification that file `C:/Users/admin/Documents/source.hlasm` was closed in the editor by the user. As it is a notification, there must not be any response. +Inside the JSON, there is a name of the method to be invoked and parameters to pass to the method. In this case, the client sends a notification that the file `C:/Users/admin/Documents/source.hlasm` was closed in the editor by the user. As it is a notification, there must not be any response. -On top of this basic protocol, LSP defines methods and their semantics to cover common functionality that users expect when programming in an editor. List of all methods implemented in the language server can be found in [[LSP and DAP server]]. +On top of this basic protocol, LSP defines methods and their semantics to cover common functionality that users expect when programming in an editor. A list of all methods implemented in the language server can be found in [[LSP and DAP server]]. -DAP ---- +Debug Adapter Protocol +---------------------- -[Debug Adapter Protocol](https://microsoft.github.io/debug-adapter-protocol/) is used to extend code editors with debugging support for additional programming languages. We use it to provide the user with the ability to trace how the HLASM compiler processes source code step by step. The user can see the values of compile-time variables and follow the expansion of macros in debug-like experience. +The [Debug Adapter Protocol](https://microsoft.github.io/debug-adapter-protocol/) (DAP) is used to extend code editors with debugging support for additional programming languages. We use it to provide the user with the ability to trace how the HLASM compiler processes source code step by step. The user can see the values of compile-time variables and follow the expansion of macros with a debug-like experience. -The communication in DAP is between an editor or an IDE and a debugger. The editor notifies the debugger about the user actions, e.g. when a breakpoint is set or when the user uses step in/step over buttons. The debugger informs the editor about the state of the debugged application, for example when the debugger stopped because it hit a breakpoint. While it is stopped, the debugger sends information about program stack, variables valid in current debugger scope and its values. +The communication in the DAP is between an editor or an IDE and a debugger. The editor notifies the debugger about the user actions, e.g. when a breakpoint is set or when the user uses the step in/step over buttons. The debugger informs the editor about the state of the debugged application, for example when the debugger stopped because it hit a breakpoint. While it is stopped, the debugger sends information about the program stack, the variables valid in the current debugger scope and their values. -DAP is very similar to LSP. Although the ideas behind DAP are nearly the same, DAP is not based on the JSON RPC. Instead, DAP specifies its own implementation of remote procedure call, still using JSON as the basic carrier of the messages. DAP has requests and events — requests always go from the client to the server and require response. Events are the same as the notifications from JSON RPC that are sent from the server to the client. The similarity allows our language server component to share a lot of code between the implementations of the protocols. +The DAP is very similar to the LSP. Although the ideas behind the DAP are nearly the same, the DAP is not based on the JSON RPC. Instead, the DAP specifies its own implementation of remote procedure call, still using JSON as the basic carrier of the messages. The DAP has requests and events — requests always go from the client to the server and require response. Events are the same as the notifications from JSON RPC that are sent from the server to the client. The similarity allows our language server component to share a lot of code between the implementations of the protocols. diff --git a/docs/Language-server-pages/Language-server-overview.md b/docs/Language-server-pages/Language-server-overview.md index 0c9ce766a..8d45f13ec 100644 --- a/docs/Language-server-pages/Language-server-overview.md +++ b/docs/Language-server-pages/Language-server-overview.md @@ -1,28 +1,28 @@ Architecture of language server. -The architecture of the Language server component is illustrated in the picture above. It communicates on the standard input/output via LSP with the LSP client and listens on a TCP port to provide DAP support for the macro tracer. The TCP communication is wrapped by class `tcp_handler`, which abstracts from the complexity of communicating through TCP/IP. +The architecture of the language server component is illustrated in the picture above. It communicates on the standard input/output via LSP with the LSP client and listens on a TCP port to provide DAP support for the macro tracer. The TCP communication is wrapped by the class `tcp_handler`, which abstracts from the complexity of communicating through TCP/IP. -The main purpose of the class [`dispatcher`](https://github.com/eclipse/che-che4z-lsp-for-hlasm/wiki/IO-handling) is to provide abstraction for the lowest level communication, which is shared by LSP and DAP. It reads iostream to parse messages using the JSON for Modern C++ library (see [[Third party libraries]]) and stores them in the [`request_manager`](https://github.com/eclipse/che-che4z-lsp-for-hlasm/wiki/Request-manager) as `requests`. +The main purpose of the class [`dispatcher`](https://github.com/eclipse/che-che4z-lsp-for-hlasm/wiki/IO-handling) is to provide abstraction for the lowest level communication, which is shared by LSP and DAP. It reads iostream to parse messages using the JSON for Modern C++ library (see [[third party libraries]]) and stores them in the [`request_manager`](https://github.com/eclipse/che-che4z-lsp-for-hlasm/wiki/Request-manager) as `requests`. -A `request` encapsulates one message that came from the client and is basically represented only by raw (but parsed) JSON. +A `request` encapsulates one message that came from the client and is represented only by raw (but parsed) JSON. `request_manager` stores `requests` in a queue and runs a worker thread that serves the requests one by one. As there is only one instance of `request_manager` running in the language server, it serializes requests from DAP and LSP (which come asynchronously from separate sources) into one queue. -`server` is an abstract class that implements protocol behavior that is common for both DAP and LSP — it basically implements Remote Procedure Call. Actual handling of LSP and DAP requests is implemented in `features`. Each `feature` contains implementation of several protocol requests or notifications. The features unwrap the arguments from JSON and call corresponding parser library methods. +`server` is an abstract class that implements protocol behavior that is common for both DAP and LSP — it implements a Remote Procedure Call. The actual handling of LSP and DAP requests is implemented in `features`. Each `feature` contains implementation of several protocol requests or notifications. The features unwrap the arguments from the JSON and call corresponding parser library methods. There are two implementations of the abstract `server` class: [`lsp_server`](https://github.com/eclipse/che-che4z-lsp-for-hlasm/wiki/LSP-and-DAP-server) and [`dap_server`](https://github.com/eclipse/che-che4z-lsp-for-hlasm/wiki/LSP-and-DAP-server). They both implement the initialization and finalization of protocol communication, which is a bit different for both protocols and both use features to serve protocol requests. -Example: hover request handling +Example: Hover Request Handling ------------------------------- A sequence diagram showing processing of the hover request. -The image above shows handling of the hover request in the language server. The hover request is sent from the LSP client to the `lsp_server` when the user hovers over the text of a file. The hover request contains location of the mouse cursor in text, i.e. the name of the file, the number of line and column where the cursor is. The LSP client then expects a response containing a string (possibly written in markdown language) to be shown in a tooltip box. +The image above shows handling of the hover request in the language server. The hover request is sent from the LSP client to the `lsp_server` when the user hovers over the text of a file. The hover request contains the location of the mouse cursor in text, i.e. the name of the file and the number of the line and column where the cursor is. The LSP client then expects a response containing a string (possibly written in markdown language) to be shown in a tooltip box. -The whole process begins with reading from the standard input by the LSP instance of the `dispatcher`. It first reads the header of the message, which contains the information about the length of the following JSON. Then it reads the JSON itself and deserializes it using the JSON for Modern C++ library (see [[Third party libraries]]). All other components of the language server work only with the parsed representation of the message. The `dispatcher` adds the message to the `request_manager` and returns to reading the next message from the standard input. +The whole process begins with reading from the standard input by the LSP instance of the `dispatcher`. It first reads the header of the message, which contains information about the length of the following JSON. Then it reads the JSON itself and deserializes it using the JSON for Modern C++ library (see [[third party libraries]]). All other components of the language server work only with the parsed representation of the message. The `dispatcher` adds the message to the `request_manager` and returns to reading the next message from the standard input. The request in the `request_manager` either waits in a queue to be processed, or, if the queue was empty, the worker thread is woken up from sleep using conditional variable. The worker then passes the JSON to the `lsp_server`, which looks at the name of the method written in the message and calls the method “hover” from the language feature. -The hover method unpacks the actual arguments from JSON and converts any URIs to paths using the cpp-netlib URI library. Then, it calls the hover method from the parser library, which returns a string to be shown in the tooltip next to the hovering mouse. The language feature then wraps the return value back in JSON and calls the `respond` method of its `response_provider` implemented by the `lsp_server`. +The hover method unpacks the actual arguments from the JSON and converts any URIs to paths using the cpp-netlib URI library. Then, it calls the hover method from the parser library, which returns a string to be shown in the tooltip next to the hovering mouse. The language feature then wraps the return value back in JSON and calls the `respond` method of its `response_provider` implemented by the `lsp_server`. The `lsp_server` wraps JSON arguments into a LSP response and uses the `send message provider` implemented by `dispatcher` to send it to the LSP client. The `dispatcher` serializes the JSON, adds the header with the length of the JSON and writes the message to a standard output. Finally, all methods return and the worker thread in `request_manager` looks for another request. If there is none, it goes to sleep. diff --git a/docs/Language-server-pages/Language-server.md b/docs/Language-server-pages/Language-server.md index 9d7e63727..82552f2ce 100644 --- a/docs/Language-server-pages/Language-server.md +++ b/docs/Language-server-pages/Language-server.md @@ -1,4 +1,4 @@ -\[chap:lang\_server\] The purpose of the Language server is to implement the Language Server Protocol (LSP) and the Debug Adapter Protocol (DAP) and to provide access to the parser library by using them. It has to deserialize and serialize LSP and DAP messages, extract parameters of particular methods and then serve the requests by invoking functionality of parser library. +\[chap:lang\_server\] The purpose of the language server is to implement the Language Server Protocol (LSP) and the Debug Adapter Protocol (DAP) and to provide access to the parser library by using them. It deserializes and serializes LSP and DAP messages, extracts parameters of particular methods and then serves the requests by invoking the functionality of the parser library. The language server component is described in the following pages: 1. [[LSP and DAP]] diff --git a/docs/Language-server-pages/Request-manager.md b/docs/Language-server-pages/Request-manager.md index afb6a89e2..62ef9612a 100644 --- a/docs/Language-server-pages/Request-manager.md +++ b/docs/Language-server-pages/Request-manager.md @@ -1,7 +1,5 @@ -Request Manager ---------------- -`request_manager` encapsulates a queue of requests with a worker thread that processes them. There may be up to two [`dispatcher`](https://github.com/eclipse/che-che4z-lsp-for-hlasm/wiki/IO-handling) instances in the language server: one for LSP and one for DAP. Both of them add the requests they parse into one `request_manager`. It is necessary to process the requests one by one, because the parser library cannot process more requests at the same time. +`request_manager` encapsulates a queue of requests with a worker thread that processes them. There can be up to two [`dispatcher`](https://github.com/eclipse/che-che4z-lsp-for-hlasm/wiki/IO-handling) instances in the language server: one for the LSP and one for the DAP. Both of them add the requests they parse into one `request_manager`. It is necessary to process the requests one by one, because the parser library cannot process more requests at the same time. Asynchronous communication is handled by separating the communication into threads: @@ -11,18 +9,18 @@ Asynchronous communication is handled by separating the communication into threa - Worker thread in `request_manager` that processes each request using the [`lsp_server`](https://github.com/eclipse/che-che4z-lsp-for-hlasm/wiki/LSP-and-DAP-server) or the [`dap_server`](https://github.com/eclipse/che-che4z-lsp-for-hlasm/wiki/LSP-and-DAP-server) and ultimately the parser library. -The threads are synchronized in two ways: First, there is a mutex that prevents the LSP and the DAP threads from adding to the request queue simultaneously. Second, there is a conditional variable to control the worker thread. +The threads are synchronized in two ways. First, there is a mutex that prevents the LSP and DAP threads from adding to the request queue simultaneously. Second, there is a conditional variable to control the worker thread. -`request_manager` additionally incorporates a mechanism for invalidating requests that have been obsoleted by new requests. The obsoleting of requests is done by a cancellation token. It is shared between the parser library and the `request_manager`. When set to true, the results of current request or notification are no longer needed, the parser library stops all parsing and return as soon as possible. +`request_manager` additionally incorporates a mechanism for invalidating requests that are made obsolete by new requests. Requests are made obsolete by a cancellation token. It is shared between the parser library and the `request_manager`. When set to true, the results of current request or notification are no longer needed, the parser library stops all parsing and return as soon as possible. -When a new request arrives, all previous requests (including the currently processed one) that concern the same file are invalidated. However, they cannot be simply removed from the queue. They still have to be processed as they may carry information that must not be discarded (e.g. changes to contents of a file). The parser library processes the request but does not reparse any source files. +When a new request arrives, all previous requests (including the currently processed one) that concern the same file are invalidated. However, they cannot be simply removed from the queue. They must still be processed as they might carry information that must not be discarded (e.g. changes to contents of a file). The parser library processes the request but does not reparse any source files. -### Example of request invalidating +### Example of Request Invalidating For example, when a user starts changing a file, every character he writes is passed to the language server as a textDocument/didChange notification. Each such notification is processed in two stages: 1. The parser library changes the internal representation of the text document. -2. The parser library starts the parsing of the file to update diagnostics and highlighting. This may take some time. +2. The parser library starts the parsing of the file to update diagnostics and highlighting. This might take some time. -When more didChange notifications come in succession, their first parts must be executed with all the notifications to keep the internal representation of the file updated. However, the user is interested only in diagnostics and semantic highlighting for the current state of the text, so we need to parse the file only once — after the last notification. +When more `didChange` notifications come in succession, their first parts must be executed with all the notifications to keep the internal representation of the file updated. However, the user is interested only in diagnostics and semantic highlighting for the current state of the text, so we need to parse the file only once — after the last notification. diff --git a/docs/Libraries-configuration.md b/docs/Libraries-configuration.md index 6f19baf48..79a47873b 100644 --- a/docs/Libraries-configuration.md +++ b/docs/Libraries-configuration.md @@ -1,49 +1 @@ -The parser library approaches the dependency resolution in a way similar to the mainframe. On mainframe, you would have to define the locations of your dependencies in a JCL file (more on JCL [here](https://www.ibm.com/support/knowledgecenter/zosbasics/com.ibm.zos.zjcl/zjclc_basicjclconcepts.htm)). As the user may want to include tens of dependencies for multiple open codes, a source code management tool called [Endevor](https://en.wikipedia.org/wiki/Endevor) groups these dependencies into so-called *processor groups*. Then, the user only has to assign a processor group to the open code and the Endevor does the dependency resolution for him. - -To provide similar experience with local files, the parser library simulates this behavior. If the user wants to include dependencies in his project, he has to define 2 configuration files inside his workspace: *pgm\_conf.json* and *proc\_grps.json*. The workspace component of the parser library then processes the configurations, retrieving their values upon initialization. Moreover, each time a save command is issued on any configuration file, the configuration values are reloaded via `load_config` method. - -### Processor groups - -The proc\_grps configuration file contains a JSON array of possible processor groups, which consist of a name and an array of folder paths (may be relative to the root of the workspace). An example can be found in \[lst:proc\_grps\]. - -Whenever `load_config` is called, the workspace retrieves these processor groups from the configuration file and creates libraries. The libraries provide information about paths to their dependency files. During the parsing, the workspace retrieves the library corresponding to the provided processor group name and uses it to search for a wanted macro or copy file. - -### Program configuration - -The pgm\_conf configuration file contains a JSON array of program names (or wildcards \[section:wildcard\]), matched to their processor groups. It serves as a list of the HLASM open code files and states the libraries (in form of processor groups) that contain the dependencies of each open code. An example can be found in \[lst:pgm\_conf\]. - -From this configuration, the workspace simply remembers the processor group - open code mapping. - - { - "pgroups": [ - { - "name":"GROUP1", - "libs": [ - "ASMMAC/", - "C:/SYS.ASMMAC" - ] - }, - { - "name":"GROUP2", - "libs": [ - "G2MAC/", - "C:/SYS.ASMMAC" - ] - } - ] - } - - - { - "pgms": [ - { - "program": "source_code", - "pgroup": "GROUP1" - }, - { - "program": "second_file", - "pgroup": "GROUP2" - }, - ] - } - \ No newline at end of file +#REDIRECT[[Configuration of libraries]] \ No newline at end of file diff --git a/docs/Macro-tracer.md b/docs/Macro-tracer.md index c687c44de..a0211a3e9 100644 --- a/docs/Macro-tracer.md +++ b/docs/Macro-tracer.md @@ -1,33 +1,33 @@ -The macro tracer allows the user to track how the HLASM source code is assembled in experience similar to common debugging tools. The user is able to see step by step how CA instructions are interpreted and how macros are expanded. +The macro tracer allows the user to track how the HLASM source code is assembled with an experience similar to common debugging tools. The user is able to see step by step how CA instructions are interpreted and how macros are expanded. -This is achieved by implementing the Debug Adapter Protocol. The protocol itself is implemented in the language server component, which uses the macro tracer component. +This is achieved by implementing the Debug Adapter Protocol (DAP). The protocol itself is implemented in the language server component, which uses the macro tracer component. -DAP functionality mapping +DAP Functionality Mapping ------------------------- -The DAP was originally designed to communicate between an IDE or an editor and a debugger or a debug adapter. For example, when debugging a C++ application in Visual Studio Code, the editor communicates through DAP with a debugger that is attached to a compiled C++ application. Contrary to this, the macro tracer does not run with compiled binary, it only uses the analyzer to simulate the compilation process of high level assembler. +The DAP was originally designed to communicate between an IDE or an editor and a debugger or a debug adapter. For example, when debugging a C++ application in Visual Studio Code, the editor communicates through DAP with a debugger that is attached to a compiled C++ application. Contrary to this, the macro tracer does not run with a compiled binary, it only uses the analyzer to simulate the compilation process of high level assembler. -However, even though we are not implementing a real debugger, it makes very good sense to use a debugging interface for tracing the simulation. Parts of the debugging interface that we use in a macro tracer are as follows. +However, even though we are not implementing a real debugger, it makes very good sense to use a debugging interface to trace the simulation. The parts of the debugging interface that are used in a macro tracer are as follows: -- **Instruction pointer** -The instruction pointer is commonly shown in debuggers by highlighting a line of code that is going to be executed next. This is applicable to HLASM without change, since all the instructions are processed one by one in a well-defined order. +- **Instruction Pointer** +The instruction pointer is commonly shown in debuggers by highlighting a line of code that is to be executed next. This is applicable to HLASM without change, since all the instructions are processed one by one in a well-defined order. - **Breakpoints** -The user can set a breakpoint when he is interested in tracing only particular section of the code. The compilation simulation will stop when it reaches a line with a breakpoint. +The user can set a breakpoint when he is interested in tracing a particular section of the code. The compilation simulation stops when it reaches a line with a breakpoint. - **Continue** The user can restart a paused simulation by using the continue function just as in any debugger. -- **Step in and step over** -In debuggers, it is possible to use step in / step over functions to debug an implementation of subroutine or to skip it and continue after the application returns from the subroutine. In HLASM, this can be applied to macros and COPY instructions: if the user is interested in what happens inside a macro or a COPY file, he can use step in. Step over skips to the next instruction in the same file. +- **Step In and Step Over** +In debuggers, the step in and step over functions can be used to debug an implementation of a subroutine or to skip it and continue after the application returns from the subroutine. In HLASM, this can be applied to macros and COPY instructions: if the user is interested in what happens inside a macro or a COPY file, he can use step in. Step over skips to the next instruction in the same file. - **Variables** The same way common debuggers show values of runtime variables, the macro tracer uses the same functionality to show values of set symbols, macro parameter values and ordinary symbols. It is also possible to visualize attributes of symbols. -- **Call stack** -The call stack makes sense with the macro tracer too. It can show the stack of currently processed macros and COPY files. Moreover, macros have local set symbols and parameters, so each stack frame may show a different set of valid variables. +- **Call Stack** +The call stack can be used with the macro tracer too. It shows the stack of currently processed macros and COPY files. Moreover, macros have local set symbols and parameters, so each stack frame might show a different set of valid variables. -All described functionality (and more) is supported by the DAP. +All the functionalities described above (and more) is supported by the DAP. Macro tracer architecture ------------------------- @@ -40,14 +40,14 @@ The macro tracer architecture is shown above. It is also the `debugger’s` responsibility to extract data from the `context` used by the `analyzer` and to transform them into a form compatible with the DAP. -`Debugger` uses an interface `variable` which represents the variable as it is shown to the user — most importantly, it is a name-value pair. The `variable` interface has four implementations: +The `debugger` uses an interface `variable` which represents the variable as it is shown to the user — most importantly, it is a name-value pair. The `variable` interface has four implementations: - `set_symbol_variable` - `ordinary_symbol_variable` - `macro_parameter_variable` - `attribute_variable` -First three represent a HLASM symbol of respectable type. They adapt the `context` representation of the symbols to DAP variables. +The first three represent a HLASM symbol of a respectable type. They adapt the `context` representation of the symbols to DAP variables. The `attribute_variable` represents attributes of all types of symbols. It does not access context, and it is only used by the rest of `variables` to show their attributes. @@ -56,15 +56,20 @@ Debugger The `debugger` component is the core of the macro tracer implementation. When the user starts debugging, the method `launch` is called from the language server component. The `debugger` creates `analyzer` and starts the analysis in a separate thread. The `debugger` implements `processor_tracer` interface, which only has one method — `statement`. The `analyzer` calls the `statement` method every time a next statement is about to be processed. -This implementation makes it possible for the `debugger` to stop the analysis using a conditional variable. When it sees fit (e.g. when a breakpoint was hit), the `debugger` can put the thread to sleep and wait for further user interaction. At the same time, it notifies the language server through `debug_event_consumer` interface that the analysis has stopped. +This implementation makes it possible for the `debugger` to stop the analysis using a conditional variable. When it sees fit (e.g. when a breakpoint is hit), the `debugger` can put the thread to sleep and wait for further user interaction. At the same time, it notifies the language server through the `debug_event_consumer` interface that the analysis has stopped. There are three important structures in the DAP: -- **Stack frame** Stack frame represents one item in the call stack. Each frame has a name that is shown to the user and points to a line in the source code. In the macro tracer, each frame points either to the next instruction, to a macro call or to a COPY instruction. +- **Stack Frame** +The stack frame represents one item in the call stack. Each frame has a name that is shown to the user and points to a line in the source code. In the macro tracer, each frame points either to the next instruction, to a macro call or to a COPY instruction. -- **Scope** Each stack frame may have scopes. A scope is simply a group of variables used to make them organized for the user. The macro tracer uses three scopes: local variables, global variables and ordinary symbols. +- **Scope** +Each stack frame can have scopes. A scope is a group of organized variables. The macro tracer uses three scopes: local variables, global variables and ordinary symbols. -- **Variable** Each scope has arbitrary number of variables. Each variable has a name and a value. They may be further structured and may have additional child variables. Therefore, the DAP can be used to present arbitrary tree of variables to the user. shows an example regarding nested macro parameters. +- **Variable** +Each scope contains an arbitrary number of variables. Each variable has a name and a value. They can be further structured and can have additional child variables. Therefore, the DAP can be used to present the arbitrary tree of variables to the user. + +The following example demonstrates nested macro parameters. @@ -83,4 +88,4 @@ There are three important structures in the DAP: -While the thread is stopped, the editor sends requests to display information about the current context. It is the `debugger’s` responsibility to extract a list of stack frames from the context, return a list of scopes for each stack frame and a list of variables for each scope. It does not have to deal with the complexity of different types of set symbols and macro parameters, which is done by the implementations of the `variable` interface. \ No newline at end of file +While the thread is stopped, the editor sends requests to display information about the current context. It is the `debugger’s` responsibility to extract a list of stack frames from the context, return a list of scopes for each stack frame and a list of variables for each scope. It does not have to deal with the complexity of different types of set symbols and macro parameters, which is done by the implementations of the `variable` interface. diff --git a/docs/Parser-library-API.md b/docs/Parser-library-API.md index 47da9208a..7af4108f8 100644 --- a/docs/Parser-library-API.md +++ b/docs/Parser-library-API.md @@ -1,4 +1,4 @@ -First of all, the workspace manager component is the only public interface of the parser library. The API design is based on LSP and DAP, most of the API is just LSP/DAP rewritten in C++. The API uses the observer pattern for DAP events and notifications originating in parser library (e.g. textDocument/publishDiagnostics). +The workspace manager component is the only public interface of the parser library. The API design is based on LSP and DAP; most of the API is just LSP/DAP rewritten in C++. The API uses the observer pattern for DAP events and notifications originating in the parser library (e.g. textDocument/publishDiagnostics). The API methods can be divided into three categories: @@ -6,38 +6,37 @@ The API methods can be divided into three categories: - Parsing results presentation - Macro tracer -### Editor state and file content synchronization +### Editor State and File Content Synchronization | Method | Description | |:--------------------------------------------------|:------------| -|`did_open (file name, file content)`
`did_change (file name, changes)`
`did_close (file name)`|Three methods that are called whenever the user opens a file, changes contents of an already opened file or closes a file in the editor.| -|`did_change_watched_files (file paths)`|Method, that is called when a file from a workspace has been changed outsize of the editor| +|`did_open (file name, file content)`
`did_change (file name, changes)`
`did_close (file name)`|Three methods that are called whenever the user opens a file, changes the contents of an already opened file or closes a file in the editor.| +|`did_change_watched_files (file paths)`|A method that is called when a file from a workspace is changed outsize of the editor| | `add_workspace (ws name, ws path)`
`remove_workspace (ws path` |Methods that are called when the user opens or closes a workspace in the editor| | -All the methods from the first category are listed in the table above. There are two types of files that need to be synchronised: +All the methods from the first category are listed in the table above. There are two types of file that need to be synchronized: -- Files, that the user has opened in the editor. Those files are being edited by the user and their content may be different from the files actually saved in the filesystem. +- Files that the user has opened in the editor. Those files are being edited by the user and their content might be different from the files saved in the filesystem. -- Files, that the parser library opens from the hard disk, because they are needed to parse opened files (e.g. a macro that is used by an opened file) +- Files that the parser library opens from the hard disk, because they are needed to parse opened files (e.g. a macro that is used by an opened file.) -So the parser library is allowed to load arbitrary files from the disk, and use its contents until such file is opened in the editor. From that point on, the only source of truth for the contents of the file are the did\_change notifications. Once the file is closed in the editor, the parser library is again allowed to rely on its contents in the filesystem. +The parser library can load arbitrary files from the disk, and use their contents until a file is opened in the editor. From that point on, the only source of truth for the contents of the file are the did\_change notifications. Once the file is closed in the editor, the parser library is again allowed to rely on its contents in the filesystem. -### Parsing results presentation +### Parsing Results Presentation | Method | Description | |:-----------------------------------------------------|:------------| -| `definition(file name, caret position)` |The method gets a position in an opened file. If there is a symbol, the method returns position of definition of that symbol| -| `references(file name, caret position)` |The method gets a position in an opened file. If there is a symbol, the method returns list of positions where the symbol is used| -| `hover(file name, mouse position)` |The method gets a position in an opened file where the user points with cursor. Returns list of strings to be shown in a tooltip window| -| `completion(file name,mouse position, trigger info)` |The method gets a position in an opened file and how the completion box was triggered (i. e. with what key, automatically/manually). Returns list of strings suggested for completion at the position| +| `definition(file name, caret position)` |This method gets a position in an opened file. If there is a symbol, the method returns the position that symbol's definition.| +| `references(file name, caret position)` |This method gets a position in an opened file. If there is a symbol, the method returns a list of positions where the symbol is used.| +| `hover(file name, mouse position)` |This method gets a position in an opened file where the user points with cursor. It returns a list of strings to be shown in a tooltip window.| +| `completion(file name,mouse position, trigger info)` |The method gets a position in an opened file and information on how the completion box was triggered (i. e. with what key, automatically/manually). It returns a list of strings suggested for completion at the position.| | | | -All the methods from the second category are listed in the table above. They get position of caret or mouse cursor in a file and are expected to return information about the place in the code. For example, method `hover` is called when the user points at some word in the code and waits for a short time. The method returns a string that the editor shows in the tooltip window at the position. Typically, the tooltip would show type of the variable and its value, if known. +All the methods from the second category are listed in the table above. They get the position of the caret or mouse cursor in a file and return information about the place in the code. For example, the method `hover` is called when the user points at some word in the code and waits for a short time. The method returns a string that the editor shows in the tooltip window at the position. Typically, the tooltip shows the type of the variable and its value, if known. -Additionally, the parser library presents its results using the observer pattern. There are two interfaces: highligting and diagnostics consumer. Each of them has method `consume` that gets updated information as parameter whenever there is an update. Any potential user of the library (e.g. the language server component) just has to implement the interfaces to process the results. +Additionally, the parser library presents its results using the observer pattern. There are two interfaces: highlighting and diagnostics consumer. Each of them uses the `consume` method, which gets updated information as a parameter whenever there is an update. Any potential user of the library (e.g. the language server component) has to implement the interfaces to process the results. -### [[Macro tracer]] - -The [[macro tracer]] part of the API is again just DAP rewritten in C++. There are methods that are called when the user clicks on buttons to control the macro tracer: launch the tracer, step in, step over, continue and stop. Moreover, there are methods that retrieve information about current state of traced code: stack of macro calls and information about compile time variables. See [[Macro tracer]] for full description. +### [[Macro tracer|Macro Tracer]] +The [[macro tracer]] part of the API is also DAP rewritten in C++. There are methods that are called when the user clicks on buttons to control the macro tracer: launch the tracer, step in, step over, continue and stop. Moreover, there are methods that retrieve information about the current state of the traced code: a stack of macro calls and information about compile time variables. See [[Macro tracer]] for a full description. diff --git a/docs/Third-party-libraries.md b/docs/Third-party-libraries.md index ed6aa5a44..f603b0a35 100644 --- a/docs/Third-party-libraries.md +++ b/docs/Third-party-libraries.md @@ -1,60 +1,60 @@ -The project uses several third party libraries. Some of them are needed by language server to parse LSP messages and communicate through DAP. Also, we use a third party library to recognize syntax of HLASM. +The project uses several third party libraries. Some of them are needed by the language server to parse LSP messages and communicate through DAP. Also, we use a third party library to recognize the syntax of HLASM. - [**ASIO C++ library**](https://think-async.com/Asio/) -Asio is a cross-platform C++ library for network and low-level I/O programming that provides developers with a consistent asynchronous model using a modern C++ approach. We use it to handle TCP communication in a cross-platform way. Asio implements std::iostream wrappers around the TCP stream, which allows us to abstract from the actual source of the communication. +ASIO is a cross-platform C++ library for network and low-level I/O programming that provides developers with a consistent asynchronous model using a modern C++ approach. We use it to handle TCP communication in a cross-platform way. ASIO implements std::iostream wrappers around the TCP stream, which allows us to abstract from the source of the communication. - [**JSON for Modern C++**](https://github.com/nlohmann/json) -We use JSON for Modern C++ library to parse and serialize JSON. It is used in both LSP and DAP. It allows us to seamlessly traverse input JSON and extract the interesting values, as well as easily respond with valid JSON messages. +We use the JSON for Modern C++ library to parse and serialize JSON. It is used in both LSP and DAP. It allows us to seamlessly traverse JSON input, extract relevant values, and respond with valid JSON messages. - [**cpp-netlib URI**](https://github.com/cpp-netlib/uri) -Cpp-netlib URI library is used for parsing URI specified by the [RFC3986](https://tools.ietf.org/html/rfc3986), which is used by the LSP and DAP protocols to transfer paths to files. It is the responsibility of the language server to parse the URIs and convert them to file paths, so it is easier to work with them in the parser library. +The cpp-netlib URI library is used for parsing the URIs specified by the [RFC3986](https://tools.ietf.org/html/rfc3986), which is used by the LSP and DAP protocols to transfer paths to files. It is the responsibility of the language server to parse the URIs and convert them to file paths, so it is easier to work with them in the parser library. -Usage of ANTLR4 within the project +Usage of ANTLR4 ---------------------------------- -We have based part of our analyzer on ANTLR 4 parser generator. ANLTR 4 implements Adaptive LL(*) parsing strategy. +Part of our analyzer is based on the ANTLR 4 parser generator. ANLTR 4 implements the Adaptive LL(*) parsing strategy. -### Adaptive LL(\*) parsing strategy +### Adaptive LL(\*) Parsing Strategy -Adaptive *LL(\*)* (or short *ALL(\*)*) parsing strategy is a combination of simple, efficient and predictable top-down *LL(k)* parsing strategy with power of *GLR* which can handle non-deterministic and ambiguous grammars. Authors move the grammar analysis to parse-time. This lets *ALL(\*)* handle any non-left-recursive context-free grammar rules and for efficiency it caches analysis results in lookahead DFA. +The Adaptive *LL(\*)* (or *ALL(\*)*) parsing strategy is a simple, efficient and predictable top-down *LL(k)* parsing strategy with the power of *GLR*, which can handle non-deterministic and ambiguous grammars. Authors move the grammar analysis to parse-time. This lets *ALL(\*)* handle any non-left-recursive context-free grammar rules and for efficiency it caches analysis results in a lookahead DFA. -Theoretical time complexity can be viewed as a possible downside of *ALL(\*)*. Parsing of *n* symbols takes *O(n4)* in theory. In practice, however, *ALL(\*)* seems to outperform other parsers by order of magnitude. +Theoretical time complexity can be viewed as a possible downside of *ALL(\*)*. The parsing of *n* symbols takes *O(n4)* in theory. In practice, however, *ALL(\*)* seems to outperform other parsers by an order of magnitude. -Despite the theoretical *O*(*n*4) time complexity, it appears that the *ALL(\*)* behaves linear on most of the code, with no unpredictable performance or large footprint in practice. In order to support this, authors investigate the parse time vs file size for languages `C`, `Verilog`, `Erlang` and `Lua` files. They found very strong evidence of linearity on all tested languages (see the original paper for details). +Despite the theoretical *O*(*n*4) time complexity, it appears that *ALL(\*)* behaves linearly on most of the code, with no unpredictable performance or large footprint in practice. In order to support this, authors investigate the parse time vs file size for files written in the `C`, `Verilog`, `Erlang` and `Lua` languages. They found very strong evidence of linearity on all tested languages (see the original paper for details). -### ANTLR 4 pipeline +### ANTLR 4 Pipeline -ANTLR 4, similarly to any other conventional parser generator, processes the inputted code as follows: (1) breaks down the source string into tokens using *lexer* (2) builds parse trees using *parser* . +ANTLR 4, similarly to any other conventional parser generator, processes the inputted code by breaking down the source string into tokens using *lexer*, and then builds parse trees using *parser* . -This pipeline in ANTLR 4 is broken into following classes: +This pipeline in ANTLR 4 is broken into the following classes: - `CharStream` -represents input code. +Represents input code. - `Lexer` -breaks the inputted code into tokens. +Breaks the inputted code into tokens. - `Token` -token representation that includes important information like token type, position in code or the actual text. +Token representation that includes important information like token type, position in code and the actual text. - `Parser` -builds parse trees. +Builds parse trees. - `TokenStream` -connects the lexer and parser. +Connects the lexer and parser. The following picture sketches the described pipeline. -ANTLR 4 pipeline overview. Taken from . +ANTLR 4 pipeline overview. ### ANTLR Parser -The input to ANTLR is a grammar written in antlr-specific language that specifies the syntax of HLASM language (see the 193 grammar rules in the [[Grammar visualization]]). The framework takes grammar and generates source code (in C++) for a recognizer, which is able to tell whether input source code is valid or not. Moreover, it is possible to assign a piece of code that executes every time a grammar rule is matched by the recognizer to further process the matched piece of code and produce helper structures (statements). +The input to ANTLR is a grammar written in ANTLR-specific language that specifies the syntax of HLASM language (see the 193 grammar rules in the [[grammar visualization]]). The framework takes the grammar and generates source code (in C++) for a recognizer, which is able to tell whether the input source code is valid or not. Moreover, it is possible to assign a piece of code that executes every time a grammar rule is matched by the recognizer to further process the matched piece of code and produce helper structures (statements). -### Parse-Tree walking +### Parse-Tree Walking ANTLR 4 offers two mechanisms for tree-walking: the parse-tree listeners and parse-tree visitors. The listener can only be used to get a notification for each matched grammar rule. The visitor lets the programmer control the walk by explicitly calling methods to visit children. -We employ the *visitor* approach during evaluation of CA expressions, because we need to have ampler control over the evaluation (such as operator priority). +We employ the *visitor* approach when evaluating CA expressions, because we require ample control over the evaluation (such as operator priority). -The ANTLR 4 first generates `hlasmparserVisitor` and `hlasmparserBaseVisitor`. The former is an abstract class, the latter is a simple implementation of the former. Both classes define `visit` functions for every grammar rule. A visit function has exactly one argument — the context of the rule. The simple implementation executes `visitChildren()`. Our parse-tree visitor — the `expression_evaluator` — overrides `hlasmparserBaseVisitor`. In order to evaluate a sub-rule, we call `visit(ctx->sub_rule())`, where `ctx->sub_rule()` returns the context of the sub-rule. The `visit()` function matches appropriate function of the visitor based on the context type (for example, `visit(ctx->sub_rule())` would call `visiSub_rule(..)`). +The ANTLR 4 first generates `hlasmparserVisitor` and `hlasmparserBaseVisitor`. The former is an abstract class, the latter is a simple implementation of the former. Both classes define `visit` functions for every grammar rule. A visit function has exactly one argument — the context of the rule. The simple implementation executes `visitChildren()`. Our parse-tree visitor — the `expression_evaluator` — overrides `hlasmparserBaseVisitor`. In order to evaluate a sub-rule, we call `visit(ctx->sub_rule())`, where `ctx->sub_rule()` returns the context of the sub-rule. The `visit()` function matches the appropriate function of the visitor based on the context type (for example, `visit(ctx->sub_rule())` would call `visiSub_rule(..)`). diff --git a/docs/Workspace-manager-overview.md b/docs/Workspace-manager-overview.md index 58bba43fc..1ecaa6cb8 100644 --- a/docs/Workspace-manager-overview.md +++ b/docs/Workspace-manager-overview.md @@ -1,38 +1,38 @@ Architecture of workspace manager. -The architecture (visualized in picture above) of the parser library is organized into the following components: +The architecture of the parser library (above) is organized into the following components: -- **Workspace manager API** -The workspace manager provides API for handling various workspace management (e.g. add new workspace), LSP and DAP requests. It may hold multiple workspaces and calls file manager to handle changes in the workspace files. +- **Workspace Manager API** +The workspace manager provides an API for handling workspace management (e.g. add new workspace), LSP and DAP requests. It can hold multiple workspaces and calls file manager to handle changes in the workspace files. -- **Workspace representation** -The representation of workspace deals with the relations between its files (dependencies) upon parse request and propagates the parsing further into analyzer. It also retrieves data from the configuration files and it is used for resolving dependency searches by implementing parse library provider. +- **Workspace Representation** +The representation of a workspace deals with the relationships between its files (dependencies) upon a parse request and propagates the parsing further into the analyzer. It also retrieves data from the configuration files, and is used for resolving dependency searches by implementing the parse library provider. -- **Processor group representation** -The representation of a processor group uses the API of libraries to search for their dependencies. Currently, we only support local libraries, which utilize the file manager for their file information retrieval. +- **Processor Group Representation** +The representation of a processor group uses the API of libraries to search for their dependencies. Currently, we only support local libraries, which use the file manager for their file information retrieval. -- **File manager** -The file manager is used by multiple components to handle file management and file searches. It also distinguishes and does conversions between regular files and processor files, which may be used for parsing. +- **File Manager** +The file manager is used by multiple components to handle file management and file searches. It also distinguishes and does conversions between regular files and processor files, which can be used for parsing. - **[[Analyzer]]** The analyzer accepts a file along with the information needed for dependency resolution, syntactically and semantically processes it and fills the context tables. The component is further explained in \[chap:analyzer\]. The technical details of each component are further explained in the following sections. -Workspace representation +Workspace Representation ------------------------ -In VSCode, as in many other editors, a grouping of files for a single project is called the *workspace*. This notion simplifies the workflow with the project as all the needed files are concentrated in a single folder. For example, the relative paths to the workspace may be used instead of the absolute ones or custom settings may be applied to the particular project/workspace. +In VS Code, as in many other editors, a grouping of files for a single project is called the *workspace*. This notion simplifies the workflow with the project as all the needed files are concentrated in a single folder. For example, relative paths to the workspace can be used instead of absolute ones or custom settings can be applied to the particular project/workspace. As the parser library follows the LSP, it also incorporates the notion of files organized into workspaces. Therefore, it has its own representation of a workspace. -The representation of workspace is used by the workspace manager to handle various changes in the workspace. The workspace manager propagates LSP requests and notifications coming from the language server to the corresponding workspace and retrieves the results from it via the registered observers. +The representation of a workspace is used by the workspace manager to handle various changes in the workspace. The workspace manager propagates LSP requests and notifications coming from the language server to the corresponding workspace and retrieves the results from it via the registered observers. -The workspace component uses the file manager for the file searches, retrieves the values from the configuration files and creates processor groups and is capable of resolving dependencies. +The workspace component uses the file manager for file searches, retrieves the values from the configuration files, creates processor groups, and is capable of resolving dependencies. -Due to the possibility to include files, the workspace maintains a list of dependants, which are active dependencies of another workspace files. The list of dependants is needed, for example, in case the user changes contents of a macro that is used by multiple open code files, as all of them would have to be reparsed. +Due to the possibility to include files, the workspace maintains a list of dependants, which are active dependencies of other workspace files. The list of dependants is needed, for example, in case the user changes the contents of a macro that is used by multiple open code files, with all of them needing to be reparsed. -The core of the workspace is its `parse_file` method. As addition to the parsing part, it also ensures that the file to be parsed, its dependencies and dependants provide consistent results. The method works as follows: +The core of the workspace is its `parse_file` method. As an addition to the parsing part, it also ensures that the file to be parsed, its dependencies and dependants provide consistent results. The method works as follows: 1. It checks whether the parsed file is a configuration file. If so, the workspace reloads the configuration values and reparses all dependants in the workspace. @@ -42,7 +42,7 @@ The core of the workspace is its `parse_file` method. As addition to the parsing 4. It checks for the files that are no longer in use (former dependencies) and closes them. -The workspace also ensures the correct closure of the file via `didClose` method. It works as follows: +The workspace also ensures the correct closure of the file via the `didClose` method. This works as follows: - If the closed file is a dependency of some other file, it cannot be removed completely from the file manager, as it is still in use. The file manager is rather notified that the file was closed in the editor. @@ -53,32 +53,32 @@ File Representation The file manager handles all file-related requests across different workspaces. It distinguishes between regular, non-HLASM files and processable, HLASM files by using different representations. -The representation of a regular file (called *file*) is capable of providing its file names, its contents and changing its state upon file-oriented LSP requests, i.e didChange, didClose and didOpen. +The representation of a regular file (called *file*) is capable of providing its file names, its contents and changing its state upon file-oriented LSP requests, i.e `didChange`, `didClose` and `didOpen`. -The representation of processor files is defined by *processor_file* class, which derives from both *file* and *processor* abstract classes. The *processor* is an interface which is capable of actual processing (parsing). Its only implementation is processor file. +The representation of processor files is defined by the *processor_file* class, which derives from both *file* and *processor* abstract classes. The *processor* is an interface which is capable of actual processing (parsing). Its only implementation is processor file. When the `parse` method is invoked, the processor file initializes new analyzer, uses it for the parsing and rebuilds its dependencies list, closing the unwanted ones. When the parsing is finished, it keeps the instance of the analyzer and provides its parsing results when requested. -Dependency resolution +Dependency Resolution --------------------- -Whenever a code from a different file is to be included, either via `COPY` instruction or a macro call, it is necessary to find the desired file first. During the parsing, the representation of libraries are already created accordingly to the configurations. However, there is also a need for components that would resolve the dependency by finding the the corresponding library and parse it. +Whenever a code from a different file is to be included, either via `COPY` instruction or a macro call, it is necessary to find the desired file first. During the parsing, the representation of libraries is already created according to the configurations. However, there is also a need for components that resolve the dependency by finding and parsing the corresponding library. -The *parse\_lib\_provider* interface exists for this purpose. Whenever a component is to be used for the dependency resolution, it implements this interface. +The *parse\_lib\_provider* interface exists for this purpose. Whenever a component is to be used for dependency resolution, it implements this interface. The name of the needed file, the current context tables and the library data (the currently used type of processing) are passed to the `parse_library` method of the *parse_lib_provider* interface. The method finds the library file (i.e. a macro or COPY file) with the specified name and parses it with the given context. -The workspace is the most important implementation of the `parse_lib_provider` interface. It provides libraries based on the processor groups configuration described in [[Libraries configuration]]. +The workspace is the most important implementation of the `parse_lib_provider` interface. It provides libraries based on the processor groups configuration described in [[configuration of libraries]]. Diagnostics ----------- -A diagnostic is used to indicate a problem with source files, such as a compiler error or a warning. Some diagnostics are created in almost every component of the parser when it finds a problem with a source code. Diagnostics are also used in workspace to indicate problems with configuration files. After each parsing, we need to collect all the diagnostics from all the instances of all the components and pass them to the language server. +Diagnostics are used to indicate a problem with source files, such as a compiler error or a warning. Some diagnostics are created in almost every component of the parser when it finds a problem with a source code. Diagnostics are also used in workspace to indicate problems with configuration files. After each parsing, we need to collect all the diagnostics from all the instances of all the components and pass them to the language server. The components capable of collecting the diagnostics are organized in a tree where the root is the workspace manager. Starting from the root, each component collects the diagnostics of those children that are again capable of collecting or generating diagnostics. -To enforce this behavior, all of these components implement the *diagnosable* interface. Its functionality is simple, it is used to add diagnostics, show his own and collect them from other diagnosable members. Each component that implements the interface is required to collect diagnostics from diagnosable objects it owns. In the result, one call of `collect_diags` from the root of the tree collects all diagnostics that were created since last such call. +To enforce this behavior, all of these components implement the *diagnosable* interface. Its functionality is simple, it is used to add diagnostics, show its own, and collect them from other diagnosable members. Each component that implements the interface is required to collect diagnostics from diagnosable objects it owns. In the result, one call of `collect_diags` from the root of the tree collects all diagnostics that were created since the last call. -The diagnosable hierarchy of workspace manager component is shown in the following picture: +The diagnosable hierarchy of the workspace manager component is shown in the following picture: Hierarchy of diagnostics collection in the workspace manager component diff --git a/docs/Workspace-manager.md b/docs/Workspace-manager.md index 4f3cd4794..b0a6cab8c 100644 --- a/docs/Workspace-manager.md +++ b/docs/Workspace-manager.md @@ -1,6 +1,6 @@ -Workspace manager encapsulates all functionality of the parser library. It is the access point to all parsing capabilities, keeps the current state of all open files and resolves libraries needed by the analyzer. It also manages when files should be reparsed. +The workspace manager encapsulates all functionality of the parser library. It is the access point to all parsing capabilities, preserves the current state of all open files and resolves libraries needed by the analyzer. It also manages the reparsing of files. -Workspace manager is further described in the following sections: +The workspace manager is further described in the following sections: 1. [[Parser library API]] -2. [[Libraries configuration]] +2. [[Configuration of libraries]] 3. [[Workspace manager overview]]