Skip to content

95616ARG/sifter

Repository files navigation

Sifter

This repository contains code implementing the prototype analogy-making program Sifter, as described in our 'Onward!' 2020 paper "Analogy-Making as a Core Primitive in the Software Engineering Toolbox."

Sifter can make analogies about programs, e.g., identifying data structures and methods which play corresponding roles in two different programs. Sifter can also complete analogies, which allows it to automatically learn and generalize source code transformations from a small number of examples.

Dependencies

You must install Bazel as well as set up bazel_python with Python 3.7.4.

Running Tests, Examples

You should then be able to run

bazel test //... && bazel run coverage_report

to get a coverage report in htmlcov/index.html.

To run the examples:

bazel run examples/letter_analogies:letter_analogy
bazel run examples/turing_machine:turing_machine
bazel run examples/program_analysis:program_understanding
bazel run examples/program_analysis:api_migration
bazel run examples/program_analysis:transform_learning

For the program_analysis ones, you can view the result by visiting http://localhost:8001 in your browser once prompted.

Goals, Status, and Future Work

This repository accompanies our paper in Onward! 2020, with the goal of encouraging interest and future work in automated analogy-making for software engineering. Ultimately, we see analogy-making as involving three main processes:

  1. Raw Perception, such as reading files (code, documentation) from the filesystem and into the triplet structure workspace.

    Status: The LazyStructure and LazyTextDocument classes in examples/program_analysis/lazy_structure.py read a file from the filesystem and add a representation of the file's contents to the triplet structure workspace. These interfaces are 'lazy' in that they allow selecting only certain parts of a file to be included in the workspace. Currently we either specify ahead-of-time which portions of the file to include in the workspace or just add the entire file contents.

    Future Work: We envision the laziness being used to initially focus the analogy-making on a small subset of the relevant code, then expanding to larger portions as the analogy solidifies.

  2. Semantic Rewriting, such as grouping individual characters into a single token, inlining a function, and annotating invariants found via program analysis.

    Status: Currently, the default LazyTextDocument interface performs a light-weight tokenization pass that groups characters separated by spaces and special characters (such as +) before encoding the document into the workspace. Extra semantic information, such as program invariants, are specified ahead-of-time by either directly editing the workspace or using the AnnotateFact method of LazyTextDocument. There is an example of annotating program invariants in examples/program_analysis/transform_learning.py.

    Future Work: We would like to directly connect Sifter to program analysis tools so analysis results can be automatically imported into the triplet structure workspace instead of needing to be manually annotated. We would like to incorporate rewrite rules that express semantic equivalence, such as inlining, syntax de-sugaring, and grouping. Finally, we would like to develop heuristics that can determine when to apply a program analyzer or semantics-preserving rewrite rule. Such a heuristic would need to operate in tandem with and be guided by the abstraction/anti-unification process.

  3. Abstraction/Anti-Unification, where we pair up corresponding objects in the workspace to form an analogy.

    Status: We have fairly-complete rules and heuristics for this that are described in more detail in AnalogyUtils.md. These rules roughly implement a syntactic anti-unification on unrooted, labelled graphs.

    Future Work: Our existing abstraction rules work well, but depend on the semantic rewriting to have already exposed most of the correspondences in the syntactic representation of the workspace itself. So the primary future work for this process is to provide feedback to the semantic rewriting engine. For example, given programs if x: y += 1 and z += a ? 1 : 0; we may not be able to initially find a good syntactic abstraction, but we want the abstraction process to be able to give feedback to the semantic rewriting process to, e.g., desugar the ternary ?: operator into the corresponding if statement, which we can then find a syntactic correspondance with. Beyond such better heuristics, we would also like to support higher-order 'slips,' i.e., analogies where the types themselves are abstracted.

In addition to these three processes, there are two other main directions for future work:

  • The runtime does not currently support modifying rules with other rules; in fact, rule-related nodes are removed from the structure completely after an initial rule-parsing pass. In the future we may decide to change this to assist with meta-learning.
  • We are still working on a good visualization module for triplet structures. There is a rudimentary CLI interface (runtime/interactive.py) as well as an interface specifically for source code analogies (examples/program_analysis/ui), however we plan to introduce a more general-purpose interface in the near future.

High-level File Overview

  • ts_lib.py: the core library, defines the triplet structure data structure and some embedded-DSL-style helpers for describing triplet structures.
  • ts_utils.py: a set of macros which make working with the library (especially expressing rules) easier.
  • tactic_utils.py: a set of macros which make writing tactics ("controllers for applying the rules") easier.
  • mapper.py: rules which can be added to any triplet structure to enable making abstract analogies (i.e., identifying isometries in sub-structures).
  • analogy_utils.py: a Python interface and tactics to building analogies in triplet structures that have the rules from mapper.py added to them.
  • ts_cpp/: C++ implementation of the core triplet structure data structure, as well as a backtracking triplet constraint solver.
  • runtime/: a runtime to parse and execute rules on TripletStructures.
  • examples/: examples of using triplet structures to solve analogies.

Quickstart with the Code

Please see TSLang.md, which documents how to write code using the Triplet Structure language.

Programming Style

We see much of the code in this repository as defining a domain-specific language TSLang embedded in Python. We then write, within TSLang, the core analogy-making code for Sifter.

To highlight the difference between "plumbing" code implementing the TSLang language and runtime vs. the actual analogy-making code written in TSLang, we use a distinct coding convention for each type of code:

  1. Code implementing TSLang, specifically ts_lib.py and runtime/*, is written using Google-style Python naming (e.g., snake_case for methods and PascalCase for classes).
  2. Code written in TSLang (everything else) is written using PascalCase for method names.

This naming convention is enforced intra-file by .pylintrc.

Citing

@inproceedings{sifter:onward20,
  author    = {Matthew Sotoudeh and
               Aditya V. Thakur},
  title     = {Analogy-Making as a Core Primitive in the Software Engineering Toolbox},
  booktitle = {Proceedings of the 2020 {ACM} {SIGPLAN} International Symposium on
               New Ideas, New Paradigms, and Reflections on Programming and Software,
               Onward! 2020, November 18-20, Virtual, USA},
  publisher = {{ACM}},
  year      = {2020},
  url       = {https://doi.org/10.1145/3426428.3426918},
  doi       = {10.1145/3426428.3426918},
}

People

License

Licensed under the AGPLv3.

About

No description or website provided.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published