Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parameterized Stateless Visitors #641

Closed
jfallows opened this issue Jun 23, 2014 · 23 comments
Closed

Parameterized Stateless Visitors #641

jfallows opened this issue Jun 23, 2014 · 23 comments

Comments

@jfallows
Copy link

The generated visitor classes are already parametrically typed to produce a strongly-typed result, but they require a new instance of the visitor to be created for each traversal.

By applying a parameterized visitor pattern, stateless visitors can be shared across threads and nested visitors with a different return type do not need to be constructed to propagate state that can now be passed as a parameter.

For example:

public interface ParseTreeVisitor<R, P> {
  R visit(@NotNull ParseTree tree, P parameter);
  R visitChildren(@NotNull RuleNode node, P parameter);
  R visitTerminal(@NotNull TerminalNode node, P parameter);
  R visitErrorNode(@NotNull ErrorNode node, P parameter);
}

public interface ParseTree extends SyntaxTree {
  <R,P> R accept(ParseTreeVisitor<? extends R, ? extends P> visitor, P parameter);
}

This approach has been applied to the ElementVisitor API in the Java7 traversal of Java source code.

Recommend providing the ability to use the parameterized stateless visitor approach for ANTLR4 grammars.

@parrt
Copy link
Member

parrt commented Jun 24, 2014

Hi! Separate instances containing state are inherently thread safe so those also could be shared across threads. Hmm... That sounds sort of interesting but makes it more complicated than necessary. I haven't run into a case yet where I couldn't share code with the normal Java mechanisms. Do you have a specific use case in mind?

@jfallows
Copy link
Author

Hi Terence. Yes, we've been using the existing ANTLRv4 visitor pattern to generate domain specific AST for our internal purposes. Subtrees within that AST are of a different type than the overall tree, such as representing a complex value in the grammar rather than a statement or expression AST node.

There is certainly no debate that this is functionally doable with the current model, just that it seemed somewhat cumbersome to track parameterized traversal state in the construction of each visitor, leading to a bit more complexity and extra unnecessary garbage to collect.

The suggestion above would allow variation in the return type R for different parts of the overall traversal through delegation to differently typed visitors, while retaining consistent parameterization of traversal state P that allows visitors to be shared and reused, even concurrently.

@parrt
Copy link
Member

parrt commented Jun 24, 2014

Ok, but if those ASTs are of your own design, you can just build your own visitor as you like though right?

@jfallows
Copy link
Author

We faced the issues described above when using the built-in ANTLRv4 visitor pattern over the various ParserRuleContext types generated from the grammar. Our specific ASTs mentioned above come into the picture after that, so they wouldn't be able to address the issue, right?

@sharwell
Copy link
Member

Can you post a more specific example? I'm not understanding how a parameterized visitor offers marked benefits over a visitor where the parameter is a field assigned during the construction of the visitor.

@jfallows
Copy link
Author

Sure. Before giving the example, it's important to point out that we have succeeded to meet the functionality needs using the existing API, so this is definitely not a discussion about whether or not the current API can work, it clearly does work well.

Starting with a grammar:

Grammar.g4 => GrammarParser + GrammarVisitor<T>

We want to generate our specific AST by visiting the generated AST, which consists of 2 different AST node super types, one is AstElementNode and one is AstValueNode. There are subclassess of these super types with concrete characteristics, but let's say they fall into these 2 broad categories.

public class AstElementNodeGenerator implements GrammarVisitor<AstElementNode> {

    private final State state;

    public AstElementNodeGenerator(State state) {
        this.state = state;
    }
...
    public AstElementNode visitElementSubType1(GrammarParser.ElementSubType1Context context) {
        AstElementSubType1 element = new AstElementSubType1();
        AstValueNodeGenerator visitor = new AstValueNodeGenerator(state);
        AstValueNode value = visitor.visitElementSubType(context);
        element.setValue(value);
    }
...
}

public class AstValueNodeGenerator implements GrammarVisitor<AstValueNode> {
    private final State state;

    public AstValueNodeGenerator(State state) {
        this.state = state;
    }
...
}

GrammarParser parser = new GrammarParser(...);
State state = new State();
AstElementNodeGenerator visitor = new AstElementNodeGenerator(state)
AstElementNode elementNode = visitor.visitElementNode(parser.elementNode());

The feedback provided here is that the overhead of creating nested instances of each visitor to handle multi-typed traversal is a bit cumbersome especially as the number of distinctly typed nested visitors increases. When the traversal requires no nested visitors, then that simpler scenario does not highlight the issue.

One possible solution would be to eliminate the need for such nested visitor creation by allowing the State to be propagated as a parameter in each of the visitXXX method signatures.

Using the example above, this would become something along the lines of:

private final GrammarVisitor<AstElementNode, State> elementGenerator = new GrammarVisitor<AstElementNode, State> {
...
    public AstElementNode visitElementSubType1(GrammarParser.ElementSubType1Context context, State state) {
        AstElementSubType1 element = new AstElementSubType1();
        AstValueNode value = valueGenerator.visitElementSubType(context, state);
        element.setValue(value);
    }
...
}

private final GrammarVisitor<AstValueNode, State> valueGenerator = new GrammarVisitor<AstValueNode, State> {
...
}

GrammarParser parser = new GrammarParser(...);
State state = new State();
AstElementNode elementNode = elementGenerator.visitElementNode(parser.elementNode(), state);

Clearly both approaches are functionally capable of producing the desired AstElementNode result, but we found the first strategy required some additional coding overhead and object creation to propagate state through the type transitions for multi-typed parse tree traversal, and that the effect was magnified with the number of distinct types in the traversal.

Hope this was helpful, let me know if there's anything still unclear.

@sharwell
Copy link
Member

This would be particularly helpful for cases where you're writing output to strings, and you want to use a single StringBuilder instance where possible but sometimes you need an intermediate buffer. The primary StringBuilder instance could be passed as the state variable for visitor methods and used as the destination for results.

@petrbel
Copy link
Contributor

petrbel commented Sep 27, 2014

I believe this could significantly help to encourage developers to use functional programming approach more frequently, which is (from my point of view) really great as it prevents side effects and states at all.

As as Scala developer using ANTLR4, I try to implement my code as functional as possible and the visitor pattern is exactly what I need except the fact of requirement of the inner state.

Anyway, as @jfallows mentioned above, the current API works absolutely fine - my point is more like philosophical.

@sharwell
Copy link
Member

I looked into multiple ways to implement this feature, but was unable to find any that would provide a reasonable level of backwards compatibility. I'll add an example of a way to perform a similar feature (at the expense of slightly more memory allocations) before closing this issue.

@sharwell
Copy link
Member

public abstract class AbstractParameterizedParseTreeVisitor<T, S> extends AbstractParseTreeVisitor<T> {
    private final S state;

    public AbstractParameterizedParseTreeVisitor(S state) {
        this.state = state;
    }

    public T visit(ParseTree tree, S state) {
        ParseTreeVisitor<T> targetVisitor = this;
        if (this.state != state) {
            targetVisitor = createVisitor(state);
        }

        return tree.accept(targetVisitor);
    }

    public T visitChildren(@NotNull RuleNode node, S state) {
        ParseTreeVisitor<T> targetVisitor = this;
        if (this.state != state) {
            targetVisitor = createVisitor(state);
        }

        return targetVisitor.visitChildren(node);
    }

    public final S getState() {
        return state;
    }

    @NotNull
    protected abstract ParseTreeVisitor<T> createVisitor(S state);
}

@fmoyses
Copy link

fmoyses commented Oct 30, 2014

@sharwell, unfortunetly I came into this after this issue was closed.
We use ANTLRv4 to parse and execute an internal specific language. This language may generate different output depending on it's context (state). It sometimes needs this state to perform lookups and other elaborated logic extracted to lots of components.

We use inversion of control a lot, so most of our components are stateless and @Autowired as needed. This approach resulted in a test friendly environment for us.

Passing the state through all the tree operations would help us a lot, making the same instance of a visitor injected when needed, with the possibility to break the visitor in several smaller (and easily testable) visitors.

@Service
public class TranslatorVisitor extends FormulaBaseVisitor<String, StateObject> {

    @Autowired
    private SomeRepository someRepository;

    @Autowired
    private ComplexLogicExecutor complexLogicExecutor;

    @Override
    public String visitFormula(final FormulaContext ctx, StateObject state) {
        ComplexObject complexObject = complexLogicExecutor.run(state.getSomeInternalState());
        state.setComplexObject(complexObject);
        return super.visitFormula(ctx, state);
    }

    @Override
    public String visitSubTree(final SubTreeContext ctx, StateObject state) {
        final String result = super.visitSubTree(ctx, state);
        if (state.getSomeInternalState().isValid()) {
            someRepository.add(state.getSomeInternalState(), result);
        }
        return result;
    }

Isn't there a way to provide this "stateless visitors" without changing the normal visitors (and not breaking compatibility)?

Sorry to bother.

Regards,

Fernando Moyses

@sharwell
Copy link
Member

@fmoyses You aren't the only one that was late. If we were back at the start of ANTLR 4 we probably would have implemented this using the parameterized visitors the first time around. While I wasn't able to find a clean solution, that does not mean one does not exist. If you decide to look into it and clean way to provide it, we'd be glad to review the result.

@fmoyses
Copy link

fmoyses commented Oct 30, 2014

Thanks for the prompt answer.
This feature would be really nice. I'll take a look at this but probably won't find anything that you didn't figure out yourself. You're way more experienced within Antlr code than I. Anyway, if I find something worth mentioning I'll surely try/implement or check with you guys.

Again, sorry to bother but I wanted to check and provide a little bit more example/information on this using injection/IoC.

@otykier
Copy link

otykier commented Feb 5, 2018

Very late to the party, but I faced the same issue in the ANTLR C# target. What I did, although not pretty, was to tack on a new property to my base parser context class, so that I can pass state around inside the visitor. This was possible since the C# target generates a partial class for all derivatives of ParserRuleContext. So in my example:

public partial class MyBaseExprContext   // In the generated file, this is defined as deriving from ParserRuleContext
{
    public object State { get; set; }
}

In my visitor, I can then assign values to this property before calling VisitXXX methods while building my AST:

public override ExpressionNode VisitMyBaseExpr(MyBaseExprContext context)
{
    var child = (context.GetChild(0) as MyBaseExprContext);
    child.State = "..." // Set whatever state I need...
    Visit(child);
}

In my grammar, the AST I'm building aligns quite nicely with the CST generated from the parser, so this approach works, but it can probably not be applied in general. Still, an option when generating the ANTLR target code, to include a stateless parameter in the method signature of every VisitXXX method would be nice...

@nielsbasjes
Copy link
Contributor

I've been playing around with the Antlr4 code a bit and I took this approach (Java only):

  1. Create a modified Java.stg and use that during code generation
  2. Modify a small number of the Java runtime to match the generic parameter changes).
  3. Package this into a single library so that using it is relatively easy.

That way most of the Antlr4 code is used as-is and my simply overriding some classes in the classpath you get the union of the base code and my modifications.

It is not backwards compatible but it seems to work for my use case.

https://github.com/nielsbasjes/antlr4-pvisitor

I would love feedback on this from you guys.

@deus777
Copy link

deus777 commented Feb 4, 2022

Reopening this issue with my solution in python, might be helpful to someone.

I had a similar requirement with a custom query tree at the top of the syntax and generated data further down the parse tree that was potentially too large for memory . The idea of having to build a new tree of a complex nested query passing it down to the data, aside from duplicating what antlr already does and being tedious and inefficient didn't appeal to me at all.

So rather than building a query tree and visiting the data generators to execute, I ended up converting the data to a list of python generators and passing them back up to the query. The query can then be executed in place with just a normal antlr visitor against the generators.

No supplemental data structures, no need for additional parameters in the visitor call

I thought it was a simple elegant pattern.

@ericvergnaud
Copy link
Contributor

Hi,
Can you provide an example?

@deus777
Copy link

deus777 commented Feb 5, 2022

as an example I rewrote the grammar something along these lines

selectexpr : selectexpr SELECT? LPAREN selector RPAREN
| selectexpr SELECT? opnot=NOT selector
| selectexpr SELECT? selector AND selector
| selectexpr SELECT? selector OR selector
| selectexpr SELECT? selector
| genexpr
;

the data was being generated several layers down in genexpr, which returned potentially huge lists. These function now return a list of python generators through the standard visiting protocol and which I then have access to in the visitor method for selectexpr and can apply the boolean logic to the generators. So instead of trying to pass the logic down to the data I bring the data back up through python generators without making potentially huge lists or having to augment antlr visitor protocol or keep track of state etc.

@ericvergnaud
Copy link
Contributor

And how does the visitor code look like ?

@deus777
Copy link

deus777 commented Feb 5, 2022

In this case selectexpr receives a list of generators by calling visit(genexp) which are drained by for x in gen loops and the local logic of selectexp is applied in the loop. you effectively have a window between code levels of the parse tree to pass data back and forth since python generators can be bi-directional. In this case the lower levels just yield data back up the tree, on demand. genexp generator function will have yield expression instead of a return expression. python takes care of the rest.

@ericvergnaud
Copy link
Contributor

please provide a visitor sample rather than a description

@deus777
Copy link

deus777 commented Feb 6, 2022

roughly something like this

def visitorSelectexpr(...)

    ans = self.visit(ctx.genexpr)
    for items in ans:
           do something
...
def makedata(...)
      for  item in generate_dome_data:
                 yield item

def visitorGenexpr(...)
      .....
      ans = makedata( ...)
      ...
      return ans

python see the yield inside makedata and makes the function a generator. the "ans= makedata()" doesnt actually do anything until it is triggered in the "for items in ans: " loop in visitorSelector, at which point makedata() start yielding items one by one.

normally you might something have

def makedata(...)
       for items in generated...
          list.append(item)
       return list

Instead visitorGenexpr actually returns a generator back up to the selectexpr visitor, rather than a list. It's possible to set this up as a co-routines and pass stuff back and forth, I didn't need to. a good reference here (http://www.dabeaz.com/coroutines/)

It's a bit like creating a wom hole between your nodes of the parse tree. hope that helps

@vrdhn
Copy link

vrdhn commented Mar 21, 2024

Will this work

public class StatelessVisitor<T, S> extends {{ParserName}}BaseVisitor<T> {

    private final List<S> statesStack = new ArrayList<>(10);
    /**
     * In a normal visit*() function, call {@link ::state()} to get the current state.
     *
     * @return
     */
    protected final S state() {
        final int size = statesStack.size();
        return statesStack.get(size - 1);
    }

    /** Visit the subtree with given state. Note that it's prefectly
     * fine to call visit(ctx), if new state is not to be passed.
     */
    public final T visit(ParseTree tree, S state) {
        final int size = statesStack.size();
        // Note: using pointer equality here, as it's faster and works for the use case.
        final boolean newState = size == 0 || statesStack.get(size - 1) != state;
        if (newState) {
            try {
                statesStack.add(state);
                return visit(tree);
            } finally {
                statesStack.remove(size);
            }
        } else {
            return visit(tree);
        }
    }
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

10 participants