Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement token list API #1829

Merged
merged 5 commits into from
May 29, 2024
Merged

Implement token list API #1829

merged 5 commits into from
May 29, 2024

Conversation

ksss
Copy link
Collaborator

@ksss ksss commented May 24, 2024

This is a PR intended for discussion.

I want a token list.

I am implementing a RuboCop extension for RBS. In RuboCop, indentation and spacing are adjusted based on the positions of various tokens such as comments and (.

Problem

If I try to create these features from the results of RBS::Parser, I have to implement complex processing. This includes searching for token positions from RBS::Location objects and finding the position information of end-of-line comments from all locations.

Example 1: Search space in block.

# Search block start char `{`
# I hope there is no literal `“{”`...
lbrace_length = method_type.location.source.index('{')

# Search char before '{'
char_before_lbrace_length = method_type.location.source.rindex(/[^\s]/, lbrace_length)

if char_before_lbrace_length + 2 != lbrace_length
  add_offence(...)
end

Example 2: Search space between any token.

scanner = StringScanner.new(source)
tokens = []
pos = 0
until scanner.eos?
  case
  when scanner.scan('[')
    pos += 1
    tokens << [:pLBRACKET, pos]
  when scanner.scan(']')
    pos += 1
    tokens << [:pRBRACKET, pos]
  when ...

I understand that my use case is unique, so I believe there is little need to modify the existing parsing process.

It would be helpful to have a method to obtain a sequence of tokens as a new API.

Use case of token list in RuboCop

https://github.com/rubocop/rubocop/blob/12fd014e255617a08b7b42aa5df0745e7382af88/lib/rubocop/cop/layout/extra_spacing.rb

Proposal for token list API

Low level

I propose a low-level API called RBS::Parser#_lex, following the example of _parse_signature and similar methods. This low-level API aims to obtain the necessary information for a sequence of tokens using minimal C code.
It is desirable to be able to obtain all tokens, including comments.

High level

I propose a high-level API called RBS::Parser#lex. The name lex is inspired by Prism#lex. This high-level API will wrap the sequence of tokens obtained from _lex, making it more convenient to handle.

@ksss
Copy link
Collaborator Author

ksss commented May 27, 2024

If possible, it would be nice to have a line break token as in Prism.

@soutaro
Copy link
Member

soutaro commented May 28, 2024

@ksss Can you fix the steep type check failure? I plan to implement supports for line breaks and comment tokens, on the top of this PR.

@soutaro soutaro added this to the RBS 3.5 milestone May 28, 2024
@ksss
Copy link
Collaborator Author

ksss commented May 29, 2024

Thank you for reviewing. I fixed type checking.

I plan to implement supports for line breaks and comment tokens, on the top of this PR.

GREAT! THANKS!

Copy link
Member

@soutaro soutaro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎉

@soutaro soutaro added this pull request to the merge queue May 29, 2024
Merged via the queue into ruby:master with commit 0831489 May 29, 2024
17 checks passed
@ksss ksss deleted the lex branch May 29, 2024 02:42
@soutaro soutaro added the Released PRs already included in the released version label Jun 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Released PRs already included in the released version
Development

Successfully merging this pull request may close these issues.

2 participants