Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Importing a library changes read syntax #433

Open
lassik opened this issue Jan 1, 2023 · 14 comments
Open

Importing a library changes read syntax #433

lassik opened this issue Jan 1, 2023 · 14 comments

Comments

@lassik
Copy link
Contributor

lassik commented Jan 1, 2023

What causes the following?

#u16(...) syntax does not work:

stklos> #u16(1 2 3)
**** Error:
%read: bad uniform vector specification `u16'
	(type ",help" for more information)
stklos> **** Error:
%execute: bad function `1'. Cannot be applied
	(type ",help" for more information)

Import the right SRFI, and it does:

stklos> (import (srfi 4))
stklos> #u16(1 2 3)
#u16(1 2 3)

As far as I know, read syntax is only supposed to be changed by #! directives (e.g. #!r6rs or #!fold-case). Imports do not normally change it.

@jpellegrini
Copy link
Contributor

Well, #u16(...) only makes sense when SRFI-4 is loaded... That's why it works this way.
As far as I know the standard doesn't forbid it (at least R7RS doesn't - or did I miss this?)

@egallesio
Copy link
Owner

The idea (not sure that it is a good one) is to have a better message when using bad sharp syntaxes.

We have:

stklos> #s16
**** Error:
%read: bad sharp syntax in `"#s16"'
stklos> (import (srfi 4)) 
stklos> #s16
**** Error:
%read: bad uniform vector specification `s16'

Furthermore, defining a constant without the primitives to access its content is probably not very helpful. Anyway, changing that point is easy. Do you see any drawback to the current implementation?

@egallesio
Copy link
Owner

BTW @lassik, what are the bugs you have seen with the current implementation?

@lassik
Copy link
Contributor Author

lassik commented Jan 1, 2023

BTW @lassik, what are the bugs you have seen with the current implementation?

The first bug is the above:

stklos> #s8(1 2 3)
**** Error:
%read: bad sharp syntax in `"#s8"'
	(type ",help" for more information)
stklos> **** Error:
%execute: bad function `1'. Cannot be applied
	(type ",help" for more information)

Since the reader does not recognize the #s8, it skips it, and then reads (1 2 3) and tries to evaluate it. The evaluation fails. An unrecognized #foo token should stop it from reading more stuff.

The second bug is:

stklos> #s16"abc"
#u8(97 98 99)

Any numeric vector prefix can be used to read the #u8"..." bytevector syntax.

@lassik
Copy link
Contributor Author

lassik commented Jan 1, 2023

An unrecognized #foo token should stop it from reading more stuff.

This may require special handling when reading from a terminal (as opposed to a file)?

@jpellegrini
Copy link
Contributor

The second bug is:

I think the second bug doesn't happen anymore - am I wrong?

stklos> #s16"abc"
**** Error:
%read: bad sharp syntax in `"#s16"'
	(type ",help" for more information)
"abc"

@jpellegrini
Copy link
Contributor

jpellegrini commented Mar 6, 2023

And I'm not sure it's possible to fix the second one... The reader sees #s8 and (1 2 3) as two separate tokens, and complains about the first. It keeps reading...

"a"x"b"       ;;  with x unbound

will, in most schemes(*), print "a", then trigger an exception and print the error message, and then print "b"... So errors like that won't make the reader stop -- the behavior of STklos for sharp syntax seems OK, I guess.

(*) Kawa stops at the error. The Chez REPL expects the user to hit enter for each expression entered, so it's totally different. The others I tried behave as I mentioned

Also, I see that only STklos and Chicken implement SRFI 207, and Chicken does not implement the sharp syntax part.

@lassik
Copy link
Contributor Author

lassik commented Mar 6, 2023

The reader sees #s8 and (1 2 3) as two separate tokens, and complains about the first. It keeps reading...

You could first read all available input from the terminal into a buffer. Then (read) from that buffer, intead of (read) directly from the terminal. This should be doable by using a read with a timeout, either using poll() or a non-blocking terminal fd.

@lassik
Copy link
Contributor Author

lassik commented Mar 6, 2023

I.e. something like this pseudo-code:

while (terminal_fd_is_not_closed) {
    buffer_clear();
    while (poll(terminal_fd)) {
        char input_char[1];
        read(terminal_fd, input_char, 1);
        buffer_putc(input_char);
    }
    try {
        input_port = make_string_input_port(buffer);
        while (datum = read_scheme_datum(input_port)) {
            eval(datum);
        }
    } except {
        display_error();
    } finally {
        close(input_port);
    }
}

@jpellegrini
Copy link
Contributor

But the same code that reads from a terminal reads from a string. Maybe it's possible to make STklos ignore what happens post-error in a whole line, I guess. A read error would make it go all the way out to the point where it processed the whole line. Not sure.

@jpellegrini
Copy link
Contributor

See that partial expressions may be passed in a line (a string):

stklos> (define x  ;; the reader knows this newline did not end the expression!
1)                 ;; but this one did!

I don't remember how STklos deals with this, but you can take a look at src/read.c. Maybe @egallesio can help?

@lassik
Copy link
Contributor Author

lassik commented Mar 6, 2023

Reading until end-of-line is not the best way to do it. If you paste 50 lines of code into a terminal, and the 5th line raises an error, then the interpreter will continue to process the next 45 lines. It should not do that.

It's better to read with a timeout. IIRC the conventional timeout for reading from a TTY is something like 50 milliseconds. If you paste a lot of text into a terminal window, the terminal emulator will send it all instantly to the program running in the terminal. So the program will receive all the pasted text, and no more, if you use a reasonable timeout.

@lassik
Copy link
Contributor Author

lassik commented Mar 6, 2023

In other words:

  • Buffer all the text that comes within the given timeout - whether it's one line or 500 lines.
  • Read and eval the text in the buffer.
  • If any of it raises an error, throw away the rest of the buffer without reading or evaluating it.
  • Clear the buffer and repeat until the end of file.

@lassik
Copy link
Contributor Author

lassik commented Mar 6, 2023

It's probably good to add some extra logic to that, so that a partial line is never read and evaluated, even if it comes within the timeout.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants