Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactoring the sharp # reader #434

Open
lassik opened this issue Jan 1, 2023 · 9 comments
Open

Refactoring the sharp # reader #434

lassik opened this issue Jan 1, 2023 · 9 comments

Comments

@lassik
Copy link
Contributor

lassik commented Jan 1, 2023

The # and #! reader in read.c has at least two bugs, which seem to be oversights due to the messy code.

Looking at the code, it could be clarified if we used:

  • a dispatch table instead of a big switch
  • lookup based on the whole word, instead of one letter at a time (e.g. #u32 instead of u)

Here's a quick sketch of how it could look:

static SCM read_fold_case(SCM port, struct read_context *ctx,
                          const char *word)
{
    if (word[0] == 'n') {
        PORT_FLAGS(port) &= ~PORT_CASE_SENSITIVE;
        ctx->case_significant = FALSE;
    } else {
        PORT_FLAGS(port) |= PORT_CASE_SENSITIVE;
        ctx->case_significant = TRUE;
    }
}

static SCM read_keyword_position(SCM port, struct read_context *ctx,
                                 const char *word)
{
    PORT_KW_COL_POS(port) =
        colon_position_value(&s[strlen("keyword-colon-position-")]);
}

static SCM read_keyword(SCM port, struct read_context *ctx,
                        const char *word)
{
    (void)port;
    (void)ctx;
    return STk_makekey(word);
}

static SCM read_numeric_vector(SCM port, struct read_context *ctx,
                               const char *word)
{
    int tag = STk_uniform_vector_tag(tok);
    if (tag >= 0) {
        int konst = ctx->constant;
        
        /* Ok that's seems correct read the list of values (this IS a constant) */
        ctx->constant = TRUE;
        v =  STk_list2uvector(tag, read_list(port, ')', ctx));
        ctx->constant = konst;
        BOXED_INFO(v) |= VECTOR_CONST;
        return v;
    } else {
        signal_error(port,
                     "bad uniform vector specification ~A",
                     STk_Cstring2string(tok));
        return STk_void;
    }
}

struct sharp {
    const char *word;
    SCM read_func(SCM, struct read_context *, const char *);
};

static const struct sharp sharp_readers[] = {
    {"#!fold-case", read_fold_case},
    {"#!key", read_keyword},
    {"#!keyword-colon-position-after", read_keyword_position},
    {"#!keyword-colon-position-before", read_keyword_position},
    {"#!keyword-colon-position-both", read_keyword_position},
    {"#!keyword-colon-position-none", read_keyword_position},
    {"#!no-fold-case", read_fold_case},
    {"#!optional", read_keyword},
    {"#!rest", read_keyword},
    {"#F", read_false},
    {"#f", read_false},
    {"#false", read_false},
    {"#s8", read_numeric_vector},
    {"#T", read_true},
    {"#t", read_true},
    {"#true", read_true},
    {"#u8", read_numeric_vector},
};

static int compare_sharp(const void *a_void, const void *b_void)
{
    struct sharp *a = a_void;
    struct sharp *b = b_void;
    
    return strcmp(a->word, b->word);
}

static SCM read_sharp(SCM port, struct read_context *ctx, const char *word)
{
    struct sharp key = {.word = word};
    const size_t width = sizeof(sharp_readers[0]);
    const size_t count = sizeof(sharp_readers) / width;
    struct sharp *sharp;
    
    sharp = bsearch(key, sharp_readers, count, width, compare_sharp);
    if (sharp) {
        return sharp->read_func(port, ctx, word);
    } else {
        return default_sharp(port, ctx, word);
    }
}

Would you accept a PR to do a refactoring like this?

Note that the #! keywords can be in the same lookup table as the plain # words. Due to binary search, there is no performance penalty (arguably not even with linear search).

@egallesio
Copy link
Owner

I'm not particularly proud of the reader code, and your proposition would probably make it clearer.

Would you accept a PR to do a refactoring like this?

Of course!

@jpellegrini
Copy link
Contributor

I know this would be too complex maybe, but... I the C reader could be configured to optionally call a Scheme procedure, we could implement Scheme reader macros easily. :)

@lassik
Copy link
Contributor Author

lassik commented Jan 2, 2023

In theory, the C reader only needs to handle the syntax needed to bootstrap a reader written in Scheme.

@jpellegrini
Copy link
Contributor

In theory, the C reader only needs to handle the syntax needed to bootstrap a reader written in Scheme.

Yes - but I suppose it would be nice to have the core of it still in C, for performance... Then the user (or libraries) could extend it in Scheme, maybe.

@lassik
Copy link
Contributor Author

lassik commented Jan 2, 2023

Sounds reasonable. # tokens occur somewhat rarely in source code, and are the most likely place for new extensions. So moving the # handling into Scheme would be one idea. #t and #f ought to be the only really important ones.

@jpellegrini
Copy link
Contributor

jpellegrini commented Jan 2, 2023

#t and #f ought to be the only really important ones

STklos also has #void, and I think it's used quite often in code...
Maybe also #eof, which can be used in loop conditions -- STklos supports things like (if (eq? x #eof) ...) instead of (if (eof-object? x)) -- and even using #eof in a case srtatement (where it's usefulness becomes clearer):

stklos> (case #eof ((1) 2) ((#eof 2) 3))
3

And keywords: #:a is not a symbol, it's a keyword.

@lassik
Copy link
Contributor Author

lassik commented Jan 2, 2023

If we implement all of those in C, there's almost nothing left to be implemented in Scheme. Maybe the numeric vectors, fancy comments, etc.

@jpellegrini
Copy link
Contributor

If we implement all of those in C, there's almost nothing left to be implemented in Scheme. Maybe the numeric vectors, fancy comments, etc.

And user-defined macros. I was thinking of a fast sharp-syntax for those already mentioned, and a configurable reader macro system similar to Common Lisp reader macros, for the user :) and for libraries maybe.

But this is just me thinking aloud. @egallesio what do you think?

egallesio added a commit that referenced this issue Jan 17, 2024
Use the new C hash tables for the implementation.
egallesio added a commit that referenced this issue Jan 17, 2024
This should fix issue #434.
@egallesio
Copy link
Owner

Hi @lasik,

One year later 😏 , I have done something similar to what you proposed. The sharp reader can even be extended in Scheme ( I have still to write some documentation about this point). This is done with the %add-sharp-reader function. Note that, this function takes associates a reader to the first character which follows the #.
For instance,

(%add-sharp-reader #\o                                                      
                       (lambda (port)                                           
                         (let ((v (read port)))                                 
                           (if (eq? v 'one)                                     
                               1                                                
                               (error "bad constant #o~a" v))))) 

permits to return 1 when we encounter #one. A function should be added to query the procedure already associated to a character to permit chaining.

The following code has been used to implement the bitvector #* syntax.

(%add-sharp-reader #\*                                                          
                   (lambda (port)                                               
                     (let* ((v   (read port))                                   
                            (str (symbol->string v)))                           
                       (or (string->bitvector (string-append "#" str))          
                           (error "bad bitvector #~a" v)))))  

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants