Refactoring the sharp # reader #434

lassik · 2023-01-01T19:38:31Z

The # and #! reader in read.c has at least two bugs, which seem to be oversights due to the messy code.

Looking at the code, it could be clarified if we used:

a dispatch table instead of a big switch
lookup based on the whole word, instead of one letter at a time (e.g. #u32 instead of u)

Here's a quick sketch of how it could look:

static SCM read_fold_case(SCM port, struct read_context *ctx,
                          const char *word)
{
    if (word[0] == 'n') {
        PORT_FLAGS(port) &= ~PORT_CASE_SENSITIVE;
        ctx->case_significant = FALSE;
    } else {
        PORT_FLAGS(port) |= PORT_CASE_SENSITIVE;
        ctx->case_significant = TRUE;
    }
}

static SCM read_keyword_position(SCM port, struct read_context *ctx,
                                 const char *word)
{
    PORT_KW_COL_POS(port) =
        colon_position_value(&s[strlen("keyword-colon-position-")]);
}

static SCM read_keyword(SCM port, struct read_context *ctx,
                        const char *word)
{
    (void)port;
    (void)ctx;
    return STk_makekey(word);
}

static SCM read_numeric_vector(SCM port, struct read_context *ctx,
                               const char *word)
{
    int tag = STk_uniform_vector_tag(tok);
    if (tag >= 0) {
        int konst = ctx->constant;
        
        /* Ok that's seems correct read the list of values (this IS a constant) */
        ctx->constant = TRUE;
        v =  STk_list2uvector(tag, read_list(port, ')', ctx));
        ctx->constant = konst;
        BOXED_INFO(v) |= VECTOR_CONST;
        return v;
    } else {
        signal_error(port,
                     "bad uniform vector specification ~A",
                     STk_Cstring2string(tok));
        return STk_void;
    }
}

struct sharp {
    const char *word;
    SCM read_func(SCM, struct read_context *, const char *);
};

static const struct sharp sharp_readers[] = {
    {"#!fold-case", read_fold_case},
    {"#!key", read_keyword},
    {"#!keyword-colon-position-after", read_keyword_position},
    {"#!keyword-colon-position-before", read_keyword_position},
    {"#!keyword-colon-position-both", read_keyword_position},
    {"#!keyword-colon-position-none", read_keyword_position},
    {"#!no-fold-case", read_fold_case},
    {"#!optional", read_keyword},
    {"#!rest", read_keyword},
    {"#F", read_false},
    {"#f", read_false},
    {"#false", read_false},
    {"#s8", read_numeric_vector},
    {"#T", read_true},
    {"#t", read_true},
    {"#true", read_true},
    {"#u8", read_numeric_vector},
};

static int compare_sharp(const void *a_void, const void *b_void)
{
    struct sharp *a = a_void;
    struct sharp *b = b_void;
    
    return strcmp(a->word, b->word);
}

static SCM read_sharp(SCM port, struct read_context *ctx, const char *word)
{
    struct sharp key = {.word = word};
    const size_t width = sizeof(sharp_readers[0]);
    const size_t count = sizeof(sharp_readers) / width;
    struct sharp *sharp;
    
    sharp = bsearch(key, sharp_readers, count, width, compare_sharp);
    if (sharp) {
        return sharp->read_func(port, ctx, word);
    } else {
        return default_sharp(port, ctx, word);
    }
}

Would you accept a PR to do a refactoring like this?

Note that the #! keywords can be in the same lookup table as the plain # words. Due to binary search, there is no performance penalty (arguably not even with linear search).

The text was updated successfully, but these errors were encountered:

egallesio · 2023-01-01T20:23:23Z

I'm not particularly proud of the reader code, and your proposition would probably make it clearer.

Would you accept a PR to do a refactoring like this?

Of course!

jpellegrini · 2023-01-02T11:58:53Z

I know this would be too complex maybe, but... I the C reader could be configured to optionally call a Scheme procedure, we could implement Scheme reader macros easily. :)

lassik · 2023-01-02T14:07:22Z

In theory, the C reader only needs to handle the syntax needed to bootstrap a reader written in Scheme.

jpellegrini · 2023-01-02T14:14:20Z

In theory, the C reader only needs to handle the syntax needed to bootstrap a reader written in Scheme.

Yes - but I suppose it would be nice to have the core of it still in C, for performance... Then the user (or libraries) could extend it in Scheme, maybe.

lassik · 2023-01-02T15:22:03Z

Sounds reasonable. # tokens occur somewhat rarely in source code, and are the most likely place for new extensions. So moving the # handling into Scheme would be one idea. #t and #f ought to be the only really important ones.

jpellegrini · 2023-01-02T15:26:32Z

#t and #f ought to be the only really important ones

STklos also has #void, and I think it's used quite often in code...
Maybe also #eof, which can be used in loop conditions -- STklos supports things like (if (eq? x #eof) ...) instead of (if (eof-object? x)) -- and even using #eof in a case srtatement (where it's usefulness becomes clearer):

stklos> (case #eof ((1) 2) ((#eof 2) 3))
3

And keywords: #:a is not a symbol, it's a keyword.

lassik · 2023-01-02T15:42:45Z

If we implement all of those in C, there's almost nothing left to be implemented in Scheme. Maybe the numeric vectors, fancy comments, etc.

jpellegrini · 2023-01-02T16:19:25Z

If we implement all of those in C, there's almost nothing left to be implemented in Scheme. Maybe the numeric vectors, fancy comments, etc.

And user-defined macros. I was thinking of a fast sharp-syntax for those already mentioned, and a configurable reader macro system similar to Common Lisp reader macros, for the user :) and for libraries maybe.

But this is just me thinking aloud. @egallesio what do you think?

Use the new C hash tables for the implementation.

This should fix issue #434.

egallesio · 2024-01-17T14:28:38Z

Hi @lasik,

One year later 😏 , I have done something similar to what you proposed. The sharp reader can even be extended in Scheme ( I have still to write some documentation about this point). This is done with the %add-sharp-reader function. Note that, this function takes associates a reader to the first character which follows the #.
For instance,

(%add-sharp-reader #\o                                                      
                       (lambda (port)                                           
                         (let ((v (read port)))                                 
                           (if (eq? v 'one)                                     
                               1                                                
                               (error "bad constant #o~a" v)))))

permits to return 1 when we encounter #one. A function should be added to query the procedure already associated to a character to permit chaining.

The following code has been used to implement the bitvector #* syntax.

(%add-sharp-reader #\*                                                          
                   (lambda (port)                                               
                     (let* ((v   (read port))                                   
                            (str (symbol->string v)))                           
                       (or (string->bitvector (string-append "#" str))          
                           (error "bad bitvector #~a" v)))))

egallesio added a commit that referenced this issue Jan 17, 2024

Start the refactoring of the sharp # reader (Issue #434)

cf4266d

Use the new C hash tables for the implementation.

egallesio added a commit that referenced this issue Jan 17, 2024

End of sharp reader Refactoring

50a5512

This should fix issue #434.

egallesio mentioned this issue Jan 17, 2024

Add SRFI 178 - Bitvector library #596

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactoring the sharp # reader #434

Refactoring the sharp # reader #434

lassik commented Jan 1, 2023

egallesio commented Jan 1, 2023

jpellegrini commented Jan 2, 2023

lassik commented Jan 2, 2023

jpellegrini commented Jan 2, 2023

lassik commented Jan 2, 2023

jpellegrini commented Jan 2, 2023 •

edited

Loading

lassik commented Jan 2, 2023

jpellegrini commented Jan 2, 2023

egallesio commented Jan 17, 2024

Refactoring the sharp # reader #434

Refactoring the sharp # reader #434

Comments

lassik commented Jan 1, 2023

egallesio commented Jan 1, 2023

jpellegrini commented Jan 2, 2023

lassik commented Jan 2, 2023

jpellegrini commented Jan 2, 2023

lassik commented Jan 2, 2023

jpellegrini commented Jan 2, 2023 • edited Loading

lassik commented Jan 2, 2023

jpellegrini commented Jan 2, 2023

egallesio commented Jan 17, 2024

jpellegrini commented Jan 2, 2023 •

edited

Loading