lexer

This is a generic tokenizer written in Go. It is meant to be used by higher level parsers for writing DSLs. It is extensible by allowing keywords to be added to the default identifiers.

One example project is PrefixDB which is a toy database project that has it's own query language. It has a built in parser and tests as well as an AST. It's a great little project that showcases how to use this lexer as well as writing parsers that make use of it.

Extensibility

Custom tokens can be added to the lexer. Here's how you could add several tokens.

package lexer

import (
	"github.com/eliquious/lexer"
)

func init() {
	// Loads keyword tokens into lexer
	lexer.LoadTokenMap(tokenMap)
}

const (
	// Starts the keywords with an offset from the built in tokens
	startKeywords lexer.Token = iota + 1000

	// CREATE starts a CREATE KEYSPACE query.
	CREATE

	// SELECT starts a SELECT FROM query.
	SELECT

	// UPSERT inserts or replaces a key-value pair.
	UPSERT

	// DELETE deletes keys from a keyspace.
	DELETE
	endKeywords
)

var tokenMap = map[lexer.Token]string{

	CREATE:   "CREATE",
	SELECT:   "SELECT",
	DELETE:   "DELETE",
	UPSERT:   "UPSERT",
}

// IsKeyword returns true if the token is a custom keyword.
func IsKeyword(tok lexer.Token) bool {
	return tok > startKeywords && tok < endKeywords
}

Example output

A simple test of the lexer is to write a command line tool that outputs the tokens that come from STDIN. This tool is the the PrefixDB repository. By importing the PrefixDB package, all the custom tokens are registered with the lexer and will be output during scanning.

package main

import (
        "fmt"
        "os"
        "strconv"

        "github.com/eliquious/lexer"
        _ "github.com/eliquious/prefixdb/lexer"
)

func main() {
        l := lexer.NewScanner(os.Stdin)
        for {
                tok, pos, lit := l.Scan()

                // exit if EOF
                if tok == lexer.EOF {
                        break
                }

                // skip whitespace tokens
                if tok == lexer.WS {
                        continue
                }

                // Print token
                if len(lit) > 0 {
                        fmt.Printf("[%4d:%-3d] %10s - %s\n", pos.Line, pos.Char, tok, strconv.QuoteToASCII(lit))
                } else {
                        fmt.Printf("[%4d:%-3d] %10s\n", pos.Line, pos.Char, tok)
                }
        }
}

An example output would look like this:

$ echo 'SELECT FROM users WHERE username = "bugs.bunny" OR "daffy.duck" AND timestamp BETWEEN "2015-01-01" AND "2016-01-01" AND topic = "hunting"' | go run main.go
[   0:0  ]     SELECT
[   0:7  ]       FROM
[   0:12 ]      IDENT - "users"
[   0:18 ]      WHERE
[   0:24 ]      IDENT - "username"
[   0:33 ]          =
[   0:34 ]    TEXTUAL - "bugs.bunny"
[   0:48 ]         OR
[   0:50 ]    TEXTUAL - "daffy.duck"
[   0:64 ]        AND
[   0:68 ]      IDENT - "timestamp"
[   0:78 ]    BETWEEN
[   0:85 ]    TEXTUAL - "2015-01-01"
[   0:99 ]        AND
[   0:102]    TEXTUAL - "2016-01-01"
[   0:116]        AND
[   0:120]      IDENT - "topic"
[   0:126]          =
[   0:127]    TEXTUAL - "hunting"

As you can see, the custom tokens are output such as SELECT, FROM and WHERE, etc.

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
buffer.go		buffer.go
reader.go		reader.go
scanner.go		scanner.go
scanner_test.go		scanner_test.go
tokens.go		tokens.go
utils.go		utils.go

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

lexer

Extensibility

Example output

About

Releases

Packages

Languages

License

eliquious/lexer

Folders and files

Latest commit

History

Repository files navigation

lexer

Extensibility

Example output

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages