Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

automata #1

Open
missinglink opened this issue Jan 29, 2022 · 2 comments
Open

automata #1

missinglink opened this issue Jan 29, 2022 · 2 comments

Comments

@missinglink
Copy link

missinglink commented Jan 29, 2022

Hi 👋, I saw your recent post over at luceneutil, I've been reading a lot of the same stuff recently.

@LifeIsStrange
Copy link
Owner

LifeIsStrange commented Feb 12, 2022

Hi @missinglink ! I am very pleasantly surprised you discovered this repository haha, which is IMHO the missing github's social link.
I am an enthusiast regarding the idea of progress (about performance or other metrics such as the realm of possibilities) in search, machine learning or any other nerd-worthy topic :)
Despite my interest I haven't spend enough time and focus to actually understand how the Levenshtein automata actually works (nor of where does the complexity comes from) so if you find the motivation to study it further, I'd appreciate if you were to share to me a high level explanation.
Moreover, feel free to use this issue to share any other interesting topic you might want to.
I don't know anything about GIS/geocoding systems BTW but this looks interesting and has many uses.
I stumbled upon this library made by Uber years ago -> https://github.com/uber/h3
I don't understand exactly what make it interesting/superior to existing approaches (e.g what problems does hexagonality solve concretely?) but I thought it might interest u.

@missinglink
Copy link
Author

missinglink commented Feb 14, 2022

Hi @LifeIsStrange

It's an interesting idea, I might do something similar, like an AskMeAnything repo on my Github ;)

It's a fairly wide topic, but the tl;dr with automata is that they are both beautiful kinetic sculptures and state machines.

Some similarities from both can be found in Cellular Automata, that video is pretty neat, a similar type of ruleset can be made which defines the transitional rules between two strings, each 'transition' in the machine is a character of the alphabet.

This is cool because instead of trying to enumerate all possible strings within an Levenshtein edit distance of each other (which is slow and uses a lot of memory) you can create an automata more quickly and simply which 'accepts' these 'similar strings' and rejects dissimilar strings.

That's a lot of what was being discussed in the articles you linked on the luceneutil repo, but IMO it's more interesting when you take this automata and 'intersect' it with another automata, producing a third automata containing only the common transitions between the two. This allows very efficient weighted Levenshtein calculations against a large corpus of text.

The H3 library is also very neat, it's an example of a Spatial Prefix Tree, of which there are actually quite a few, Geohash is one of the most well known, but S2 is probably the 'coolest', despite not having hexagons ;)

Construction of the Earth Cube is really fascinating, because they take 3 dimensional space and reduce it to one-dimensional space through a space-filling algorithm called Hilbert Curves (also see Z-Order Curves).

I remember reading some physics text when I was younger about 'folding' a single dimension into two dimensions, just like folding a piece of paper at 45 degrees, and how theoretical physicists believe the world we live is no more than 11 dimensions, a mind-fuck to say the least.

And this is kinda the opposite of that, you take two dimensions and encode them as one 🤔 how it works is easiest to understand imagining you are walking in a square labyrinth, your location can be defined as a pair of co-ordinates X/Y, but if you unravelled a piece of string behind you as you went, you could then use the distance along the string as an alternate way of encoding your location, one which only has a scalar value, distance along the spool of string 📏

Space filling curves can also be extended into n dimensional space 🤯 such as this example of a Hilbert Cube where all dimensions are reduced to a scalar value 🤯 🤯 .

Sorry for the long text

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants