GitHub - tonib/multihead-rnn-classifier: Tensorflow model to predict next token on a source code file

This is an experiment to create a language model framework for programming source code autocompletion. It tries to predict the next word that will be typed in the IDE, based on what are tye previous words typed. It can train, export and (somewhat) serve the model.

This is NOT a model for any specific IDE and/or language, just a framework to train a generic programming language. Is up to you tokenize and define/give the features for each token and process the predictions.

Model is trained from a set of CSV files, one for each source file module. What a module is depends on the programming language. In Java it could be a .java file, in C it could be a .h/.c file.

Model is trained with Tensorflow 2.4.1 or 2.5, with a RNN or a GPT-like model (minGPT-TF).

Model is exported as a a Tensorflow model and a Tensorflow Lite model. You can use the standard ways to serve it. Besides this, there is a Python script to serve the model from the command line (stdin/stdout), with JSON inputs/outputs, that could be piped to your IDE.

This project has been used to implement an autocomplete tool for Genexus code editor. See the project here.

Install

Requires Tensorflow 2.4.1 or 2.5. See https://www.tensorflow.org/install

Use

See documentation

Licensing

This repo includes a modified version of the minGPT-TF (see model/mingpt for original and modified version, and the minGPT-TF license)

Name		Name	Last commit message	Last commit date
Latest commit History 228 Commits
dataset		dataset
debug		debug
doc		doc
model		model
predict		predict
tflite		tflite
training		training
.gitignore		.gitignore
README.md		README.md
TODO		TODO
__init__.py		__init__.py
clean_model.sh		clean_model.sh
column_info.py		column_info.py
configure_tf_log.py		configure_tf_log.py
cpu_no_turbo.sh		cpu_no_turbo.sh
data_directory.py		data_directory.py
debug_ds.py		debug_ds.py
export.py		export.py
model_data_definition.py		model_data_definition.py
model_definition.py		model_definition.py
model_server.py		model_server.py
production.py		production.py
test_prediction.py		test_prediction.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Install

Use

Licensing

About

Releases

Packages

Languages

tonib/multihead-rnn-classifier

Folders and files

Latest commit

History

Repository files navigation

Install

Use

Licensing

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages