id2vec

A little under 1M identifier embeddings, generated for identifiers extracted from half of PGA in June 2018. New pipeline was used, with splitting and stemming of identifiers, the full description can be found in the "Algorithms" section of the sourced.ml repository.

Example:

from sourced.ml.models import Id2Vec
id2vec = Id2Vec().load("3467e9ca-ec11-444a-ba27-9fa55f5ee6c1")
print("Number of tokens:", len(id2vec))

References

Source code identifier embeddings


ID	3467e9ca-ec11-444a-ba27-9fa55f5ee6c1
Uploaded	2018-07-19 13:14:53.000621
Version	1.0.0
File	https://storage.googleapis.com/models.cdn.sourced.tech/models%2Fid2vec%2F3467e9ca-ec11-444a-ba27-9fa55f5ee6c1.asdf
Size	1.2 GB
Data collection date	June 2018
Number of tokens	999,424
Size of each embedding	300
License

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

3467e9ca-ec11-444a-ba27-9fa55f5ee6c1.md

3467e9ca-ec11-444a-ba27-9fa55f5ee6c1.md

id2vec

References

Files

3467e9ca-ec11-444a-ba27-9fa55f5ee6c1.md

Latest commit

History

3467e9ca-ec11-444a-ba27-9fa55f5ee6c1.md

File metadata and controls

id2vec

References