Skip to content

Latest commit

 

History

History
28 lines (21 loc) · 1.05 KB

3467e9ca-ec11-444a-ba27-9fa55f5ee6c1.md

File metadata and controls

28 lines (21 loc) · 1.05 KB

id2vec

A little under 1M identifier embeddings, generated for identifiers extracted from half of PGA in June 2018. New pipeline was used, with splitting and stemming of identifiers, the full description can be found in the "Algorithms" section of the sourced.ml repository.

Example:

from sourced.ml.models import Id2Vec
id2vec = Id2Vec().load("3467e9ca-ec11-444a-ba27-9fa55f5ee6c1")
print("Number of tokens:", len(id2vec))

References

ID 3467e9ca-ec11-444a-ba27-9fa55f5ee6c1
Uploaded 2018-07-19 13:14:53.000621
Version 1.0.0
File https://storage.googleapis.com/models.cdn.sourced.tech/models%2Fid2vec%2F3467e9ca-ec11-444a-ba27-9fa55f5ee6c1.asdf
Size 1.2 GB
Data collection date June 2018
Number of tokens 999,424
Size of each embedding 300
License