You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I saw that llama.cpp (github project) has a PR (at the time I saw it) that made use of the nmap? c++ command that allows the model files to be mapped as memory space (like a dedicated swap file) and I was wondering if TorchSharp has something like that or could work with something created in c# to do that?
Having it map the model file(s) memory space to the files on the drive, allows them to be loaded near instantaneously. Also, it decreases the required memory by gigabytes.
I'm not 100% sure about all of the finer details, but I do know that the difference between running the standard llama.cpp and the nmap llama.cpp with the same model files was like night and day. It made getting the program up and running take less than a minute, and didn't slow down the model in a noticeable way.
So, I was wondering if something like this could be implemented, as that would be awesome (and would work cross-platform too.)
The text was updated successfully, but these errors were encountered:
I believe that torch.from_file already does this for tensors, but not for modules. In other words, the building blocks already exist, and we would use that instead of loading a state dict. It will require some mulling over in order to get it right, I think.
I didn't see a template for this...
Having it map the model file(s) memory space to the files on the drive, allows them to be loaded near instantaneously. Also, it decreases the required memory by gigabytes.
I'm not 100% sure about all of the finer details, but I do know that the difference between running the standard llama.cpp and the nmap llama.cpp with the same model files was like night and day. It made getting the program up and running take less than a minute, and didn't slow down the model in a noticeable way.
So, I was wondering if something like this could be implemented, as that would be awesome (and would work cross-platform too.)
The text was updated successfully, but these errors were encountered: