2 unstable releases
new 0.2.0 | Nov 24, 2024 |
---|---|
0.1.0 | Oct 16, 2024 |
#187 in Graphics APIs
285KB
6K
SLoC
wgml: cross-platform GPU LLM inference
/!\ This library is still under heavy development and is still missing many features.
The goal of wgml is to provide composable WGSl shaders and kernels for cross-platform GPU LLM inference.
Running the models
Currently, the gpt2
and llama2
models are implemented. They can be loaded from gguf files. Support of quantization
is very limited (tensors are systematically unquantized upon loading) and somewhat untested. A very basic execution
of these LLMs can be run from the examples.
Running GPT-2
cargo run -p wgml --example gpt2 -- your_model_file_path.gguf --prompt "How do I bake a cake?"
Note that this will run both the gpu version and cpu version of the transformer.
Running llama-2
cargo run -p wgml --example llama2 -- your_model_file_path.gguf
Note that this will run both the cpu version and gpu version of the transformer.
Dependencies
~17–49MB
~797K SLoC