llama.cpp example

Run a local llama.cpp server for Proval.

llama.cpp can serve a local model with an OpenAI-compatible HTTP API. This guide covers starting the server. To add the model provider in Proval, follow Set LLM.

Prerequisites

  • Proval is running on your server (Quick Start)
  • A GGUF model file on that server

Start the server

Run llama.cpp with the OpenAI-compatible HTTP server. Set an API key so only your clients can call the API:

llama-server -m /path/to/your-model.gguf --port 8080 --api-key <your-llm-api-key>

Confirm the server responds:

http://<llm-host>:8080/v1/models

Replace <llm-host> with the hostname or IP address of the machine running llama.cpp.

llama.cpp server running in a terminal
llama.cpp server listening for API requests

If you run without --api-key, enter any placeholder in that field. We recommend run llama.cpp with --api-key

In Proval, set Base URL to http://<llm-host>:8080/v1. See Set LLM for the rest of the form and connection test.


What's next