This is simple example to show how smaller LLM models (or small language models, SLMs) can be used as a viable alternative to their larger and much more performance-hungry siblings, when compute resources are constrained.
Here I present a simple but functional search engine for the (almost) complete legal texts of the
Swiss confederation in German, French, Italian and English that can be hosted on CPU-only cloud
compute. For instance an Oracle Cloud Infrastructure (OCI) VM.Standard.E6.Flex shape with 8 oCPUs
and 32GB memory. No GPUs required!
The application uses in the backend ChromaDB as a vector database. When starting up the server, the entire legal corpus is read (from XML files) and parsed, vector embeddings are created using the Qwen-3-Embedding-4B model and stored in the database.
The streamlit frontent presents a simple web interface with a search bar. Query text is passed through the same model for similarity search and the top 10 hits are returned.
- Get some system dependencies, on ubuntu this should suffice:
sudo apt install git tmux htop \
build-essential gcc-12 g++-12 cmake libcurl4-openssl-dev- Use
uvfor easy deployment:
curl -LsSf https://astral.sh/uv/install.sh | sh- Get the code:
git clone https://github.com/ohm314/slm_embedding_demo.git
cd slm_embedding_demo- run the streamlit app:
uv run streamlit run src/local_server.py -- data/xml -p -qTo access the app you will either have already deployed your VM in a public subnet with firewall rules adjusted to have the streamlit port open, or you do some ssh port forwarding to expose the streamlit port locally.
Copyright (c) 2025 Omar Awile Licensed under the Universal Permissive License v 1.0 as shown at https://oss.oracle.com/licenses/upl/

