OpenClaw + Gemma 4 E2B + Agentic RAG — Full Integration¶
Build a personal AI assistant that answers on Telegram/WhatsApp/CLI using Gemma 4 E2B, and delegates research-heavy questions to your local Agentic RAG API.
What you end up with¶
- OpenClaw Gateway — always-on control plane (daemon)
- gemma4:e2b — conversational model with tools + optional vision
- agentic-rag skill — shells out to
rag_query.sh→POST /predicton LitServe - qwen-agentic-rag — CrewAI Researcher + Writer + Qdrant (and optional Firecrawl)
This integration uses one Ollama model everywhere: gemma4:e2b for OpenClaw chat and for the CrewAI RAG agents.
Architecture¶
flowchart LR
A[Channels] -->|user message| B[OpenClaw Gateway]
B -->|gemma4:e2b plans| C[agentic-rag skill]
C -->|rag_query.sh POST /predict| D[LitServe API]
D --> E[Researcher Agent]
E --> F[Vector DB Tool]
E --> G[Firecrawl Search]
E --> H[Writer Agent]
H --> I[JSON Response]
F --> J[(Qdrant)]
G --> K[Firecrawl API]
B --> L[(Ollama gemma4:e2b)]
E --> L
H --> L
I --> C
C --> B
B -->|reply| A
Flow:
- User messages OpenClaw on Telegram, WhatsApp, or CLI
- gemma4:e2b decides whether to call the agentic-rag skill
- Skill runs
rag_query.sh→ LitServePOST /predict - CrewAI Researcher picks Qdrant or Firecrawl; Writer drafts the answer
- JSON returns through the skill → OpenClaw → the same channel
Prerequisites¶
| Requirement | Check |
|---|---|
| Node 22.12+ or 24 (OpenClaw will not run on Node 20) | node -v |
| Ollama | ollama -v |
| Python 3.10+ | python3 --version |
| curl + jq | curl --version && jq --version |
| Completed qwen-agentic-rag once | vector DB built |
Part 1 — Agentic RAG API¶
If you already finished the Qwen Agentic RAG tutorial, start the server only:
ollama pull gemma4:e2b
cd guides/qwen-agentic-rag
source .venv/bin/activate
cp ../openclaw-gemma-rag/env.rag.example .env # sets OLLAMA_MODEL=ollama/gemma4:e2b
# First time only:
# pip install -r requirements.txt && python setup_vectordb.py
python server.py
Default URL: http://127.0.0.1:8001 (PORT in .env).
Verify:
python client.py --query "What is cross-validation?"
# or
curl -sS -X POST http://127.0.0.1:8001/predict \
-H 'Content-Type: application/json' \
-d '{"query":"What is cross-validation?"}' | jq -r .output
Keep this terminal open. First crew run may take several minutes.
Part 2 — Pull Gemma 4 E2B¶
Recommended sampling (Ollama may already apply defaults): temperature=1, top_p=0.95, top_k=64.
Part 3 — Install OpenClaw¶
Node version (required)¶
OpenClaw needs Node >= 22.12. If node -v shows v20, switch with nvm (you may already have 22 installed):
cd guides/openclaw-gemma-rag
source ./use-node22.sh # uses .nvmrc → 22.22.3
node -v # must be v22.12.0 or higher
Optional — make Node 22 the default in new terminals:
Follow prompts for workspace, auth, and optional channels. See Getting started.
Set the primary model:
export OLLAMA_API_KEY="ollama-local"
openclaw models list --provider ollama
openclaw models set ollama/gemma4:e2b
Config snippet¶
Copy fields from config/openclaw.snippet.json5 in this guide into ~/.openclaw/openclaw.json.
Critical points:
baseUrl:http://127.0.0.1:11434— no/v1suffixapi:"ollama"— native tool callingagents.defaults.model.primary:"ollama/gemma4:e2b"
Restart:
Part 4 — Install the agentic-rag skill¶
From this guide directory:
cd guides/openclaw-gemma-rag
chmod +x install-skill.sh skills/agentic-rag/scripts/*.sh
./install-skill.sh
This copies to ~/.openclaw/workspace/skills/agentic-rag/.
Alternative (if your CLI supports it):
Enable in config:
{
skills: {
entries: {
"agentic-rag": {
enabled: true,
env: { RAG_API_URL: "http://127.0.0.1:8001" },
},
},
},
}
Optional allowlist so only this skill is injected:
Restart the gateway after skill or config changes.
Skill behavior¶
The skill teaches OpenClaw to run:
That POSTs to LitServe and prints the crew answer. The Gemma model decides when to use the skill; the RAG crew uses the same OLLAMA_MODEL=ollama/gemma4:e2b from guides/qwen-agentic-rag/.env (see env.rag.example).
Part 5 — End-to-end test¶
CLI (no channel)¶
openclaw agent --message "Using the agentic RAG knowledge base: explain cross-validation in 3 bullets." --thinking low
Watch the gateway logs — you should see an exec invoking rag_query.sh.
Manual script test¶
export RAG_API_URL=http://127.0.0.1:8001
./skills/agentic-rag/scripts/rag_query.sh "What is regularization?"
Health check¶
Part 6 — Connect a channel (optional)¶
Example: Telegram
- Create a bot via @BotFather
- During
openclaw onboardoropenclaw configure, add the Telegram channel token - Keep DM pairing enabled (
dmPolicy: "pairing") until you trust exposure - Approve yourself:
openclaw pairing approve telegram <code>
Send: "Search the ML FAQ: what is gradient descent?"
Flow: Telegram → Gateway → Gemma → agentic-rag skill → RAG API → reply on Telegram.
Channel docs: OpenClaw Channels.
Security checklist¶
- Treat inbound DMs as untrusted — keep pairing on for production-adjacent setups
exec(used by the RAG skill) is powerful — do not expose the gateway to the public internet without Security and Exposure runbook- Run
openclaw doctorafter config changes - RAG API binds to localhost by default — keep it that way
Troubleshooting¶
| Symptom | Fix |
|---|---|
connection refused on :8001 |
Start python server.py in qwen-agentic-rag |
| RAG very slow | Normal on laptop; reduce parallel Ollama loads |
| OpenClaw ignores RAG | Confirm skill installed, enabled: true, gateway restarted; ask explicitly to "use agentic RAG" |
ollama/gemma4:e2b not found |
ollama pull gemma4:e2b; check openclaw models list |
| Tool calling errors | Ensure api: "ollama" and no /v1 on baseUrl |
openclaw requires Node >=22.12.0 |
Run source guides/openclaw-gemma-rag/use-node22.sh or nvm use 22 before any openclaw command |
| OOM on 16GB Mac | Only run gemma4:e2b; quit other Ollama models (ollama ps) |
Skill curl fails |
brew install jq or apt install jq |
What's next¶
- Add your own documents in
guides/qwen-agentic-rag/rag_code.pyand re-runsetup_vectordb.py - Publish a second OpenClaw skill for Gradio (
ui.py) health checks - Route work vs personal agents with multi-agent routing
- Share this guide on the ecosystem docs site
Summary¶
| Component | You run |
|---|---|
| Ollama | gemma4:e2b (chat + RAG) |
| RAG | guides/qwen-agentic-rag/server.py |
| OpenClaw | openclaw gateway (daemon) |
| Skill | agentic-rag → rag_query.sh → /predict |
You now have a local-first assistant: Gemma for conversation, CrewAI RAG for grounded ML research — no cloud LLM required for either layer.