Overview
- Vision as MCP tools — why
describe_image / ocr_document / compare_images beat one-off scripts
- One protocol, many hosts — same server in Cursor, Claude Desktop, and Hermes
- MiniCPM-V 4.6 on Ollama — pull, run, and wire the 1.6 GB multimodal model locally
- Private document OCR — extract receipt and whiteboard text without sending pixels to the cloud
- Before/after UI diffs — compare two screenshots for regression review
- Runnable Python MCP server + an end-to-end agent demo (works offline with
OLLAMA_MOCK=1)




Read the full tutorial →