Overview

Vision as MCP tools — why describe_image / ocr_document / compare_images beat one-off scripts
One protocol, many hosts — same server in Cursor, Claude Desktop, and Hermes
MiniCPM-V 4.6 on Ollama — pull, run, and wire the 1.6 GB multimodal model locally
Private document OCR — extract receipt and whiteboard text without sending pixels to the cloud
Before/after UI diffs — compare two screenshots for regression review
Runnable Python MCP server + an end-to-end agent demo (works offline with OLLAMA_MOCK=1)

Capability exchange — vision tools over MCP

Agent demo terminal — describe_image, ocr_document, compare_images

Capability exchange — vision tools over MCP

Terminal demo