Overview

  • Vision as MCP tools — why describe_image / ocr_document / compare_images beat one-off scripts
  • One protocol, many hosts — same server in Cursor, Claude Desktop, and Hermes
  • MiniCPM-V 4.6 on Ollama — pull, run, and wire the 1.6 GB multimodal model locally
  • Private document OCR — extract receipt and whiteboard text without sending pixels to the cloud
  • Before/after UI diffs — compare two screenshots for regression review
  • Runnable Python MCP server + an end-to-end agent demo (works offline with OLLAMA_MOCK=1)

Capability exchange — vision tools over MCP

Agent demo terminal — describe_image, ocr_document, compare_images

Capability exchange — vision tools over MCP

Terminal demo

Read the full tutorial →