Opensource AI Fundamentals

Posted by Amol Dighe on February 24, 2026

What is Opensource AI?

A collection of technology & frameworks that is needed to use opensource AI to build systems & applications. e.g. build AI agent to book a flight OR shop for the right shoes at the right price point by comparing across multiple shopping websites.


* AI Models - Proprietary vs Opensource Models

Choosing the right model is the foundation of any AI system. There are two categories:

Proprietary Models — Closed-source, API-access only, typically more capable out-of-the-box:

  • OpenAI GPT-4.5, o3, o3-mini, o1
  • Anthropic Claude 3.7 Sonnet (with extended thinking), Claude 3.5 Haiku
  • Google Gemini 2.0 Flash, Gemini 2.0 Pro (Experimental)
  • xAI Grok-3, Grok-3 mini
  • Microsoft Phi-4 (via Azure AI Foundry)

Opensource Models — Weights available publicly, can be run locally or self-hosted:

  • Llama 3.3 (70B) from Meta — Latest flagship open model, best-in-class instruction following
  • DeepSeek-R1 / DeepSeek-V3 — Top-tier reasoning & coding, rivals GPT-4o at 1/10th cost
  • Qwen 2.5 / Qwen2.5-Coder (7B, 32B, 72B) from Alibaba — Excellent coding & multilingual
  • Qwen3-VL (2B, 7B) — Multimodal (vision + language), great for image understanding tasks
  • Mistral Small 3.1 (24B) — Fast, efficient, Apache 2.0 licensed, strong instruction following
  • Gemma 3 (1B, 4B, 12B, 27B) from Google — Lightweight, optimized for local inference
  • Phi-4 (14B) from Microsoft — Punches above its weight on reasoning benchmarks
  • GLM-4 from Zhipu AI — Strong multilingual support, especially Chinese + English
  • Kimi k1.5 from Moonshot AI — Long-context reasoning model (up to 128k tokens)

* Model Ranking Leaderboard

Before picking a model, consult benchmarks and community rankings to find the best fit for your use case (coding, reasoning, instruction-following, multilingual, etc.):

Key benchmarks to look at:

  • MMLU — General knowledge across 57 subjects
  • HumanEval / MBPP — Coding ability
  • MT-Bench — Multi-turn conversation quality
  • MATH / GSM8K — Mathematical reasoning

* Model Manager — Ollama & Docker Desktop Models

To run and manage open-source models locally, you need a model manager:

Ollama

  • Easiest way to download, run, and switch between local LLMs
  • Single command to pull and run: ollama run llama3

  • REST API at http://localhost:11434 — compatible with OpenAI API format
  • Supports: Llama 3, Mistral, Gemma, Phi, Qwen, DeepSeek, and more
  • Cross-platform: macOS, Linux, Windows

Docker Desktop Models

  • Docker Desktop (4.40+) has a built-in AI model runner
  • Pull and run models as containers: docker model run ai/llama3.2
    ~ » docker model list                                                                                  
    MODEL NAME  PARAMETERS  QUANTIZATION   ARCHITECTURE  MODEL ID      CREATED       CONTEXT  SIZE
    gemma3      3.88 B      MOSTLY_Q4_K_M  gemma3        a353a8898c9d  5 months ago           2.31 GiB
    
  • Exposes OpenAI-compatible API endpoint locally
  • Useful if your stack is already containerized

* Running a Local Model

Steps to get a model running locally:

  1. Install Ollama: Download from ollama.com and install
  2. Pull a model: ollama pull qwen3-vl:2b or ollama pull deepseek-coder-v2:latest
    ~ » ollama list
    NAME                        ID              SIZE      MODIFIED
    deepseek-coder-v2:latest    63fb193b3a9b    8.9 GB    45 hours ago
    qwen3-vl:2b                 0635d9d857d4    1.9 GB    3 days ago
    qwen2.5-coder:7b            dae161e27b0e    4.7 GB    12 days ago
    
  3. Run the model interactively: ollama run qwen3-vl:2b
  4. Use via API:
    curl http://localhost:11434/api/generate \
      -d '{"model": "qwen3-vl:2b", "prompt": "Explain RAG in simple terms"}'
    
  5. Persist context using chat API for multi-turn conversations
  6. Monitor performance: Check RAM/VRAM usage — most 7B models need ~8GB RAM; 13B needs ~16GB

Tips:

  • Use quantized models (e.g., Q4_K_M) for lower memory footprint with minimal quality loss
  • GPU acceleration is automatic on Apple Silicon (Metal) and CUDA (NVIDIA)

* Building AI Agents — No-Code: Ollama + n8n

For no-code / low-code AI agent building:

n8n is an open-source workflow automation tool similar to Zapier/Make, but self-hostable and AI-native.

Architecture:

User Input → n8n Workflow → Ollama (local LLM) → Tool Calls → Response

Steps:

  1. Self-host n8n via Docker: docker run -it --rm -p 5678:5678 n8nio/n8n
  2. Add an AI Agent node in n8n
  3. Connect it to Ollama Chat Model node (point to http://localhost:11434)
  4. Add Tool nodes (e.g., HTTP Request, Google Search, Database query)
  5. Define a system prompt and let the agent autonomously call tools

Use cases:

  • Auto-research and summarize news
  • Book flights by scraping airline sites
  • Price comparison across shopping websites
  • Email triage and auto-reply

* Building AI Agents — Code: Python + Ollama + OpenAI Agent SDK (ToDo)