Opensource AI Fundamentals

What is Opensource AI?

A collection of technology & frameworks that is needed to use opensource AI to build systems & applications. e.g. build AI agent to book a flight OR shop for the right shoes at the right price point by comparing across multiple shopping websites.

* AI Models - Proprietary vs Opensource Models

Choosing the right model is the foundation of any AI system. There are two categories:

Proprietary Models — Closed-source, API-access only, typically more capable out-of-the-box:

OpenAI GPT-4.5, o3, o3-mini, o1
Anthropic Claude 3.7 Sonnet (with extended thinking), Claude 3.5 Haiku
Google Gemini 2.0 Flash, Gemini 2.0 Pro (Experimental)
xAI Grok-3, Grok-3 mini
Microsoft Phi-4 (via Azure AI Foundry)

Opensource Models — Weights available publicly, can be run locally or self-hosted:

Llama 3.3 (70B) from Meta — Latest flagship open model, best-in-class instruction following
DeepSeek-R1 / DeepSeek-V3 — Top-tier reasoning & coding, rivals GPT-4o at 1/10th cost
Qwen 2.5 / Qwen2.5-Coder (7B, 32B, 72B) from Alibaba — Excellent coding & multilingual
Qwen3-VL (2B, 7B) — Multimodal (vision + language), great for image understanding tasks
Mistral Small 3.1 (24B) — Fast, efficient, Apache 2.0 licensed, strong instruction following
Gemma 3 (1B, 4B, 12B, 27B) from Google — Lightweight, optimized for local inference
Phi-4 (14B) from Microsoft — Punches above its weight on reasoning benchmarks
GLM-4 from Zhipu AI — Strong multilingual support, especially Chinese + English
Kimi k1.5 from Moonshot AI — Long-context reasoning model (up to 128k tokens)

* Model Ranking Leaderboard

Before picking a model, consult benchmarks and community rankings to find the best fit for your use case (coding, reasoning, instruction-following, multilingual, etc.):

llm-stats.com — Aggregated benchmarks and cost comparison
OpenRouter Rankings — Real-world usage and popularity rankings across providers
HuggingFace Open LLM Leaderboard — Standardized evals (MMLU, HellaSwag, ARC, etc.)

Key benchmarks to look at:

MMLU — General knowledge across 57 subjects
HumanEval / MBPP — Coding ability
MT-Bench — Multi-turn conversation quality
MATH / GSM8K — Mathematical reasoning

* Model Manager — Ollama & Docker Desktop Models

To run and manage open-source models locally, you need a model manager:

Ollama

Easiest way to download, run, and switch between local LLMs
Single command to pull and run: ollama run llama3
REST API at http://localhost:11434 — compatible with OpenAI API format
Supports: Llama 3, Mistral, Gemma, Phi, Qwen, DeepSeek, and more
Cross-platform: macOS, Linux, Windows

Docker Desktop Models

Docker Desktop (4.40+) has a built-in AI model runner

Pull and run models as containers: docker model run ai/llama3.2

~ » docker model list                                                                                  
MODEL NAME  PARAMETERS  QUANTIZATION   ARCHITECTURE  MODEL ID      CREATED       CONTEXT  SIZE
gemma3      3.88 B      MOSTLY_Q4_K_M  gemma3        a353a8898c9d  5 months ago           2.31 GiB

Exposes OpenAI-compatible API endpoint locally
Useful if your stack is already containerized

* Running a Local Model

Steps to get a model running locally:

Install Ollama: Download from ollama.com and install

Pull a model: ollama pull qwen3-vl:2b or ollama pull deepseek-coder-v2:latest

~ » ollama list
NAME                        ID              SIZE      MODIFIED
deepseek-coder-v2:latest    63fb193b3a9b    8.9 GB    45 hours ago
qwen3-vl:2b                 0635d9d857d4    1.9 GB    3 days ago
qwen2.5-coder:7b            dae161e27b0e    4.7 GB    12 days ago

Run the model interactively: ollama run qwen3-vl:2b

Use via API:

curl http://localhost:11434/api/generate \
  -d '{"model": "qwen3-vl:2b", "prompt": "Explain RAG in simple terms"}'

Persist context using chat API for multi-turn conversations
Monitor performance: Check RAM/VRAM usage — most 7B models need ~8GB RAM; 13B needs ~16GB

Tips:

Use quantized models (e.g., Q4_K_M) for lower memory footprint with minimal quality loss
GPU acceleration is automatic on Apple Silicon (Metal) and CUDA (NVIDIA)

* Building AI Agents — No-Code: Ollama + n8n

For no-code / low-code AI agent building:

n8n is an open-source workflow automation tool similar to Zapier/Make, but self-hostable and AI-native.

Architecture:

User Input → n8n Workflow → Ollama (local LLM) → Tool Calls → Response

Steps:

Self-host n8n via Docker: docker run -it --rm -p 5678:5678 n8nio/n8n
Add an AI Agent node in n8n
Connect it to Ollama Chat Model node (point to http://localhost:11434)
Add Tool nodes (e.g., HTTP Request, Google Search, Database query)
Define a system prompt and let the agent autonomously call tools

Use cases:

Auto-research and summarize news
Book flights by scraping airline sites
Price comparison across shopping websites
Email triage and auto-reply