Local LLM

Run Local LLM on Windows with CFAI — Ollama, LM Studio

Run the full CFAI feature set — meeting notes, Document RAG, text rewriting — powered by a local LLM running on your Windows PC. No OpenAI bill, no data sent to the cloud.

Local LLMs Are Good Enough Now — And CFAI Puts Them to Work in Every Meeting

In 2023, local LLMs were experimental. In 2026, models like Llama 3.1 70B (quantized), Mistral Nemo, Phi-4, and Qwen 2.5 achieve quality comparable to GPT-3.5 and approaching GPT-4 levels on many tasks — running on consumer hardware. Ollama, LM Studio, and Jan.ai have made deploying these models as easy as a GUI download.

CFAI integrates natively with Ollama, LM Studio, and Jan.ai as AI backends. This means you can run the full CFAI feature set — meeting transcription with AI summaries, Document RAG during live calls, In-Place Text Replace, Cognitive Map enhancement, and Agentic AI — powered entirely by a local model with zero API costs and zero data leaving your device.

The practical implications: an enterprise that processes 10,000 meeting-hours per month pays nothing in API costs. A security-conscious professional handles sensitive meetings with zero cloud exposure. A developer in an air-gapped environment still has full AI meeting assistance. This is the architecture that makes CFAI the most flexible AI meeting tool available.

Local LLM Meeting Assistant Tools — Feature Comparison 2026

ToolLocal ProcessingNo BotSpeaker IDReal-time TranslationCognitive MapPrice
CFAI + Ollama/LM Studio ★ RecommendedFree trial / From €7.99/mo (+ free local LLM)
Jan.ai (standalone) Free (open source)
LM Studio (standalone) Free (for personal use)
Ollama + Open WebUI Free (open source)
Oobabooga (text-generation-webui) Free (open source, technical)

Why CFAI Is the Best Interface for Local LLMs in Meetings

🎙️

Local Whisper Transcription

CFAI uses Faster Whisper running entirely on your Windows PC. Your audio is never uploaded — processing happens locally, even offline.

🧠

Real-Time Cognitive Map

As you speak, CFAI builds a live semantic map of your conversation. See topic clusters, predict where discussion is heading, and stay focused — in real time.

🔇

No Bot, No Notifications

CFAI captures audio from your microphone and system audio at the OS level. No bot joins your call. Other participants see no recording notification.

🌍

101 Languages, Real-Time

CFAI translates your meeting transcript into 101 languages as you speak. Perfect for multilingual teams and international calls.

📄

Document RAG During Calls

Upload PDFs, Word docs, or CSV files. During your meeting, CFAI retrieves relevant information from your documents in real time.

🤖

Agentic AI and Web Search

CFAI's Agentic AI can perform multi-step tasks, search the web, and surface information proactively during your meeting.

🔒

100% Private by Default

All audio processing, transcription, and AI analysis runs on your Windows device. Nothing is sent to CFAI's servers unless you explicitly enable optional cloud features.

💳

Flexible Plans, No Surprises

Free trial to test everything. Then choose from €7.99/mo (500 CF), €12.99/mo (1500 CF), or €24.99/mo (3000 CF) — cancel anytime, no lock-in.

How to Run CFAI with a Local LLM on Windows

1

Install Ollama or LM Studio

Download Ollama (ollama.com) or LM Studio (lmstudio.ai) on your Windows PC. Pull a model: 'ollama pull llama3.1' or use LM Studio's model browser. Start the local server.

2

Configure CFAI to Use Your Local LLM

In CFAI settings, select 'Local LLM' as your AI backend. Point CFAI to your Ollama endpoint (http://localhost:11434) or LM Studio server. Test the connection.

3

Run Full CFAI Features with Local AI

Start a meeting. CFAI transcribes locally (Whisper) and applies AI analysis using your local LLM — summaries, action items, Document RAG responses, and text rewrites all run locally.

4

Adjust Model for Your Hardware

Choose a model that fits your GPU VRAM: Llama 3.1 8B (4–6 GB VRAM), Mistral 7B (4–5 GB), Phi-4 (8 GB), Llama 3.1 70B quantized (24+ GB). CPU-only works with smaller models.

Frequently Asked Questions

Can I use Ollama with CFAI?

Yes. CFAI integrates with Ollama as a local LLM backend. In CFAI settings, set your AI provider to 'Ollama' and configure the endpoint (default: http://localhost:11434). Any Ollama-compatible model (Llama 3, Mistral, Qwen, Phi, etc.) can then power all CFAI AI features.

Does CFAI work with LM Studio?

Yes. LM Studio runs a local OpenAI-compatible API endpoint. Configure CFAI to use your LM Studio endpoint and any loaded model. Supports all CFAI AI features: summaries, Document RAG, IPR, and Agentic AI.

What is the best local LLM to use with CFAI for meetings?

For most meeting use cases: Llama 3.1 8B (fast, 4–6 GB VRAM, good quality) or Mistral Nemo (strong at instruction following). For maximum quality: Llama 3.1 70B quantized (requires 24+ GB VRAM or CPU with 32+ GB RAM). Phi-4 is an excellent choice for lower-end hardware.

Does using a local LLM mean everything is offline?

Yes — with local LLM + local Whisper transcription configured, CFAI's entire pipeline runs offline: audio capture (Whisper local) → transcription → AI analysis (local LLM) → Cognitive Map → Document RAG. No internet required at any step.

Does running a local LLM save money on API costs?

Yes. Cloud LLM APIs (OpenAI GPT-4o, Claude Sonnet) charge per token. For heavy meeting use (10+ hours/week), API costs can reach $50–200/month. A local LLM has zero per-token cost after the initial hardware investment.

Run the Full CFAI Feature Set on Your Own Hardware

Try CFAI free. Connect Ollama, LM Studio, or Jan.ai for fully local AI meeting notes — zero API costs, zero cloud data. Plans from €7.99/month.