local AI - Sesame Disk Blog

Close-up of an NVIDIA RTX graphics card representing GPU hardware for running 70B AI models locally

Local AI Inference in 2026: Strategies

Discover practical strategies and hardware choices for local AI inference in 2026, including benchmarking, deployment patterns, and system building tips.

July 13, 2026 26 min read

Developer working on a laptop running local AI inference with code editor visible

2026 Local Inference Engines: Key Decision

Discover the key factors influencing local AI inference engine choices in 2026, including performance, security, and architectural considerations for…

June 19, 2026 16 min read

Detailed close-up of a commercial aircraft engine on the runway with terminal backdrop.

2026 Local AI Inference Engines Guide

Compare top local inference engines for LLMs in 2026: Ollama, llama.cpp, vLLM, TGI, and SGLang. Find the best local inference engine 2026 for your hardware and workload.

May 20, 2026 14 min read

Two individuals interact with digital interfaces in a colorful futuristic setting, representing local AI adoption.

Why Local AI Deployment Is Critical in 2026

Explore the importance of local AI deployment in 2026, driven by hardware innovations, open models, and security needs, shaping the future of AI infrastructure.

May 11, 2026 8 min read

OpenYak April 2026: What’s Actually New Since the Last Update?

Discover OpenYak April 2026 updates showcasing production-ready features, enhanced privacy, compliance tools, and plugin ecosystem growth for enterprise use.

March 30, 2026 7 min read

Running Llama 3.1 70B on RTX 3090 via NVMe-to-GPU

Learn how to run Llama 3.1 70B on an RTX 3090 using NVMe-to-GPU technology, bypassing the CPU for efficient local AI inference.

February 22, 2026 6 min read

ggml.ai Joins Hugging Face: A New Era for Local AI

ggml.ai’s partnership with Hugging Face marks a pivotal moment for local AI development, enhancing sustainability and community support.

February 20, 2026 7 min read