AI inference - Sesame Disk Blog

Close-up of server racks representing AI inference workloads, GPU hardware, and rising product infrastructure costs in 2026

AI Inference Costs in 2026: The Inference

Analyze how AI token serving costs are decreasing in 2026, impacting product design, infrastructure, and budgeting strategies across the industry.

July 13, 2026 13 min read

Close-up of an NVIDIA RTX graphics card representing GPU hardware for running 70B AI models locally

Local AI Inference in 2026: Strategies

Discover practical strategies and hardware choices for local AI inference in 2026, including benchmarking, deployment patterns, and system building tips.

July 13, 2026 26 min read

Close-up of computer memory modules and processor hardware representing high unified memory capacity for local LLM inference on Apple Silicon.

Apple Silicon vs Nvidia RTX 5090

Explore the capabilities and limitations of Apple Silicon versus Nvidia RTX 5090 for local AI inference in 2026, focusing on model capacity, performance,…

July 10, 2026 16 min read

Close-up of server racks in a data center representing AI inference engine architecture and hardware tradeoffs

2026 Comparison of Local AI Inference Engines

Explore the latest in local AI inference engines for 2026, including architecture, benchmarks, security updates, and deployment strategies for optimal…

July 9, 2026 15 min read

Software developer working at a modern workstation, representing engineering teams using local AI models for code review, log triage, ticket drafting, and internal copilots.

Local Inference Practice with gguf

Explore practical local inference strategies in 2026, including gguf, q-levels, awq, gptq, fp8, and best practices for hardware and engine choices.

July 3, 2026 25 min read

The $5,000 AI Workstation: Running 70B Models Locally in 2026

$5,000 AI Workstation for 70B Models in 2026

Discover how to build a $5,000 AI inference workstation in 2026 capable of running 70B models locally, amidst record-high GPU prices and memory shortages.

June 25, 2026 11 min read

Apple Silicon for Large Language Model Inference in 2026: Strengths and Limitations

Apple Silicon for LLM Inference 2026

Discover the strengths and limitations of Apple Silicon for large language model inference in 2026, focusing on capacity, latency, framework ecosystem, and…

June 25, 2026 15 min read

Detailed close-up of microprocessors and RAM sticks on a motherboard, symbolizing OpenAI and Broadcom custom AI inference silicon for production workloads

AI Inference Silicon 2026: Chip Race Shift

Discover how inference silicon is reshaping AI deployment economics in 2026, emphasizing memory capacity, software ecosystem, and hardware choices for…

June 24, 2026 13 min read

Developer working on a laptop running local AI inference with code editor visible

2026 Local Inference Engines: Key Decision

Discover the key factors influencing local AI inference engine choices in 2026, including performance, security, and architectural considerations for…

June 19, 2026 16 min read

Abstract digital light burst with neon blue and purple fiber optic glow representing high-speed data transfer and token processing

A Fully Digital Transformer Chip at 80 MHz

Explore the groundbreaking digital silicon Transformer chip claiming 56,000 tokens/sec at 80 MHz, analyzing feasibility, design principles, and industry…

June 17, 2026 13 min read

Detailed close-up of a commercial aircraft engine on the runway with terminal backdrop.

2026 Local AI Inference Engines Guide

Compare top local inference engines for LLMs in 2026: Ollama, llama.cpp, vLLM, TGI, and SGLang. Find the best local inference engine 2026 for your hardware and workload.

May 20, 2026 14 min read

Laptop displaying a data analytics line graph representing price-per-token trends across major AI providers

AI Inference Cost Trends in 2026: Tokens, Model Size, and Economics That Actually Matter

Learn how AI inference costs are declining in 2026, impacting deployment strategies, infrastructure choices, and economic models for scalable AI solutions.

May 19, 2026 15 min read