Open source. Private. Ready in minutes on any PC.
Chat
What can I do with 128 GB of unified RAM?
Load up models like gpt-oss-120b or Qwen-Coder-Next for advanced tool use.
What should I tune first?
You can use --no-mmap to speed up load times and increase context size to 64 or more.
Image Generation
A pitcher of lemonade in the style of a renaissance painting
Speech
Hello, I am your AI assistant. What can I do for you today?
Lemonade exists because local AI should be free, open, fast, and private.
Lemonade is integrated in many apps and works out-of-box with hundreds more thanks to the OpenAI API standard.
Everything from install to runtime is optimized for fast setup, broad compatibility, and local-first execution.
![]()
Lightweight service that is only 2MB.
![]()
Simple installer that sets up the stack automatically.

Works with hundreds of apps out-of-box and integrates in minutes.
![]()
Configures dependencies for your GPU and NPU.
![]()
Works with llama.cpp, Ryzen AI SW, FastFlowLM, and more.
![]()
Run more than one model at the same time.
![]()
A consistent experience across Windows, Linux, and macOS (beta).
![]()
A GUI that lets you download, try, and switch models quickly.
Point your app at Lemonade and get chat, vision, image gen, transcription, speech gen, and more with standard APIs.
POST /api/v1/chat/completions
Track the newest improvements and highlights from the Lemonade release stream.