How to Use Ollama to Host Your Own DeepSeek LLM Locally
With the increasing demand for privacy, security, and control over AI models, hosting your own large language models (LLMs) like DeepSeek locally has become a viable option.
With the increasing demand for privacy, security, and control over AI models, hosting your own large language models (LLMs) locally has become a viable option. Ollama is a powerful tool that simplifies this process, allowing users to run and interact with open-source LLMs on their local machines efficiently. This guide will walk you through the installation, configuration, and usage of Ollama to host your own DeepSeek LLM.
What is Ollama?
Ollama is a lightweight framework that enables users to run large language models locally with minimal setup. It supports various open-source models such as Llama, Mistral, and more. Ollama is designed to be user-friendly, making it an excellent choice for developers, researchers, and AI enthusiasts who want to leverage LLMs without relying on cloud-based services.
Prerequisites
Before installing Ollama, ensure that your system meets the following requirements:
- Operating System: macOS or Linux (Windows support via WSL2)
- Hardware: A modern CPU with AVX2 support (GPU acceleration recommended but not mandatory)
- Memory: At least 16GB RAM for optimal performance
Installation :
macOS
- Open a terminal window.
Verify the installation:
ollama --version
Run the following command to install Ollama:
brew install ollama
Linux
Verify the installation:
ollama --version
Download and install Ollama using the package manager:
curl -fsSL https://ollama.ai/install.sh | sh
Windows (via WSL2)
- Install Windows Subsystem for Linux 2 (WSL2).
- Follow the Linux installation steps inside WSL2.
- Use
ollama
commands within your WSL2 terminal.
Downloading and Running Models
Once installed, you can start using Ollama to download and run models.
Listing Available Models
To see the available models, run:
ollama list
Downloading a Model
To download a model, use:
ollama pull deepseek-r1:1.5b
You can replace deepseek-r1:1.5b with any supported model, such as llama2.
Running a Model Interactively
To start an interactive chat session with the model:
ollama run deepseek-r1:1.5b
This will allow you to enter prompts and receive responses from the model in real-time.
Hosting the Model as an API
Ollama provides a simple way to expose the model as an API for integration into applications.
Starting the API Server
Run the following command to start a local API server:
ollama serve
This will expose an endpoint (typically on port 11434) that applications can use to send requests to the model.
Making API Requests
You can interact with the API using curl
or any HTTP client:
curl -X POST http://localhost:11434/api/generate -d '{"model": "llama2", "prompt": "Hello, world!"}'
The response will contain the model's generated output.
Customizing Models
You can also fine-tune models by creating a custom Modelfile
:
FROM deepseek-r1:1.5b
PARAMETER temperature 0.7
Save this as Modelfile
and build it using:
ollama create my-custom-model .
Now, you can run your customized model with:
ollama run my-custom-model
Conclusion
Ollama makes it easy to host and run LLMs locally, providing privacy, control, and reduced latency compared to cloud-based solutions. Whether you're a developer building AI-powered applications or a researcher exploring LLM capabilities, Ollama is a powerful tool that streamlines the process.
By following this guide, you can quickly set up and deploy your own DeepSeek LLMs, ensuring you have full control over your AI experience.
Want to see an example ? Techie on TalkingTech.io is now self hosted and using DeepSeek-R1:1.5b model.