How to Use Ollama to Host Your Own DeepSeek LLM Locally

With the increasing demand for privacy, security, and control over AI models, hosting your own large language models (LLMs) like DeepSeek locally has become a viable option.

How to Use Ollama to Host Your Own DeepSeek LLM Locally

With the increasing demand for privacy, security, and control over AI models, hosting your own large language models (LLMs) locally has become a viable option. Ollama is a powerful tool that simplifies this process, allowing users to run and interact with open-source LLMs on their local machines efficiently. This guide will walk you through the installation, configuration, and usage of Ollama to host your own DeepSeek LLM.

What is Ollama?

Ollama is a lightweight framework that enables users to run large language models locally with minimal setup. It supports various open-source models such as Llama, Mistral, and more. Ollama is designed to be user-friendly, making it an excellent choice for developers, researchers, and AI enthusiasts who want to leverage LLMs without relying on cloud-based services.

Prerequisites

Before installing Ollama, ensure that your system meets the following requirements:

  • Operating System: macOS or Linux (Windows support via WSL2)
  • Hardware: A modern CPU with AVX2 support (GPU acceleration recommended but not mandatory)
  • Memory: At least 16GB RAM for optimal performance

Installation :

macOS

  1. Open a terminal window.

Verify the installation:

ollama --version

Run the following command to install Ollama:

brew install ollama

Linux

Verify the installation:

ollama --version

Download and install Ollama using the package manager:

curl -fsSL https://ollama.ai/install.sh | sh

Windows (via WSL2)

  1. Install Windows Subsystem for Linux 2 (WSL2).
  2. Follow the Linux installation steps inside WSL2.
  3. Use ollama commands within your WSL2 terminal.

Downloading and Running Models

Once installed, you can start using Ollama to download and run models.

Listing Available Models

To see the available models, run:

ollama list

Downloading a Model

To download a model, use:

ollama pull deepseek-r1:1.5b

You can replace deepseek-r1:1.5b  with any supported model, such as llama2.

Running a Model Interactively

To start an interactive chat session with the model:

ollama run deepseek-r1:1.5b

This will allow you to enter prompts and receive responses from the model in real-time.

Hosting the Model as an API

Ollama provides a simple way to expose the model as an API for integration into applications.

Starting the API Server

Run the following command to start a local API server:

ollama serve

This will expose an endpoint (typically on port 11434) that applications can use to send requests to the model.

Making API Requests

You can interact with the API using curl or any HTTP client:

curl -X POST http://localhost:11434/api/generate -d '{"model": "llama2", "prompt": "Hello, world!"}'

The response will contain the model's generated output.

Customizing Models

You can also fine-tune models by creating a custom Modelfile:

FROM deepseek-r1:1.5b
PARAMETER temperature 0.7

Save this as Modelfile and build it using:

ollama create my-custom-model .

Now, you can run your customized model with:

ollama run my-custom-model

Conclusion

Ollama makes it easy to host and run LLMs locally, providing privacy, control, and reduced latency compared to cloud-based solutions. Whether you're a developer building AI-powered applications or a researcher exploring LLM capabilities, Ollama is a powerful tool that streamlines the process.

By following this guide, you can quickly set up and deploy your own DeepSeek LLMs, ensuring you have full control over your AI experience.

Want to see an example ? Techie on TalkingTech.io is now self hosted and using DeepSeek-R1:1.5b model.

Liked it ?

Read more