Fredy Acuna
  • Posts
  • Projects
  • Contact
LinkedInXGitHubMedium

© 2026 Fredhii. All rights reserved.

Back to posts
How to Self-Host Gemma on Dokploy (The Right Way)

How to Self-Host Gemma on Dokploy (The Right Way)

Fredy Acuna / December 8, 2025 / 8 min read

Self-Host Gemma on Dokploy: A Production-Ready Guide

This guide shows you how to properly self-host Google's Gemma AI model on Dokploy using Ollama. I've corrected several issues from an existing tutorial to make this production-ready with proper networking, persistent storage, and concurrency handling.


What You'll Learn

  • Setting up Gemma with proper Traefik networking (no exposed ports)
  • Configuring persistent storage for models
  • Hardware requirements for production
  • Concurrency settings for multiple users
  • Adding Open WebUI for a ChatGPT-like interface (optional)
  • Testing your deployment with curl

Prerequisites

Before starting, ensure you have:

  • A Dokploy instance running (check out How to Install Coolify for similar self-hosting setup)
  • A VPS with at least 1 vCPU, 1 GB RAM, and 5 GB storage (minimum for development/testing)

Understanding Gemma and Ollama

Gemma is Google's open-source AI model family. Ollama is a tool that makes running AI models locally simple—it handles downloading, serving, and API endpoints automatically.

When you run ollama serve, it starts an HTTP server on port 11434 that accepts requests and returns AI-generated responses. This is what we'll deploy.


Hardware Requirements

The gemma3:270m model is lightweight (~270MB), so it runs on minimal hardware. Choose your setup based on your use case:

Development/Testing (Survival Mode)

Use this for personal projects or cheap VPS instances:

ResourceSpecification
CPU1 vCPU
RAM1 GB
Storage5 GB
GPUNot required

Note: This handles 1 user quickly. If 2 people query at the same time, the second waits a few seconds.

Production (Frequent Use)

Use this if you expect 5-10 concurrent users or automated bots querying frequently:

ResourceSpecification
CPU2 vCPUs
RAM2-4 GB
Storage5-10 GB
GPUNot required

Why more RAM? Long conversations grow the context window (memory of previous messages), which can spike memory usage. 2GB is the safety zone.

Why 2 vCPUs? The HTTP server handling JSON requests and the inference engine compete for CPU. 2 cores keep the API responsive while the model thinks.

If you want better quality responses, consider larger models like gemma:2b (1.7GB) or gemma:7b (requires more RAM/GPU).


Step 1: Create the Service in Dokploy

  1. Log in to your Dokploy dashboard
  2. Click Create Service → Select Compose
  3. Give it a name like gemma-service

Step 2: Configure Docker Compose

Go to the General tab, then click Raw. Paste the following configuration:

version: '3.8'
services:
  ollama:
    image: ollama/ollama:latest
    container_name: ollama
    restart: unless-stopped
    environment:
      - OLLAMA_HOST=0.0.0.0
      - OLLAMA_ORIGINS=*
      - OLLAMA_NUM_PARALLEL=4
      - OLLAMA_MAX_LOADED_MODELS=1
    volumes:
      - ollama_storage:/root/.ollama
    # Uncomment if you have GPU available:
    # deploy:
    #   resources:
    #     reservations:
    #       devices:
    #         - driver: nvidia
    #           count: 1
    #           capabilities: [gpu]

  # Optional: ChatGPT-like web interface
  open-webui:
    image: ghcr.io/open-webui/open-webui:main
    container_name: open-webui
    volumes:
      - open-webui:/app/backend/data
    depends_on:
      - ollama
    environment:
      - OLLAMA_BASE_URL=http://ollama:11434
      - WEBUI_SECRET_KEY=your-secret-key-here
    restart: unless-stopped

volumes:
  ollama_storage:
  open-webui:

Important: We don't set OLLAMA_MODELS as an environment variable. Setting it changes the storage path and breaks persistence. Instead, we download models manually after deployment (Step 4).

Click Save.


Key Configuration Explained

Let's break down what makes this configuration production-ready:

Persistent Storage

volumes:
  - ollama_storage:/root/.ollama

Without this, you'd lose downloaded models every time the container restarts. The original tutorial missed this—meaning you'd have to re-download the model after every deployment.

Concurrency Settings

- OLLAMA_NUM_PARALLEL=4
- OLLAMA_MAX_LOADED_MODELS=1
VariablePurpose
OLLAMA_NUM_PARALLEL=4Allows 4 concurrent requests (4 users at the same time)
OLLAMA_MAX_LOADED_MODELS=1Keeps only 1 model in memory (saves RAM)

CORS Configuration

- OLLAMA_ORIGINS=*

Allows requests from any origin. Useful if you're calling the API from a frontend application.


Step 3: Configure the Domains

You need to add domains for the services you want to expose. Go to the Domains tab in your service.

Domain for Ollama API (Required)

  1. Click Add Domain
  2. Select Service Name: ollama
  3. For the Host field, choose one of these options:

Option A: Generate a traefik.me URL (Recommended for Testing)

Click the Generate button in Dokploy. It will automatically create a URL like:

main-ollama-wv9tts-9dc2f9-209-112-91-61.traefik.me

This gives you instant HTTPS without any DNS configuration.

Option B: Use Your Own Domain

Enter your subdomain: ollama.yourdomain.com

Make sure you have a DNS A record pointing to your Dokploy server's IP.

  1. Set the Container Port to 11434 (this is the port Ollama exposes internally)
  2. Leave Path as /
  3. Enable HTTPS for SSL (automatic with traefik.me or your own domain with Let's Encrypt)
  4. Click Save

Domain for Open WebUI (Optional)

If you included the Open WebUI service, add another domain for it:

  1. Click Add Domain
  2. Select Service Name: open-webui
  3. For the Host, generate a traefik.me URL or use your own domain (e.g., chat.yourdomain.com)
  4. Set the Container Port to 8080
  5. Enable HTTPS
  6. Click Save

Step 4: Deploy and Download the Model

  1. Click Deploy to start the containers
  2. Wait for the deployment to complete (check the logs)

Now, here's the critical step—you must download the model manually:

  1. Go to Docker in the Dokploy sidebar
  2. Find the ollama container
  3. Click the three dots → Terminal
  4. Run the following command:
ollama pull gemma3:270m

Wait for the download to complete. You can verify it worked with:

ollama list

You should see:

NAME           ID          SIZE     MODIFIED
gemma3:270m    abc123...   270MB    2 minutes ago

Step 5: Test Your Deployment

Visit your domain in a browser. You should see:

Ollama is running

Now test the API with curl:

curl https://ollama.yourdomain.com/api/generate -d '{
  "model": "gemma3:270m",
  "prompt": "Why is the sky blue?",
  "stream": false
}'

You should receive a JSON response with the AI-generated answer.


API Usage Examples

Basic Generation

curl -X POST https://ollama.yourdomain.com/api/generate \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gemma3:270m",
    "prompt": "Explain Docker in one sentence.",
    "stream": false
  }'

With Temperature Control

curl -X POST https://ollama.yourdomain.com/api/generate \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gemma3:270m",
    "prompt": "Write a haiku about programming.",
    "stream": false,
    "options": {
      "temperature": 0.7,
      "num_predict": 50
    }
  }'

Chat Format

curl -X POST https://ollama.yourdomain.com/api/chat \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gemma3:270m",
    "messages": [
      {"role": "user", "content": "What is machine learning?"}
    ],
    "stream": false
  }'

Using Open WebUI (Optional)

If you included Open WebUI in your Docker Compose, you now have a ChatGPT-like interface for interacting with your models.

  1. Visit your Open WebUI domain (e.g., https://chat.yourdomain.com)
  2. Create an account on first visit
  3. Select gemma3:270m from the model dropdown
  4. Start chatting

Open WebUI provides:

  • Chat history: Your conversations are saved locally
  • Multiple models: Switch between any models you've downloaded
  • System prompts: Customize the AI's behavior
  • File uploads: Attach documents for the AI to analyze
  • User management: Create accounts for team members

Tip: You can download additional models directly from Open WebUI's settings, or via the Ollama container terminal.


Troubleshooting

Model Not Found Error

If you get model not found, you forgot to download it. SSH into the container and run:

ollama pull gemma3:270m

Container Keeps Restarting

Check the logs in Dokploy. Common causes:

  • Out of memory: Increase RAM or use a smaller model
  • Volume permissions: The ollama_storage volume may have permission issues

Slow Responses

  • For single-user use, 1 vCPU and 1 GB RAM is sufficient
  • For concurrent users, upgrade to 2 vCPUs and 2-4 GB RAM
  • Reduce OLLAMA_NUM_PARALLEL if RAM is limited (try OLLAMA_NUM_PARALLEL=1 or 2)
  • Consider using a GPU for larger models

Cannot Access Domain

  • Check that the correct Container Port is set in Domains (11434 for Ollama, 8080 for Open WebUI)
  • Verify the Service Name matches the service in your Docker Compose
  • Ensure Traefik is running properly in Dokploy

Upgrading to Larger Models

Once your setup is working, you can easily switch models:

# Inside the container terminal
ollama pull gemma:2b      # 1.7 GB, better quality
ollama pull gemma:7b      # 4.2 GB, requires more RAM
ollama pull llama3.2:3b   # Alternative model

Update your API calls to use the new model name.


Security Considerations

For production deployments:

  1. Change the secret key: Replace your-secret-key-here with a strong, random string for Open WebUI
  2. Add authentication to Ollama API: Consider placing an authentication proxy in front of Ollama if exposing the raw API
  3. Rate limiting: Use Traefik middleware to prevent abuse
  4. Restrict CORS: Change OLLAMA_ORIGINS=* to specific domains if not using Open WebUI

Conclusion

You now have a production-ready Gemma AI service running on Dokploy with:

  • Persistent model storage that survives restarts
  • Concurrent request handling for multiple users
  • A clean API accessible via your domain
  • Optional ChatGPT-like interface with Open WebUI

This setup is significantly more robust than exposing ports directly and handles real-world usage patterns.


Related Resources

  • Ollama Documentation
  • Open WebUI Documentation
  • Dokploy Documentation
  • Gemma Model Card

Subscribe to my newsletter

Get updates on my work and projects.

We care about your data. Read our privacy policy.