← All Articles
Last updated: 2026-03-30

How to Add an AI Chatbot to Your Website (Trained on Your Content)

Build a website chatbot using RAG, OpenAI API, and embeddings. From scraping content to deploying the chat widget.

TL;DR

You can add an AI chatbot trained on your own website content by scraping your pages, creating vector embeddings, storing them in a vector database like ChromaDB, and using Retrieval-Augmented Generation (RAG) with the OpenAI API to answer user questions. Off-the-shelf solutions like Crisp, Tidio, or Chatbase exist for non-technical users. A custom RAG setup costs roughly $5–50/month depending on traffic. Always consider GDPR compliance before deploying.

Prerequisites

Overview of Chatbot Options

Off-the-Shelf Solutions

SolutionProsConsStarting Price
CrispLive chat + AI bot combo, clean UIAI features on higher tiers onlyFree / $25/mo
TidioDrag-and-drop flow builder, Shopify integrationAI answers limited on free planFree / $29/mo
ChatbaseUpload docs/URLs, instant RAG chatbotLess control over retrieval logicFree / $19/mo
Custom RAGFull control, own data, no vendor lock-inRequires development and maintenance~$5/mo (API costs)

If you need full control over how your chatbot retrieves and generates answers, building a custom RAG pipeline is the way to go. The rest of this guide walks through exactly that.

Step 1: Scraping Your Website Content

First, we need to collect the text content from your website pages. This script crawls a sitemap or a list of URLs and extracts clean text.

import requests
from bs4 import BeautifulSoup
from urllib.parse import urljoin
import json
import time

def scrape_website(base_url, paths):
    """Scrape text content from a list of URL paths."""
    documents = []
    
    for path in paths:
        url = urljoin(base_url, path)
        try:
            resp = requests.get(url, timeout=10)
            resp.raise_for_status()
            soup = BeautifulSoup(resp.text, "html.parser")
            
            # Remove script, style, nav, footer elements
            for tag in soup(["script", "style", "nav", "footer", "header"]):
                tag.decompose()
            
            title = soup.title.string.strip() if soup.title else path
            # Extract main content area if possible
            main = soup.find("main") or soup.find("article") or soup.body
            text = main.get_text(separator="\n", strip=True) if main else ""
            
            if len(text) > 50:  # Skip near-empty pages
                documents.append({
                    "url": url,
                    "title": title,
                    "content": text[:8000]  # Limit per page
                })
                print(f"Scraped: {title} ({len(text)} chars)")
            
            time.sleep(0.5)  # Be polite to the server
        except Exception as e:
            print(f"Failed to scrape {url}: {e}")
    
    return documents

# Usage
pages = ["/", "/about", "/services", "/faq", "/pricing", "/contact"]
docs = scrape_website("https://example.com", pages)

with open("scraped_content.json", "w") as f:
    json.dump(docs, f, indent=2)

print(f"Scraped {len(docs)} pages successfully.")

Step 2: Creating Embeddings with ChromaDB

Next, we chunk the scraped text, generate embeddings via OpenAI, and store them in ChromaDB for fast similarity search.

import chromadb
from openai import OpenAI
import json

client = OpenAI()  # Uses OPENAI_API_KEY env variable
chroma = chromadb.PersistentClient(path="./chroma_db")
collection = chroma.get_or_create_collection(
    name="website_content",
    metadata={"hnsw:space": "cosine"}
)

def chunk_text(text, chunk_size=500, overlap=50):
    """Split text into overlapping chunks."""
    words = text.split()
    chunks = []
    for i in range(0, len(words), chunk_size - overlap):
        chunk = " ".join(words[i:i + chunk_size])
        if len(chunk) > 20:
            chunks.append(chunk)
    return chunks

def build_index(scraped_file="scraped_content.json"):
    with open(scraped_file) as f:
        documents = json.load(f)
    
    all_chunks = []
    all_ids = []
    all_metadata = []
    
    for doc in documents:
        chunks = chunk_text(doc["content"])
        for i, chunk in enumerate(chunks):
            chunk_id = f"{doc['url']}#chunk{i}"
            all_chunks.append(chunk)
            all_ids.append(chunk_id)
            all_metadata.append({
                "url": doc["url"],
                "title": doc["title"]
            })
    
    # Batch embed and insert (max 2048 per batch for OpenAI)
    batch_size = 100
    for i in range(0, len(all_chunks), batch_size):
        batch = all_chunks[i:i + batch_size]
        ids = all_ids[i:i + batch_size]
        meta = all_metadata[i:i + batch_size]
        
        response = client.embeddings.create(
            model="text-embedding-3-small",
            input=batch
        )
        embeddings = [e.embedding for e in response.data]
        
        collection.upsert(
            ids=ids,
            documents=batch,
            embeddings=embeddings,
            metadatas=meta
        )
        print(f"Indexed {i + len(batch)}/{len(all_chunks)} chunks")
    
    print(f"Index complete: {len(all_chunks)} chunks from {len(documents)} pages.")

build_index()

Step 3: Building the RAG Chatbot Backend

This Flask API receives a user question, retrieves relevant chunks from ChromaDB, and sends them as context to the OpenAI chat completion endpoint.

from flask import Flask, request, jsonify
from flask_cors import CORS
from openai import OpenAI
import chromadb

app = Flask(__name__)
CORS(app, origins=["https://yourdomain.com"])

oai = OpenAI()
chroma = chromadb.PersistentClient(path="./chroma_db")
collection = chroma.get_collection("website_content")

SYSTEM_PROMPT = """You are a helpful assistant for Example Company.
Answer questions based ONLY on the provided context from our website.
If the context does not contain the answer, say: "I don't have that
information. Please contact us at support@example.com."
Be concise, friendly, and accurate. Always cite which page the
information comes from when possible."""

def retrieve_context(query, n_results=5):
    """Find the most relevant chunks for a query."""
    query_embedding = oai.embeddings.create(
        model="text-embedding-3-small",
        input=[query]
    ).data[0].embedding
    
    results = collection.query(
        query_embeddings=[query_embedding],
        n_results=n_results
    )
    
    context_parts = []
    sources = set()
    for doc, meta in zip(results["documents"][0], results["metadatas"][0]):
        context_parts.append(f"[Source: {meta['title']}]\n{doc}")
        sources.add(meta["url"])
    
    return "\n\n---\n\n".join(context_parts), list(sources)

@app.route("/api/chat", methods=["POST"])
def chat():
    data = request.json
    user_message = data.get("message", "").strip()
    
    if not user_message or len(user_message) > 1000:
        return jsonify({"error": "Invalid message"}), 400
    
    context, sources = retrieve_context(user_message)
    
    response = oai.chat.completions.create(
        model="gpt-4o-mini",
        messages=[
            {"role": "system", "content": SYSTEM_PROMPT},
            {"role": "user", "content": f"Context:\n{context}\n\nQuestion: {user_message}"}
        ],
        max_tokens=500,
        temperature=0.3
    )
    
    answer = response.choices[0].message.content
    return jsonify({"answer": answer, "sources": sources})

if __name__ == "__main__":
    app.run(port=5000)

Step 4: Prompt Engineering for Website Chatbots

The system prompt is critical. Here are proven patterns:

Strict Context-Only Prompt

STRICT_PROMPT = """You are the virtual assistant for [Company Name].
Rules:
1. ONLY answer using the provided context. Never make up information.
2. If unsure, direct the user to contact support.
3. Keep answers under 3 sentences unless detail is requested.
4. Always mention the source page title.
5. Respond in the same language the user writes in.
6. Never discuss competitors or off-topic subjects."""

Conversational Sales Prompt

SALES_PROMPT = """You are a friendly assistant for [Company Name].
Your goal is to help visitors find the right product/service.
Rules:
1. Answer from the provided context. If the answer isn't there,
   offer to connect them with a human agent.
2. Gently guide toward relevant products when appropriate.
3. If asked about pricing, provide the information and suggest
   booking a demo.
4. Be warm, concise, and professional."""

Step 5: Embedding the Chat Widget

Add this snippet before the closing </body> tag on your website. It creates a floating chat button and dialog.

<!-- AI Chatbot Widget -->
<style>
  #chatbot-toggle {
    position: fixed; bottom: 24px; right: 24px; z-index: 9999;
    width: 56px; height: 56px; border-radius: 50%; border: none;
    background: #2563eb; color: #fff; font-size: 24px; cursor: pointer;
    box-shadow: 0 4px 12px rgba(0,0,0,0.15);
  }
  #chatbot-window {
    display: none; position: fixed; bottom: 90px; right: 24px;
    width: 380px; max-height: 520px; z-index: 9999;
    border-radius: 12px; overflow: hidden;
    box-shadow: 0 8px 30px rgba(0,0,0,0.2);
    font-family: -apple-system, BlinkMacSystemFont, sans-serif;
    background: #fff; flex-direction: column;
  }
  #chatbot-window.open { display: flex; }
  #chatbot-header {
    background: #2563eb; color: #fff; padding: 16px;
    font-weight: 600; font-size: 15px;
  }
  #chatbot-messages {
    flex: 1; overflow-y: auto; padding: 16px;
    max-height: 360px; min-height: 200px;
  }
  .cb-msg { margin-bottom: 12px; line-height: 1.5; font-size: 14px; }
  .cb-msg.bot { color: #1e293b; }
  .cb-msg.user { color: #2563eb; text-align: right; }
  #chatbot-input-wrap {
    display: flex; border-top: 1px solid #e2e8f0; padding: 8px;
  }
  #chatbot-input {
    flex: 1; border: none; outline: none; padding: 8px 12px;
    font-size: 14px;
  }
  #chatbot-send {
    background: #2563eb; color: #fff; border: none; padding: 8px 16px;
    border-radius: 6px; cursor: pointer; font-size: 14px;
  }
</style>

<button id="chatbot-toggle" onclick="toggleChat()">ὊC</button>
<div id="chatbot-window">
  <div id="chatbot-header">Ask us anything</div>
  <div id="chatbot-messages">
    <div class="cb-msg bot">Hi! How can I help you today?</div>
  </div>
  <div id="chatbot-input-wrap">
    <input id="chatbot-input" placeholder="Type your question..."
           onkeydown="if(event.key==='Enter')sendMessage()" />
    <button id="chatbot-send" onclick="sendMessage()">Send</button>
  </div>
</div>

<script>
const CHATBOT_API = "https://your-api.example.com/api/chat";

function toggleChat() {
  document.getElementById("chatbot-window").classList.toggle("open");
}

async function sendMessage() {
  const input = document.getElementById("chatbot-input");
  const msg = input.value.trim();
  if (!msg) return;
  
  const messages = document.getElementById("chatbot-messages");
  messages.innerHTML += `<div class="cb-msg user">${escapeHtml(msg)}</div>`;
  input.value = "";
  messages.innerHTML += `<div class="cb-msg bot" id="typing">Thinking...</div>`;
  messages.scrollTop = messages.scrollHeight;
  
  try {
    const res = await fetch(CHATBOT_API, {
      method: "POST",
      headers: { "Content-Type": "application/json" },
      body: JSON.stringify({ message: msg })
    });
    const data = await res.json();
    document.getElementById("typing").remove();
    messages.innerHTML += `<div class="cb-msg bot">${escapeHtml(data.answer)}</div>`;
  } catch {
    document.getElementById("typing").remove();
    messages.innerHTML += `<div class="cb-msg bot">Sorry, something went wrong.</div>`;
  }
  messages.scrollTop = messages.scrollHeight;
}

function escapeHtml(str) {
  const d = document.createElement("div");
  d.textContent = str;
  return d.innerHTML;
}
</script>

Step 6: Hosting Options

Option A: Serverless (Recommended for Low Traffic)

PlatformCold StartFree TierBest For
Vercel (Python Functions)~300ms100k requests/moQuick deployment, Next.js sites
Cloudflare Workers~5ms100k requests/dayLow latency, global edge
AWS Lambda~500ms1M requests/moComplex pipelines, enterprise

Option B: VPS (Recommended for High Traffic or Self-Hosting)

A small VPS (2 vCPU, 4GB RAM) at $5–12/month from Hetzner, DigitalOcean, or Contabo can handle ChromaDB + Flask comfortably for up to ~10k daily conversations. Use Docker Compose for deployment:

# docker-compose.yml (simplified)
# services:
#   chatbot:
#     build: .
#     ports: ["5000:5000"]
#     volumes: ["./chroma_db:/app/chroma_db"]
#     environment:
#       - OPENAI_API_KEY=${OPENAI_API_KEY}

Cost Estimation

The main cost driver is the OpenAI API. Here is a formula and example:

ComponentCostCalculation
Embedding (indexing, one-time)~$0.0150 pages x 2k tokens x $0.02/1M tokens
Embedding (per query)~$0.000002~100 tokens x $0.02/1M tokens
GPT-4o-mini (per query)~$0.0004~2k input + 300 output tokens
1,000 queries/month~$0.40Embedding + completion per query
10,000 queries/month~$4.00
100,000 queries/month~$40.00
ChromaDB / hosting$0–12/moFree locally, VPS otherwise

Total realistic estimate for a small business website: $5–15/month including hosting.

GDPR Considerations

If your users are in the EU, you must address the following:

Limitations & When NOT to Use a Chatbot

Troubleshooting

Chatbot returns "I don't have that information" for everything

Widget does not appear on the page

CORS errors when calling the API

High latency (>5 seconds per response)

Prevention & Best Practices

Need Expert Help?

Want a chatbot installed today? €99, trained on up to 50 pages of your content.

Book Now — €99

100% money-back guarantee

HR

Harald Roessler

Infrastructure Engineer with 20+ years experience. Founder of DSNCON GmbH.