{"id":61,"date":"2026-05-18T12:37:29","date_gmt":"2026-05-18T12:37:29","guid":{"rendered":"https:\/\/convly.ai\/build-ai-chatbot-claude-api\/"},"modified":"2026-05-21T20:12:56","modified_gmt":"2026-05-21T20:12:56","slug":"build-ai-chatbot-claude-api","status":"publish","type":"post","link":"https:\/\/convly.ai\/ar\/build-ai-chatbot-claude-api\/","title":{"rendered":"How to Build an AI Chatbot with the Claude API in 2026"},"content":{"rendered":"<p>Building a chatbot used to mean wrangling intent classifiers, dialogue trees, and a mountain of edge cases. With a modern language model API, the model handles the hard part \u2014 understanding and responding \u2014 and your job is the wiring around it. With the Claude API you can have a genuinely capable chatbot working in well under an hour.<\/p>\n<p>This guide walks through the concepts and the code: setup, holding a conversation, steering behavior, streaming responses, and keeping costs down.<\/p>\n<div class=\"convly-tldr\">\n<h3>Key takeaways<\/h3>\n<ul>\n<li><strong>The core call<\/strong> is the Messages API \u2014 you send a list of messages, Claude returns a reply.<\/li>\n<li><strong>Conversation memory<\/strong> is your job: keep the message history and resend it each turn.<\/li>\n<li><strong>The system prompt<\/strong> sets the bot&#8217;s role, personality, and rules.<\/li>\n<li><strong>Streaming<\/strong> makes the reply appear word by word, like a real chat.<\/li>\n<li><strong>Prompt caching<\/strong> reuses stable parts of the prompt to cut cost and latency significantly.<\/li>\n<\/ul>\n<\/div>\n<h2>Step 1: Get set up<\/h2>\n<p>You need two things: an API key and the SDK.<\/p>\n<ol>\n<li><strong>Get an API key<\/strong> \u2014 create an account in the Anthropic Console and generate an API key. Keep it secret: store it in an environment variable, never hard-code it or commit it to version control.<\/li>\n<li><strong>Install the SDK<\/strong> \u2014 Anthropic provides official SDKs. For Python:<\/li>\n<\/ol>\n<pre><code>pip install anthropic\n<\/code><\/pre>\n<p>(A Node.js SDK is available too; the concepts below are identical.)<\/p>\n<h2>Step 2: Your first message<\/h2>\n<p>The heart of the Claude API is the <strong>Messages API<\/strong>. You send a list of messages; Claude returns the next one. Here&#8217;s the simplest possible call:<\/p>\n<pre><code class=\"language-python\">from anthropic import Anthropic\n\nclient = Anthropic()  # reads the ANTHROPIC_API_KEY environment variable\n\nresponse = client.messages.create(\n    model=&quot;claude-sonnet-4-6&quot;,\n    max_tokens=1024,\n    messages=[\n        {&quot;role&quot;: &quot;user&quot;, &quot;content&quot;: &quot;Hello! What can you help me with?&quot;}\n    ],\n)\n\nprint(response.content[0].text)\n<\/code><\/pre>\n<p>That&#8217;s a working \u2014 if very forgetful \u2014 chatbot. <code>model<\/code> selects which Claude to use, <code>max_tokens<\/code> caps the reply length, and <code>messages<\/code> is the conversation so far.<\/p>\n<h2>Step 3: Give it a memory<\/h2>\n<p>The example above has no memory: each call is independent. To hold a real conversation, <strong>you<\/strong> keep the history and resend it every turn. The API itself is stateless \u2014 it knows only what you send it.<\/p>\n<p>The pattern: maintain a <code>messages<\/code> list, append each user message and each Claude reply, and pass the whole list on every call.<\/p>\n<pre><code class=\"language-python\">from anthropic import Anthropic\n\nclient = Anthropic()\nmessages = []\n\nwhile True:\n    user_input = input(&quot;You: &quot;)\n    if user_input.lower() == &quot;quit&quot;:\n        break\n\n    messages.append({&quot;role&quot;: &quot;user&quot;, &quot;content&quot;: user_input})\n\n    response = client.messages.create(\n        model=&quot;claude-sonnet-4-6&quot;,\n        max_tokens=1024,\n        messages=messages,\n    )\n\n    reply = response.content[0].text\n    print(f&quot;Claude: {reply}&quot;)\n\n    messages.append({&quot;role&quot;: &quot;assistant&quot;, &quot;content&quot;: reply})\n<\/code><\/pre>\n<p>Now it&#8217;s a real chatbot \u2014 it remembers everything earlier in the conversation, because that history is sent every turn.<\/p>\n<h2>Step 4: Set its personality with a system prompt<\/h2>\n<p>A generic assistant is rarely what you want. The <strong>system prompt<\/strong> defines the bot&#8217;s role, tone, and rules. It&#8217;s passed as a separate <code>system<\/code> parameter, not as a message.<\/p>\n<pre><code class=\"language-python\">SYSTEM_PROMPT = &quot;&quot;&quot;You are a friendly support assistant for a coffee\nsubscription service. Be warm, concise, and helpful. If a customer asks\nabout something you don't know, tell them you'll connect them to a human\nagent. Never discuss competitors.&quot;&quot;&quot;\n\nresponse = client.messages.create(\n    model=&quot;claude-sonnet-4-6&quot;,\n    max_tokens=1024,\n    system=SYSTEM_PROMPT,\n    messages=messages,\n)\n<\/code><\/pre>\n<p>The system prompt is your main tool for shaping behavior \u2014 invest time in it. Be specific about the role, the tone, what the bot should do when unsure, and any hard boundaries.<\/p>\n<h2>Step 5: Stream the response<\/h2>\n<p>In the examples above, you wait for the entire reply before anything appears. Real chat interfaces stream \u2014 the text arrives word by word. The SDK makes this straightforward:<\/p>\n<pre><code class=\"language-python\">with client.messages.stream(\n    model=&quot;claude-sonnet-4-6&quot;,\n    max_tokens=1024,\n    system=SYSTEM_PROMPT,\n    messages=messages,\n) as stream:\n    for text in stream.text_stream:\n        print(text, end=&quot;&quot;, flush=True)\n<\/code><\/pre>\n<p>Streaming doesn&#8217;t make generation faster, but it makes the bot <em>feel<\/em> dramatically more responsive, because the user sees output immediately.<\/p>\n<h2>Step 6: Keep costs down with prompt caching<\/h2>\n<p>API calls are billed by tokens, and a chatbot resends a lot of the same text every turn \u2014 the system prompt, and a conversation history that only grows. <strong>Prompt caching<\/strong> lets you mark stable parts of the prompt so the API reuses them instead of reprocessing them, which cuts both cost and latency substantially.<\/p>\n<p>You add a cache marker to the content you want cached \u2014 typically the system prompt, and any long, fixed context:<\/p>\n<pre><code class=\"language-python\">response = client.messages.create(\n    model=&quot;claude-sonnet-4-6&quot;,\n    max_tokens=1024,\n    system=[\n        {\n            &quot;type&quot;: &quot;text&quot;,\n            &quot;text&quot;: SYSTEM_PROMPT,\n            &quot;cache_control&quot;: {&quot;type&quot;: &quot;ephemeral&quot;},\n        }\n    ],\n    messages=messages,\n)\n<\/code><\/pre>\n<p>For any chatbot that handles real traffic, enable prompt caching from the start \u2014 it&#8217;s one of the highest-impact optimizations available, and it costs nothing to turn on.<\/p>\n<h2>Choosing a model<\/h2>\n<p>Claude comes in a few tiers. As a rule of thumb:<\/p>\n<ul>\n<li><strong>A fast, balanced model<\/strong> (such as the Sonnet tier) is the right default for most chatbots \u2014 strong quality, good speed, sensible cost.<\/li>\n<li><strong>The most capable model<\/strong> (the Opus tier) is worth it when the bot must handle hard reasoning or complex tasks.<\/li>\n<li><strong>A smaller, fastest model<\/strong> (the Haiku tier) suits simple, high-volume bots where speed and cost matter most.<\/li>\n<\/ul>\n<p>Start with the balanced tier and only move up or down once you see real usage.<\/p>\n<h2>Going to production<\/h2>\n<p>The code above is the working core. For a real deployment, add:<\/p>\n<ul>\n<li><strong>A web layer<\/strong> \u2014 wrap the logic in an API endpoint and connect a chat UI.<\/li>\n<li><strong>History limits<\/strong> \u2014 conversations grow forever; cap or summarize old turns so prompts don&#8217;t balloon.<\/li>\n<li><strong>Error handling<\/strong> \u2014 handle rate limits and transient failures with retries.<\/li>\n<li><strong>Knowledge<\/strong> \u2014 to answer from your own data, add <a href=\"\/rag-retrieval-augmented-generation-explained\/\">retrieval-augmented generation<\/a> so the bot pulls in relevant documents.<\/li>\n<li><strong>Safety<\/strong> \u2014 validate inputs and set clear boundaries in the system prompt.<\/li>\n<\/ul>\n<h2>FAQ<\/h2>\n<h3>How do I build a chatbot with the Claude API?<\/h3>\n<p>Install Anthropic&#8217;s SDK, get an API key, and call the Messages API: send a list of messages and Claude returns a reply. To make it conversational, keep the message history yourself and resend it each turn. Add a system prompt for personality and streaming for a responsive feel.<\/p>\n<h3>Does the Claude API remember previous messages?<\/h3>\n<p>No \u2014 the API is stateless. It only knows what you send in a given request. To give a chatbot memory, your application must store the conversation history and include it in the <code>messages<\/code> list on every call.<\/p>\n<h3>What is a system prompt?<\/h3>\n<p>The system prompt is a separate instruction that defines the chatbot&#8217;s role, tone, and rules \u2014 for example, &#8220;You are a concise support assistant; escalate to a human when unsure.&#8221; It&#8217;s passed as the <code>system<\/code> parameter and is the main way to shape how the bot behaves.<\/p>\n<h3>How much does it cost to run a Claude chatbot?<\/h3>\n<p>Cost depends on the model and how many tokens you process. A balanced model is inexpensive for typical chat traffic. Because chatbots resend the system prompt and growing history each turn, enabling prompt caching can cut costs significantly \u2014 it reuses stable parts of the prompt instead of reprocessing them.<\/p>\n<h3>Which Claude model should I use for a chatbot?<\/h3>\n<p>For most chatbots, start with a fast, balanced model (the Sonnet tier) \u2014 it offers strong quality at sensible speed and cost. Use the most capable model for complex reasoning tasks, and a smaller, faster model for simple high-volume bots.<\/p>\n<h2>Bottom line<\/h2>\n<p>Building a chatbot with the Claude API is mostly about the wiring, not the AI. The model handles understanding and response; you provide the loop. Keep a <code>messages<\/code> history and resend it for memory, use a <code>system<\/code> prompt for personality, stream for responsiveness, and turn on prompt caching to control cost.<\/p>\n<p>That core is genuinely an hour of work. The path to production is the familiar engineering around it \u2014 a web layer, history management, error handling, and <a href=\"\/rag-retrieval-augmented-generation-explained\/\">RAG<\/a> if the bot needs to know your data. Start with the simple loop above, get it talking, and build outward from there.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Build a working AI chatbot with Anthropic&#8217;s Claude API. This guide covers setup, conversation memory, system prompts, streaming, and the prompt caching that keeps costs low.<\/p>","protected":false},"author":0,"featured_media":62,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_uag_custom_page_level_css":"","site-sidebar-layout":"default","site-content-layout":"","ast-site-content-layout":"default","site-content-style":"default","site-sidebar-style":"default","ast-global-header-display":"","ast-banner-title-visibility":"","ast-main-header-display":"","ast-hfb-above-header-display":"","ast-hfb-below-header-display":"","ast-hfb-mobile-header-display":"","site-post-title":"","ast-breadcrumbs-content":"","ast-featured-img":"","footer-sml-layout":"","ast-disable-related-posts":"","theme-transparent-header-meta":"","adv-header-id-meta":"","stick-header-meta":"","header-above-stick-meta":"","header-main-stick-meta":"","header-below-stick-meta":"","astra-migrate-meta-layouts":"default","ast-page-background-enabled":"default","ast-page-background-meta":{"desktop":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"ast-content-background-meta":{"desktop":{"background-color":"var(--ast-global-color-4)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"var(--ast-global-color-4)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"var(--ast-global-color-4)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"_themeisle_gutenberg_block_has_review":false,"footnotes":""},"categories":[9],"tags":[446,445,444,447,443],"class_list":["post-61","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-tutorials","tag-ai-development","tag-anthropic-api","tag-build-ai-chatbot","tag-chatbot-tutorial","tag-claude-api"],"uagb_featured_image_src":{"full":["https:\/\/convly.ai\/wp-content\/uploads\/2026\/05\/build-ai-chatbot-claude-api.jpg",1200,630,false],"thumbnail":["https:\/\/convly.ai\/wp-content\/uploads\/2026\/05\/build-ai-chatbot-claude-api-150x150.jpg",150,150,true],"medium":["https:\/\/convly.ai\/wp-content\/uploads\/2026\/05\/build-ai-chatbot-claude-api-300x158.jpg",300,158,true],"medium_large":["https:\/\/convly.ai\/wp-content\/uploads\/2026\/05\/build-ai-chatbot-claude-api-768x403.jpg",768,403,true],"large":["https:\/\/convly.ai\/wp-content\/uploads\/2026\/05\/build-ai-chatbot-claude-api-1024x538.jpg",1024,538,true],"1536x1536":["https:\/\/convly.ai\/wp-content\/uploads\/2026\/05\/build-ai-chatbot-claude-api.jpg",1200,630,false],"2048x2048":["https:\/\/convly.ai\/wp-content\/uploads\/2026\/05\/build-ai-chatbot-claude-api.jpg",1200,630,false],"trp-custom-language-flag":["https:\/\/convly.ai\/wp-content\/uploads\/2026\/05\/build-ai-chatbot-claude-api-18x9.jpg",18,9,true]},"uagb_author_info":{"display_name":"","author_link":"https:\/\/convly.ai\/ar\/author\/"},"uagb_comment_info":0,"uagb_excerpt":"Build a working AI chatbot with Anthropic's Claude API. This guide covers setup, conversation memory, system prompts, streaming, and the prompt caching that keeps costs low.","_links":{"self":[{"href":"https:\/\/convly.ai\/ar\/wp-json\/wp\/v2\/posts\/61","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/convly.ai\/ar\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/convly.ai\/ar\/wp-json\/wp\/v2\/types\/post"}],"replies":[{"embeddable":true,"href":"https:\/\/convly.ai\/ar\/wp-json\/wp\/v2\/comments?post=61"}],"version-history":[{"count":1,"href":"https:\/\/convly.ai\/ar\/wp-json\/wp\/v2\/posts\/61\/revisions"}],"predecessor-version":[{"id":696,"href":"https:\/\/convly.ai\/ar\/wp-json\/wp\/v2\/posts\/61\/revisions\/696"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/convly.ai\/ar\/wp-json\/wp\/v2\/media\/62"}],"wp:attachment":[{"href":"https:\/\/convly.ai\/ar\/wp-json\/wp\/v2\/media?parent=61"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/convly.ai\/ar\/wp-json\/wp\/v2\/categories?post=61"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/convly.ai\/ar\/wp-json\/wp\/v2\/tags?post=61"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}