The Sanctuary

Writing about interests; Computer Science, Philosophy, Artificial Intelligence and more!

Building LLM Applications with LangChain: Go, Python, and AWS

LangChain has emerged as the de facto framework for building applications powered by Large Language Models. While the Python ecosystem dominates the AI landscape, Go developers aren’t left behind—LangChainGo brings the same abstractions to the Go world. This article explores practical implementations across both languages, from simple completions to full Retrieval-Augmented Generation (RAG) pipelines.

The LangChain Philosophy

LangChain provides composable building blocks for LLM applications:

  • Models: Unified interface to various LLM providers
  • Prompts: Templates and management for model inputs
  • Chains: Sequences of calls to models and utilities
  • Memory: State persistence across interactions
  • Retrieval: Integration with vector stores and document loaders

The key insight: LLM applications are pipelines, not single API calls.

Getting Started: LangChain Go with Ollama

The simplest entry point uses a local LLM through Ollama. This avoids API costs and latency while prototyping.

package main

import (
    "context"
    "fmt"
    "log"

    "github.com/tmc/langchaingo/llms"
    "github.com/tmc/langchaingo/llms/ollama"
)

func main() {
    llm, err := ollama.New(ollama.WithModel("llama2"))
    if err != nil {
        log.Fatal(err)
    }

    ctx := context.Background()
    completion, err := llms.GenerateFromSinglePrompt(
        ctx,
        llm,
        "Human: Who was the first man to walk on the moon?\nAssistant:",
        llms.WithTemperature(0.8),
        llms.WithStreamingFunc(func(ctx context.Context, chunk []byte) error {
            fmt.Print(string(chunk))
            return nil
        }),
    )
    if err != nil {
        log.Fatal(err)
    }

    _ = completion
}

Key points:

  • ollama.New() connects to a local Ollama instance
  • WithModel("llama2") selects the model to use
  • WithStreamingFunc enables real-time token streaming
  • WithTemperature(0.8) controls randomness in responses

Scaling Up: AWS Bedrock Integration in Go

For production workloads, AWS Bedrock provides managed access to foundation models including Claude, Llama, and Titan.

package main

import (
    "context"
    "flag"
    "fmt"
    "log"

    "github.com/tmc/langchaingo/llms"
    "github.com/tmc/langchaingo/llms/bedrock"
)

func main() {
    var (
        prompt    = flag.String("prompt", "Summarize the novel 'Fairy Tale'", "Prompt to send")
        awsRegion = flag.String("region", "eu-west-1", "AWS region")
        verbose   = flag.Bool("verbose", false, "Enable verbose output")
    )
    flag.Parse()

    ctx := context.Background()

    // Create Bedrock LLM with Claude Haiku
    opts := []bedrock.Option{
        bedrock.WithModel(bedrock.ModelAnthropicClaudeV3Haiku),
    }

    llm, err := bedrock.New(opts...)
    if err != nil {
        log.Fatalf("Failed to create Bedrock LLM: %v", err)
    }

    if *verbose {
        fmt.Printf("AWS Region: %s\n", *awsRegion)
        fmt.Printf("Prompt: %s\n", *prompt)
    }

    // Simple Call method
    response, err := llm.Call(ctx, *prompt)
    if err != nil {
        log.Printf("Error calling model: %v", err)
    } else {
        fmt.Printf("Response: %s\n", response)
    }

    // GenerateContent with structured messages
    messages := []llms.MessageContent{
        {
            Role: llms.ChatMessageTypeSystem,
            Parts: []llms.ContentPart{
                llms.TextPart("You are a helpful assistant."),
            },
        },
        {
            Role: llms.ChatMessageTypeHuman,
            Parts: []llms.ContentPart{
                llms.TextPart(*prompt),
            },
        },
    }

    resp, err := llm.GenerateContent(ctx, messages)
    if err != nil {
        log.Printf("Error generating content: %v", err)
    } else {
        if len(resp.Choices) > 0 {
            fmt.Printf("Response: %s\n", resp.Choices[0].Content)
        }
    }
}

The Go Bedrock integration provides:

  • Two calling patterns: Simple Call() for basic prompts, GenerateContent() for structured conversations
  • Message types: System, Human, and AI message roles
  • AWS credential handling: Uses standard AWS SDK credential chain

Full RAG Pipeline: Python with LangChain

For complex applications, Python’s LangChain offers the most mature ecosystem. Here’s a complete RAG implementation using AWS Bedrock, Titan embeddings, and Qdrant vector store.

from langchain.chat_models import init_chat_model
from langchain_aws import BedrockEmbeddings
from langchain_qdrant import QdrantVectorStore
from qdrant_client import QdrantClient
import os

# LLM Setup - Claude 3.7 Sonnet via AWS Bedrock
model = init_chat_model(
    "eu.anthropic.claude-3-7-sonnet-20250219-v1:0",
    model_provider="bedrock_converse"
)

# Embedding model - Amazon Titan
embeddings = BedrockEmbeddings(model_id="amazon.titan-embed-text-v2:0")

# Vector store - Qdrant Cloud
qdrant_client = QdrantClient(
    url=os.getenv("QDRANT_CLOUD_URL"),
    api_key=os.getenv("QDRANT_CLOUD_KEY"),
)

vector_store = QdrantVectorStore(
    client=qdrant_client,
    collection_name="langchainpy-aws-poc",
    embedding=embeddings,
)

Document Loading and Chunking

RAG begins with ingesting documents:

import bs4
from langchain_community.document_loaders import WebBaseLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter

# Load web content with targeted parsing
loader = WebBaseLoader(
    web_paths=("https://lilianweng.github.io/posts/2023-06-23-agent/",),
    bs_kwargs=dict(
        parse_only=bs4.SoupStrainer(
            class_=("post-content", "post-title", "post-header")
        )
    ),
)
docs = loader.load()

# Split into chunks for embedding
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000,
    chunk_overlap=200
)
all_splits = text_splitter.split_documents(docs)

# Index in vector store
_ = vector_store.add_documents(documents=all_splits)

Key considerations:

  • chunk_size=1000: Balance between context and specificity
  • chunk_overlap=200: Prevents information loss at boundaries
  • Targeted parsing: BeautifulSoup filters relevant content

LangGraph Orchestration

LangGraph provides state management and workflow orchestration:

from langchain import hub
from langchain_core.documents import Document
from langgraph.graph import START, StateGraph
from typing_extensions import List, TypedDict

# Pull a standard RAG prompt template
prompt = hub.pull("rlm/rag-prompt")

# Define application state
class State(TypedDict):
    question: str
    context: List[Document]
    answer: str

# Retrieval step
def retrieve(state: State):
    retrieved_docs = vector_store.similarity_search(state["question"])
    return {"context": retrieved_docs}

# Generation step
def generate(state: State):
    docs_content = "\n\n".join(doc.page_content for doc in state["context"])
    messages = prompt.invoke({
        "question": state["question"],
        "context": docs_content
    })
    response = model.invoke(messages)
    return {"answer": response.content}

# Build and compile the graph
graph_builder = StateGraph(State).add_sequence([retrieve, generate])
graph_builder.add_edge(START, "retrieve")
graph = graph_builder.compile()

# Execute
response = graph.invoke({"question": "What is Task Decomposition?"})
print(response["answer"])

The pipeline:

  1. Retrieve: Vector similarity search finds relevant document chunks
  2. Generate: LLM synthesizes answer from retrieved context

Architecture Comparison

AspectGo (LangChainGo)Python (LangChain)
MaturityGrowingMature
ProvidersOllama, Bedrock, OpenAI50+ integrations
RAG SupportBasicFull ecosystem
LangGraphNot availableFull support
PerformanceLower latencyMore features
Use CaseMicroservices, CLI toolsComplex AI apps

When to Use Each

Choose Go when:

  • Building microservices that need LLM capabilities
  • Performance and binary size matter
  • Simple completion or chat use cases
  • Your infrastructure is Go-based

Choose Python when:

  • Building complex RAG pipelines
  • Need LangGraph for orchestration
  • Require extensive integrations (document loaders, vector stores)
  • Prototyping and experimentation

Production Considerations

AWS Bedrock Setup

  1. Enable model access in the AWS Console
  2. Configure IAM permissions for bedrock:InvokeModel
  3. Use cross-region inference endpoints for newer models
  4. Monitor costs—embeddings and completions bill separately

Vector Store Selection

StoreBest For
QdrantProduction, managed cloud option
PineconeServerless, auto-scaling
pgvectorPostgreSQL integration
FAISSLocal development, in-memory

Chunking Strategy

The chunk size affects retrieval quality:

  • Smaller chunks (500-1000): More precise retrieval, may lose context
  • Larger chunks (1500-2000): Better context, noisier retrieval
  • Overlap (10-20%): Ensures continuity across chunk boundaries

Conclusion

LangChain democratizes LLM application development by providing consistent abstractions across languages and providers. Start with Go for simple integrations, graduate to Python for complex pipelines. AWS Bedrock offers a production-ready backend without managing infrastructure.

The future of application development increasingly involves LLM components. LangChain ensures you’re not locked into any single provider while maintaining the flexibility to evolve your architecture.