Building LLM Applications with LangChain: Go, Python, and AWS

2024-09-15 00:00:00

LangChain has emerged as the de facto framework for building applications powered by Large Language Models. While the Python ecosystem dominates the AI landscape, Go developers aren’t left behind—LangChainGo brings the same abstractions to the Go world. This article explores practical implementations across both languages, from simple completions to full Retrieval-Augmented Generation (RAG) pipelines.

The LangChain Philosophy

LangChain provides composable building blocks for LLM applications:

Models: Unified interface to various LLM providers
Prompts: Templates and management for model inputs
Chains: Sequences of calls to models and utilities
Memory: State persistence across interactions
Retrieval: Integration with vector stores and document loaders

The key insight: LLM applications are pipelines, not single API calls.

Getting Started: LangChain Go with Ollama

The simplest entry point uses a local LLM through Ollama. This avoids API costs and latency while prototyping.

package main

import (
    "context"
    "fmt"
    "log"

    "github.com/tmc/langchaingo/llms"
    "github.com/tmc/langchaingo/llms/ollama"
)

func main() {
    llm, err := ollama.New(ollama.WithModel("llama2"))
    if err != nil {
        log.Fatal(err)
    }

    ctx := context.Background()
    completion, err := llms.GenerateFromSinglePrompt(
        ctx,
        llm,
        "Human: Who was the first man to walk on the moon?\nAssistant:",
        llms.WithTemperature(0.8),
        llms.WithStreamingFunc(func(ctx context.Context, chunk []byte) error {
            fmt.Print(string(chunk))
            return nil
        }),
    )
    if err != nil {
        log.Fatal(err)
    }

    _ = completion
}

Key points:

ollama.New() connects to a local Ollama instance
WithModel("llama2") selects the model to use
WithStreamingFunc enables real-time token streaming
WithTemperature(0.8) controls randomness in responses

Scaling Up: AWS Bedrock Integration in Go

For production workloads, AWS Bedrock provides managed access to foundation models including Claude, Llama, and Titan.

package main

import (
    "context"
    "flag"
    "fmt"
    "log"

    "github.com/tmc/langchaingo/llms"
    "github.com/tmc/langchaingo/llms/bedrock"
)

func main() {
    var (
        prompt    = flag.String("prompt", "Summarize the novel 'Fairy Tale'", "Prompt to send")
        awsRegion = flag.String("region", "eu-west-1", "AWS region")
        verbose   = flag.Bool("verbose", false, "Enable verbose output")
    )
    flag.Parse()

    ctx := context.Background()

    // Create Bedrock LLM with Claude Haiku
    opts := []bedrock.Option{
        bedrock.WithModel(bedrock.ModelAnthropicClaudeV3Haiku),
    }

    llm, err := bedrock.New(opts...)
    if err != nil {
        log.Fatalf("Failed to create Bedrock LLM: %v", err)
    }

    if *verbose {
        fmt.Printf("AWS Region: %s\n", *awsRegion)
        fmt.Printf("Prompt: %s\n", *prompt)
    }

    // Simple Call method
    response, err := llm.Call(ctx, *prompt)
    if err != nil {
        log.Printf("Error calling model: %v", err)
    } else {
        fmt.Printf("Response: %s\n", response)
    }

    // GenerateContent with structured messages
    messages := []llms.MessageContent{
        {
            Role: llms.ChatMessageTypeSystem,
            Parts: []llms.ContentPart{
                llms.TextPart("You are a helpful assistant."),
            },
        },
        {
            Role: llms.ChatMessageTypeHuman,
            Parts: []llms.ContentPart{
                llms.TextPart(*prompt),
            },
        },
    }

    resp, err := llm.GenerateContent(ctx, messages)
    if err != nil {
        log.Printf("Error generating content: %v", err)
    } else {
        if len(resp.Choices) > 0 {
            fmt.Printf("Response: %s\n", resp.Choices[0].Content)
        }
    }
}

The Go Bedrock integration provides:

Two calling patterns: Simple Call() for basic prompts, GenerateContent() for structured conversations
Message types: System, Human, and AI message roles
AWS credential handling: Uses standard AWS SDK credential chain

Full RAG Pipeline: Python with LangChain

For complex applications, Python’s LangChain offers the most mature ecosystem. Here’s a complete RAG implementation using AWS Bedrock, Titan embeddings, and Qdrant vector store.

from langchain.chat_models import init_chat_model
from langchain_aws import BedrockEmbeddings
from langchain_qdrant import QdrantVectorStore
from qdrant_client import QdrantClient
import os

# LLM Setup - Claude 3.7 Sonnet via AWS Bedrock
model = init_chat_model(
    "eu.anthropic.claude-3-7-sonnet-20250219-v1:0",
    model_provider="bedrock_converse"
)

# Embedding model - Amazon Titan
embeddings = BedrockEmbeddings(model_id="amazon.titan-embed-text-v2:0")

# Vector store - Qdrant Cloud
qdrant_client = QdrantClient(
    url=os.getenv("QDRANT_CLOUD_URL"),
    api_key=os.getenv("QDRANT_CLOUD_KEY"),
)

vector_store = QdrantVectorStore(
    client=qdrant_client,
    collection_name="langchainpy-aws-poc",
    embedding=embeddings,
)

Document Loading and Chunking

RAG begins with ingesting documents:

import bs4
from langchain_community.document_loaders import WebBaseLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter

# Load web content with targeted parsing
loader = WebBaseLoader(
    web_paths=("https://lilianweng.github.io/posts/2023-06-23-agent/",),
    bs_kwargs=dict(
        parse_only=bs4.SoupStrainer(
            class_=("post-content", "post-title", "post-header")
        )
    ),
)
docs = loader.load()

# Split into chunks for embedding
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000,
    chunk_overlap=200
)
all_splits = text_splitter.split_documents(docs)

# Index in vector store
_ = vector_store.add_documents(documents=all_splits)

Key considerations:

chunk_size=1000: Balance between context and specificity
chunk_overlap=200: Prevents information loss at boundaries
Targeted parsing: BeautifulSoup filters relevant content

LangGraph Orchestration

LangGraph provides state management and workflow orchestration:

from langchain import hub
from langchain_core.documents import Document
from langgraph.graph import START, StateGraph
from typing_extensions import List, TypedDict

# Pull a standard RAG prompt template
prompt = hub.pull("rlm/rag-prompt")

# Define application state
class State(TypedDict):
    question: str
    context: List[Document]
    answer: str

# Retrieval step
def retrieve(state: State):
    retrieved_docs = vector_store.similarity_search(state["question"])
    return {"context": retrieved_docs}

# Generation step
def generate(state: State):
    docs_content = "\n\n".join(doc.page_content for doc in state["context"])
    messages = prompt.invoke({
        "question": state["question"],
        "context": docs_content
    })
    response = model.invoke(messages)
    return {"answer": response.content}

# Build and compile the graph
graph_builder = StateGraph(State).add_sequence([retrieve, generate])
graph_builder.add_edge(START, "retrieve")
graph = graph_builder.compile()

# Execute
response = graph.invoke({"question": "What is Task Decomposition?"})
print(response["answer"])

The pipeline:

Retrieve: Vector similarity search finds relevant document chunks
Generate: LLM synthesizes answer from retrieved context

Architecture Comparison

Aspect	Go (LangChainGo)	Python (LangChain)
Maturity	Growing	Mature
Providers	Ollama, Bedrock, OpenAI	50+ integrations
RAG Support	Basic	Full ecosystem
LangGraph	Not available	Full support
Performance	Lower latency	More features
Use Case	Microservices, CLI tools	Complex AI apps

When to Use Each

Choose Go when:

Building microservices that need LLM capabilities
Performance and binary size matter
Simple completion or chat use cases
Your infrastructure is Go-based

Choose Python when:

Building complex RAG pipelines
Need LangGraph for orchestration
Require extensive integrations (document loaders, vector stores)
Prototyping and experimentation

Production Considerations

AWS Bedrock Setup

Enable model access in the AWS Console
Configure IAM permissions for bedrock:InvokeModel
Use cross-region inference endpoints for newer models
Monitor costs—embeddings and completions bill separately

Vector Store Selection

Store	Best For
Qdrant	Production, managed cloud option
Pinecone	Serverless, auto-scaling
pgvector	PostgreSQL integration
FAISS	Local development, in-memory

Chunking Strategy

The chunk size affects retrieval quality:

Smaller chunks (500-1000): More precise retrieval, may lose context
Larger chunks (1500-2000): Better context, noisier retrieval
Overlap (10-20%): Ensures continuity across chunk boundaries

Conclusion

LangChain democratizes LLM application development by providing consistent abstractions across languages and providers. Start with Go for simple integrations, graduate to Python for complex pipelines. AWS Bedrock offers a production-ready backend without managing infrastructure.

The future of application development increasingly involves LLM components. LangChain ensures you’re not locked into any single provider while maintaining the flexibility to evolve your architecture.

The Sanctuary