Skip to main content

Command Palette

Search for a command to run...

SOLID Principles for AI Systems: Why Your RAG Pipeline Needs Better Architecture

Updated
12 min read
SOLID Principles for AI Systems: Why Your RAG Pipeline Needs Better Architecture

Your RAG pipeline works perfectly in staging. You deploy to production. 10,000 concurrent users hit it. Embeddings start timing out. Vector search fails silently. LLM calls retry infinitely because someone forgot to set a max. Your "AI-powered" feature is down.

The root cause? Not the model. Not the vector database. The code around it.

Here's what nobody tells you: AI tooling moves fast. New models drop every week, frameworks change APIs monthly. But bad architecture? That compounds faster than technical debt in a monolith. Everyone's racing to ship AI features. Very few are building systems that survive their first real load test.

This isn't about choosing LangChain over LlamaIndex. It's about the boring fundamentals that keep AI systems running when things go wrong.


The Problem: AI Code Ages in Dog Years

Walk into most AI codebases today and you'll find the same pattern: a massive AIService class doing everything. Prompt templating, embedding generation, vector retrieval, caching, monitoring - all in one place.

I've seen this exact setup blow up in three ways:

  1. Can't experiment safely.

    Want to A/B test two prompt strategies? Too bad. The prompt logic is tangled with your retrieval code. Every test requires a full redeploy.

  2. Vendor lock-in at scale.
    Switching from OpenAI to Claude means touching 40 files. That "simple" model swap becomes a two-week refactor because your business logic directly imports the OpenAI SDK.

  3. Testing costs real money.
    No clean interfaces means you can't mock LLM calls. Every test hits the actual API. Your CI bill is $500/month and climbing.

There's this myth floating around: "AI code is just glue - SOLID principles are overkill."

Here's the reality check. Your glue code IS your product. Those abstractions you skipped? They're costing you $10k/month in wasted LLM calls through retries and poor error handling. That tight coupling? Every model upgrade becomes a rewrite instead of a config change.

SOLID isn't academic theory. It's survival architecture for systems that change constantly. And AI systems? They change all the time.


What is SOLID? (And Why Should You Care)

SOLID is five design principles from object-oriented programming. They're not rules you follow blindly. Think of them as forcing functions that make your code:

  • Easy to change when you need to swap models or vendors

  • Safe to extend when you're adding features without breaking existing flows

  • Cheap to test because you can mock LLM calls instead of burning API credits

Here's what each principle does:

  1. Single Responsibility: One class, one reason to change. Your PromptBuilder shouldn't care about vector databases.

  2. Open/Closed: Extend behavior without editing stable code. Adding Claude support shouldn't require changing your OpenAI integration.

  3. Liskov Substitution: Swap implementations without breaking contracts. If you say your interface returns 1536-dimension vectors, all implementations better deliver exactly that.

  4. Interface Segregation: Don't force clients to depend on methods they don't use. Batch embedding models shouldn't implement streaming interfaces.

  5. Dependency Inversion: Depend on abstractions, not concrete vendors. Your business logic should talk to a ChatService interface, not import the OpenAI SDK directly.

These aren't "best practices" you memorize and apply everywhere. They're trade-off tools. The skill is knowing when to use them and when to skip them.


Single Responsibility: One Job Per Class

The core idea: A class should have one reason to change. Not one method. One reason someone would need to open the file and edit it.

In AI systems, this shows up everywhere. Prompt logic changes frequently. You're always tweaking templates. Embedding strategies change less often, maybe when you upgrade models. Vector retrieval logic? Even more stable.

When these three concerns live in the same class, every prompt tweak risks breaking your retrieval. Every embedding model upgrade requires regression testing your entire flow.

Here's what this looks like:

// ❌ Everything in one place
@Service
public class RAGService {
    private final OpenAI openAI;
    private final VectorStore vectorStore;

    public String answer(String question) {
        // Prompt building
        String systemPrompt = "You are a helpful assistant...";
        String context = retrieveContext(question);
        String fullPrompt = systemPrompt + "\n\nContext: " + context + "\n\nQuestion: " + question;

        // LLM call
        return openAI.complete(fullPrompt);
    }

    private String retrieveContext(String question) {
        // Embedding
        float[] embedding = openAI.embed(question);
        // Retrieval
        List<String> docs = vectorStore.search(embedding, 5);
        return String.join("\n", docs);
    }
}

Now you want to change your prompt strategy. Maybe add few-shot examples. You open RAGService. While you're there, you see the embedding code. And the retrieval logic. And suddenly you're wondering if that hardcoded "5" should be configurable. One simple change spirals into refactoring everything.

Here's the split:

// ✅ Each class has one job
@Service
public class PromptBuilder {
    public String buildPrompt(String question, String context) {
        return "You are a helpful assistant...\n\n" +
               "Context: " + context + "\n\n" +
               "Question: " + question;
    }
}

@Service
public class EmbeddingService {
    private final OpenAI openAI;

    public float[] embed(String text) {
        return openAI.embed(text);
    }
}

@Service
public class ContextRetriever {
    private final VectorStore vectorStore;
    private final EmbeddingService embeddingService;

    public String retrieve(String question) {
        float[] embedding = embeddingService.embed(question);
        List<String> docs = vectorStore.search(embedding, 5);
        return String.join("\n", docs);
    }
}

Now changing prompt templates doesn't touch embedding logic. Swapping vector databases doesn't affect prompt building. Each piece can evolve independently.

Quick win: Next time you write a service that calls an LLM, ask yourself: "Am I mixing business logic with infrastructure?" If yes, split them.

When to skip it: Prototyping a new prompt technique? Keep it simple. One class is fine. Once you're running experiments or serving production traffic, refactor.


Open/Closed: Extend Without Editing

The core idea: Software should be open for extension but closed for modification. Add new behavior by writing new code, not editing existing code.

In AI systems, this is your defense against vendor lock-in and model churn. When GPT-5 drops latency and you need to add Claude as a fallback, you shouldn't be editing your core business logic.

Here's the smell:

// ❌ Vendor logic embedded everywhere
@Service
public class ChatService {
    public String complete(String prompt) {
        OpenAI openAI = new OpenAI(apiKey);
        return openAI.chat()
            .model("gpt-5")
            .message(prompt)
            .execute()
            .getContent();
    }
}

Now you want to add Claude support. Maybe for cost comparison. Maybe as a fallback when OpenAI is down. You have two bad options: edit this class (risky) or copy-paste it into ClaudeChatService (now you have two places to maintain retry logic).

Here's the fix:

// ✅ Interface lets you add providers without editing existing code
public interface LLMProvider {
    String complete(String prompt);
}

@Component
public class OpenAIProvider implements LLMProvider {
    private final OpenAI client;

    @Override
    public String complete(String prompt) {
        return client.chat()
            .model("gpt-5")
            .message(prompt)
            .execute()
            .getContent();
    }
}

@Component
public class ClaudeProvider implements LLMProvider {
    private final Anthropic client;

    @Override
    public String complete(String prompt) {
        return client.messages()
            .model("claude-sonnet-4.5")
            .userMessage(prompt)
            .execute()
            .getText();
    }
}

Your business logic depends on LLMProvider. Adding a new model is just a new class implementing that interface. Zero edits to existing code. Zero regression risk.

Quick win: If you're hardcoding vendor SDKs in your service layer, extract an interface. Wire the concrete implementation in your Spring configuration.

When to skip it: If you know you're married to OpenAI for the next two years and won't even consider alternatives, the interface might be premature. But model APIs change. Bet accordingly.


Liskov Substitution: Contracts You Can Trust

The core idea: If your code expects type A, you should be able to substitute any subtype of A without breaking things. Implementations must honor the contract their interface promises.

In AI systems, this shows up with model swaps. You define an interface that says "this returns embeddings." Great. But does it return 768-dimensional vectors? 1536? 3072? If implementations differ, downstream code breaks.

Here's the silent failure:

// ❌ Interface doesn't enforce dimensions
public interface EmbeddingModel {
    float[] embed(String text);
}

@Component
public class FastEmbedding implements EmbeddingModel {
    public float[] embed(String text) {
        return new float[768]; // Small, fast model
    }
}

@Component  
public class HighQualityEmbedding implements EmbeddingModel {
    public float[] embed(String text) {
        return new float[1536]; // Better model, different dimensions
    }
}

Your vector database is configured for 768 dimensions. Someone swaps in HighQualityEmbedding via config. Ingestion fails with a cryptic dimension mismatch error. Debugging takes an hour because the interface lied— it said "embeddings" but didn't specify what kind.

Here's the fix:

// ✅ Contract enforces dimension consistency
public interface EmbeddingModel {
    float[] embed(String text);
    int getDimensions();
}

@Component
public class FastEmbedding implements EmbeddingModel {
    public float[] embed(String text) {
        return new float[768];
    }

    public int getDimensions() {
        return 768;
    }
}

// Now your VectorStore can validate at startup
@Service
public class VectorStore {
    private final EmbeddingModel embeddingModel;

    @PostConstruct
    public void validateDimensions() {
        if (embeddingModel.getDimensions() != configuredDimensions) {
            throw new IllegalStateException(
                "Embedding model returns " + embeddingModel.getDimensions() + 
                " dimensions, but vector store expects " + configuredDimensions
            );
        }
    }
}

Fail fast at startup, not in production. Swap models safely because the contract is explicit.

Quick win: If your interfaces return "embeddings" or "predictions" without specifying shape or type, add methods that expose these properties. Make violations obvious.

When to skip it: If you control all implementations and they live in the same codebase, you might get away with implicit contracts. But the moment you're integrating third-party models, make it explicit.


Interface Segregation: Don't Force Unused Methods

The core idea: Don't force clients to implement methods they don't need. Big, kitchen-sink interfaces create friction and fake implementations.

In AI systems, this shows up with streaming vs batch models. Not every model supports streaming. But if your interface requires it, every implementation needs to fake it or throw UnsupportedOperationException.

Here's the friction:

// ❌ One interface tries to do everything
public interface AIModel {
    String complete(String prompt);
    Stream<String> completeStream(String prompt);
    List<String> completeBatch(List<String> prompts);
}

@Component
public class BatchEmbeddingModel implements AIModel {
    public String complete(String prompt) {
        throw new UnsupportedOperationException("Use batch method");
    }

    public Stream<String> completeStream(String prompt) {
        throw new UnsupportedOperationException("Streaming not supported");
    }

    public List<String> completeBatch(List<String> prompts) {
        // Actual implementation
    }
}

Two-thirds of the interface is noise. Tests need to handle these exceptions. Documentation needs to warn users. It's all friction.

Here's the split:

// ✅ Clients only depend on what they need
public interface SyncModel {
    String complete(String prompt);
}

public interface StreamingModel {
    Stream<String> completeStream(String prompt);
}

public interface BatchModel {
    List<String> completeBatch(List<String> prompts);
}

@Component
public class OpenAIChat implements SyncModel, StreamingModel {
    // Implements both because OpenAI supports it
}

@Component
public class BatchEmbedding implements BatchModel {
    // Only implements batch—no fake methods
}

Your code only imports the interfaces it actually uses. No exception handling for unsupported operations. Clean contracts.

Quick win: If you're implementing methods just to throw exceptions, your interface is too big. Split it.

When to skip it: If every implementation genuinely supports every method, one interface is fine. But in AI, capabilities vary widely across models. Split accordingly.


Dependency Inversion: Abstractions Over Concretions

The core idea: High-level business logic shouldn't depend on low-level implementation details. Both should depend on abstractions.

In AI systems, this means your core logic shouldn't import vendor SDKs directly. It should depend on interfaces. Wire concrete implementations through dependency injection.

Here's the coupling:

// ❌ Business logic imports OpenAI directly
@Service
public class CustomerSupportService {
    private final OpenAI openAI;

    public String handleQuery(String question) {
        String context = loadCustomerHistory();
        String prompt = buildPrompt(context, question);

        // Direct dependency on OpenAI SDK
        return openAI.chat()
            .model("gpt-4")
            .message(prompt)
            .execute()
            .getContent();
    }
}

Testing this requires hitting the real OpenAI API. Every test costs money. CI is slow. You can't test offline. And if OpenAI's API is down, your entire test suite fails.

Here's the inversion:

// ✅ Business logic depends on abstraction
public interface ChatCompletionService {
    String complete(String prompt);
}

@Service
public class CustomerSupportService {
    private final ChatCompletionService chatService;

    public CustomerSupportService(ChatCompletionService chatService) {
        this.chatService = chatService;
    }

    public String handleQuery(String question) {
        String context = loadCustomerHistory();
        String prompt = buildPrompt(context, question);
        return chatService.complete(prompt);
    }
}

// Wire the real implementation in config
@Configuration
public class AIConfig {
    @Bean
    public ChatCompletionService chatService() {
        return new OpenAIChatService(apiKey);
    }
}

// Mock in tests
@Test
public void testCustomerQuery() {
    ChatCompletionService mock = prompt -> "Mocked response";
    CustomerSupportService service = new CustomerSupportService(mock);

    String result = service.handleQuery("Test question");
    assertEquals("Mocked response", result);
}

Tests run instantly. No API costs. No network dependencies. You can test the business logic in complete isolation.

Quick win: If your service classes import vendor SDKs, extract an interface and inject it. The real implementation and the mock both implement the same contract.

When to skip it: Tiny scripts or one-off experiments don't need this. But production services? Always invert the dependency.


How Each Principle Protects Your AI System

PrincipleStability ImpactCost ImpactVelocity Impact
Single ResponsibilityIsolate failures —embedding timeout doesn't kill retrievalEasier to optimize hot paths separatelyChange prompts without QA-ing the entire pipeline
Open/ClosedAdd fallback models without touching stable codeA/B test vendors without duplicating logicNew model = one new class, zero edits elsewhere
Liskov SubstitutionSafe model swaps in productionNo surprise dimension mismatches breaking ingestionConfig-driven model selection that actually works
Interface SegregationDon't implement unused streaming retry logicLess code means fewer bugs, less downtimeSmaller interfaces are faster to implement
Dependency InversionMock LLMs in tests, zero API costTest without burning credits on every CI runCI runs in seconds, not minutes waiting for APIs

These aren't nice-to-haves. Each one either cuts costs or prevents downtime. That's the math.


Reality Check: When to Actually Use This

SOLID isn't about perfect code. It's about changing code safely. And AI systems have high change velocity plus high cost per mistake. Bad combination without guardrails.

Here's the honest breakdown. Building a weekend prototype to test if RAG works for your use case? Monolithic code is fine. Ship it. Learn fast.

Building a production RAG system serving 100,000 users? You need these abstractions. Because when you're doing 10 million LLM calls per month, a poorly designed retry mechanism costs you $15,000 in wasted tokens. When your embedding model changes, you need to know that swap won't break vector search for 50,000 existing documents.

The real test of your architecture is simple. Can you swap OpenAI for Claude in under two hours without redeploying 10 services? Can you A/B test two prompt strategies by changing a config flag? Can your tests run without an internet connection?

If the answer is no, your architecture is a liability. These principles fix that.

Want the deep dive? DM me

Email: harsh@pragmaticbyharsh.com

Portfolio: Pragmatic By Harsh

SOLID Principles for AI Systems: Why Your RAG Pipeline Needs Better Architecture