Agent architectures, LLM system design patterns, and everything you need to know for AI engineering interviews.
How LLMs interact with external tools and APIs through structured function calling, including schema design and error handling.
The Reasoning + Acting loop where agents interleave chain-of-thought reasoning with tool execution steps.
Coordinating multiple specialized agents — routing, delegation, handoffs, and shared state management.
Strategies for managing conversation history, long-term memory, context windows, and summarization.
Retrieval-Augmented Generation: chunking, embedding, vector search, re-ranking, and grounding LLM outputs in external knowledge.
How to evaluate agent performance — task completion rates, trajectory analysis, regression testing, and human-in-the-loop evaluation.
Systematic approaches to prompting: few-shot, chain-of-thought, system prompts, and structured output formatting.
When to fine-tune a model vs use RAG. Trade-offs in cost, latency, accuracy, and maintainability.
Designing pipelines for generating, storing, and querying embeddings — vector databases, indexing strategies, and similarity search.
Input/output validation, content filtering, PII detection, prompt injection defense, and responsible AI practices.
Batching, model serving (vLLM, TensorRT), GPU optimization, auto-scaling, and multi-model deployment strategies.
Token management, caching strategies, model selection, prompt compression, and building cost-efficient LLM applications.