Prompt Engineering with Claude Opus 4.6: A Practical Guide

If you already work with language models, you know that the difference between a mediocre response and an exceptional one almost always lies in the prompt. With the release of Claude Opus 4.6, Anthropic significantly elevated reasoning capabilities, instruction following, and structured content generation. But to extract the maximum from this model, you need to master specific prompt engineering techniques that work with the Claude architecture — and many of them differ from what works with other LLMs.

I've been using Claude Opus as my primary work tool for over 8 months, both for software development and technical content production. What I've noticed during this period is that Opus 4.6 responds radically differently depending on how you structure your instructions. Prompts that worked perfectly with GPT-4 or even Claude 3.5 Sonnet produce completely different results here — and in most cases, Opus 4.6 delivers superior results when you understand the rules of the game. The part nobody talks about is that the model is extremely literal: if you don't explicitly ask for something, it simply won't do it. This might seem like a limitation, but in practice it's a huge advantage for anyone who knows how to write precise instructions.

What changed in Claude Opus 4.6 compared to previous versions

Claude Opus 4.6 represents a qualitative leap over previous versions of the Claude family. According to Anthropic's official documentation, the model processes up to 1 million context tokens, meaning you can feed it extensive documents, entire codebases, and long conversations without losing information. But what really matters for prompt engineering are three fundamental changes:

Literal interpretation of instructions: Opus 4.6 doesn't try to guess what you want. It follows exactly what's written in the prompt. This eliminates the undesired "over-helping" behaviors that were common in previous versions.
Native extended thinking support: the model can reason internally before responding, producing more accurate answers in complex analysis and coding tasks.
Better adherence to structured formats: JSON, XML, tables, and code follow the requested format with much more consistency, reducing the need for retries and defensive parsing.

In practice, this means vague prompts produce vague results — but well-structured prompts produce exceptional results. The model rewards precision.

Structuring prompts with XML Tags — the most effective technique

Of all available prompt engineering techniques, the one that produces the most consistently excellent results with Claude Opus 4.6 is using XML tags to structure the prompt. As documented in Anthropic's official best practices, Claude was trained to recognize and respect the hierarchy of XML tags as semantic delimiters.

Instead of writing a long paragraph with mixed instructions, separate each prompt component into specific tags:

XML Tag	Function	Usage Example
`<context>`	Background information the model needs	Project description, tech stack, constraints
`<instructions>`	What the model should do	Main task, expected output format
`<example>`	Input/output examples (few-shot)	Input/output pairs to calibrate format
`<constraints>`	Mandatory limitations and rules	Maximum size, language, forbidden format
`<output_format>`	Exact response structure	JSON schema, markdown template

The reason XML works better than Markdown or numbered lists is that tags create clear semantic boundaries. The model knows exactly where context ends and instructions begin, which drastically reduces interpretation errors.

Practical example: code review prompt

A generic prompt like "review this code and suggest improvements" will produce a generic response. Compare it with a structured prompt that specifies exactly what you expect:

<context>
I'm developing a REST API in Node.js with Express.
The code below is a JWT authentication middleware.
Stack: Node 20, Express 4, jsonwebtoken 9.
</context>

<instructions>
Review the code below focusing on:
1. Security vulnerabilities (OWASP Top 10)
2. Incomplete error handling
3. Performance under high concurrency scenarios
</instructions>

<constraints>
- Don't suggest migrating to another framework
- Maintain Node 20 compatibility
- Prioritize security over readability
</constraints>

<code>
// paste code here
</code>

<output_format>
For each issue found:
- Severity: critical / high / medium / low
- Affected line(s)
- Problem
- Suggested fix (with code)
</output_format>

The quality difference between the two approaches is striking. The structured prompt directs the model's attention to the aspects that truly matter and eliminates generic responses like "consider adding more tests."

Chain of Thought: when and how to use extended thinking

Chain of Thought (CoT) is a technique where you instruct the model to reason step by step before delivering the final answer. In Claude Opus 4.6, this technique is enhanced by the extended thinking feature, which allows the model to process internally before generating visible output.

According to Anthropic's documentation, extended thinking works best with adaptive settings rather than manually forcing a thinking block. In practice, this means that for complex tasks — bug analysis, architecture decisions, mathematical problems — you should enable thinking and let the model decide the depth of reasoning.

However, for simple tasks like format conversion, direct translation, or boilerplate generation, extended thinking adds unnecessary latency. The practical rule is: if the task requires multi-step reasoning, enable it; if it's a direct transformation, disable it.

When CoT makes a real difference

Complex debugging: ask the model to analyze the data flow step by step before suggesting the fix
Architecture decisions: instruct it to list pros and cons before recommending an approach
Requirements analysis: ask it to identify ambiguities and dependencies before proposing the solution
Performance issues: request it to trace the critical execution path before optimizing

Few-Shot Prompting: calibrating response format

Few-shot prompting consists of providing concrete input/output examples within the prompt. In Claude Opus 4.6, this technique is especially powerful because the model excels at identifying patterns and replicating them consistently. As described in Anthropic's interactive prompt engineering tutorial, examples should be realistic and specific — not generic.

The technique works best when you need a very specific output format that's difficult to describe with text instructions alone. Instead of explaining the format in three paragraphs, show two examples and the model will replicate the pattern.

A common mistake is providing overly trivial examples. If your examples are simple but the real case is complex, the model will simplify its response to match the level of the examples. Always use examples that represent the real complexity of what you need.

Positive framing: say what to do, not what to avoid

An important discovery documented by the Claude user community is that the model responds significantly better to positive instructions than to negations. Instead of listing what the model shouldn't do, explicitly describe what it should do.

Negative approach (less effective)	Positive approach (more effective)
"Don't use technical jargon"	"Write in accessible language for beginners"
"Don't repeat information"	"Each paragraph should add new information"
"Don't make generic lists"	"Include practical examples with code in each item"
"Don't be verbose"	"Limit each response to maximum 200 words"

Another critical point: aggressive language hurts quality. Phrases like "CRITICAL!", "YOU MUST", "NEVER EVER" might seem emphatic to humans, but in Claude Opus 4.6 they cause over-triggering — the model becomes so focused on avoiding the error that it compromises the overall response quality. Calm, direct instructions consistently produce better results.

Prompt Chaining: breaking complex tasks into stages

Prompt chaining is the technique of dividing a large task into multiple sequential calls to the model, where each call receives the previous result as context. According to Anthropic, this approach is especially effective for tasks involving multiple reasoning or transformation steps.

In practice, prompt chaining works like a data pipeline where each stage has a unique, well-defined responsibility. For example, to generate technical documentation from source code, you can chain:

Stage 1: analyze the code and extract the structure (classes, functions, dependencies)
Stage 2: generate descriptions for each component based on the analysis
Stage 3: assemble the final document with formatting and usage examples
Stage 4: review consistency and completeness of the generated document

Each stage produces an intermediate result that's easier to validate and correct than trying to obtain the final document in a single call. Additionally, if a stage produces an unsatisfactory result, you can re-execute only that stage without wasting tokens on the others.

When to use chaining vs. single prompt

Not every task needs chaining. Use a single prompt when the task is self-contained and the output format is simple. Use chaining when there are dependencies between stages, when intermediate results need validation, or when a single prompt's context window can't accommodate all necessary information.

Context management in long prompts

With 1 million context tokens, it's tempting to simply throw all available information into the prompt and let the model figure it out. But this is a mistake. Even with a massive context window, information organization directly impacts response quality.

The best practice is to provide a "roadmap" at the beginning of the prompt that tells the model what it will find and what to prioritize. Something like:

<roadmap>
This prompt contains:
1. API specification (sections 1-3): project context
2. Current payments module code (section 4): code to review
3. Recent error logs (section 5): bug evidence
4. Review instructions (section 6): what you should do

Prioritize section 6 to understand the task.
Use sections 1-3 as reference when needed.
</roadmap>

This prompt "indexing" technique is especially useful when working with long documents or multiple code files. The model navigates context better when it knows in advance what's available and where to find each piece of information.

Prefill: controlling the response's beginning

Prefill is an advanced technique where you pre-define the beginning of the model's response. In Claude, this is done through the assistant message parameter in the API. The model continues from where you left off, which guarantees the exact format from the first line.

Practical use cases for prefill:

Force JSON format: start the response with { to ensure the model returns valid JSON
Set language: begin with a sentence in the desired language to prevent the model from responding in English
Structure analysis: start with the first section header to ensure the model follows the requested structure
Eliminate preamble: avoid responses that start with "Sure, I'll help with that!" by pre-defining the beginning with direct content

Prefill is particularly useful in automations and API integrations, where you need parseable responses that follow a rigid format without variations.

Common mistakes that destroy prompt quality

After months of intensive work with prompt engineering on Claude Opus 4.6, I identified the most frequent errors that lead to poor results — even with such a capable model:

Contradictory instructions: asking "be concise" and "explain in detail" in the same prompt. The model tries to satisfy both and satisfies neither.
Lack of format specificity: saying "format it well" without defining what "well" means. Specify: Markdown, JSON, table, bullets, paragraphs.
Excessive constraints: each additional constraint reduces the solution space. Include only constraints that truly matter for the result.
Irrelevant context: including information unnecessary for the task dilutes the model's attention. Less relevant context is better than lots of generic context.
Not iterating: prompt engineering is an iterative process. The first prompt is rarely the best. Test variations, compare results, refine progressively.

An important personal learning: I used to write enormous prompts with dozens of constraints and examples, thinking that more context always meant better results. With Opus 4.6, I discovered the opposite is true. The best prompts I use daily are between 200 and 500 words — clear, structured with XML tags, with at most 2-3 examples and surgically chosen constraints. Prompt quality lies in precision, not volume.

Practical template to get started

To facilitate adopting these techniques, here's a template I use as a starting point for most of my professional prompts. Adapt according to task complexity:

<role>
You are a [specialty] with experience in [domain].
</role>

<context>
[Background information relevant to the task]
</context>

<task>
[Clear and specific description of what needs to be done]
</task>

<constraints>
- [Constraint 1]
- [Constraint 2]
</constraints>

<output_format>
[Exact expected format — ideally with example]
</output_format>

This template works for 80% of use cases. For more complex tasks, add <example> and <thinking_instructions> tags. For simple tasks, remove what's not needed — Claude Opus 4.6 doesn't need ceremony when the task is straightforward.

Conclusion

Prompt engineering with Claude Opus 4.6 isn't about tricks or hacks — it's about clear, structured communication with a model that rewards precision. The techniques that truly make a difference are surprisingly simple: use XML tags for structure, be specific about what you want, provide representative examples, prefer positive instructions, and iterate on your prompts as you would with any other engineering artifact. The model is an extraordinarily powerful tool, but like any tool, the quality of the result depends on who operates it. Invest time in learning to communicate with it and the returns will be exponential in your daily productivity.