Home » Ship with ChatGPT Courses · ERC · LLM Bench · Newsletter · Labs · About

Schema-Guided Reasoning (SGR)

Summary: Schema-Guided Reasoning (SGR) is a structured prompting technique that uses predefined schemas (via Structured Output) to guide large language models through explicit reasoning workflows, encoding expert cognitive processes directly into inference.

This is a pattern to prompt LLMs by defining explicit reasoning workflows using typed output schemas. This uses Structured Outputs/Constrained Decoding, so I used to refer to that as SO CoT or Custom Chain of Thought in the past.

Instead of relying only on loose instructions, we encode exactly how the model should approach a complex task, including preliminary analysis, checks, and supporting evidence - essentially translating domain expert’s mental checklist into a structured schema.

2025-07-23-schema-guided-reasoning-sgr.png

By enforcing strict schema structures, we ensure predictable and auditable reasoning, gain fine-grained control over inference quality, and easily validate intermediate results against test data.

In other words, via the structure we can control the layout of the response. This allows us to break tasks into smaller steps, while ensuring mandatory checkpoints.

Here are some benefits:

  • Reproducible reasoning - we guarantee more consistent inference across repeated runs.
  • Auditable - SGR makes every reasoning step explicit and inspectable.
  • Debuggable & Testable - intermediate outputs can be directly evaluated and improved (they are linkable to test datasets with evals)
  • We can translate expert knowledge into executable prompts. DDD works really well here.
  • Enhances both reasoning transparency and output reliability. Accuracy boost of 5-10% is not uncommon.
  • This improves reasoning capabilities of weaker local models, making them more applicable in various workloads.

Note, that we are not replacing the entire prompt with structured output. We just don't rely only on prompt in order to force LLM to follow a certain reasoning process precisely.

Examples

Let's go through a few examples that illustrate SGR:

  • simple math task
  • text-to-sql
  • document classification
  • advanced reasoning in compliance

1. Simple Math Task

Here is an easy example with small reasoning customisation. Let's say we have the following Python classes:

class Step(BaseModel):
    explanation: str
    output: str

class MathReasoning(BaseModel):
    steps: list[Step]
    final_answer: str

Note, that BaseModel comes from pydantic typing library of Python, frequently used to define response formats. Other languages might leverage different libraries and tools for this purpose.

Given this MathReasoning response format and a prompt Solve the equation 8x + 7 = -23 step by step. LLM can respond like this:

{
  "steps": [
    {
      "explanation": "Start with the equation 8x + 7 = -23.",
      "output": "8x + 7 = -23"
    },
    {
      "explanation": "Subtract 7 from both sides to isolate the term with the variable.",
      "output": "8x = -23 - 7"
    },
    {
      "explanation": "Simplify the right side of the equation.",
      "output": "8x = -30"
    },
    {
      "explanation": "Divide both sides by 8 to solve for x.",
      "output": "x = -30 / 8"
    },
    {
      "explanation": "Simplify the fraction.",
      "output": "x = -15 / 4"
    }
  ],
  "final_answer": "x = -15 / 4"
}

Note, that it thinks in a predefined way before providing an answer. This uses more tokens by investing them to think through the problem aloud, but improves model accuracy.

2. Text-to-SQL example

In the real world, we would want to have more precise reasoning schema to boost answer accuracy. For example, when prompting LLM to perform query expansion from human request to a precise SQL query over a predefined schema, adding Schema-Guided Reasoning (SGR) increases the accuracy by 6% out of the box.

On the image below this was done by adding strategy field before the sql_query field. It forced LLM to perform analysis according to a custom checklist.

wiki-custom-cot-sql-example.png

In essence, we programmed LLM to reason in a predefined way without writing any executable code.

3. Document classification example

Here is an example of a Schema-Guided Reasoning (SGR) from a system for classifying business documents in a RAG:

DOCUMENT_TYPES = ["invoice", "contract", "receipt", "email", ...]
ENTITY_TYPES = ["payment", "risk", "regulator", "employee", ...]

class DocumentClassification(BaseModel):
  document_type: Literal[tuple(DOCUMENT_TYPES)]
  brief_summary: str
  key_entities_mentioned: List[Literal[tuple(ENTITY_TYPES)]]
  keywords: List[str] = Field(..., description="Up to 10 keywords describing this document")

In this case, LLM is forced to think through the classification challenge in steps:

  1. Identify type of the document and pick it. Literal enforces that.
  2. Summarise the document
  3. Identify key entities mentioned in the document. List[Literal] ensures that the response will be a list from ENTITY_TYPES
  4. Come up with 10 unique keywords. List[str] ensures that the response is a list of strings, while description kindly asks LLM to keep the list at 10 items or less.

In this specific example, first two fields are discarded from the response. They are used just to force LLM to approach classification from a predefined angle and think a little about it. Ultimately this improved prompt accuracy in this task.

4. Advanced Reasoning in Compliance

This is an example of more advanced workflow that is "packed" into a single prompt. While executing this schema, the model will be forced to go through that sequentially.

2025-07-23-schema-guided-reasoning-sgr.png

First, we are instructing the model to do preliminary analysis, where most of the analysis is encoded in Applicability reasoning sub-routine (it is implemented as a reusable nested object). The task is phrased explicitly in the field description and field name.

field name will get more attention from the model, because it will be copied to the output prompt by the model just before it starts answering the question.

Afterwards model has to reason about concrete gaps in the document. These gaps, represented as a list of strings, will be the mental notes that the model gathers before providing a final answer.

Note, that description field is passed to the LLM automatically by OpenAI. Other providers might not include that.

The answer itself is a fairly straightforward ENUM of three options. However, the reasoning doesn't stop there. Benchmarking has shows that sometimes this reasoning workflow gets too pessimistic and flags too many gaps. To handle that, we are forcing a verification step after the answer:

  • reasonForNoncompliance - model has to pick a category
  • gapSeverity - also another list of categories

Information from these two fields is useful in 3 ways:

  • allow to prioritise important gaps by assigning scores to each category
  • allow to test classification precision with our test evals
  • a model gets a chance to review, all the information again and mark the gap as valid, but less relevant.

And the final step is to list most important supporting evidence for the concrete identified gap. It happens in the same prompt because we already have all the information loaded in the context, so there is no need in second prompt.

Plus, supporting evidence is usually specified exactly by the unique identifiers of text chapters, clauses or snippets. This means, that we could also include this part of the reasoning into the test datasets that ensure quality of the overall system. It would look like this:

20250723-testing-sgr.png

This way Schema-Guided Reasoning helps to establish faster the the feedback loops that generate valuable test data. This works because with SGR we get more easily-testable parameters per each reasoning process.

20250723-sgr-testing-loop.png

Production Uses

Schema-Guided Reasoning (SGR) is the single most widely applied LLM pattern in AI cases that I've observed. It was used:

  • in manufacturing, construction - to extract and normalise information from purchase orders, data sheets and invoices in multiple languages (when used together with a Visual LLM);
  • in business automation products - to automatically create tickets, issues and calendar entries from the calendar input;
  • in EU logistics - to normalise and extract information from diverse tax declaration forms;
  • in fintech - to accurately parse regulations for further ingestion into compliance assistants, then - to run compliance gap analysis according to the defined checklist process;
  • in sales - to power lead generation systems that run web research powered by custom workflows.

Schema-Guided Reasoning (SGR) becomes even more important for the locally-capable models (models that could run on private servers offline). Such models have much less cognitive capacity than what we could get by querying OpenAI or Anthropic APIs. In other words, local models are generally not as smart as the cloud ones. SGR helps to work around this limitation.

Support

Schema-Guided Reasoning (SGR) works with modern cloud providers that support Structured Output. It doesn't require reasoning models, but it works well with models that were distilled from the reasoning models.

Most of modern inference engines support the necessary capability:

References

Next post in Ship with ChatGPT story: Structured Output

🤗 Check out my newsletter! It is about building products with ChatGPT and LLMs: latest news, technical insights and my journey. Check out it out