Schema-Guided Reasoning (SGR)
Schema-Guided Reasoning (SGR) is a technique that guides large language models (LLMs) to produce structured, clear, and predictable outputs by enforcing reasoning through predefined steps. By creating a specific schema (or structured template), you explicitly define:
- What steps the model must go through (preventing skipped or missed reasoning)
- In which order it must reason (ensuring logical flow)
- Where it should explicitly focus attention (improving depth and accuracy)
Instead of allowing free-form text completion (which can be inconsistent or ambiguous), the schema acts as a strict guideline. This guideline will be enforced upon the LLM via Constrained Decoding (Structured Output). You can think of it as giving the model a clear “checklist” or “structured script” to follow.
Here is one example of SGR in action from a project in compliance/FinTech domain. This is a pydantic data structure that enforces LLM to perform an analysis of a clause from internal company procedure in a very specific order.
We translated domain expert’s mental checklist into a structured reasoning schema for LLM.
See also SGR Patterns such as Cascade, Routing, and Cycle.
By enforcing strict schema structures, we ensure predictable and auditable reasoning, gain fine-grained control over inference quality, and easily validate intermediate results against test data.
In other words, via the structure we can control the layout of the response. This allows us to break tasks into smaller steps, while ensuring mandatory checkpoints.
Here are some benefits:
- Reproducible reasoning - we guarantee more consistent inference across repeated runs.
- Auditable - SGR makes every reasoning step explicit and inspectable.
- Debuggable & Testable - intermediate outputs can be directly evaluated and improved (they are linkable to test datasets with evals)
- We can translate expert knowledge into executable prompts. DDD works really well here.
- Enhances both reasoning transparency and output reliability. Accuracy boost of 5-10% is not uncommon.
- This improves reasoning capabilities of weaker local models, making them more applicable in various workloads.
Note, that we are not replacing the entire prompt with structured output. We just don't rely only on prompt in order to force LLM to follow a certain reasoning process precisely.
Deep Dive
To dive deeper:
- Read through the SGR Patterns: Cascade, Routing, and Cycle.
- Go through a few SGR Examples that illustrate application of SGR:
- simple math task
- text-to-sql
- document classification
- advanced reasoning in compliance
- Business Assistant demonstrates how to build a reasoning business assistant with tool use in 160 lines of Python.
- Adaptive Planning further explains how and why this simple agent demo is capable of adapting its plans to new circumstances on-the-fly.
Production Uses
Schema-Guided Reasoning (SGR) is the single most widely applied LLM pattern in AI cases that I've observed. It was used:
- in manufacturing, construction - to extract and normalise information from purchase orders, data sheets and invoices in multiple languages (when used together with a Visual LLM);
- in business automation products - to automatically create tickets, issues and calendar entries from the calendar input;
- in EU logistics - to normalise and extract information from diverse tax declaration forms;
- in fintech - to accurately parse regulations for further ingestion into compliance assistants, then - to run compliance gap analysis according to the defined checklist process;
- in sales - to power lead generation systems that run web research powered by custom workflows.
Schema-Guided Reasoning (SGR) becomes even more important for the locally-capable models (models that could run on private servers offline). Such models have much less cognitive capacity than what we could get by querying OpenAI or Anthropic APIs. In other words, local models are generally not as smart as the cloud ones. SGR helps to work around this limitation.
Support
Schema-Guided Reasoning (SGR) works with modern cloud providers that support Structured Output. It doesn't require reasoning models, but it works well with models that were distilled from the reasoning models.
- OpenAI - supported via Structured Outputs (including OpenAI on Azure)
- Mistral - supported via Custom Structured Output
- Google/Gemini - very limited support via Structured Output. Also note that it doesn't respect the order of fields by default.
- Grok - supported for multiple models: Structured Outputs.
- Fireworks AI - via JSON Schema.
- Cerberas - via Structured Outputs
- OpenRouter - depends on the downstream provider, maps to JSON Schema.
Most of modern inference engines support the necessary capability:
- ollama - via Structured Outputs
- vllm - via xgrammar or guidance backends
- TensorRT-LLM - e.g. via GuidedDecoding
- SGLang - via Outlines, XGrammar or llguidance
References
- Video with more background on text-to-sql: NODES 2024 - LLM Query Benchmarks: Cypher vs SQL
- Talk by Andrej Karpathy from MSBuild 2023: State of GPT
Next post in Ship with ChatGPT story: SGR Patterns
🤗 Check out my newsletter! It is about building products with ChatGPT and LLMs: latest news, technical insights and my journey. Check out it out