SGR Patterns

Here is a set of minimal Pydantic schemas that demonstrate foundational building blocks for Schema-Guided Reasoning (SGR). They illustrate how to encode a specific reasoning pattern that will constrain and guide LLM generation.

1. Cascade

Cascade ensures that LLM explicitly follows predefined reasoning steps while solving the problem. Each step - allocating thinking budget to take reasoning one step further

For example, in a candidate interview evaluation we can enforce the model to:

First summarize and review its knowledge of the candidate. This will make it explicit for the LLM (putting it into the attention) and for human reviewers later.
Then rate candidate on the applicability from 1 to 10
Finally make a final decision as a choice between hire, reject or hold

This is how the corresponding Pydantic schema would look like:

from pydantic import BaseModel  
from typing import Literal, Annotated  
from annotated_types import Ge, Le

class CandidateEvaluation(BaseModel):
    brief_candidate_summary: str
    rate_skill_match:  Annotated[int, Ge(1), Le(10)]
    final_recommendation: Literal["hire", "reject", "hold"]

The schema explicitly defines and constrains the order of reasoning: first summarize, then rate, and finally recommend. LLM, driven by the constrained decoding, will reason in this predefined logical sequence.

Note, that rate_skil_match is bounded to be within the [1,10] range by Python typing annotations. pydantic will be able to handle that and convert to JSON Schema. conint(ge=1, le=10) can achieve the same, but is going to be deprecated soon. Use Annotated instead

It can be plugged into OpenAI-compatible library like this:

from openai import OpenAI
client = OpenAI()

user = "evaluate Sam Altman for DevOps Role at OpenAI"
completion = client.chat.completions.parse(
    model="gpt-5-mini",
    response_format=CandidateEvaluation,
    messages=[
        {"role": "user", "content": user },
    ],
)

and the model will be forced by constrained decoding to structure its response accordingly:

CandidateEvaluation(
    brief_candidate_summary=(
        'Sam Altman is a high-profile technology executive and entrepreneur '
        '(co-founder of Loopt, president of Y Combinator, CEO of OpenAI) with '
        'strong leadership, strategy, product and fundraising experience. '
        'Publicly available information highlights executive management and '
        'company-building skills rather than hands-on systems engineering, SRE, '
        'or platform/DevOps work. He would bring strategic vision and '
        'organizational leadership but not the typical deep, day-to-day '
        'operational expertise expected for an individual contributor DevOps '
        'role.'
    ),
    rate_skill_match=2,
    final_recommendation='reject'
)

Note, that we order parameters to gradually focus and refine the information, until we come up with a concrete conclusion. Start by a generic summary of the candidate, narrow down to the skill rating and end up with a concrete decision.

If LLM starts misbehaving in some situations, it would be possible to load back full SGR outlines for these cases and review them.

2. Routing

Routing forces LLM to explicitly choose one specific reasoning path out of many. For example, in software triage we can force LLM to explicitly choose the path ("hardware" or "software"), followed by filling specific required details:

from pydantic import BaseModel
from typing import Literal, Union

class HardwareIssue(BaseModel):
    kind: Literal["hardware"]
    component: Literal["battery", "display", "keyboard"]

class SoftwareIssue(BaseModel):
    kind: Literal["software"]
    software_name: str

class UnknownIssue(BaseModel):
    kind: Literal["unknown"]
    category: str
    summary: str

class SupportTriage(BaseModel):
    issue: Union[HardwareIssue, SoftwareIssue, UnknownIssue]

By passing SupportTriage to response_format, we will force LLM to make a choice and pick one of the branches.

completion = client.chat.completions.parse(
    model="gpt-5-mini",
    response_format=SupportTriage,
    messages=[
        {"role": "developer", "content": "triage support"},
        {"role": "user", "content": "My laptop screen keeps flickering and sometimes turns black." }
    ],
)

print(completion.choices[0].message.parsed)

Parsed object will be of type HardwareIssue in this case:

SupportTriage(
    issue=HardwareIssue(kind='hardware', component='display')
)

Tools can be represented with branches as well. Consider this schema for a personal business assistant that has access to a few tools:

from pydantic import BaseModel, Field
from typing import Union, Literal

class SendEmailTool(BaseModel):
    tool: Literal["send_email"]
    recipient_email: str
    subject: str
    message: str

class SearchKnowledgeBaseTool(BaseModel):
    tool: Literal["search_knowledge_base"]
    query: str

class CreateSupportTicketTool(BaseModel):
    tool: Literal["create_support_ticket"]
    customer_id: int
    issue_summary: str
    priority: Literal["low", "medium", "high"]


class Response(BaseModel):
    action: Union[SendEmailTool, SearchKnowledgeBaseTool, CreateSupportTicketTool]
    summary: str

Here is how we can use this in action:

system = "handle request of Rinat - support agent. Don't make things up"
user = "Email to jessica@example.com, tell that her refund has been processed"

completion = client.chat.completions.parse(
    model="gpt-5-mini",
    response_format=Response,
    messages=[
        {"role": "developer", "content": system },         
        {"role": "user", "content": user }
    ],
)

Response can look like:

action = SendEmailTool(
    tool='send_email',
    recipient_email='jessica@example.com',
    subject='Your refund has been processed',
    message=(
        'Hi Jessica,\n\nYour refund has been processed. If you do not see the '
        'refund on your account or have any questions, please reply to this '
        'email and I will investigate.\n\nBest,\nRinat\nCustomer Support'
    )
)
summary = 'Email notifying Jessica that her refund has been processed.'

This is how we can wrap this code with actual tool calling:

# ----- Mock Tool Implementations -----
def send_email(recipient_email: str, subject: str, message: str):
    print(f"Sending email to {recipient_email} with subject '{subject}'")
    print(f"Body:\n{message}\n")

def search_knowledge_base(query: str):
    print(f"Searching KB for: {query}")

def create_support_ticket(customer_id: int, issue_summary: str, priority: str):
    print(f"Creating {priority} priority ticket for customer {customer_id}")
    print(f"Issue: {issue_summary}")

# Map tool type to handler
TOOL_DISPATCH: Dict[str, Callable] = {
    "send_email": send_email,
    "search_knowledge_base": search_knowledge_base,
    "create_support_ticket": create_support_ticket
}

# ----- LLM Wrapper -----
def handle_request(system_prompt: str, user_prompt: str):
    completion = openai.chat.completions.parse(
        model="gpt-5-mini",
        response_format=Response,
        messages=[
            {"role": "developer", "content": system },
            {"role": "user", "content": user }
        ],
    )

    response = completion.choices[0].message.parsed

    print(f"Summary: {response.summary}")

    tool_type = response.action.tool
    if tool_type in TOOL_DISPATCH:
        TOOL_DISPATCH[tool_type](response.action)
    else:
        print(f"Unknown tool: {tool_type}")

3. Cycle

Cycle explicitly forces to repeat reasoning steps.

Here we are forcing LLM to come up with multiple risk factors. At least two, but no more than four:

from pydantic import BaseModel
from typing import List, Literal
from annotated_types import MinLen, MaxLen

class RiskFactor(BaseModel):
    explanation: str
    severity: Literal["low", "medium", "high"]

class RiskAssessment(BaseModel):
    factors: Annotated[List[RiskFactor], MinLen(2), MaxLen(4)]

And the execution:

user = "The server room has poor ventilation and outdated surge protectors."

completion = client.chat.completions.parse(
    model="gpt-5-mini",
    response_format=RiskAssessment,
    messages=[
        {"role": "developer", "content": "be brief" },
        {"role": "user", "content": user }
    ],
)

response:

factors = [
    RiskFactor(
        explanation=(
            "Poor ventilation leading to elevated temperatures, increased "
            "risk of thermal shutdown, shortened hardware lifespan, and "
            "potential downtime."
        ),
        severity="high"
    ),
    RiskFactor(
        explanation=(
            "Outdated surge protectors that may not adequately guard against "
            "voltage spikes or electrical faults, raising risk of hardware "
            "damage and data loss; replace with modern surge/UPS protection."
        ),
        severity="high"
    )
]

By the way, we can use Cycle to extend the schema from the tool calling example to enable parallel tool execution like this:

class Response(BaseModel):
    action: List[Union[SendEmailTool, SearchKnowledgeBaseTool, CreateSupportTicketTool]]
    summary: str

Now the response will contain a list of different tool calls that we can dispatch in parallel before passing the results back to LLM for further processing.

Next post in Ship with ChatGPT story: SGR Examples

🤗 Check out my newsletter! It is about building products with ChatGPT and LLMs: latest news, technical insights and my journey. Check out it out