Home » Ship with ChatGPT ERC · LLM Bench · Newsletter · Labs · About

Custom Chain of Thought

Summary: Custom Chain of Thought uses Structured Output (it forces LLM to respond according to schema) to make LLM reason in a predefined way before answering.

Here is an easy example. Let's say we have the following Python classes:

class Step(BaseModel):
    explanation: str
    output: str

class MathReasoning(BaseModel):
    steps: list[Step]
    final_answer: str

Note, that BaseModel comes from pydantic typing library of Python, frequently used to define response formats. Other languages might leverage different libraries and tools for this purpose.

Given this MathReasoning response format and a prompt Solve the equation 8x + 7 = -23 step by step. LLM can respond like this:

{
  "steps": [
    {
      "explanation": "Start with the equation 8x + 7 = -23.",
      "output": "8x + 7 = -23"
    },
    {
      "explanation": "Subtract 7 from both sides to isolate the term with the variable.",
      "output": "8x = -23 - 7"
    },
    {
      "explanation": "Simplify the right side of the equation.",
      "output": "8x = -30"
    },
    {
      "explanation": "Divide both sides by 8 to solve for x.",
      "output": "x = -30 / 8"
    },
    {
      "explanation": "Simplify the fraction.",
      "output": "x = -15 / 4"
    }
  ],
  "final_answer": "x = -15 / 4"
}

Note, that it thinks in a predefined way before providing an answer. This uses more tokens by investing them to think through the problem aloud, but improves model accuracy.

Text-to-SQL example

In the real world, we can inject precise reasoning steps in the response schema to boost answer accuracy. For example, when prompting LLM to perform query expansion from human request to a precise SQL query over a predefined schema, adding custom chain of thought increases the accuracy by 6% out of the box.

On the image below this was done by adding strategy field before the sql_query field. It forced LLM to perform analysis according to a custom checklist.

wiki-custom-cot-sql-example.png

In essence, we programmed LLM to reason in a predefined way without writing any executable code.

Production Uses

Custom Chain of Thought is the single most widely applied LLM pattern in AI cases that I've observed. It was used:

  • in manufacturing, construction - to extract information from purchase orders, data sheets and invoices in multiple languages (when used together with a Visual LLM);
  • in business automation products - to automatically create tickets, issues and calendar entries from the calendar input;
  • in EU logistics - to normalise and extract information from diverse tax declaration forms;
  • in fintech - to accurately parse regulations for further ingestion into compliance assistants;
  • in sales - to extract company information (contact details) from web search results in lead generation systems.

This list can go on and on, because this pattern is used everywhere. It helps to improve accuracy and package more reasoning steps in single prompts.

Custom Chain of Thought becomes even more important for the locally-capable models (models that could run on private servers offline). Such models have much less cognitive capacity than what we could get by querying OpenAI or Anthropic APIs. In other words, local models are generally not as smart as the cloud ones. Custom Chain of Thought helps to work around this limitation.

Document classification example

Here is an example of a SO CoT from a system for classifying business documents in a RAG:

DOCUMENT_TYPES = ["invoice", "contract", "receipt", "email", ...]
ENTITY_TYPES = ["payment", "risk", "regulator", "employee", ...]

class DocumentClassification(BaseModel):
  document_type: Literal[tuple(DOCUMENT_TYPES)]
  brief_summary: str
  key_entities_mentioned: List[Literal[tuple(ENTITY_TYPES)]]
  keywords: List[str] = Field(..., description="Up to 10 keywords describing this document")

In this case, LLM is forced to think through the classification challenge in steps:

  1. Identify type of the document and pick it. Literal enforces that.
  2. Summarise the document
  3. Identify key entities mentioned in the document. List[Literal] ensures that the response will be a list from ENTITY_TYPES
  4. Come up with 10 unique keywords. List[str] ensures that the response is a list of strings, while description kindly asks LLM to keep the list at 10 items or less.

In this specific example, first two fields are discarded from the response. They are used just to force LLM to approach classification from a predefined angle and think a little about it. Ultimately this improved prompt accuracy in this task.

References

Next post in Ship with ChatGPT story: Structured Output

🤗 Check out my newsletter! It is about building products with ChatGPT and LLMs: latest news, technical insights and my journey. Check out it out