Home © Rinat Abdullin ERC · LLM Bench · Newsletter · Labs · About

AI+Coding Kata

AI in Coding helps a lot with high-level tasks like prototyping, reasoning and finding bugs.

Please use your favourite tools (no limitations) to implement as much of this spec as possible in a language of your choice. This should be a parser that can parse any document in this spec. Imagine that your team will have to support this code for a few years, so you want to do a thorough job here.

There are no limits here. If you spot something unusual - use your best judgement.


BizDocumentAI Spec

Let's define a simple document format that could describe a contract, procedure, or any other business document in a structured way. It may be used to load this business data into AI Assistants (like in Enterprise RAG Challenge). We’ll work with the documents.

Our documents will consist of blocks. A block is a logical piece of text (like a paragraph). It can optionally have a head, number, and body. A block’s body can contain:

  • Another block
  • Text
  • A list
  • A dictionary

Blocks can contain heterogeneous content—texts, other blocks, dictionaries, etc. Lists can contain only similar block items that also have a number.

Document Layout

The document below describes a simple text format that can be deterministically parsed into JSON objects. This document is also a test suite! Code admonitions always come in pairs: first input and then json.

When the parser is implemented, parsed input should always produce output that is structurally similar to the expected JSON. The headline before the code blocks is the name of the text.

Python Data Structures

Below is an example of how you might structure your data models in Python using Pydantic:

from typing import List, Optional, Union, Dict, Literal
from pydantic import BaseModel, Field

# This type alias helps with readability and forward references.
ContentNode = Union[str, "Block", "ListBlock", "Dictionary"]

class Dictionary(BaseModel):
    """
    A distinct dictionary structure for key-value pairs.
    """
    kind: Literal["dict"]
    items: Dict[str, str] = Field(default_factory=dict)

class Block(BaseModel):
    """
    A general-purpose container for a 'section' or item.

    - 'number' can store a section number (e.g., "5", "5.1") if applicable.
    - 'head' is an optional heading for the block.
    - 'body' can hold any mix of strings, sub-blocks, dictionaries, or lists.
    """
    kind: Literal["block"]
    number: Optional[str] = None
    head: Optional[str] = None
    body: List[ContentNode] = Field(default_factory=list)

class ListBlock(BaseModel):
    """
    A container for a list of items, each item being a 'Block'.
    """
    kind: Literal["list"]
    items: List[Block] = Field(default_factory=list)

# Important for forward references within union types
Block.model_rebuild()

Specifications

Empty text

Empty text results in an empty document block.

Input:


(there is no content)

JSON:

{
  "kind": "block"
}

Body

Plain text goes into the block body straight away. Different paragraphs are separated by new lines.

Input:

First paragraph.
Second paragraph.

JSON:

{
  "kind": "block",
  "body": [
    "First paragraph.",
    "Second paragraph."
  ]
}

Note that we strip and skip empty lines!

Input:

First paragraph.

Second paragraph.

(An empty line in between)

JSON:

{
  "kind": "block",
  "body": [
    "First paragraph.",
    "Second paragraph."
  ]
}

Head

Text marked with <head> goes directly into the head of the current block.

Input:

<head>Test Document</head>
Content

JSON:

{
  "kind": "block",
  "head": "Test Document",
  "body": [
    "Content"
  ]
}

Blocks

You've seen that the document is parsed into a root block. But everything is a block, and blocks can be nested explicitly:

Input:

<head>AI Coding Kata</head>
Let's get started with the kata
<block>
<head>Preface</head>
Here is a little story
</block>

JSON:

{
  "kind": "block",
  "head": "AI Coding Kata",
  "body": [
    "Let's get started with the kata",
    {
      "kind": "block",
      "head": "Preface",
      "body": [
        "Here is a little story"
      ]
    }
  ]
}

Dictionaries

Dictionaries are used to capture key-value pairs. By default, they are separated by :.

Input:

<dict sep=":">
Key One: Value One
Key Two: Value Two
Key Three: Value Three
</dict>

JSON:

{
  "kind": "block",
  "body": [
    {
      "kind": "dict",
      "items": {
        "Key One": "Value One",
        "Key Two": "Value Two",
        "Key Three": "Value Three"
      }
    }
  ]
}

We can also have a non-standard separator and empty values:

Input:

<dict sep="-">
Title - AI Coding - for TAT
Kata Number - 
</dict>

JSON:

{
  "kind": "block",
  "body": [
    {
      "kind": "dict",
      "items": {
        "Title": "AI Coding - for TAT",
        "Kata Number": ""
      }
    }
  ]
}

Lists

Lists are very important! By default, each non-empty line is a list item. They go inside the root block.

There are multiple kinds:

  • . for ordered lists that are dot-separated
  • * for bulleted lists

Note that the list item’s text goes into head and the item number goes into number.


Ordered Lists

Input:

<list kind=".">

1. First
2. Second
</list>

JSON:

{
  "kind": "block",
  "body": [
    {
      "kind": "list",
      "items": [
        { "kind": "block", "number": "1.", "head": "First" },
        { "kind": "block", "number": "2.", "head": "Second" }
      ]
    }
  ]
}

As a convenience, nested lists are automatically detected:

Input:

<list kind=".">

1. First
2. Second
2.1. Subitem 1
2.2. Subitem 2
</list>

JSON:

{
  "kind": "block",
  "body": [
    {
      "kind": "list",
      "items": [
        {
          "kind": "block",
          "number": "1.",
          "head": "First"
        },
        {
          "kind": "block",
          "number": "2.",
          "head": "Second",
          "body": [
            {
              "kind": "list",
              "items": [
                { "kind": "block", "number": "2.1.", "head": "Subitem 1" },
                { "kind": "block", "number": "2.2.", "head": "Subitem 2" }
              ]
            }
          ]
        }
      ]
    }
  ]
}

Unordered lists

We can have unordered lists too:

Input:

<list kind="*"> First
• Second
• Third
</list>

JSON:

{
  "kind": "block",
  "body": [
    {
      "kind": "list",
      "items": [
        { "kind": "block", "number": "•", "head": "First" },
        { "kind": "block", "number": "•", "head": "Second" },
        { "kind": "block", "number": "•", "head": "Third" }
      ]
    }
  ]
}

And nesting can be done with "o":

Input:

<list kind="*"> First
    o Subitem
• Second
• Third
</list>

JSON:

{
  "kind": "block",
  "body": [
    {
      "kind": "list",
      "items": [
        {
          "kind": "block",
          "number": "•",
          "head": "First",
          "body": [
            {
              "kind": "list",
              "items": [
                {
                  "kind": "block",
                  "number": "o",
                  "head": "Subitem"
                }
              ]
            }
          ]
        },
        {
          "kind": "block",
          "number": "•",
          "head": "Second"
        },
        {
          "kind": "block",
          "number": "•",
          "head": "Third"
        }
      ]
    }
  ]
}

Mixed lists

We can mix lists, but we need to designate different types separately with tags.

Input:

<list kind=".">

1. Beginning
2. Main
2.1. Subsection
<list kind="*">
* Bullet 1
* Bullet 2
</list>
3. Ending
</list>

JSON:

{
  "kind": "block",
  "body": [
    {
      "kind": "list",
      "items": [
        { "kind": "block", "number": "1.", "head": "Beginning" },
        {
          "kind": "block",
          "number": "2.",
          "head": "Main",
          "body": [
            {
              "kind": "list",
              "items": [
                { "kind": "block", "number": "*", "head": "Bullet 1" },
                { "kind": "block", "number": "*", "head": "Bullet 2" }
              ]
            }
          ]
        },

        { "kind": "block", "number": "3.", "head": "Ending" }
      ]
    }
  ]
}

Lists with content

Lists can also have additional content. If something in the current list doesn't match the prefix, then it is treated as a block body:

Input:

<list kind=".">

1. First
First body
2. Second
Some more text
<dict sep=":">
Key: Value
Another Key: Another Value
</dict>
</list>

JSON:

{
  "kind": "block",
  "body": [
    {
      "kind": "list",
      "items": [
        {
          "kind": "block",
          "number": "1.",
          "head": "First",
          "body": [
            "First body"
          ]
        },
        {
          "kind": "block",
          "number": "2.",
          "head": "Second",
          "body": [
            "Some more text",
            {
              "kind": "dict",
              "items": {
                "Key": "Value",
                "Another Key": "Another Value"
              }
            }
          ]
        }
      ]
    }
  ]
}

Published: April 06, 2025.