Allgemein

This page explains how to enhance Aider for mainframe development by integrating COBOL parsing and analysis.

What is Aider ?

Aider is an open-source (Apache-2.0 license) coding assistant that you use directly from your terminal. It connects with both licensed and open-source large language models (llm) to help you write, edit, and improve your code.

How is Aider different from competitors?

Unlike other coding assistants like Claude Code, Amazon Kiro, Cursor, Windsurf, and Gemini CLI, Aider is tightly integrated with Git. This means it automatically creates a Git repo for your project and make commits as you work (you can turn this off if you want).
Aider supports more than 100 programming languages, understands your entire codebase, and automatically lints (checks for errors in) your code every time it makes changes and builds a map of all the programs in your repo, so only the necessary context is sent to the llm, making the process more efficient and cost-effective.

Useful Links:

Why Integrate COBOL into Aider?

Aider claims to support over 100 languages, but what does that mean in practice? When you use Aider, it actually relies on the llm you choose (like GPT-4, Claude, etc.) to generate code in the programming language you ask for. Most modern models can generate COBOL code, but there’s a catch:

  • If you ask an llm model to write a lot of COBOL, or keep prompting it for code, you might start seeing errors or weird formatting, because COBOL is very strict about syntax, indentation, and column numbers.
  • By default, Aider can generate COBOL code, but cannot lint (check for errors) or map the COBOL codebase. This means Aider wouldn’t catch syntax mistakes or help you navigate large COBOL projects.

How does Aider perform linting and Repo-Mapping?

  • Aider uses Tree-sitter, a parser library which supports 100+ programming languages, to read and understand the structure of your code and perform linting and repo-mapping.

Steps to Integrate COBOL into Aider

  • With the right Tree-sitter grammar for COBOL, Aider can parse COBOL files. This allows for automatic linting and repo-mapping, making it much easier to handle large COBOL codebases without overloading the llm.

Generating Tree-Sitter Grammar for COBOL

We identified a public GitHub repository (MIT license) that implements a COBOL grammar for Tree-sitter. To adapt it to our specific requirements, particularly constructing a repository map using a tags file to capture and organize code structure. We cloned the Tree-Sitter-Cobol repository and used the Tree-sitter CLI to generate a new parser along with Python bindings and created a custom tags.scm file to define our desired repository mapping.

Modifications after Cloning:

Grammar Customization (grammar.js)

  • Exposed previously hidden nodes by replacing anonymous node types with named fields, enabling precise code mapping.
  • Changed program_name in the identification_division to a named field (prg_name) and made it use the non-anonymous WORD type.
// Before
    identification_division: $ => seq(
     $._IDENTIFICATION, $._DIVISION, '.',
     optional(
     seq($._PROGRAM_ID, '.',
     $.program_name,
     …
    ),
    program_name: $ => choice( $._WORD, $._LITERAL ),
// After
   identification_division: $ => seq(
   $._IDENTIFICATION, $._DIVISION, '.',
   optional(
   seq($._PROGRAM_ID, '.',
   field('prg_name', $.program_name),
   …
   ),
   program_name: $ => choice( $.WORD, $.LITERAL ),
  • Updated section_header and paragraph_header to use $.WORD instead of the anonymous $._WORD, ensuring these nodes are visible to tags queries.
// Before
    section_header: $ => seq(
      field('name', choice($._WORD, $.integer)),
      ...
    ),
    paragraph_header: $ => seq(
      field('name', choice($._WORD, $.integer)),
      '.'
    ),
// After
    section_header: $ => seq(
      field('name', choice($.WORD, $.integer)),  
      ...
    ),
    paragraph_header: $ => seq(
      field('name', choice($.WORD, $.integer)), 
      '.'
    ),

Generating the parser and python bindings

  • To build Python bindings for the modified COBOL grammar, we first initialized the local Tree-sitter configuration to enable Python support, then regenerated the parser with ABI version 14 (since the official Tree-sitter project is still on version 14, not 15), built the language bindings, and finally installed the resulting Python package in the target environment where aider is installed.
cd <Cloning directory>
tree-sitter init-config          # Set up Tree-sitter CLI configuration for bindings
tree-sitter generate --abi 14   # Generate the parser using ABI version 14
tree-sitter build              # Build language bindings, including Python
pip install .                 # Install the Python package of the grammar

Creating a Custom Tags Query File (cobol-tags.scm)

  • Created a tags.scm file aligned with the updated grammar, enabling extraction of key elements for repository mapping.
(identification_division
  prg_name: (_) @name.definition.program) @definition.program

(file_description_entry 
  (WORD) @name.definition.filename) @definition.filename

(
  section_header
    name: (_) @name.definition.section
) @definition.section

(
  paragraph_header
    name: (_) @name.definition.paragraph
) @definition.paragraph

(perform_procedure (_) @name.reference.paragraph) @reference.call

Integrating COBOL Tree-sitter Grammar into Aider:

1️⃣ Add the COBOL tags Query File

  • Place your custom tags query file, named cobol-tags.scm, in: .../lib/pythonx.x/site-packages/aider/queries/tree-sitter-language-pack/
  • This allows Aider to recognize and map COBOL program structure during repository analysis.

2️⃣ Register COBOL File Extensions in Grep-AST

  • In your Aider environment, edit: .../lib/pythonx.x/site-packages/grep-ast/parsers.py
  • Extend the PARSERS dictionary inside the USING_TSL_PACK block to associate COBOL file extensions with the "cobol" language:
# Before
if USING_TSL_PACK:
    # Replace the PARSERS dictionary with a comprehensive mapping based on the language pack
    PARSERS = { ....
        
        # C
        ".c": "c",
        ".h": "c",
              
   }

#After
if USING_TSL_PACK:
    # Replace the PARSERS dictionary with a comprehensive mapping based on the language pack
    PARSERS = { .....
        
        # C
        ".c": "c",
        ".h": "c",
        # Add COBOL file extension
        ".cob": "cobol",
        ".cbl": "cobol",
        ".cpy": "cobol",
        ".COB": "cobol",
        ".CBL": "cobol",
        ".CPY": "cobol",
        
   }

3️⃣ Register COBOL in Supported Languages and Python Binding Import

  • The grep-ast module fetches the appropriate parser for each supported language using the tree-sitter-language-pack. To enable COBOL support, make the following changes, open: lib/pythonx.x/site-packages/tree-sitter-language-pack/__init__.py
  • Add "cobol" to the SupportedLanguage literal
  • In the function that loads language bindings (get_binding), add explicit logic to import our custom COBOL parser:
# In get binding function, we need to explicitly mention to import cobol when the language name is cobol
def get_binding(language_name: SupportedLanguage) -> object:
   
    if language_name == "cobol":                 
        import tree_sitter_tree_sitter_cobol     # Use your actual package name if different
        return tree_sitter_tree_sitter_cobol.language() 

With these changes, Aider will be able to lint COBOL source files and accurately map their structure using our custom tags file.

Demo

Generating COBOL Code with Aider:

  • We asked Aider to generate a COBOL program (demo.cbl) designed to read inputs, perform addition, and conditionally call multiplication or division procedures based on the result. Aider created the source file and committed the code, as shown in the output. The initial COBOL program followed standard structure, including the IDENTIFICATION DIVISION, DATA DIVISION, and the required procedures.

  • To demonstrate Aider’s linting capabilities, we intentionally introduced a typo in IDENTIFICATION DIVISION and an indentation error in the DATA DIVISION. When linting was run, Aider first flagged the keyword typo; after correction, it then detected the indentation issue, showing that errors are caught and reported sequentially, with clear guidance for each fix.
  • The structure of the COBOL program is mapped using the custom cobol-tags.scm file defined earlier. This mapping enables Aider to extract program entities, such as program names, sections, and procedures directly from the parse tree, allowing for efficient codebase navigation and analysis.

With these features in place, Aider not only automates COBOL code generation but also provides robust linting and precise repo mapping based on our custom grammar and tags configuration. This ensures reliable error detection and a clear understanding of code structure throughout the development workflow.

Transparency Note
This blog post was drafted with the support of generative AI to help structure and formulate the content. However, the technical background was thoroughly researched by our team beforehand, and we consider the topic highly relevant and worth sharing. The final content has been carefully reviewed and approved by us before publication.