This project converts legacy Cobol programs into modern Java. A grammar-driven parser and AST transformation pipeline produce runtime-correct Java. The output is correct but verbose: getter/setter pairs, fixed-size arrays, numbered Filler groups, CharSequence shims for byte-layout.
An LLM stage then refactors this into records, streams and clean naming while preserving runtime behaviour. A second, optional LLM stage runs a profile-guided performance pass on hot methods. An integration test suite runs against each stage's output and flags any sample that diverges from the Cobol reference.
The example below is the INVCALC invoice calculation program. Around 220 lines of Cobol become 280 lines of transpiled Java, then 70 lines of modernized Java. All three produce the same output.
*****************************************************************
* Program name: INVCALC
* Original author: Dave Nicolette
*
* Demonstrates typical Cobol business calculations.
*****************************************************************
IDENTIFICATION DIVISION.
PROGRAM-ID. INVCALC.
DATA DIVISION.
WORKING-STORAGE SECTION.
01 FILLER.
05 SALES-TAX-RATE PIC SV9(5) COMP-3 VALUE 0.065.
05 WORKING-INDEX PIC S9(04) COMP.
05 CUMULATIVE-PRICE-BEFORE-TAX PIC S9(07)V99 COMP-3.
05 CUMULATIVE-PRICE-WITH-TAX PIC S9(07)V99 COMP-3.
05 CUMULATIVE-SALES-TAX PIC S9(05)V9(03) COMP-3.
05 LINE-WORKING-TOTAL PIC S9(07)V99 COMP-3.
05 LINE-WORKING-TAX PIC S9(05)V9(03) COMP-3.
01 INVOICE.
05 INV-DATE PIC X(08).
05 INV-NUMBER PIC X(08).
05 INV-TOTAL-AMOUNT PIC S9(07)V99 COMP-3.
05 INV-TOTAL-BEFORE-TAX PIC S9(07)V99 COMP-3.
05 INV-TOTAL-SALES-TAX PIC S9(05)V9(03) COMP-3.
05 INV-RETURN PIC X.
88 IS-RETURN VALUE 'R'.
05 INV-LINE-ITEM-COUNT PIC S9(05) COMP-3.
05 INV-LINE OCCURS 1 TO 100
DEPENDING ON INV-LINE-ITEM-COUNT.
10 INV-LINE-SKU PIC X(10).
10 INV-LINE-UNIT-PRICE PIC S9(05)V99 COMP-3.
10 INV-LINE-QUANTITY PIC S9(05) COMP-3.
10 INV-LINE-TAXABLE PIC X.
88 TAXABLE-ITEM VALUE 'T'.
88 NONTAXABLE-ITEM VALUE 'N'.
PROCEDURE DIVISION.
INITIALIZE INVOICE
REPLACING ALPHANUMERIC DATA BY SPACES
NUMERIC DATA BY ZEROES
MOVE '20230914' TO INV-DATE
MOVE 'Sample 1' TO INV-NUMBER
MOVE 3 TO INV-LINE-ITEM-COUNT
MOVE 'PROD004411' TO INV-LINE-SKU(1)
MOVE 18.55 TO INV-LINE-UNIT-PRICE(1)
MOVE 2 TO INV-LINE-QUANTITY(1)
SET TAXABLE-ITEM(1) TO TRUE
MOVE 'PROD004412' TO INV-LINE-SKU(2)
MOVE 6.32 TO INV-LINE-UNIT-PRICE(2)
MOVE 4 TO INV-LINE-QUANTITY(2)
SET NONTAXABLE-ITEM(2) TO TRUE
MOVE 'PROD004413' TO INV-LINE-SKU(3)
MOVE 2.28 TO INV-LINE-UNIT-PRICE(3)
MOVE 8 TO INV-LINE-QUANTITY(3)
SET TAXABLE-ITEM(1) TO TRUE
MOVE ZERO TO CUMULATIVE-PRICE-BEFORE-TAX
CUMULATIVE-PRICE-WITH-TAX
CUMULATIVE-SALES-TAX
WORKING-INDEX
PERFORM WITH TEST BEFORE
VARYING WORKING-INDEX
FROM 1 BY 1
UNTIL WORKING-INDEX > INV-LINE-ITEM-COUNT
IF INV-LINE-QUANTITY(WORKING-INDEX) IS NUMERIC
AND INV-LINE-UNIT-PRICE(WORKING-INDEX) IS NUMERIC
MOVE ZERO TO LINE-WORKING-TOTAL LINE-WORKING-TAX
MULTIPLY
INV-LINE-QUANTITY(WORKING-INDEX)
BY INV-LINE-UNIT-PRICE(WORKING-INDEX)
GIVING LINE-WORKING-TOTAL
END-MULTIPLY
ADD LINE-WORKING-TOTAL TO CUMULATIVE-PRICE-BEFORE-TAX
IF TAXABLE-ITEM(WORKING-INDEX)
MULTIPLY LINE-WORKING-TOTAL
BY SALES-TAX-RATE
GIVING LINE-WORKING-TAX
ADD LINE-WORKING-TAX TO LINE-WORKING-TOTAL
END-IF
ADD LINE-WORKING-TOTAL TO CUMULATIVE-PRICE-WITH-TAX
ADD LINE-WORKING-TAX TO CUMULATIVE-SALES-TAX
ELSE
PERFORM INVALID-INVOICE-DATA
END-IF
END-PERFORM
MOVE CUMULATIVE-SALES-TAX TO INV-TOTAL-SALES-TAX
MOVE CUMULATIVE-PRICE-BEFORE-TAX TO INV-TOTAL-BEFORE-TAX
MOVE CUMULATIVE-PRICE-WITH-TAX TO INV-TOTAL-AMOUNT
PERFORM PRINT-INVOICE-DETAILS
GOBACK
.
package fi.vesas.translator;
import java.math.BigDecimal;
import java.math.RoundingMode;
import fi.vesas.translator.util.PicFormatter;
import org.apache.commons.lang3.StringUtils;
public class INVCALC {
public class Filler0Type implements CharSequence {
private BigDecimal salesTaxRate = new BigDecimal("0.065");
public BigDecimal getSalesTaxRate() { return salesTaxRate; }
public void setSalesTaxRate(BigDecimal value) { salesTaxRate = value; }
private int workingIndex;
public int getWorkingIndex() { return workingIndex; }
public void setWorkingIndex(int value) { workingIndex = (int)(value % 10000L); }
private BigDecimal cumulativePriceBeforeTax = BigDecimal.ZERO;
public BigDecimal getCumulativePriceBeforeTax() { return cumulativePriceBeforeTax; }
public void setCumulativePriceBeforeTax(BigDecimal value) { cumulativePriceBeforeTax = value; }
private BigDecimal cumulativePriceWithTax = BigDecimal.ZERO;
public BigDecimal getCumulativePriceWithTax() { return cumulativePriceWithTax; }
public void setCumulativePriceWithTax(BigDecimal value) { cumulativePriceWithTax = value; }
private BigDecimal cumulativeSalesTax = BigDecimal.ZERO;
public BigDecimal getCumulativeSalesTax() { return cumulativeSalesTax; }
public void setCumulativeSalesTax(BigDecimal value) { cumulativeSalesTax = value; }
private BigDecimal lineWorkingTotal = BigDecimal.ZERO;
public BigDecimal getLineWorkingTotal() { return lineWorkingTotal; }
public void setLineWorkingTotal(BigDecimal value) { lineWorkingTotal = value; }
private BigDecimal lineWorkingTax = BigDecimal.ZERO;
public BigDecimal getLineWorkingTax() { return lineWorkingTax; }
public void setLineWorkingTax(BigDecimal value) { lineWorkingTax = value; }
public String toString() { return salesTaxRate.toPlainString() + String.format("%04d", workingIndex) + /* ... */ ""; }
public int length() { return toString().length(); }
public char charAt(int i) { return toString().charAt(i); }
public CharSequence subSequence(int s, int e) { return toString().subSequence(s, e); }
}
private Filler0Type filler0 = new Filler0Type();
public Filler0Type getFiller0() { return filler0; }
public class InvoiceType implements CharSequence {
private String invDate = " ";
public String getInvDate() { return invDate; }
public void setInvDate(String value) { invDate = StringUtils.rightPad(value == null ? "" : value, 8).substring(0, 8); }
private String invNumber = " ";
public String getInvNumber() { return invNumber; }
public void setInvNumber(String value) { invNumber = StringUtils.rightPad(value == null ? "" : value, 8).substring(0, 8); }
private BigDecimal invTotalAmount = BigDecimal.ZERO;
public BigDecimal getInvTotalAmount() { return invTotalAmount; }
public void setInvTotalAmount(BigDecimal value) { invTotalAmount = value; }
// ... invTotalBeforeTax, invTotalSalesTax, invReturn, invLineItemCount
// ... nested InvLineType with taxable/nontaxable 88-level booleans
// ... InvLineType[] invLine = new InvLineType[100];
// ... lazy init: if (invLine[idx] == null) invLine[idx] = new InvLineType();
// ... CharSequence impl, toString concatenates all fields as Cobol byte layout
}
private InvoiceType invoice = new InvoiceType();
public InvoiceType getInvoice() { return invoice; }
// ... InvoiceFormattedType with PIC-clause emulated formatters
// ... private InvoiceFormattedType invoiceFormatted = new InvoiceFormattedType();
public void main() {
getInvoice().setInvDate(" ");
getInvoice().setInvNumber(" ");
getInvoice().setInvTotalAmount(BigDecimal.ZERO);
// ... 10 more zero-initializations
getInvoice().setInvDate("20230914");
getInvoice().setInvNumber("Sample 1");
getInvoice().setInvLineItemCount(3);
getInvoice().getInvLine(0).setInvLineSku("PROD004411");
getInvoice().getInvLine(0).setInvLineUnitPrice(new BigDecimal("18.55"));
getInvoice().getInvLine(0).setInvLineQuantity(2);
getInvoice().getInvLine(0).getInvLineTaxable().setTaxableItem();
// ... line 1 and 2 initialization (same pattern)
for (getFiller0().setWorkingIndex(1);
getFiller0().getWorkingIndex() <= getInvoice().getInvLineItemCount();
getFiller0().setWorkingIndex(getFiller0().getWorkingIndex() + 1)) {
getFiller0().setLineWorkingTotal(BigDecimal.ZERO);
getFiller0().setLineWorkingTax(BigDecimal.ZERO);
getFiller0().setLineWorkingTotal(
BigDecimal.valueOf(getInvoice().getInvLine(getFiller0().getWorkingIndex() - 1).getInvLineQuantity())
.multiply(getInvoice().getInvLine(getFiller0().getWorkingIndex() - 1).getInvLineUnitPrice())
.setScale(2, RoundingMode.DOWN));
getFiller0().setCumulativePriceBeforeTax(
getFiller0().getCumulativePriceBeforeTax().add(getFiller0().getLineWorkingTotal()));
if (getInvoice().getInvLine(getFiller0().getWorkingIndex() - 1).getInvLineTaxable().getTaxableItem()) {
getFiller0().setLineWorkingTax(
getFiller0().getLineWorkingTotal().multiply(getFiller0().getSalesTaxRate()).setScale(3, RoundingMode.DOWN));
getFiller0().setLineWorkingTotal(
getFiller0().getLineWorkingTotal().add(getFiller0().getLineWorkingTax()));
}
getFiller0().setCumulativePriceWithTax(
getFiller0().getCumulativePriceWithTax().add(getFiller0().getLineWorkingTotal()));
getFiller0().setCumulativeSalesTax(
getFiller0().getCumulativeSalesTax().add(getFiller0().getLineWorkingTax()));
}
getInvoice().setInvTotalSalesTax(getFiller0().getCumulativeSalesTax());
getInvoice().setInvTotalBeforeTax(getFiller0().getCumulativePriceBeforeTax());
getInvoice().setInvTotalAmount(getFiller0().getCumulativePriceWithTax());
printInvoiceDetails();
}
// ... printInvoiceDetails() with 30+ lines of StringUtils.overlay / PicFormatter calls
public static void main(String[] args) throws Exception {
INVCALC instance = new INVCALC();
instance.main();
}
}
Abridged for display. Full output is around 280 lines.
package fi.vesas.translator;
import java.math.BigDecimal;
import java.math.RoundingMode;
import java.util.List;
public class Invcalc {
private static final BigDecimal SALES_TAX_RATE = new BigDecimal("0.065");
public record InvoiceLine(String sku, BigDecimal unitPrice, int quantity, boolean taxable) {
public BigDecimal baseTotal() {
return unitPrice.multiply(BigDecimal.valueOf(quantity)).setScale(2, RoundingMode.DOWN);
}
public BigDecimal tax() {
return taxable
? baseTotal().multiply(SALES_TAX_RATE).setScale(3, RoundingMode.DOWN)
: BigDecimal.ZERO;
}
public BigDecimal totalWithTax() {
return baseTotal().add(tax());
}
}
public record Invoice(String date, String number, List<InvoiceLine> lines) {
public BigDecimal totalBeforeTax() {
return lines.stream().map(InvoiceLine::baseTotal).reduce(BigDecimal.ZERO, BigDecimal::add);
}
public BigDecimal totalSalesTax() {
return lines.stream().map(InvoiceLine::tax).reduce(BigDecimal.ZERO, BigDecimal::add);
}
public BigDecimal totalAmount() {
return lines.stream().map(InvoiceLine::totalWithTax).reduce(BigDecimal.ZERO, BigDecimal::add);
}
}
public static void main(String[] args) {
Invoice invoice = new Invoice(
"20230914",
"Sample 1",
List.of(
new InvoiceLine("PROD004411", new BigDecimal("18.55"), 2, true),
new InvoiceLine("PROD004412", new BigDecimal("6.32"), 4, false),
new InvoiceLine("PROD004413", new BigDecimal("2.28"), 8, false) // see reviewer note #3
)
);
printInvoice(invoice);
}
private static void printInvoice(Invoice invoice) {
System.out.println();
System.out.println("----------------------------------------");
System.out.printf("Invoice Number: %s%n", invoice.number());
System.out.printf("Invoice Date: %s/%s/%s%n",
invoice.date().substring(0, 4),
invoice.date().substring(4, 6),
invoice.date().substring(6, 8));
System.out.printf("Total Amount: $%6.2f%n", invoice.totalAmount());
System.out.printf("Total Before Tax: $%6.2f%n", invoice.totalBeforeTax());
System.out.printf("Total Sales Tax: $%.3f%n", invoice.totalSalesTax());
System.out.printf("Sales Tax Rate: %.5f%n", SALES_TAX_RATE);
int lineNumber = 1;
for (InvoiceLine line : invoice.lines()) {
System.out.println();
System.out.printf("Line %2d%n", lineNumber++);
System.out.printf("SKU %s%n", line.sku());
System.out.printf("Quantity %d%n", line.quantity());
System.out.printf("Unit Price: $%6.2f%n", line.unitPrice());
System.out.println(line.taxable() ? "Taxable Item" : "Nontaxable Item");
}
}
}
Around 70 lines. Runtime output matches the raw transpiled version, verified by the integration test suite.
Filler0Type held loop-local accumulators (cumulative*, lineWorking*) that in Cobol shared memory with the data division, but in Java have no reason to be class fields. They collapse into stream reductions on InvoiceLine.
List<InvoiceLine>.
Cobol's OCCURS 1 TO 100 DEPENDING ON maps to a bounded array with lazy init in the raw Java. The bound isn't load-bearing — it's a legacy memory constraint — so modernization uses an unbounded List. Flag if callers relied on index-based mutation semantics.
SET TAXABLE-ITEM(1) TO TRUE but appears in the line-3 initialization block — likely a copy-paste typo for (3). Runtime leaves line 3 non-taxable. Modernized code preserves this (line 3 taxable = false) to maintain equivalence. Recommend a separate human-reviewed bug-fix pass.
IS NUMERIC check dropped.
The Cobol IF ... IS NUMERIC guard is redundant once types are Java int / BigDecimal. Both the raw transpiler and the modernized code elide it; no behavioural change on valid input. Flagged for auditors.
printf.
The InvoiceFormattedType class and PicFormatter utility emulate Cobol PIC clauses character-by-character. For display-only output, Java format strings match byte-for-byte and remove an entire helper class.
TAXABLE-ITEM / NONTAXABLE-ITEM (Cobol 88-levels on a single character field) become a boolean taxable. The InvLineTaxableType wrapper class disappears.
CharSequence shims dropped.
The raw Java implements CharSequence on every structure to preserve Cobol's byte-layout MOVE semantics. Not exercised in this program's call graph; removed. If other programs MOVE a whole structure to a PIC X field, keep the shim there.
Stage 0 is the Cobol source. Stage 1 is the ANTLR4-driven transpiler, producing verbose but runtime-correct Java. Stage 2 sends this Java and a compact summary of the Cobol AST to Claude Sonnet 4.6 with a style guide covering records, streams, naming, and Java idioms. The model returns a JSON envelope with the modernized source and reviewer notes. Stage 3 is described below.
Readability and performance are different goals, so they get different prompts. Stage 2 aims for readability. Stage 3 is a separate pass, driven by profiling.
It takes a JMH benchmark or a representative workload, identifies the hot methods, and rewrites only those.
Typical rewrites include BigDecimal to scaled long, streams to indexed loops, and pooled buffers for per-call allocations.
Cold code keeps its stage 2 form. A benchmark check confirms stage 3 is actually faster than stage 2.
The integration tests run against stage 3 output too, so a speed win that breaks semantics is rejected.
INVCALC above is not a performance case. It runs once and finishes in microseconds, so stage 3 was not applied here.
The style guide and few-shot examples are long and stable, around 8k tokens. Each file-level input is small.
The system prompt has a cache breakpoint, so later files in a batch hit the cache (usage.cache_read_input_tokens > 0).
This cuts per-file latency and cost significantly on a multi-file codebase.
The transpiler ships with an integration test suite that runs the translated Java and compares output against the Cobol reference. Every stage after transpilation reuses this harness: compile the stage output, run the same inputs, diff against the reference. Stage 3 adds a benchmark check on top. A stage whose output diverges is flagged, not shown as clean.