AI-Assisted Autonomous Software Development: Experiments with Claude Code

Phased, markdown-based autonomous project development framework with Claude Code CLI. Exploring the boundaries of AI-assisted development.

2024-12-01

An R&D Initiative: Testing AI as a Development Partner

As a developer, it is possible to approach AI tools with skepticism: “Code completion works, but I’m the one making the real decisions.” That skepticism is legitimate. But the real question is this: can AI support a project not just by writing code, but by planning, documenting decisions, and tracking progress?

We are moving past the era when AI tools in software development were used only for code completion or debugging. A bigger question looms: can AI plan and implement a project from start to finish? To answer that question, we designed an autonomous development framework using Claude Code CLI.

This is not a client project — it is a pure R&D initiative. We launched it to learn what works and what does not. The results were both surprising and illuminating.

Claude Code CLI: More Than Just a Chatbot

Claude Code is a CLI tool that runs in the terminal. It can read and write files, execute commands, and interpret terminal output. Rather than a simple question-and-answer loop, it is a tool capable of interacting with a real development environment.

Our approach: define the project through markdown files containing all context and decisions, then run Claude Code with those files as reference.

The directory structure looks like this:

project/
  SPEC.md          # What we're building, why, constraints
  PLAN.md          # Phases, tasks, dependencies
  DECISIONS.md     # Architectural decisions and their rationale
  PROGRESS.md      # Completed work, blockers, notes
  src/
  tests/

Markdown-Based Workflow

For each phase of the project we prepare a separate markdown document. Claude Code reads these documents to understand the context and then executes the next step.

SPEC.md: What Are We Building?

# Project: Automated Invoice Processing System

## Purpose
A tool that extracts data from PDF invoices and transfers it to the accounting system.

## Constraints
- Python 3.11+
- PostgreSQL database
- Integration with existing ERP API
- < 3 seconds per transaction

## Success Criteria
- 95% accuracy rate
- 1,000 invoices/day capacity

PLAN.md: Phased Development

## Phase 1: PDF Parsing (Days 1–2)
- [ ] Table extraction with pdfplumber
- [ ] Amount, date, company name detection with regex
- [ ] Unit tests

## Phase 2: Database Layer (Day 3)
- [ ] SQLAlchemy models
- [ ] Migration script
- [ ] CRUD operations

## Phase 3: ERP Integration (Days 4–5)
- [ ] API client
- [ ] Error handling and retry logic
- [ ] E2E tests

Phased Development: Spec → Plan → Implement → Test

At each phase we give Claude Code a task specific to that phase. We do not try to write the entire project in one shot — that approach increases error rates and makes the output harder to control.

Spec phase: Requirements are clarified. Claude Code asks about ambiguous points and documents constraints.

Plan phase: Tasks are broken into atomic pieces. Dependencies are mapped.

Implement phase: One module at a time. PROGRESS.md is updated each time a module is completed.

Test phase: Unit tests are written, edge cases are defined, and results are reported.

What Worked

Repetitive code generation is excellent. CRUD operations, API client boilerplate, migration scripts — Claude Code writes these very quickly and without errors. Developers no longer spend time on routine work like this.

Documentation automation is impressive. It generates docstrings, README files, and API documentation simultaneously as it writes code. This is a step that is normally neglected in most teams.

Debugging capability is strong. When given a stack trace and an error message, it performs similarly to an experienced developer when it comes to finding root causes and offering suggestions.

What Did Not Work

Long-term context management is weak. As the project grows and files multiply, Claude Code struggles to reference previous decisions. DECISIONS.md has to be reintroduced as a reminder each time.

Architectural decisions still require a human. Answering “Should we build this service as a microservice or keep it inside the monolith?” requires knowing the company’s infrastructure, team capacity, and growth plans. Fully conveying this context to AI is difficult.

Test coverage is inconsistent. The happy path is tested well, but anticipating edge cases requires domain knowledge.

Takeaways for Software Teams

This experiment showed us the following: AI-assisted development is not a binary but a spectrum. The tools surpass humans on certain tasks, walk alongside them on others, and are simply not good enough yet on still others.

The most productive mode is not “AI driver, human navigator” but “AI co-pilot, human decision-maker.” The technical output is produced by AI, but which output gets produced is decided by the human.

Our markdown-based workflow offers a framework for how this balance works in practice. We are considering making it open source.

Conclusion

AI-assisted software development does not reduce the number of developers — it changes where developers add value. Routine coding diminishes; architectural thinking and quality control gain importance.

If you are planning how to integrate AI tools into your team or want to modernize your development processes, we would be happy to share our experiences.