Building pymqrest: What I Learned Letting AI Write a Production Library
The author built pymqrest almost entirely with AI and wrote virtually none of the code by hand. The result is as good as or better than anything produced in a 40-year career. This is an honest account of what worked, what did not, and what surprised him.
Building pymqrest: What I Learned Letting AI Write a Production Library
I just published pymqrest5, a Python wrapper for the IBM MQ administrative REST API. I built it almost entirely with AI – Claude and Codex – and I wrote virtually none of the code by hand. The result is as good as or better than anything I have produced in my entire career, and we did it in a few weeks.
This is not an AI evangelism piece. A few months ago, I was deeply skeptical of AI-assisted engineering. Part of me went into this project hoping to come out the other side saying, “Yeah, this sucks. You can’t use this to write code.” I could not have been more wrong. What follows is an honest account of what worked, what did not, and what surprised me.
25 years of infrastructure wrappers
I am a site reliability engineer. I build infrastructure. For nearly 40 years, my main interest has been making vendor products manageable through good APIs and high-level tooling.
That started in 1999, when I released the Perl 5 MQSeries module on CPAN – the first product released under an open-source license from Morgan Stanley1. The module wrapped IBM’s C API and added an object-oriented layer that made it easy to build both MQ applications and administrative tooling in Perl. It also wrapped the Programmable Command Format (PCF), the binary administrative messaging protocol, which was the only way to manage MQ programmatically at the time. That Perl API became the foundation for extremely powerful infrastructure management tooling at several firms over the following decade.
I went on to build similar wrappers for other products: AFS::Command for the Andrew File System, a NetApp filer management library, and several proprietary vendor wrappers that were never released. The pattern was always the same: vendor products ship with APIs that are technically functional but painful to use at scale, and a high-level wrapper that abstracts the pain is where the real value lives.
When IBM finally introduced a REST API for MQ administration several years ago, I was thrilled. Working with PCF is a major pain. Return values come back as numeric C macro constants that you have to map to usable representations. The REST API replaces all of that with standard JSON over HTTPS.
But there was a gap. In the Python ecosystem, the existing MQ libraries – pymqi, its IBM successor ibmmq, and their forks – all wrap the C MQI, the low-level messaging interface. They require native C libraries installed locally and use binary protocols. None of them touch the REST API2. People were writing ad-hoc Python scripts with requests and pycurl to hit the REST endpoints, but nobody had packaged a proper library. I decided to close that loop.
REST is the assembler of web programming
REST is a standard, well-understood paradigm. It is also tedious. The protocol mechanics – constructing HTTP requests, parsing responses, managing sessions, handling errors – are boilerplate. They are the same for every interaction you do. The real complexity lives in the payloads: knowing what to send in your requests and understanding what comes back in your responses. That is where all the application-specific knowledge concentrates.
If you are scripting in Python and writing utilities to manage MQ, you do not want to be thinking about JSON payload construction and HTTP error checking. You want to be thinking in the MQ domain: creating objects, updating them, querying their status. That is the goal of pymqrest – factor out all the plumbing so you think in MQ, not in HTTP.
One of the most important features the library adds is namespace translation. MQ’s attribute names are relics of the mainframe era – terse, cryptic MQSC names inherited from the 20th century mainframe platform. pymqrest maps these bidirectionally to clean Python snake_case names. When you send a request, your Python-friendly names are translated to MQSC format. When the response comes back, the MQSC names are translated back. You never have to think about the legacy namespace unless you want to.
Extracting the data to build that namespace mapping from IBM’s publicly available documentation was one of the biggest early wins with AI. It would have taken a human days to systematically pull the attribute tables, normalize them, and validate the mappings. The AI did it in hours.
Design and approach
This is a problem space I have worked in for my entire career. I have been working with MQ products since 1998. I was working in a language I know well. I had a precise understanding of exactly what I wanted this library to look like and what I wanted it to be.
That matters. Domain expertise defined the design goals and the quality bar. The AI executed; I architected.
The development process was iterative and, in structure, identical to how I would manage any project with a team of humans. Define the design goals. Discuss them with the AI. Create issues. Work through them one at a time. Plan, implement, review, merge. I used both Claude and Codex at various points. There were subtle differences in how they behaved – I prefer Claude at this point because it is a bit more verbose, which suits my style – but the overall process was the same with both.
My original goal was to be the project manager and architect, not the hands-on developer, tester, or deployer. I wanted the AI to do as much of that work as possible. I was successful. I did not write a single line of production code in this repository. Maybe I touched a couple of files by hand early in the project, but then I committed to the role and stayed in it.
The asymmetry: code was easy, process was hard
The easy part, surprisingly, was getting the AI to write code that looked like I wanted it to look. It is extremely good at that. In fact, it is quite terrifying how good it is. As long as you are not terribly ambiguous in what you want, you will usually get something that is pretty much what you asked for.
The hard part was getting the AI to follow the operational rules of managing the code. I am talking about the step-by-step process: create a branch, commit the code, submit a PR, get it reviewed, merge the branch, clean up afterwards. The mundane logistics of being a developer working with git and GitHub to get a code change deployed into your base branch. That was surprisingly more difficult than I expected.
The more room I gave the agent to make its own decisions, the less predictable its behavior became. And there were certain things it got wrong all the time. Claude seems to not understand the right way to do heredocs in bash. Almost every time, the syntax breaks, it figures out why, fixes it, and moves on. Little mundane annoyances like that kept getting in the way.
I like the analogy that working with AI is like working with a genie – but not the evil genie who twists your wish into the worst possible interpretation. More like a genie who is trying earnestly to satisfy your wish but interprets it so literally that you get something that was not quite what you intended. Not out of maliciousness, but because you were not precise enough in what you asked for. That can be frustrating. It is also, in the grand scheme of things, very much like working with a human being. You give humans a description of something to do, you assume they will understand the context, and they may or may not do it perfectly.
Prompt engineering is learning how to ask in such a way that you get the result you want. That has been very doable. The weakness is not in the code – it is in the operational space.
Enforcement over trust
Human developers learn in a hurry: do not commit code directly to the develop branch. You cannot push it to origin. It is an eternal branch and it is protected. Even with that rule documented and referenced in the initialization files, more often than not the agent would try to hack on develop.
This is not a learning problem that gets better over time. Large language models are stateless at inference – model weights are fixed and read-only. As Anthropic’s own engineering team has written, “each new session begins with no memory of what came before”3. OpenAI’s API documentation is equally direct: “Each text generation request is independent and stateless”4. The agent adapts within a session – it sees its error, learns for that session – but the learning does not persist. Next session, same guardrail violation.
Product-level features like Claude Code’s auto-memory and ChatGPT’s saved memories partially compensate by injecting context from prior sessions into the prompt. But these are file-based scaffolding, not model learning. If you delete the files, the “memory” disappears. The model itself is a blank slate every time. This is exactly why the enforcement infrastructure I am about to describe is necessary.
The solution was aggressive enforcement. I developed more tooling to control AI agent behavior than I have ever needed with human developers:
- Pre-commit hooks that block commits to eternal branches. The first time the agent hits the hook, it sees the error and does the right thing for the rest of the session.
- Ruff with nearly all rules enabled. We started with everything turned on and worked backwards, adding exceptions only where rules were contradictory or unjustified. The goal was extremely aggressive auditing.
- mypy and ty for type checking, running both in parallel.
- GitHub Actions as additional hard gates, running the full validation suite on every PR.
- Python and shell scripts encapsulating every operational procedure – branching, releasing, validating, deploying – so the agent executes a script rather than improvising a workflow.
When you cannot audit a standard, implement it with a script. The more I reduced the agent’s decision-making latitude, the more predictable and correct its output became. That is the central operational lesson of this project.
What AI made possible
The payoff for all that enforcement infrastructure was enormous.
I was able to cover the entire MQ administrative command namespace from day one. Every command, every qualifier, every attribute – including z/OS-specific commands that I have no way to integration-test without access to a mainframe. Unit tests cover them; integration tests cover everything that can run on a Linux-based queue manager.
With a team of humans, we would never have done this. We would have implemented the subset of the API we needed immediately and deferred the rest to the backlog. If we never used a given subset, we might never implement support for it. But I wanted this to be a truly generic, fully featured library, and the AI made it feasible to go both broad and deep in a way that I could not do with humans.
The ability to send the agent off to do a complete sanity check – “compare this code against these standards, tell me what we missed” – replaced operations that would have taken humans days. Minutes, not days. Being able to iterate at a pace that is at least an order of magnitude faster than anything I have done in the past meant I could focus on covering edge conditions and corner cases that I would have otherwise skipped.
The documentation is incredibly thorough. There might even be too much of it. But when the cost of producing documentation drops to near zero, the calculus changes.
AI-independence of the artifact
One thing I tried very hard to get right: pymqrest was built with AI, but it does not require AI to use or maintain.
I made sure that as much of the procedures and processes as possible are scripted, automated, and clearly documented. If you want to make a code contribution but refuse to work with AI tools, I respect that. I was in the same camp a few short months ago. You should still be able to work with this code and make changes. I will review contributions and maintain them to fit the project’s standards – using AI-assisted tooling – but I have not set the bar so high that you must use AI to participate.
This matters especially for infrastructure tooling. We have produced an artifact using AI which is independent of AI in its usage. AI was leveraged for creation and management, not embedded as a dependency.
The result
pymqrest 1.0 is published on PyPI5. I labeled it beta because starting a first release in production felt wrong, even though I think it is good enough to use for real.
The project took roughly two to three weeks. A significant chunk of that time was spent engineering the process rather than the product – my own growing pains learning to work with AI agents. If I blew this away and started from scratch with what I know now, I think I could get the entire thing done in under a week.
The end result looks as good as or better than anything I have produced in my entire career. I take complete and full responsibility for every single line of code in this repository. It is very important that there still be human management, human decision-making, and human ownership. I am that human for this product.
I am inviting people – especially AI skeptics, as I was – to look at this code and this documentation. Please, criticize it. Find the flaws. What did we miss?
The skeptic’s reversal
The most surprising part of this entire exercise was how much fun it was.
I have been developing, coding, scripting, and building enterprise-scale systems for nearly 40 years. The job has always been varying degrees of fun and frustration. The most fun part was typically creating things in the code. This was a very different kind of fun. I did not write the code. I directed its creation. And I enjoyed it more than I expected.
I understand why so many people are pushing back against AI. It is disruptive to our industry and to society in ways we are all just beginning to understand. There is a valid reason to be deeply concerned about the role AI is playing in our culture and professional careers.
But the bottom line is this: I have seen in the last few months that the majority of my mechanical skill set is now obsolete. I could have written every line of code in this repository by hand. Nothing that was committed was something I did not understand. But it would have taken me many months. Instead, it took weeks. And the quality is higher than it would have been, because AI let me spend my time on architecture and edge cases instead of typing.
I went into this as a skeptic, expecting frustration. Part of me was hoping to get to the end and say, “This is crap. You cannot use this to write code.” I could not have been more wrong.
I am looking forward now to figuring out how to leverage this tooling to produce more of the software I have dreamed up over a career – to make ideas into reality faster than I ever could before.
Bibliography
W. Phillip Moore, “MQSeries,” CPAN, first released November 11, 1999. Copyright (c) 1999-2012 Morgan Stanley & Co. Incorporated. Artistic License v1.0. The next known Morgan Stanley open-source release was A+ (January 22, 2001), approximately 14 months later.
Based on a survey of PyPI, GitHub, IBM community resources, and MQ forums conducted February 2026. All existing Python MQ packages (pymqi, ibmmq, py3mqi, pymqi2) wrap the C MQI, not the REST API. IBM provides REST API admin samples in .NET and JavaScript but not Python. Ad-hoc Python scripts using requests and pycurl exist in blog posts but no pip-installable library was found.
Anthropic Engineering, “Effective Harnesses for Long-Running Agents,” Anthropic Engineering Blog, November 26, 2025.
OpenAI, “Conversation State,” OpenAI Platform Documentation, 2025.
W. Phillip Moore, “pymqrest,” PyPI, first released February 2026. GNU General Public License v3.0 or later.