I've been a hardcore claude-4-sonnet + Cursor fan for a long time, but in the last 2 months my usage went through the roof. I started with the basic Cursor subscription, then upgraded to pro, until I hit usage limits again. Then I started using my own Claude API key but I was still paying ~70$ / 5 days, which is not that sustainable for me. But since grok-code-fast-1 landed, I've been using it daily with Cursor and it's fantastic, fast and cheap (free so far). I've also been using GPT-5 lately through the official Codex VSCode extension, and it blows my mind. Last night I used gpt-5-medium to help me heavily refactor a react-native app, improved it's structure and overall performance, something that would've taken me at least 2 days. Now I'm testing out gpt-5-medium-codex, asked it to restructure the entire app routing, and it seems it makes a lot of tool calls, understands, executes commands, it's very organized. Overall my stack from now on is Cursor + grok-code-fast-1 for daily use, and Codex/GPT when I need the brainz. Worth noting that I abused gpt-5-medium all day long yesterday, and I never hit any kind of limit (I just used by ChatGPT Plus account), reason for which I thank the OpenAI team
dmix 34 minutes ago [-]
I also hit Cursor usage limits for the first time in a year. Hit limits on Claude, GPT, and then it started using Grok :)
I chose to turn on Cursor's pay per usage within the Pro plan (so I paid $25, $20+$5 usage, instead of upgrading to $60/m) in order to keep using Claude because it's faster than Grok
heymijo 1 hours ago [-]
What exactly did your work flow look like for the gpt-5-medium refactor you did?
I don't have a test like that on hand so I'm really curious what all you prompted the model, what it suggested, and how much your knowledge as a SWE enabled that workflow.
I'd like a more concrete understanding if the mind blowing nature is attainable for any average SWE, an average Joe that tinkers, or only a top decile engineer.
xwowsersx 32 minutes ago [-]
I've landed on more or less the same. grok-code-fast-1 has been working well for most coding tasks. I use it in opencode (I guess it's free for some amount of time? Because I haven't added any grok keys ¯\_(ツ)_/¯)
jumploops 23 hours ago [-]
Interesting, the new model's prompt is ~half the size (10KB vs. 23KB) of the previous prompt[0][1].
SWE-bench performance is similar to normal gpt-5, so it seems the main delta with `gpt-5-codex` is on code refactors (via internal refactor benchmark 33.9% -> 51.3%).
As someone who recently used Codex CLI (`gpt-5-high`) to do a relatively large refactor (multiple internal libs to dedicated packages), I kept running into bugs introduced when the model would delete a file and then rewrite in (missing crucial or important details). My approach would have been to just the copy the file over and then make package-specific changes, so maybe better tool calling is at play here.
Additionally, they claim the new model is more steerable (both with AGENTS.md and generally). In my experience, Codex CLI w/gpt-5 is already a lot more steerable than Claude Code, but any improvements are welcome!
Change the url from x.com to xcancel.com to see it all.
pants2 23 hours ago [-]
Interestingly, "more steerable" can sometimes be a bad thing, as it will tend to follow your prompt to the letter even if that's against your interests. It requires better prompting and generally knowing what you're doing - might be worse for vibe-coders and better for experienced SWEs.
jumploops 20 hours ago [-]
Yes, given a similarly sparse prompt, Claude Code seems to perform "better" because it eagerly does things you don't necessarily know to ask
GPT-5 may underwhelm with the same sparse prompt, as it seems to do exactly what's asked, not more
You can still "fully vibe" with GPT-5, but the pattern works better in two steps:
1. Plan (iterate on high-level spec/PRD, split into actions)
2. Build (work through plans)
Splitting the context here is important, as any LLM will perform worse as the context gets more polluted.
Turskarama 17 hours ago [-]
The best of both worlds would surely be for the LLM to write what you've asked, but also write comments about other things it could have done so you can consider those extra bits when you check the output.
htrp 23 hours ago [-]
think they're indexing here for professional work (people in the VSCode terminal)
siva7 21 hours ago [-]
So you're all saying suddenly codex cli w gpt 5 codex is better than claude code? Hard to believe
jumploops 20 hours ago [-]
Not suddenly, it's been better since GPT-5 launched.
Prompting is different, but in a good way.
With Claude Code, you can use less prompting, and Claude will get token happy and expand on your request. Great for greenfield/vibing, bad for iterating on existing projects.
With Codex CLI, GPT-5 seems to handle instructions much more precisely. It won't just go off on it's own and do a bunch of work, it will do what you ask.
I've found that being more specific up-front gets better results with GPT-5, whereas with Claude, being more specific doesn't necessarily stop the eagerness of it's output.
As with all LLMs, you can't compare apples to oranges, so to clarify, my experiences are primarily with Typescript and Rust codebases.
srcreigh 20 hours ago [-]
Codex CLI of course will sometimes do the wrong thing, or sometimes do something extra that you didn't intend for it to do.
It seems about half my sessions quickly become "why did you do that? rip __ out and just do ___". Then again, most of the other sessions involve Codex correctly inferring what I wanted without having to be so specific.
elcritch 13 hours ago [-]
Yeah, I tried Claude Code CLI and never found it too useful, but that was Claude 3.5 era. Still using Claude 3.7/4.0 via Cursor were much better but still had to micro managed.
GPT5 + Codex CLI has been pretty productive for me. It's able to get a lot right in a simple prompt without getting too distracted with other crap. It's not perfect, but it's pretty good.
I actually worry GPT5-Codex will make it worse on that aspect though. One of the best parts of GPT5/Codex CLI is that it tends to plan and research first, then make code.
mvieira38 3 hours ago [-]
This has been my experience even in Cursor. I often select the GPT-5 option because I know it will "know" better how much reasoning effort it needs
drob518 17 hours ago [-]
Yea, I have struggled with Claude to keep it focused on what I want and only what I want. I have no experience with GPT-5-Codex, but maybe I should.
j45 18 hours ago [-]
Sounds like the difference between finding what needs to be done, making a plan, and executing on it remains something to consider and be aware of.
Claude Code has been a revelation and a bit of a let down the past 45 days.
Some open acknowledgement would have been great, but in lieu of it, it seems it's best to hop on a new tool and make sure you learn how to prompt better and not rely on the model to read between until usage is "optimized" and it no longer seems to work for those folks.
I've seen some interesting files that help any model understand a programming language as it's strong suit and it might not even be an expert in and how to best develop with it.
stpedgwdgfhgdd 6 hours ago [-]
Anthropic acknowledged there were bugs that are now resolved, see their status page for latest info:
The models themselves are responding differently to prior chat requests being run again.
strangescript 12 hours ago [-]
Its been better for awhile, people are sleeping on it, just like they slept on claude code when it initially came out.
barrenko 10 hours ago [-]
People are using claude code + glm models as alternative too, some complaints flying around.
wahnfrieden 19 hours ago [-]
It is 100% true. And they are rapidly losing users to Codex. Charts were shared recently showing a massive migration underway.
CuriouslyC 17 hours ago [-]
Oh yeah, Sonnet performance has been in the toilet for me. They claim they've mitigated it but when 4.0 first dropped CC was really impressive, and now I constantly have to babysit it because any time it hits a challenge it'll just stop trying and make a simple toy version and declare false victory. If I don't catch it and I let it build on top of that bullshit, things get nasty in a hurry.
It's a shame because the plan is a great deal but the number of all caps and profanity laced messages I'm firing off at Claude is too damned high.
resonious 9 hours ago [-]
This hits home for me too. Claude feels like it has gotten more "yes-man"-y. I can no longer trust its judgement. Even if I come in with something dead wrong, I'm "absolutely right" and it finds amazing ways to spin my BS into something vaguely believable.
I am also bullying Claude more nowadays. Seeing this thread, I might give Codex another go (I was on Codex CLI before Claude Code. At that time, Claude blew Codex out of the water but something's changed)
dmazin 11 hours ago [-]
Yes, this. I feel like I’m going crazy. I pay for the extra Opus usage and I keep checking the model switcher to see if it has automatically switched to Sonnet. It has not. I just have a lot more experiences of it feeling anecdotally dumb lately.
wahnfrieden 14 hours ago [-]
GPT-5 is comparable to Opus without needing to constantly dip back down to Sonnet for cost management
I wonder if this means part of the prompt has been moved to a higher level somehow... or baked into the bread elsewhere.
groby_b 18 hours ago [-]
Small suggestion on refactors into packages: Move the files manually. Just tell codex "they used to be in different locations, fix it up so it builds".
It seems that the concept of file moving isn't something Codex (and other clis) handle well yet. (Same goes for removing. I've ~never seen success in tracking moves and removes in the git commit if I ask for one)
artemisart 18 hours ago [-]
Does refactoring mean moving things around for people? Why don't you use your IDE for this, it already handles fixing imports (or use find-replace) and it's faster and deterministic.
jumploops 17 hours ago [-]
Not necessarily -- in the case I posted about, we first abstracted some common functionality to internal libs, and then further abstracted that functionality into a number of packages (so they could be used by other clients).
So it was part simplification (dedupe+consolidate), and part moving files around.
robotswantdata 22 hours ago [-]
Codex CLI IDE just works, very impressed with the quality. If you tried it a while back and didn’t like it, try it again via the vscode extension generous usage included with plus.
Ditched my Claude code max sub for the ChatGPT pro $200 plan. So much faster, and not hit any limits yet.
Did anybody switch away from Aider for a compelling alternative? What was the feature?
gen220 3 hours ago [-]
I was a #1 aider stan for months, but ultimately switched to Claude Code.
For me it was (1) lack of MCP (2) excessive hand-holding required (3) availability of the Max plan meant I wouldn't have to monitor cost.
Claude Code's skill curve felt slightly longer than aider's, and had a higher ceiling.
I think if I were more cost-sensitive, I would give Aider another whirl, or Gemini CLI (based on what I've heard, I haven't used it yet).
3 hours ago [-]
dsrtslnd23 11 hours ago [-]
I still like aider but multi-step agentic flows are very useful so I mostly use codex and claude nowadays. If I want to do very specific edits I use aider.
I also use claude or codex but i do not find them much useful for what i do.
aitchnyu 8 hours ago [-]
Where does regular Aider fall short?
faangguyindia 6 hours ago [-]
bruh, i posted a video for you to see where it falls short. If you are an aider user you can tell based on video what it can or cannot do.
radheyj 25 minutes ago [-]
[dead]
raincole 10 hours ago [-]
Why not Gemini CLI tho?
robotswantdata 3 hours ago [-]
Do not let Gemini CLI write code for you. You will find out the hard way
theshrike79 9 hours ago [-]
It doesn't have a working plan mode.
If you ask it to plan something or suggest something. It'll write the suggestion and dive right into implementation with zero hesitation.
TomaszZielinski 2 hours ago [-]
It's not my experience, but then I have a „living” GEMINI.md doc where I add/clarify/tweak what to do and what not to do. And it's possible the initial revision already contained the correct spell :)
faangguyindia 9 hours ago [-]
gemini cli is such a trash it has ruined gemini 2.5 pro model's reputation, now people think it's incapable of writing good code.
PrayagS 6 hours ago [-]
True. In my experience, it has been good to use for second opinions and code reviews. Usually in CC/Codex via an MCP like zen.
barrenko 10 hours ago [-]
it sucks.
raincole 10 hours ago [-]
The opposite of my experience, but okay.
barrenko 9 hours ago [-]
I'll give it another try, the gemini models themselves are great for me, tried the tool when it came out, didn't gel with it.
steinvakt2 21 hours ago [-]
Im using Cursor with the $20 plan and hit rate limits after 15 days (so im paying extra the rest of the month). What do you recommend I do?
robotswantdata 21 hours ago [-]
You could get two plus accounts? Or maybe a business account with two seats?
Well, it's really a VSCode extension that lets you run Codex CLI in the IDE. Not the "cloud" version of Codex... So GP is technically correct
poszlem 21 hours ago [-]
Wait, what? They now allow claude code like subscription instead of the API too?
robotswantdata 21 hours ago [-]
Yes for at least a month. Download the vscode extension and sign in with ChatGPT
Tiberium 20 hours ago [-]
Yes, just do "codex login" and it'll use your ChatGPT subscription.
energy123 10 hours ago [-]
Do you also get free Codex CLI usage?
gosasan 39 minutes ago [-]
I’ve been using Claude Code ($20/month) for about two weeks now, and with one of the token usage monitors I can handle most of what I need. I also have the $20/month ChatGPT plan. I know I could try Codex CLI, but I’ve been hesitant since I’ve seen people suddenly hit the limit and get locked out for a week. The problem is that there’s no way to check usage. So I’m wondering if this update improves token usage management, or if there’s now a way to actually see our usage so we don’t end up tripping the limit without any warning.
twalichiewicz 19 hours ago [-]
It's been interesting reading this thread and seeing that others have also switched to using Codex over Claude Code. I kept running into a huge issue with Claude Code creating mock implementations and general fakery when it was overwhelmed. I spent so much time tuning my input prompt just to keep it from making things worse that I eventually switched.
Granted, it's not an apples-to-apples comparison since Codex has the advantage of working in a fully scaffolded codebase where it only has to paint by numbers, but my overall experience has been significantly better since switching.
theshrike79 5 hours ago [-]
1) create plan in plan-mode
2) ask it to implement plan
That's the way to work with Claude.
Other systems don't have a bespoke "planning" mode and there you need to "tune your input prompt" as they just rush in to implementation by guessing what you wanted
klipklop 20 hours ago [-]
From my observation of the past 2 weeks is that Claude Code is getting dramatically worse and super low usage quota's while OpenAI Codex is getting great and has a very generous usage quota in comparison.
For people that have not tried it in say ~1 month, give Codex CLI a try.
SecretDreams 16 hours ago [-]
All that matters to the end users is to never be trapped. Cross-shop these products around constantly and go for lowest price, highest performance ratios. We've seen over the last year all companies trade blows, but none are offering something novel within the current space. There is no reason to "stick to one service". But the services will try very hard to keep you stuck for that SaaS revenue.
theshrike79 9 hours ago [-]
Does it still go "your project is using git, let me just YOLO stuff" on first startup?
My essentials for any coding agent are proper whitelists for allowed commands (you can run uv run <anything>, but rm requires approval every time) and customisable slash commands.
I can live without hooks and subagents.
kerpal 47 minutes ago [-]
First time I was able to install Codex with no NPM errors. Gonna give it a shot, seems a lot slower than Claude Code but I'm only using basic Pro with ChatGPT vs. Max 100.
epolanski 18 hours ago [-]
Question, how do I get the equivalent of Claude's "normal mode" in Codex CLI?
It is super annoying that it either vibe codes and just edits and use tools, or it has a plan mode, but no in-between where it asks me whether it's fine it does a or b.
I'm not understanding why it lacks such a capability, why in the world would I want to choose between having to copy paste the edits or auto accept them by default...
klipklop 17 hours ago [-]
Usually I give it a prompt that includes telling it to formulate a plan and not not do any coding until I approve. I will usually do several loops of that before I give it the instruction to go forward with the plan. I like to copy and paste the plan elsewhere because at times these LLM's can "forget" the plan. I usually do testing at each major milestone (either handed off to me or do builds/unit tests.)
epolanski 17 hours ago [-]
Yeah, no way I'm doing copy pasting or allowing it to vibe it.
I want it to help me come up with a plan, execute and check and edit every single edit but with the UX offered by claude, Codex is simply atrocious, I regret spending 23 euros on this.
I see the visual studio code extension does offer something like this, but the UX/UI is terrible, doesn't OAI have people testing those things?
The code is unreadable in that small window[1], doesn't show the lines above/below, it doesn't have IDE tooling (can't inspect types e.g.).
This is just not good, that's the kind of AI that slows me, doesn't help at all.
stopachka 20 hours ago [-]
Very impressive. I've been working on a shared background presence animation, and have been testing out Claude and Codex. (By shared presence, I mean imagine a page's background changing based on where everyone's cursor is)
Both were struggling yesterday, with Claude being a bit ahead. Their biggest problems came with being "creative" (their solutions were pretty "stock"), and they had trouble making the simulation.
Tried the same problem on Codex today. The design it came up with still felt a bit lackluster, but it did _a lot_ better on the simulation.
M4v3R 9 hours ago [-]
> Their biggest problems came with being "creative" (their solutions were pretty "stock")
LLM designed UIs will always look generic/stock if you don’t give it additional prompting because of how LLMs work - they’ve memorized certain design patterns and if you don’t specify what you want they will always default to a certain look.
Try adding additional UI instructions to your prompts. Tell it what color scheme you want, what design choices you prefer, etc. Or tell it to scan your existing app’s design and try to match it. Often the results will be much better this way.
bdcravens 16 hours ago [-]
I literally tried out Codex for the first time this weekend, and the results were ... weird. It'll be interesting to see if it does things differently. (It was a super simple prompt, standing up a Rails app in Docker Compose with a home page and Devise; it hard-coded each file to create inside of the bootstrap.sh, instead of actually creating the files to begin with)
thurn 5 hours ago [-]
Does the "caching containers for Codex Cloud" mean I have some chance of being able to reuse build artifacts between tasks? My Rust project takes around 20 minutes to set up from scratch in a new Codex environment, which seems extremely expensive.
kelvinjps 15 hours ago [-]
I bought chatgpt last month and I think that openai is doing things right now, mostly in the experience, for example it has a better voice mode than Claude's and I liked their new model names than the confusing ones they used to have, it simplified the whole thing. Also as a general assistant is better too, for comparison Claude for non code things is not that very good. And openai keep releasing tools and seems more reliable in their tools
Tiberium 24 hours ago [-]
Only an 1.7% upgrade on SWE-Bench compared to GPT-5, but 33.9 vs 51.3% on their internal code refactoring benchmark. This seems like an Opus 4.1-like upgrade, which is nice to see and means they're serious about Codex.
ryandetzel 7 hours ago [-]
I don't know. Something that takes CC about a minute is now on minute 15 with this new model...
simianwords 20 hours ago [-]
The code review thing might be my favorite UX for AI based development. Largely stays out of your way and provides good comments.
I’m imagining if it can navigate the codebase and modify tests - like add new cases or break the tests by changing a few lines. This can actually verify if the tests were doing actual assertions and being useful.
Thorough reviewing like this probably benefits me the most - more than AI assisted development.
georgeofjungle7 14 hours ago [-]
Cool upgrade, but I wonder how this plays with existing tools like Copilot and Cursor. Everyone’s moving toward “AI pair programmer in every IDE,” and it feels like the competition is less about raw model quality now and more about integration + workflow lock-in. Codex getting everywhere (terminal, GitHub, phone) sounds powerful
inerte 19 hours ago [-]
Oh since when Codex cli is now included as part of a ChatGPT plan? 99% sure that wasn't the case before. Time to try to use it for real.
It's relatively new - they enabled it ~1 month ago or something.
17 hours ago [-]
sanex 17 hours ago [-]
I've considered swapping to Claude since the last update made talking to gpt absolutely terrible. I heavily make use of being able to put in PRs on mobile by working with codex, and if it wasn't for this I'd probably have switched. Excited to see the updates.
CuriouslyC 15 hours ago [-]
Don't. Claude is worse for everything but coding, and even then it's mostly better for coding in greenfield/small projects, and it makes a mess of large projects. The only thing really good about Claude was the plan economics, and now I'm not so sure about it.
theshrike79 6 hours ago [-]
It only makes a mess of large projects if your CLAUDE.md and docs/ are out of date.
It has a very specific style and if your project isn't in that style, it starts to enforce it -> "making a mess".
CuriouslyC 4 hours ago [-]
Nah bro. I have a Claude Code hall of shame, where Sonnet gets derailed by the most trivial shit, and instead of finishing actual research code that's been clearly outlined for it (like, file by file level instructions), it creates a broken toy implementation with fake/simulated output ("XXX isn't working, the user wants me to YYY, let me just try a simpler approach...") and it'll lie about it in the final report, so if you aren't watching the log, sucks to be you.
I have an extensive array of tripwires, provenance chain verifications and variance checks in my code, and I have to treat Claude as adversarial when I let it touch my research. Not a great sign.
alvis 23 hours ago [-]
It's interest to see this quote:
`for the bottom 10% of user turns sorted by model-generated tokens (including hidden reasoning and final output), GPT‑5-Codex uses 93.7% fewer tokens than GPT‑5`
It sounds like it can make simple tasks much more correct. It's impressive to me. Today coding agent tends to pretend they're working hard by generating lots of unnecessary code. Hope it's true
bn-l 22 hours ago [-]
This is my issue with gpt-5. If you use the low or medium reasoning it’s garbage. If you use high, it’ll think for up to five minutes on something dead simple.
srcreigh 20 hours ago [-]
Can you be more specific about what type of code you're talking about, and what makes it garbage?
I'm happy with medium reasoning. My projects have been in Go, Typescript, React Dockerfiles stuff like that. The code almost always works, it's usually not "Clean code" though.
esafak 2 hours ago [-]
Do they demand biometrics to use it?
ianbutler 22 hours ago [-]
I just want the codex models in the API, I won’t touch them until then.
And before someone says it, I do happen to have my own codex like environment complete with development containers, browser, github integration, etc.
And I'm happy to pay a mint for access to the best models.
greyb 21 hours ago [-]
They've said it's coming:
>For developers using Codex CLI via API key, we plan to make GPT‑5-Codex available in the API soon.
ianbutler 21 hours ago [-]
I saw that, but soon doesn’t inspire confidence, and is easy to overlook if they don’t. They didn’t with the previous Codex model.
codybontecou 19 hours ago [-]
Do they have a GitHub action to run in GitHub, similar to Claude?
It isn't quite the same. The Claude Code action can be easily integrated into a workflow to fire automatically (like when a PR is opened).
The Codex support at the moment requires adding a comment "@codex review" which then initiates a cloud based review.
You can, however, directly invoke Codex CLI from a GitHub workflow to do things like perform a code review.
Topfi 22 hours ago [-]
One major improvement I have seen today, even before I saw the announcement, was that the model is far more reliable in using the Task Completion interface to communicate what stage of the prompt is being implemented. Previously this was only shown sparingly (especially in the first few weeks) and if, it didn't properly tick tasks, simply jumping from the first to completion at the end. Now this works very reliably and I do like this improvement, but if I didn't know better, would have suspected this was merely the result of a system prompt change, considering GPT-5 adherence being very solid in my experience, this should have been fixable without a tuned model. Nevertheless, I like this improvement (arguably fix of a previously broken feature).
Beyond that, purely anecdotal and subjective, but this model does seem to do extensive refactors with semi precise step-by-step guidance a bit faster (comparing GPT-5 Thinking (Medium) and GPT-5 Codex (Medium)), though adherence to prompts seems roughly equivalent between the two as of now. In any case, I really feel they should consider a more nuanced naming convention.
New Claude Sonnet 3.7 was a bit of a blunder, but overall, Anthropic has their marketing in tight order compared to OpenAI. Claude Code, Sonnet, Opus, those are great, clear differentiating names.
Codex meanwhile can mean anything from a service for code reviews with Github integration to a series of dedicated models going back to 2021.
Also, while I do enjoy the ChatGPT app integration for quick on-the-go work made easier with a Clicks keyboard, I am getting more annoyed by the drift between Codex VSCode, Codex Website and Codex in the ChatGPT mobile app. The Website has a very helpful Ask button, which can also be used to launch subtasks via prompts written by the model, but such a button is not present in the VSCode plugin, despite subtasks being something you can launch from the VSCode plugin if you have used Ask via the website first. Meanwhile, the iOS app has no Ask button and no sub task support and neither the app, nor VSCode plugin show remote work done beyond abbreviations, whereas the web page does show everything. Then there are the differences between local and remote via VSCode and the CLI, ... To people not using Codex, this must sound insane and barely understandable, but it seems that is the outcome of spreading yourself across so many fields. CLI, dedicated models, VSCode plugin, mobile app, code review, web page, some like Anthropic only work on one or two, others like Augment three, but no one else does that much, for better and worse.
I like using Codex, but it is a mess with such massive potential that needs a dedicated team lead whose only focus is to untangle this mess, before adding more features. Alternatively, maybe interview a few power user on their actual day to day experience, those that aren't just in one, but are using multiple or all parts of Codex. There is a lot of insight to be gained from someone who has an overview off the entire product stack, I think. Sending out a questionnaire to top users would be a good start, I'd definitely answer.
andybak 19 hours ago [-]
Wait. There's Codex support in the mobile app? But on iOS only?
Ffs...
Topfi 7 hours ago [-]
Yeah, it's a pitty but platform support differences between iOS and Android are still alive and well. Funnily enough, the 15 has been my first iPhone ever, mainly because Android device hardware has less and less differentiators (no jack, no expandable storage, no true camera hardware innovations as seen in the Huawei pre ban, ...). iOS is far from perfect, have experienced some insane bugs despite Apple fanatics being adamant that there are none, but considering Google is now killing side loading, I see no reason to switch. A few small experimental manufacturers (Ikko, ...) not withstanding, hardware is closer to Apples product line than ever and software now offers neither flexibility, nor added stability, so what's the point, especially as even Google makes more from iOS [0], so why keep setting money on fire running that platform...
Regardless of that, Codex needs to both come to Android for parity and the app features need to be expanded towards parity with the web page.
I have seen fluctuations in token/sec. Early yesterday, roughly equivalent to none Codex GPT-5 (this branding ...), late yesterday I had a severe drop off in token/sec. Today, it seems to have improved again and with the lowered amount of unnecessary/rambling token output, GPT-5-Codex (Medium) seems faster overall. LLM rollouts always have this back and forth in token/sec, especially in the first few days.
e1g 18 hours ago [-]
Extremely slow for me - takes minutes to get anything done. Regular GPT5 was much faster. Hoping it’s mostly due to the launch day.
bigwheels 18 hours ago [-]
I've been using gpt-5 on effort=high but for gpt-5-codex, try: `-c model_reasoning_effort=medium`.
On high it is totally unusable.
replyfabric-ai 4 hours ago [-]
even on medium ... gpt-5 was way faster, at least that's my first impression
18 hours ago [-]
simianwords 20 hours ago [-]
OpenAI is starting its new era of specialized models. Guess they gave up on a monolithic model approach
CuriouslyC 15 hours ago [-]
If you try to optimize for everything you get a model that's good at nothing (or hyper expensive to train and run). Simple economics. There is no free lunch.
14 hours ago [-]
arthurcolle 16 hours ago [-]
Agent-1
king_magic 19 hours ago [-]
Doesn't seem ready for prime-time. I'll be impressed when it actually installs.
codex cli has been out for 5 months with 170k weekly downloads - it's definitely 'ready for primetime' if you can get past your bug. I don't have that issue.
king_magic 9 hours ago [-]
The install errors out after following two basic install commands. This is poor code hygiene on the part of the authors. That's "not ready for primetime" in my book.
Please tell me you are not running a NodeJS release that hasn't been supported for at least 3 years...
Optional chaining was added in v14 (2020), and it sure looks like that is the issue here.
king_magic 8 minutes ago [-]
[delayed]
j45 21 hours ago [-]
Has anyone hit any programming usage limits with the ChatGPT 5 Pro account?
robotswantdata 20 hours ago [-]
None yet, feels unlimited.
Huge repos too
incomingpain 24 hours ago [-]
Still waiting on codex cli to support lm studio.
NitpickLawyer 21 hours ago [-]
? Isn't lmstudio API openai compatible? Codex cli already supports 3rd party models, you have to edit the config yaml file, and you can add many model providers.
incomingpain 16 hours ago [-]
I never managed to get it to work. I used chatgpt to try to do it for me :)
I have it go to openrouter, and then you just export the API key and run codex, works smoothly.
trilogic 20 hours ago [-]
[flagged]
laidoffamazon 20 hours ago [-]
I agree because I don’t want to get fired or get audited
trilogic 10 hours ago [-]
Jeez -2! No sense of humor left, my comment was sent to the very end. That´s sad and says a lot. Is it because I forgot to mention Grok, that you are pissed :)) It is a joke for god sake, laugh it while you still can. What a narrative damn.
I chose to turn on Cursor's pay per usage within the Pro plan (so I paid $25, $20+$5 usage, instead of upgrading to $60/m) in order to keep using Claude because it's faster than Grok
I don't have a test like that on hand so I'm really curious what all you prompted the model, what it suggested, and how much your knowledge as a SWE enabled that workflow.
I'd like a more concrete understanding if the mind blowing nature is attainable for any average SWE, an average Joe that tinkers, or only a top decile engineer.
SWE-bench performance is similar to normal gpt-5, so it seems the main delta with `gpt-5-codex` is on code refactors (via internal refactor benchmark 33.9% -> 51.3%).
As someone who recently used Codex CLI (`gpt-5-high`) to do a relatively large refactor (multiple internal libs to dedicated packages), I kept running into bugs introduced when the model would delete a file and then rewrite in (missing crucial or important details). My approach would have been to just the copy the file over and then make package-specific changes, so maybe better tool calling is at play here.
Additionally, they claim the new model is more steerable (both with AGENTS.md and generally). In my experience, Codex CLI w/gpt-5 is already a lot more steerable than Claude Code, but any improvements are welcome!
[0]https://github.com/openai/codex/blob/main/codex-rs/core/gpt_...
[1]https://github.com/openai/codex/blob/main/codex-rs/core/prom...
SWE-bench is a great eval, but it's very narrow. Two models can have the same SWE-bench scores but very different user experiences.
Here's a nice thread on X about the things that SWE-bench doesn't measure:
https://x.com/brhydon/status/1953648884309536958
https://nitter.net/brhydon/status/1953648884309536958
GPT-5 may underwhelm with the same sparse prompt, as it seems to do exactly what's asked, not more
You can still "fully vibe" with GPT-5, but the pattern works better in two steps:
1. Plan (iterate on high-level spec/PRD, split into actions)
2. Build (work through plans)
Splitting the context here is important, as any LLM will perform worse as the context gets more polluted.
Prompting is different, but in a good way.
With Claude Code, you can use less prompting, and Claude will get token happy and expand on your request. Great for greenfield/vibing, bad for iterating on existing projects.
With Codex CLI, GPT-5 seems to handle instructions much more precisely. It won't just go off on it's own and do a bunch of work, it will do what you ask.
I've found that being more specific up-front gets better results with GPT-5, whereas with Claude, being more specific doesn't necessarily stop the eagerness of it's output.
As with all LLMs, you can't compare apples to oranges, so to clarify, my experiences are primarily with Typescript and Rust codebases.
It seems about half my sessions quickly become "why did you do that? rip __ out and just do ___". Then again, most of the other sessions involve Codex correctly inferring what I wanted without having to be so specific.
GPT5 + Codex CLI has been pretty productive for me. It's able to get a lot right in a simple prompt without getting too distracted with other crap. It's not perfect, but it's pretty good.
I actually worry GPT5-Codex will make it worse on that aspect though. One of the best parts of GPT5/Codex CLI is that it tends to plan and research first, then make code.
Claude Code has been a revelation and a bit of a let down the past 45 days.
Some open acknowledgement would have been great, but in lieu of it, it seems it's best to hop on a new tool and make sure you learn how to prompt better and not rely on the model to read between until usage is "optimized" and it no longer seems to work for those folks.
I've seen some interesting files that help any model understand a programming language as it's strong suit and it might not even be an expert in and how to best develop with it.
https://status.anthropic.com/
It's a shame because the plan is a great deal but the number of all caps and profanity laced messages I'm firing off at Claude is too damned high.
I am also bullying Claude more nowadays. Seeing this thread, I might give Codex another go (I was on Codex CLI before Claude Code. At that time, Claude blew Codex out of the water but something's changed)
It seems that the concept of file moving isn't something Codex (and other clis) handle well yet. (Same goes for removing. I've ~never seen success in tracking moves and removes in the git commit if I ask for one)
So it was part simplification (dedupe+consolidate), and part moving files around.
Ditched my Claude code max sub for the ChatGPT pro $200 plan. So much faster, and not hit any limits yet.
For me it was (1) lack of MCP (2) excessive hand-holding required (3) availability of the Max plan meant I wouldn't have to monitor cost.
Claude Code's skill curve felt slightly longer than aider's, and had a higher ceiling.
I think if I were more cost-sensitive, I would give Aider another whirl, or Gemini CLI (based on what I've heard, I haven't used it yet).
I also use claude or codex but i do not find them much useful for what i do.
If you ask it to plan something or suggest something. It'll write the suggestion and dive right into implementation with zero hesitation.
The $200 pro feels good value personally.
What?
Granted, it's not an apples-to-apples comparison since Codex has the advantage of working in a fully scaffolded codebase where it only has to paint by numbers, but my overall experience has been significantly better since switching.
2) ask it to implement plan
That's the way to work with Claude.
Other systems don't have a bespoke "planning" mode and there you need to "tune your input prompt" as they just rush in to implementation by guessing what you wanted
For people that have not tried it in say ~1 month, give Codex CLI a try.
My essentials for any coding agent are proper whitelists for allowed commands (you can run uv run <anything>, but rm requires approval every time) and customisable slash commands.
I can live without hooks and subagents.
It is super annoying that it either vibe codes and just edits and use tools, or it has a plan mode, but no in-between where it asks me whether it's fine it does a or b.
I'm not understanding why it lacks such a capability, why in the world would I want to choose between having to copy paste the edits or auto accept them by default...
I want it to help me come up with a plan, execute and check and edit every single edit but with the UX offered by claude, Codex is simply atrocious, I regret spending 23 euros on this.
I see the visual studio code extension does offer something like this, but the UX/UI is terrible, doesn't OAI have people testing those things?
The code is unreadable in that small window[1], doesn't show the lines above/below, it doesn't have IDE tooling (can't inspect types e.g.).
https://i.imgur.com/mfPpMlI.png
This is just not good, that's the kind of AI that slows me, doesn't help at all.
Both were struggling yesterday, with Claude being a bit ahead. Their biggest problems came with being "creative" (their solutions were pretty "stock"), and they had trouble making the simulation.
Tried the same problem on Codex today. The design it came up with still felt a bit lackluster, but it did _a lot_ better on the simulation.
LLM designed UIs will always look generic/stock if you don’t give it additional prompting because of how LLMs work - they’ve memorized certain design patterns and if you don’t specify what you want they will always default to a certain look.
Try adding additional UI instructions to your prompts. Tell it what color scheme you want, what design choices you prefer, etc. Or tell it to scan your existing app’s design and try to match it. Often the results will be much better this way.
I’m imagining if it can navigate the codebase and modify tests - like add new cases or break the tests by changing a few lines. This can actually verify if the tests were doing actual assertions and being useful.
Thorough reviewing like this probably benefits me the most - more than AI assisted development.
It has a very specific style and if your project isn't in that style, it starts to enforce it -> "making a mess".
I have an extensive array of tripwires, provenance chain verifications and variance checks in my code, and I have to treat Claude as adversarial when I let it touch my research. Not a great sign.
It sounds like it can make simple tasks much more correct. It's impressive to me. Today coding agent tends to pretend they're working hard by generating lots of unnecessary code. Hope it's true
I'm happy with medium reasoning. My projects have been in Go, Typescript, React Dockerfiles stuff like that. The code almost always works, it's usually not "Clean code" though.
And before someone says it, I do happen to have my own codex like environment complete with development containers, browser, github integration, etc.
And I'm happy to pay a mint for access to the best models.
>For developers using Codex CLI via API key, we plan to make GPT‑5-Codex available in the API soon.
The Codex support at the moment requires adding a comment "@codex review" which then initiates a cloud based review.
You can, however, directly invoke Codex CLI from a GitHub workflow to do things like perform a code review.
Beyond that, purely anecdotal and subjective, but this model does seem to do extensive refactors with semi precise step-by-step guidance a bit faster (comparing GPT-5 Thinking (Medium) and GPT-5 Codex (Medium)), though adherence to prompts seems roughly equivalent between the two as of now. In any case, I really feel they should consider a more nuanced naming convention.
New Claude Sonnet 3.7 was a bit of a blunder, but overall, Anthropic has their marketing in tight order compared to OpenAI. Claude Code, Sonnet, Opus, those are great, clear differentiating names.
Codex meanwhile can mean anything from a service for code reviews with Github integration to a series of dedicated models going back to 2021.
Also, while I do enjoy the ChatGPT app integration for quick on-the-go work made easier with a Clicks keyboard, I am getting more annoyed by the drift between Codex VSCode, Codex Website and Codex in the ChatGPT mobile app. The Website has a very helpful Ask button, which can also be used to launch subtasks via prompts written by the model, but such a button is not present in the VSCode plugin, despite subtasks being something you can launch from the VSCode plugin if you have used Ask via the website first. Meanwhile, the iOS app has no Ask button and no sub task support and neither the app, nor VSCode plugin show remote work done beyond abbreviations, whereas the web page does show everything. Then there are the differences between local and remote via VSCode and the CLI, ... To people not using Codex, this must sound insane and barely understandable, but it seems that is the outcome of spreading yourself across so many fields. CLI, dedicated models, VSCode plugin, mobile app, code review, web page, some like Anthropic only work on one or two, others like Augment three, but no one else does that much, for better and worse.
I like using Codex, but it is a mess with such massive potential that needs a dedicated team lead whose only focus is to untangle this mess, before adding more features. Alternatively, maybe interview a few power user on their actual day to day experience, those that aren't just in one, but are using multiple or all parts of Codex. There is a lot of insight to be gained from someone who has an overview off the entire product stack, I think. Sending out a questionnaire to top users would be a good start, I'd definitely answer.
Ffs...
Regardless of that, Codex needs to both come to Android for parity and the app features need to be expanded towards parity with the web page.
[0] https://searchengineland.com/report-75-percent-of-googles-mo...
On high it is totally unusable.
npm ERR! code 1 npm ERR! path /usr/local/lib/node_modules/@openai/codex/node_modules/@vscode/ripgrep npm ERR! command failed npm ERR! command sh -c node ./lib/postinstall.js npm ERR! /usr/local/lib/node_modules/@openai/codex/node_modules/@vscode/ripgrep/lib/download.js:199 npm ERR! zipFile?.close();
Optional chaining was added in v14 (2020), and it sure looks like that is the issue here.
I have it go to openrouter, and then you just export the API key and run codex, works smoothly.
It is inevitable.