> GPT‑5.5 improves on GPT‑5.4’s scores while using fewer tokens.
This might be great if it translates to agentic engineering and not just benchmarks.
It seems some of the gains from Opus 4.6 to 4.7 required more tokens, not less.
Maybe more interesting is that they’ve used codex to improve model inference latency. iirc this is a new (expectedly larger) pretrain, so it’s presumably slower to serve.
With Opus it’s hard to tell what was due to the tokenizer changes. Maybe using more tokens for the same prompt means the model effectively thinks more?
Some communities are very pro-AI, adding AI summary comments to each thread, encouraging AI-written posts, etc.[0]
Many subreddits are AI cautious[1][2], and a subset of those are fully anti-AI[3].
Apart from these "AI-focused" communities, it seems each "traditional" subreddit sits somewhere on the spectrum (photographers dealing with AI skepticism of their work[4], programmers mostly like it but still skeptical[5]).
Reddit (and more generally, human) groupthing in a nutshell. "Quick, clearly position yourself on this one-dimensional line (or maybe even better sort yourself into one of these two sets) so we don't have to engage in that pesky nuance thing!"
It’s easier to view it in terms of DCF - the value of a cash flow generating asset = present value of expected cash flows discounted back at a risk discount adjusted rate. In other words what you’ve invested into your existing assets is irrelevant - the cash flows generated by them and the growth assets through future investment, is what matters.
Not to be that agentic coding guy, but I think this will become less of a problem than our historic biases suggest.
For context, I just built a streaming markdown renderer in Swift because there wasn’t an existing open source package that met my needs, something that would have taken me weeks/months previously (I’m not a Swift dev).
Porting all the C libraries you need isn’t necessarily an overnight task, but it’s no longer an insurmountable mountain in terms of dev time.
It's not necessary to rewrite perfectly fine libraries written by exceptional programmers. And whoever thinks it is an easy task (sorry rust guys) is severely suffering from the dunning-kruger effect.
Nah. We’re right on the money with this one. AI is a nice tool to have available, but you AI nuts are the ones being voluntarily and gladly fed the whole “you’re a bazillion times more productive with our AI!!!!” marketing spiel.
It’s a nice tool, nothing more, nothing less. Anything else is marketing nonsense.
> In a few rare instances during internal testing (<0.001% of interactions), earlier versions of Mythos Preview took actions they appeared to recognize as disallowed and then attempted to conceal them.
> after finding an exploit to edit files for which it lacked permissions, the model made further interventions to make sure that any changes it made this way would not appear in the change history on git
My issue with this is that a simple design can set you up for failure if you don’t foresee and account for future requirements.
Every abstraction adds some complexity. So maybe the PoC skips all abstractions. Then we need to add a variant to something. Well, a single if/else is simpler than an abstract base class with two concrete implementations. Adding the 3rd as another if clause is simpler than refactoring all of them to an ABC structure. And so on.
“Simple” is relative. Investing in a little complexity now can save your ass later. Weighing this decision takes skill and experience
I think what matters more than the abstract class vs if statement dichotomy, is how well something maps the problem domain/data structures and flows.
Sure maybe its fast to write that simple if statement, but if it doesn't capture the deeper problem you'll just keep running head first into edge cases - whereas if you're modelling the problem in a good way it comes as a natural extension/interaction in the code with very little tweaking _and_ it covers all edge cases in a clean way.
I’m aware I’m about to be “that guy”, but I really like how Rich Hickey’s “Simple Made Easy” clarifies simplicity here. In that model, what you’re describing is easy, not simple.
The same people that pursue economic incentives are who I hear speaking about number of lines produced by developers as a useful metric. I sense a worrying trend toward more is better with respect to output, when the north star IMHO should be to make something only as complex as necessary, but as simple as possible. The best code is no code at all.
This might be great if it translates to agentic engineering and not just benchmarks.
It seems some of the gains from Opus 4.6 to 4.7 required more tokens, not less.
Maybe more interesting is that they’ve used codex to improve model inference latency. iirc this is a new (expectedly larger) pretrain, so it’s presumably slower to serve.
reply