More

jumploops · 2026-04-23T18:17:15 1776968235

> GPT‑5.5 improves on GPT‑5.4’s scores while using fewer tokens.

This might be great if it translates to agentic engineering and not just benchmarks.

It seems some of the gains from Opus 4.6 to 4.7 required more tokens, not less.

Maybe more interesting is that they’ve used codex to improve model inference latency. iirc this is a new (expectedly larger) pretrain, so it’s presumably slower to serve.

beering · 2026-04-23T18:30:11 1776969011

With Opus it’s hard to tell what was due to the tokenizer changes. Maybe using more tokens for the same prompt means the model effectively thinks more?

conradkay · 2026-04-23T18:29:54 1776968994

They say latency is the same as 5.4 and 5.5 is served on GB200 NVL72, so I assume 5.4 was served on hopper.

jumploops · 2026-04-22T04:28:17 1776832097

Looks like analog clocks work well enough now, however it still struggles with left-handed people.

Overall, quite impressed with its continuity and agentic (i.e. research) features.

jumploops · 2026-04-20T22:00:56 1776722456

I've noticed this trend most heavily on Reddit.

Some communities are very pro-AI, adding AI summary comments to each thread, encouraging AI-written posts, etc.[0]

Many subreddits are AI cautious[1][2], and a subset of those are fully anti-AI[3].

Apart from these "AI-focused" communities, it seems each "traditional" subreddit sits somewhere on the spectrum (photographers dealing with AI skepticism of their work[4], programmers mostly like it but still skeptical[5]).

[0]https://www.reddit.com/r/vibecoding/

[1]https://www.reddit.com/r/isthisAI/

[2]https://www.reddit.com/r/aiwars/

[3]https://www.reddit.com/r/antiai/

[4]https://www.reddit.com/r/photography/comments/1q4iv0k/what_d...

[5]https://www.reddit.com/r/webdev/comments/1s6mtt7/ai_has_suck...

lxgr · 2026-04-20T22:18:13 1776723493

Reddit (and more generally, human) groupthing in a nutshell. "Quick, clearly position yourself on this one-dimensional line (or maybe even better sort yourself into one of these two sets) so we don't have to engage in that pesky nuance thing!"

jumploops · 2026-04-20T23:31:27 1776727887

Yes, groupthink certainly seems to be pushing each community into the false dichotomy of AI good/bad, even if it's still early days.

Another example from `r/bayarea` where the author is OK with AI but the top comments are increasingly wary of its potential for harm[0]

[0]https://www.reddit.com/r/bayarea/comments/1sp8wvz/is_it_just...

jumploops · 2026-04-13T20:06:15 1776110775

> How can we throw away years of work?

This trap has killed many startups, well before AI.

Now that code is cheaper to write, hopefully it becomes less of a problem?

In either case, founders should never fall in love with their solutions.

grtteee · 2026-04-13T20:15:06 1776111306

It’s easier to view it in terms of DCF - the value of a cash flow generating asset = present value of expected cash flows discounted back at a risk discount adjusted rate. In other words what you’ve invested into your existing assets is irrelevant - the cash flows generated by them and the growth assets through future investment, is what matters.

jumploops · 2026-04-10T09:57:42 1775815062

It's ugly[0] and I haven't checked it deeply for correctness, but you should get the gist (:

I hate vibecoding. The cognitive toll is higher than you expect, the days feel fast, but the weeks move slowly.

With that said, these are the new compilers. Hopefully they make some software better[1] even with the massive increase in slop.

[0]https://gist.github.com/jumploops/b8e6cbbce7d24993cdd2fe2425...

[1]https://red.anthropic.com/2026/mythos-preview/

jumploops · 2026-04-10T09:37:21 1775813841

I don't know about a new Git, but GitHub feels like the cruftiest part of agentic coding.

The Github PR flow is second nature to me, almost soothing.

But it's also entirely unnecessary and sometimes even limiting to the agent.

jumploops · 2026-04-08T21:52:48 1775685168

Not to be that agentic coding guy, but I think this will become less of a problem than our historic biases suggest.

For context, I just built a streaming markdown renderer in Swift because there wasn’t an existing open source package that met my needs, something that would have taken me weeks/months previously (I’m not a Swift dev).

Porting all the C libraries you need isn’t necessarily an overnight task, but it’s no longer an insurmountable mountain in terms of dev time.

MattDamonSpace · 2026-04-08T21:59:21 1775685561

My favorite part is the AI will still estimate projects in human-time.

“You’re looking at a multi-week refactor” aaaaand it’s done

nothinkjustai · 2026-04-09T00:08:23 1775693303

Yeah lol. “I estimate this will take 15-20 days” I do it in like 5 hours lol

randomNumber7 · 2026-04-09T07:08:06 1775718486

It's not necessary to rewrite perfectly fine libraries written by exceptional programmers. And whoever thinks it is an easy task (sorry rust guys) is severely suffering from the dunning-kruger effect.

buzzerbetrayed · 2026-04-09T00:43:16 1775695396

Very high quality comment that is being downvoted unfairly because it defends AI. HN is on the wrong side of history on this one.

inunuftbkj · 2026-04-09T16:29:13 1775752153

Nah. We’re right on the money with this one. AI is a nice tool to have available, but you AI nuts are the ones being voluntarily and gladly fed the whole “you’re a bazillion times more productive with our AI!!!!” marketing spiel.

It’s a nice tool, nothing more, nothing less. Anything else is marketing nonsense.

jumploops · 2026-04-07T18:28:47 1775586527

> In a few rare instances during internal testing (<0.001% of interactions), earlier versions of Mythos Preview took actions they appeared to recognize as disallowed and then attempted to conceal them.

> after finding an exploit to edit files for which it lacked permissions, the model made further interventions to make sure that any changes it made this way would not appear in the change history on git

Mythos leaked Claude Code, confirmed? /s

jumploops · 2026-03-31T20:00:32 1774987232

“John Ousterhout [..] argues that good code is:

- Simple and easy to understand

- Easy to modify”

In my career at fast-moving startups (scaling seed to series C), I’ve come to the same conclusion:

> Simple is robust

I’m sure my former teams were sick of me saying it, but I’ve found myself repeating this mantra to the LLMs.

Agentic tools will happily build anything you want, the key is knowing what you want!

jfreds · 2026-03-31T20:39:34 1774989574

My issue with this is that a simple design can set you up for failure if you don’t foresee and account for future requirements.

Every abstraction adds some complexity. So maybe the PoC skips all abstractions. Then we need to add a variant to something. Well, a single if/else is simpler than an abstract base class with two concrete implementations. Adding the 3rd as another if clause is simpler than refactoring all of them to an ABC structure. And so on.

“Simple” is relative. Investing in a little complexity now can save your ass later. Weighing this decision takes skill and experience

miningape · 2026-04-01T00:11:36 1775002296

I think what matters more than the abstract class vs if statement dichotomy, is how well something maps the problem domain/data structures and flows.

Sure maybe its fast to write that simple if statement, but if it doesn't capture the deeper problem you'll just keep running head first into edge cases - whereas if you're modelling the problem in a good way it comes as a natural extension/interaction in the code with very little tweaking _and_ it covers all edge cases in a clean way.

blanched · 2026-04-01T00:36:03 1775003763

I’m aware I’m about to be “that guy”, but I really like how Rich Hickey’s “Simple Made Easy” clarifies simplicity here. In that model, what you’re describing is easy, not simple.

mememememememo · 2026-03-31T20:47:17 1774990037

Yes. Which is why "I generated X lines of code" "I used a billion tokens this month" sound stupid to me.

Like I used 100 gallons of petrol this month and 10 kilos of rabbit feed!

julenx · 2026-04-01T07:48:02 1775029682

The same people that pursue economic incentives are who I hear speaking about number of lines produced by developers as a useful metric. I sense a worrying trend toward more is better with respect to output, when the north star IMHO should be to make something only as complex as necessary, but as simple as possible. The best code is no code at all.

sph · 2026-03-31T21:13:23 1774991603

People use stupid metrics like those because more useful ones, like "productivity" or "robustness" are pretty much impossible to objectively measure.

mememememememo · 2026-04-01T12:04:46 1775045086

And because the other easy one, revenue, is not so impressive.

jumploops · 2026-03-19T08:45:07 1773909907

Funnily enough, with the most recent models (having reduced sycophancy), putting in the wrong assumptions often still leads to the right output.