More

dmazin · 2026-04-24T19:07:00 1777057620

Has anyone actually heard Eno at the airport? What is it like? Does it actually calm you?

cholantesh · 2026-04-24T19:41:12 1777059672

I was hoping to see discussion of this - to my knowledge it was sold to a few airports who removed it after it was poorly received: https://www.cambridge.org/core/journals/twentieth-century-mu...

Personally 1/1 has been absolutely sublime for me as a tool for meditation, but I don't know that I could imagine it in an airport.

dmazin · 2026-04-25T07:31:05 1777102265

Yeah, 1/1 is one song I always keep downloaded to my phone for this reason!

Thanks for that article, love to read about well intended design being poorly received.

cholantesh · 2026-04-27T20:15:06 1777320906

Enjoy; I especially liked learning of the 'rebuttal' album that the Black Dog released lol

have_faith · 2026-04-24T19:42:53 1777059773

No, but I’ve heard Aphex Twin in an aquarium once. Bristol (UK) for anyone interested, which fits.

dmazin · 2026-04-25T07:27:43 1777102063

Was it in that kind of touristy area filled with children? I didn’t think to go in.

Is this a regular thing?!

dmazin · 2026-04-21T05:32:52 1776749572

I got sick of the inconsistency caused by Anthropic tinkering with Claude Code and had canceled my 20x. My plan was to switch to Codex so I could use it in Pi.

I am specifically talking about switching because of the harness, not model quality. Anyone else match my experience?

I wonder how many other people recently did the same. It would be prudent of Anthropic to let people use Pro/Max OAuth tokens with other harnesses I think. Even though I get why they want to own the eyeballs.

redrove · 2026-04-21T05:41:26 1776750086

I’ve been using Codex Pro since they lobotomized Opus 4.6. Codex is so much better, GPT 5.4 xhigh fast is definitely the smartest and fastest model available.

For a while there I had both Opus 4.6 and Codex access and I frequently pitted them against each other, I never once saw Opus come out ahead. Opus was good as a reviewer though, but as an implementer it just felt lazy compared to 5.4 xhigh.

One feature that I haven’t seen discussed that much is how codex has auto-review on tool runs. No longer are you a slave to all or nothing confirmations or endless bugging, it’s such a bad pattern.

Even in a week of heavy duty work and personal use I still haven’t been able to exhaust the usage on the $200 plan.

I’ll probably change my mind when (not IF) OpenAI rug pull, but for spring ‘26, codex is definitely the better deal.

walthamstow · 2026-04-21T07:12:28 1776755548

I also made the switch to OpenAI, the $20 plan, I dunno about "so much better" but it's more or less the same, which is great!

The models and tools levelling out is great for users because the cost of switching is basically nil. I'm reading people ITT saying they signed up for a year - big mistake. A year is a decade right now.

redrove · 2026-04-21T07:54:15 1776758055

I underscored using xhigh + fast mode when saying it’s so much better.

Now with Opus 4.7 of course the “burden” of adjusting reasoning effort has been taken away from you even at the API level.

In my experience people don’t change the thinking level at all.

sitkack · 2026-04-21T08:25:27 1776759927

What issues did you consider about sending your code base to OpenAI?

walthamstow · 2026-04-21T10:31:03 1776767463

None mate. Code is cheap, it's not worth anything any more, especially not my little personal projects

Scotchy · 2026-04-21T05:45:00 1776750300

Any alternative to Claude Design ? Tried Figma with Opus 4.6 but it doesn't come close in my experience.

Codex is abysmal for UI design imo.

dgb23 · 2026-04-21T05:57:48 1776751068

It really depends on what you‘re trying to do and what your skillset is.

But if you go information architecture first and have that codified in some way (espescially if you already have the templates), then you can nudge any agent to go straight into CSS and it will produce something reasonable.

joelmanner · 2026-04-21T14:38:32 1776782312

I've been using paper.design and it's been working well for me via mcp on claude code

makingstuffs · 2026-04-21T07:41:39 1776757299

Have you tried stitch.withgoogle.com?

freedomben · 2026-04-21T09:58:59 1776765539

Thanks for the tip! Hadn't seen that, but definitely giving it a try.

gbalduzzi · 2026-04-21T06:06:58 1776751618

I created some decent prototypes with stitch but I don't know how it compares to claude design

freedomben · 2026-04-21T09:58:33 1776765513

stitch.withgoogle.com

StrangeSound · 2026-04-21T09:59:50 1776765590

Google Stitch

tommica · 2026-04-21T05:37:01 1776749821

I left anthropic a while ago because of the similar shenanigans they had earlier. I went with opencode & zen.

I still have their subscription, but am using pi now, mainly because something happened that made my opencode sessions unusable (cannot continue them, just blanks out, I assume something in the sqlite is fucked), and I cannot be bothered to debug it.

For what I use the agents, the Chinese models are enough

hboon · 2026-04-21T05:42:17 1776750137

Doesn't using pi be against their terms of use about having to go through Claude Code cli for all Max plan usage? (I had use Droid with Max previously, it was a great combo).

the_mitsuhiko · 2026-04-21T08:22:53 1776759773

It's unclear right now. The current stance is that using pi or other coding harnesses eats into extra usage and that is the behavior one sees today. We have added a hint to pi now that warns you when you use an anthropic sub.

hboon · 2026-04-21T08:33:28 1776760408

Thanks for the great work.

tommica · 2026-04-21T06:21:46 1776752506

Probably - it was that kind of confusion that resulted in me switching providers.

Plus I like being able to switch a model.

resonious · 2026-04-21T05:46:59 1776750419

I also cancelled my 20x and switched to Codex. At this point even the Codex CLI seems to perform better than Claude Code... And so far I'm on the OpenAI Pro plan and haven't even needed to upgrade to their $100/mo plan. I'm getting more value for almost 10x cheaper.

hboon · 2026-04-21T06:17:06 1776752226

I switched to Droid+Opus (with Claude Max) many months ago and it was my favorite combo.

Had to stop because they don't like us proxying requests anymore.

serial_dev · 2026-04-21T06:16:58 1776752218

My experience is the opposite of this thread's consensus. Context: Full time SWE, working on large and messy codebase. Not working on crazy automations, working on fixing bugs, troubleshooting crashes, implementing features.

Anthropic models write much better code, they are easy to follow, reasonable and very close to what I would done if I had the time... OpenAI's on the other hand generate extremely complex solutions to the simplest problems.

I was so disappointed by non-Anthropic models, that for a couple of weeks I only used Anthropic models, but based on this thread, I'll go back and give it another try. It's good to go back and try things again every couple of weeks.

Of course, I was annoyed that they lobotomized 4.6, the difference was day and night, and Anthropic is certainly not a company I trust. In my opinion, it shows their willingness to rugpull, so I'm looking at other approaches. Since 4.7, things went back to normal, things you'd expect to work just work.

yokoprime · 2026-04-21T20:26:31 1776803191

I feel like Opus 4.7 vs GPT 5.4 is pretty much just flavor variants, the big difference is in the harness. I like the Claude Code CLI better than the Codex CLI, it just clicks with how I like to interact with agents. The codex app on the other hand is better than the Claude app in code view, so if I had to stick to an app it would be codex all the way.

athrowaway3z · 2026-04-21T16:25:02 1776788702

I've been on pi for a few months now, build a custom tmux plugin so i can use nested pi and mix and match codex / claude instances.

pi has been the better harness out of all the ones i tried, first and third party.

Ever since the Anthropic block i've just canceled all my claude subs. Used to be codex was a bit worse, now they're practically equal. Claude is slightly better at directing other agents but the difference is too minor and not worth the money.

Claude usage limits / costs are absurd.

Any 'principles' people praise anthropic for are not that relevant to me anyways because i'm not a US citizen.

ai-tamer · 2026-04-21T13:50:05 1776779405

(Disclosure: I work on tamer, an OSS supervisor for coding agents — biased.) Add one more to the count. The OAuth-across-harnesses idea would help, but it doesn't fix the shape of the problem. "Harness" has always felt off to me. Exoskeleton is closer — Claude Code, Codex, opencode wrap the model and augment it from the inside. What's missing is a layer above that's explicitly not an exoskeleton: a thin supervisor. A master that watches and guides, nothing more. It just relays I/O and hands approval back to the human.

uvu · 2026-04-21T06:30:26 1776753026

Same, I am from 5x plan and cancel and switched to codex as I want to use Pi.

KronisLV · 2026-04-21T06:48:44 1776754124

> I wonder how many other people recently did the same.

Some negative signal for better overall view on things: I'm still with Anthropic and will probably stay with them for the foreseeable future.

I think after DoD/DoW shenanigans (which in of itself felt like a reasonable take on the part of Anthrpic) they got a bunch of visibility and new users, so them hitting some scaling limits is pretty much inevitable - so some service disruption is inevitable. Couple this with the tokenizer changes and seeming decrease in model performance (adaptive thinking etc.), and lots of people will be rightfully pissed off, alongside increased downtime (doesn't matter that much for me, definitely does matter for anything time-sensitive).

At the same time, in practice I've only seen it do stupid things across 8 million tokens about 5 times (confusing user/assistant roles, not reading files that should be obvious for a given use case, and picking trivially wrong/stupid solutions when planning things), alongside another 4 times that tests/my ProjectLint tool caught that I would have missed. The error rate is still arguably lower than mine, though I work in a very well known and represented domain (webdev with a bunch of DevOps and also some ML stuff, and integration with various APIs etc.).

At the same time, the 85 EUR they gave to me for free has been enough to weather the instability in regards to pricing changes and peak usage. They've fixed most of the issues I had with Claude Code (notably performance), and the sub-agent support is great and it's way better than OpenCode in my experience. They also keep shipping new features that are pretty nice, like Dispatch and Routines and Design, those features also seem nice and not like something completely misdirected, so that's nice. The Opus 4.7 model quality with high reasoning is actually pretty nice as well and works better than most of the other models I've tried (OpenAI ones are good, I just prefer Claude phrasing/language/approaches/the overall vibe, not even sure what I'd call it exactly, all the stuff in addition to the technical capabilities).

At the same time, if they mess too much with the 100 USD tier, I bet I could go to OpenAI or try out the GLM 5.1 subscription without too many issues. For now they're replacing all the other providers for me. Oh also I find the subscription vs API token-based payment approach annoying, but I guess that's how they make their money.

benjx88 · 2026-04-21T05:58:39 1776751119

Because the Harness is the Moat and key IP not the Models themselves that is the why! now for both OpenAI and Anthropic with all their money raised and the compute they acquire and have in the books of course no one can easily replicate, whom can afford all those datacenters and Nvidia GPUs interconnected is why OpenAI throws you a bone and gives you an Open Source SDK Harness but not the one they actually use for ChatGPT. But now both of them have to deliver and do all the bull-shet they said this models can do... truth is they cannot. So now the bubbles burst and we will see what happens. We all have to buy iPhones or MacBooks so that makes sense, we all use Chrome or Google Search, Instagram, TikTok.

All these models and agents are shortcuts for all of us to be lazy and play games and watch YouTube or Netflix because we use them to work-less, well the party will be over soon.

dmazin · 2026-04-17T16:39:58 1776443998

Agreed.

That said if this bothers you I highly recommend not looking up how many Space Shuttle missions are classified.

dmazin · 2026-04-16T21:45:27 1776375927

Constraints can lead to innovation. Just two things that I think will get dramatically better now that companies have incentive to focus on them:

* harness design

* small models (both local and not)

I think there is tremendous low hanging fruit in both areas still.

com2kid · 2026-04-16T22:01:42 1776376902

China already operates like this. Low cost specialized models are the name of the game. Cheaper to train, easy to deploy.

The US has a problem of too much money leading to wasteful spending.

If we go back to the 80s/90s, remember OS/2 vs Windows. OS/2 had more resources, more money behind it, more developers, and they built a bigger system that took more resources to run.

Mac vs Lisa. Mac team had constraints, Lisa team didn't.

Unlimited budgets are dangerous.

tasoeur · 2026-04-17T08:04:10 1776413050

Though I do agree with you, I just came back from a trip to China (Shanghai more specifically) and while attending a couple AI events, the overwhelming majority of people there were using VPNs to access Claude code and codex :-/

coldtea · 2026-04-17T09:34:49 1776418489

Parent's point was about deployment, not agentic coding.

jeffhwang · 2026-04-17T15:07:29 1776438449

On the Mac vs Lisa team, I generally agree but wasn't there a strong tension on budget vs revenue on Mac vs Apple II? And that Apple II had even more constrained budget per machine sold which led to the conflict between Mac and Apple II teams. (Apple II team: "We bring in all the revenue+profit, we offer color monitors, we serve businesses and schools at scale. Meanwhile, Steve's Mac pirate ship is a money pit that also mocks us as the boring Navy establishment when we are all one company!")

By the logic of constraints (on a unit basis), Apple II should have continued to dominate Mac sales through the early 90s but the opposite happened.

phist_mcgee · 2026-04-17T06:59:20 1776409160

Perhaps its because american hyperscalers want unlimited upside for their capital?

jackcviers3 · 2026-04-18T10:01:05 1776506465

It has been a very bad bet that hardware will not evolve to exceed the performance requirements of today's software tomorrow, just as it is a bad bet that tomorrow someone will rewrite today's software to be slower.

yurishimo · 2026-04-18T10:11:24 1776507084

Eh, but then as hardware evolves, the software will also follow suit. We’ve had an explosion of compute performance and yet software is crawling for the same tasks we did a decade ago.

Better hardware ensures that software that is “finished” today will run at acceptable levels of performance in the future, and nothing more.

I think we won’t see software performance improve until real constraints are put on the teams writing it and leaders who prioritize performance as a North Star for their product roadmap. Good luck selling that to VCs though.

busfahrer · 2026-04-17T09:16:27 1776417387

> Low cost specialized models

Can you elaborate on this? Is this something that companies would train themselves?

tempoponet · 2026-04-17T16:19:07 1776442747

You can fine-tune a model, but there are also smaller models fine-tuned for specific work like structured output and tool calling. You can build automated workflows that are largely deterministic and only slot in these models where you specifically need an LLM to do a bit of inference. If frontier models are a sledgehammer, this approach is the scalpel.

A common example would be that people are moving tasks from their OpenClaw setup off of expensive Anthropic APIs onto cheaper models for simple tasks like tagging emails, summarizing articles, etc.

Combined with memory systems, internal APIs, or just good documentation, a lot of tasks don't actually require much compute.

cesarvarela · 2026-04-16T22:05:03 1776377103

Harness is a big one, Claude Code still has trouble editing files with tabs. I wonder how many tokens per day are wasted on Claude attempting multiple times to edit a file.

lpcvoid · 2026-04-17T06:37:15 1776407835

The future is now, I guess

aldanor · 2026-04-17T12:20:08 1776428408

Yep.

As a recent example in AI space itself. China had scarce GPU resources, quite obvious why => DeepSeek training team had to invent some wheels and jump through some hoops => some of those methods have since become 'industry standard' and adopted by western labs who are now jumping through the same hoops despite enjoying massive computeresources, for the sake of added efficiency.

drra · 2026-04-17T07:23:32 1776410612

Absolutely. Anyone working on inference token level knows how wasteful it all is especially in multimodal tokens.

christkv · 2026-04-17T06:12:00 1776406320

Could not agree more, this will spur innovation in all aspects of local models is my hunch.

dataviz1000 · 2026-04-16T22:14:41 1776377681

What do you mean by harness here?

Ifkaluva · 2026-04-16T22:22:05 1776378125

When you go to the command line and type “Claude”, there is an LLM, and everything else is the harness

dataviz1000 · 2026-04-16T23:03:52 1776380632

I'm having an hard time getting my mind to see this.

> Users should re-tune their prompts and harnesses accordingly.

I read this in the press release and my mind thought it meant test harness. Then there was a blog post about long running harnesses with a section about testing which lead me to a little more confusion.

Yes, the word 'harness' is consistently used in the context as a wrapper around the LLM model not as 'test harness'.

dboreham · 2026-04-17T07:11:01 1776409861

This field is chock full of people using terms incorrectly, defining new words for things that already had well known names, overloading terms already in use. E.g. shard vs partition. TUI which already meant "telephony user interface ". "Client" to mean "server" in blockchain.

suttontom · 2026-04-18T15:23:01 1776525781

Some people also call evaluations "tests". There are unexpected things that come along with new models, like the model in a workflow you'd set up suddenly starts calling a tool and never stops or decides to no longer call a particular tool, so running your existing evaluations to catch regressions like this and potentially updating the prompts is considered "testing" your prompts and harnesses.

kreig · 2026-04-17T18:57:59 1776452279

I understood this concept with this simple equation: Agent = LLM + harness

ElFitz · 2026-04-17T08:35:50 1776414950

It’s the tool that calls the model, give it access to the local file system, calls the actual tools and commands for the model, etc, and provide the initial system prompt.

Basically a clever wrapper around the Anthropic / OpenAI / whatever provider api or local inference calls.

codybontecou · 2026-04-16T22:29:01 1776378541

pi vs. claude code vs. codex These are all agent harnesses which run a model (in pi's case, any model) with a system prompt and their own default set of tools.

dmazin · 2026-04-01T17:43:18 1775065398

why do half the comments here read like ai trying to boost some sort of scam?

Capricorn2481 · 2026-04-01T22:17:22 1775081842

Because there's absolutely nothing stopping that from happening. There are bots on Reddit, there are of course bots on here, a VPN friendly site where you don't even need an email. But a lot of people don't want to admit it.

dmazin · 2026-03-31T07:09:52 1774940992

Maciej now has a Mars newsletter, which I obviously subscribed to immediately: https://mceglowski.substack.com/

I didn’t even have a strong interest in space before the dude started writing about it. Maciej could write about literal rocks and make it worthwhile to read.

cubefox · 2026-03-31T08:00:34 1774944034

I just read one blog post ("Musk on Mars") and it was indeed excellent. He seems to have quite a small readership though, judging from the Substack reactions.

decimalenough · 2026-03-31T08:23:29 1774945409

It's subscribers only and costs $5/month.

cubefox · 2026-03-31T13:42:48 1774964568

Yeah, though some posts a free. I think real problem is that he decided to start a Mars blog two weeks before SpaceX announced they are now focusing on the moon instead, and prior to that merging with xAI, effectively cancelling any Mars plans.

dmazin · 2026-03-27T12:44:15 1774615455

Well, it should only update what it says: security updates (from official Ubuntu sources) unless you change the configuration.

dmazin · 2026-03-19T20:09:58 1773950998

This is a lot less of a story than it seems.

It makes it sound like a rogue AI hacked Meta.

Instead, the "wild" thing here is that someone let an agent speak on their behalf with no review. The agent posted inaccurate instructions which someone else followed.

Those instructions lead to a brief gap in internal ACL controls, sounds like. I'm sorry, but given that the US government gave 14 year olds off incel Discords full access to Social Security data, this is not shocking by comparison.

To be clear, it is dumb and rude to let an agent speak on your behalf _without even reviewing it_.

This will eventually lead to a bigger snafu, of course. Security teams should control or at least review the agent permissions of every installation. Everyone is adopting this stuff, and a whole lot of people are going to set it up lazily/wrong (yolo mode at work).

BoneShard · 2026-03-19T22:24:28 1773959068

Yeah, a nothingburger for clicks.

dmazin · 2026-02-16T07:41:02 1771227662

Me: “I want to wash my car. The car wash is 50 meters away. Should I walk or drive?” Opus 4.6, without searching the web: “Drive. You’re going to a car wash. ”

dmazin · 2026-02-12T16:00:24 1770912024

If you can get your hands on it, I recommend Other Networks: A Radical Technology Sourcebook by the same author. She covers barbed wire as well as many other ways to communicate. The book itself is gorgeous.

Luc · 2026-02-12T16:08:08 1770912488

It's on archive.org: https://archive.org/details/emerson-lori-other-networks-a-ra...