More

bcrosby95 · 2026-04-24T22:52:47 1777071167

The hilarious thing is LLMs tend not to say "I don't know", so it might find a reason, but if it doesn't, it will just make shit up.

bcrosby95 · 2026-04-22T21:34:15 1776893655

At least in elementary school I don't see the deficiency in common core math compared to what I had 30 years ago. My kid has been exposed to a wide variety of topics sooner than I was, and she's way stronger in word problems on top of that. Do people have a specific complaint with elementary school common core math that we should be teaching but aren't, or vice versa? Or is it more problematic later?

BrenBarn · 2026-04-22T22:01:13 1776895273

One thing I notice is there seem to be far more students who finish elementary school unable to comfortably do basic math in their head (stuff like 17+36 or 144 or even basic multiplication tables like 38).

bcrosby95 · 2026-04-22T21:30:11 1776893411

Off the top of my head I don't have problem with this, but the topic is about declining scores so I'm not sure how relevant this is.

bcrosby95 · 2026-04-20T21:57:51 1776722271

Unfortunately many people are humourphobic.

bcrosby95 · 2026-04-09T15:36:51 1775749011

I've been working on a client/server game in Unity the past few years and the LLM constantly forgets to update parts of the UI when I have it make changes. The codebase isn't even particularly large, maybe around 150k LOC in total.

A single complex change (defined as 'touching many parts') can take Claude code a couple hours to do. I could probably do it in a couple hours, but I can have Claude do it (while I steer it) while I also think about other things.

My current guess is that LLMs are really good at web code because its seen a shitload of it. My experience with it in arenas where there's less open source code has been less magical.

fooker · 2026-04-09T19:07:38 1775761658

I suspect you are not using plan mode?

bcrosby95 · 2026-04-09T15:27:46 1775748466

This is where the old line of "LLMs are just next token predictors" actually factors in. I don't know how you get a next token predictor that user input can't break out of. The answer is for the implementer to try to split what they can, and run pre/post validation. But I highly doubt it will ever be 100%, its fundamental to the technology.

miki123211 · 2026-04-09T16:29:40 1775752180

I think this is fundamental to any technology, including human brains.

Humans have a problem distinguishing "John from Microsoft" from somebody just claiming to be John from Microsoft. The reason why scamming humans is (relatively) hard is that each human is different. Discovering the perfect tactic to scam one human doesn't necessarily scale across all humans.

LLMs are the opposite; my Chat GPT is (almost) the same as your Chat GPT. It's the same model with the same system message, it's just the contexts that differ. This makes LLM jailbreaks a lot more scalable, and hence a lot more worthwhile to discover.

LLMs are also a lot more static. With people, we have the phenomenon of "banner blindness", which LLMs don't really experience.

lupire · 2026-04-09T16:39:39 1775752779

How are you defining "banner blindness"?

The foundation of LLMs is Attention.

warkdarrior · 2026-04-09T17:18:03 1775755083

"Banner blindness [...] describes people’s tendency to ignore page elements that they perceive (correctly or incorrectly) to be ads." https://www.nngroup.com/articles/banner-blindness-old-and-ne...

So people can focus their attention to parts of content, specifically parts they find irrelevant or adversarial (like ads). LLMs on the other hand pay attention to everything or if they focus on something, it is hard to steer them away from irrelevant or adversarial parts.

miki123211 · 2026-04-09T22:09:18 1775772558

Banner blindness is a phenomenon where humans build resistance to previously-effective ad formats, making them much less effective than they previously used to be.

You can find a "hook" to effectively manipulate people with advertising, but that hook gets less and less effective as it is exploited. LLMs don't have this property, except across training generations.

justinclift · 2026-04-09T22:35:34 1775774134

> I don't know how you get a next token predictor that user input can't break out of.

Maybe by adjusting the transformer model to have separate input layers for the control and data paths?

Wowfunhappy · 2026-04-09T23:31:50 1775777510

Maybe it's my failing but I can't imagine what that would look like.

Right now, you train an LLM by showing it lots of text, and tell it to come up with the best model for predicting the next word in any of that text, as accurately as possible across the corpus. Then you give it a chat template to make it predict what an AI assistant would say. Do some RLHF on top of that and you have Claude.

What would a model with multiple input layers look like? What is it training on, exactly?

justinclift · 2026-04-10T09:33:08 1775813588

> by showing it lots of text

When you're "showing it lots of text", where does that "show" bit happen? :)

salt4034 · 2026-04-09T16:30:50 1775752250

It's hard in general, but for instruct/chat models in particular, which already assume a turn-based approach, could they not use a special token that switches control from LLM output to user input? The LLM architecture could be made so it's literally impossible for the model to even produce this token. In the example above, the LLM could then recognize this is not a legitimate user input, as it lacks the token. I'm probably overlooking something obvious.

lupire · 2026-04-09T16:45:09 1775753109

Yes, and as you'd expect, this is how LLMs work today, in general, for control codes. But different elems use different control codes for different purposes, such as separating system prompt from user prompt.

But even if you tag inputs however your this is good, you can't force an LLM to it treat input type A as input type B, all you can do is try to weight against it! LLMs have no rules, only weights. Pre and post filters cam try to help, but they can't directly control the LLM text generation, they can only analyze and most inputs/output using their own heuristics.

bcrosby95 · 2026-04-08T04:03:33 1775621013

I wouldn't personally do so, but arguably those tens of thousands rest at our feet considering the current government was political blowback from the US and UK regime changing Iran back in the '50s.

It's even less likely to work because Trump has already claimed, publicly, to arming the protestors. That already makes any regime change illegitimate. They're all foreign backed agitators.

I bring it up because this shit is messy.

bcrosby95 · 2026-04-06T19:24:26 1775503466

Bad code works fine until it doesn't. In my experience, with humans, doing the right thing is worth it over doing the bad thing if your time horizon is a few months. Once you're in years, absolutely do the right thing, you're actually throwing time away if you don't. And I don't mean "big refactor", I mean at-change-time, when you think "this change feels like an icky hack."

For LLMs, I don't really know. I only have a couple years experience at that.

tokioyoyo · 2026-04-06T19:53:20 1775505200

If you make a working and functional bad code, and put it on maintenance mode, it can keep churning for decades with no major issues.

Everything depends on context. Most code written by humans is indeed, garbage.

bsder · 2026-04-06T22:25:10 1775514310

> Most code written by humans is indeed, garbage.

I think that this is the problem, actually.

It's similar to writing. Most people suck at writing so badly that the LLM/AI writing is almost always better when writing is "output".

Code is similar. Most programmers suck at programming so badly that LLM/AI production IS better than 90+% (possibly 99%+). Remember, a huge number of programmers couldn't pass FizzBuzz. So, if you demand "output", Claude is probably better than most of your (especially enterprise) programming team.

The problem is that the Claude usage flood is simply identifying the fact that things that work do so because there is a competent human somewhere in the review pipeline who has been rejecting the vast majority of "output" from your programming team. And he is now overwhelmed.

thesz · 2026-04-07T14:15:51 1775571351

  >  Most programmers suck at programming so badly that LLM/AI production IS better than 90+% (possibly 99%+).

How do you know?

bsder · 2026-04-07T19:52:55 1775591575

Because of just how many programmersI've interviewed who can't pass FizzBuzz?

I also taught upper level CS and my first assignment was always "You have 10 days. Here is a 10 line program on this sheet of paper. Type it in, check it into source control, and make the automated tests go green. Warning: start today."

1/3 of the class couldn't finish that task and would drop.

lucketone · 2026-04-06T20:56:47 1775509007

I define maintenance mode as: given over to different team, so not my problem anymore.

reese_john · 2026-04-06T21:20:29 1775510429

If you are a company founder, what scenario would you rather find yourself in?

a) a pristine, good codebase that follows the best coding practices, but it is built on top of bad specs, wrong data/domain model

b) a bad codebase but it correctly models and nails the domain model for your business case

Real life example, a fintech with:

a) a great codebase but stuck with a single-entry ledger

b) a bad codebase that perfectly implements a double-entry ledger

SpicyLemonZest · 2026-04-06T21:44:48 1775511888

"Perfectly implements" is doing a lot of work there. Enterprise software is very rarely perfect out of the box, and the issue with bad code is that it can make it extraordinarily hard to solve simple problems. I have personally seen tech-debt induced scenarios where "I want a new API to edit this field in an object" and "Let's do a dependency upgrade" respectively became multi-month projects.

reese_john · 2026-04-06T22:01:40 1775512900

> Perfectly implements" is doing a lot of work there. Enterprise software is very rarely perfect out of the box

Fair, by “perfectly implements” I meant to say that it correctly implemented the core invariant of a double entry ledger (debits = credits), not that it was 100% bug free

progmetaldev · 2026-04-07T04:58:03 1775537883

Since most won't actually deal with fintech (I don't know the stats on HN, but I'm talking devs as one industry), your first "a" example might actually be better than your first "b" example, depending on the complexity of the software. In lots (probably most) of industries, having a good codebase would mean architecture decisions were solid, but the domain/service layer is bad. Maybe my experiences don't match most of the HN crowd, but usually I get stuck with very detailed domain/service rules, but the architecture is a problem where too much memory or CPU is being used, just to abstract away the actual rules of the application (the purpose). Usually when I've been brought in to rebuild an application, the client is fine with the results, but they are upset over performance and/or cost to run the application. For anything of actual complexity, it's usually the supporting code that is the biggest failure, because complex apps usually have decent requirements. Now, if the requirements were bad, and the architecture was bad, AND the domain/service layer is bad, I don't know if there's anything to fix that.

devy · 2026-04-06T20:54:21 1775508861

> Bad code works fine until it doesn't.

Who is to judge the "good" or "bad" anyway?

thesz · 2026-04-07T14:21:15 1775571675

It is important to question "how to judge," not "who is to judge."

My answer of "how to judge?" question is the question "how easy is it to implement new unforeseen functionality with the code under scrutiny?"

strulovich · 2026-04-06T20:58:15 1775509095

And it’s perfectly okay to fix and improve the code later.

Many super talented developers I know will say “Make it work, then make it good”. I think it’s okay to do this on a bigger scale than just the commit cycle.

a96 · 2026-04-06T21:25:38 1775510738

https://wiki.c2.com/?MakeItWorkMakeItRightMakeItFast

Make it work, make it work right, make it work fast. In that order.

butlike · 2026-04-07T16:00:02 1775577602

But why not rewrite the app, change the name, and get shareholder value from a new product announcement? It shouldn't take a long time, the spec for the new product is the old product being rewritten.

See Google Hangouts > Google Chat

Aperocky · 2026-04-06T20:07:10 1775506030

The fix time horizon changes too, don't discard that.

peder · 2026-04-06T21:05:50 1775509550

But tech debt with vibe coding is fixed by just throwing more magic at it. The cost of tech debt has never been lower.

bcrosby95 · 2026-03-31T15:30:20 1774971020

Imagine thinking people losing their primary income source (usually 100% of it) is remotely comparable to the share price of a single company not going up 2%.

simianwords · 2026-03-31T16:05:14 1774973114

If you can’t lay off people then the economy won’t run and it affects everyone.

Sure you can show easy empathy for the employees but this is how economy runs. A static economy where layoffs are hard or punished will lose to a more dynamic one.

wiseowise · 2026-03-31T16:33:58 1774974838

> Sure you can show easy empathy for the employees but this is how economy runs. A static economy where layoffs are hard or punished will lose to a more dynamic one.

Is that why workers are generally happier in Europe even though on paper their economy loses?

pedroma · 2026-03-31T21:34:59 1774992899

I've always been skeptical of happiness statistics. In many cases, self-reporting happiness offers an objective floor for happiness, but the ceiling is entirely relative/subjective.

The floor is universal: starvation, suffering, death.

The ceiling...

For someone who's starving & facing death, would simply be good health, easy access to food, healthy family, house & car.

But the ceiling for someone who already has these things is different. The ceiling for a billionaire is different.

The only way I can imagine not doing this type of subjective self-reporting is... maybe you can draw blood from populations and record cortisol and oxytocin levels?

bcrosby95 · 2026-03-28T14:24:14 1774707854

Unfortunately since PG&E is a regulated utility it's not that simple.

bronson · 2026-03-28T21:54:53 1774734893

When you compare PG&E's electricity rates to the rest of the nation (and neighbors like SMUD), you can see that the CPUC isn't doing much.