Hacker Newsnew | past | comments | ask | show | jobs | submit | alecco's commentslogin

Long chats are mostly cache. For coding agents this is like >90%.

CoRecursive had a really good episode about this last August:

"Coding in the Red-Queen Era" https://corecursive.com/red-queen-coding/


Cool story, SEO bro.

Speak for yourself. I now dare to code much harder problems and learning is bliss. No more having to sit down to dig needle-in-haystack through horrible documentation or random Stack Overflow posts.

LLMs are a magnificent tool if you use them correctly. They enable deep work like nothing before.

The problem is the education system focused on passivity (obeyance), memorization, and standardized testing. And worst of all, aiming for the lowest common denominator. So most people are mentally lazy and go for the easy win, almost cheating. You get school and interview cheating and vivecoders.

But it's not the only way to use LLMs.

Similarly, in Wikipedia you can spend hours reading banal pop-slop content or instead spend that time reading amazing articles about history, literature, arts, and science.


Perhaps the approach to, and leverage from, using AI is different for someone who's been active on HN for two decades, and junior devs who've been brought up on iPhones in the flawed school system you're describing?

As TFA says, the problem is that accumulating knowledge takes time and effort, and the AI hype and expectations on LLM-assisted coding helps with rationalizing ever more short-sighted decisions that squander or hinder that process.


> Speak for yourself.

Even if you are the absolute unicorn who gets paid to "code much harder problems" and "learning", the rest of the industry exists to deliver actual products and services.

So unless you nurture some type of https://xkcd.com/208/ fantasy, this is not just about you. The industry as a whole needs to find a way to work with LLMs without automating programming away entirely, and the industry as a whole needs to find a way to ensure that newcomers are able to be productive even if code-generation tools are taken away from them.


> in Wikipedia you can spend hours reading banal pop-slop content or instead spend that time reading amazing articles about history, literature, arts, and science.

I'm not saying you're personally doing anything wrong, but there's a parallel here, when smart and curious people read articles about history and literature and art and science, rather than engaging directly with the real thing.

Or then the next level down, where creating amazing work in all of those domains depends on enough "slack" in the system for people to pursue deep work that will not be immediately profitable.

Do you see where I'm going with that? We (and I'm very much including myself: here I am on HN, instead of reading something more substantial) skim the (Wikipedia) surface, instead of diving truly deep. AIs (right now) are the ultimate surface-skimmers, and our fascination with and growing reliance on them reflects something in our current surface-skimming cultural mindset.


I meant it as a simple to understand parallel. Absolutely deep reading and thought is much better than Wikipedia or an LLM chat.

If 5090 has 32GB, and let's say somehow a 1-bit quantization is possible and you don't need more VRAM for anything else (forget KV cache etc), it would be able to fit a 256B 1-bit model. Just to picture it in extremes how unlikely this is.

And the active parameters come from the experts. For each token the model picks some experts to run the pass (usually 2 to 4, I haven't read V4's papers). It's not always the same experts.

OTOH, being DeepSeek, I foresee a bunch of V4 distilled FP8 models fitting in a 5090 with tiny batches and with performance close from 75 to 85% of V4. And this might be good enough for many everyday tasks.

Today is a good day for open models. Thank god for DeepSeek.


The paper is about previously unknown ways coffee affects the body.

Please don't slander the most open AI company in the world. Even more open than some non-profit labs from universities. DeepSeek is famous for publishing everything. They might take a bit to publish source code but it's almost always there. And their papers are extremely pro-social to help the broader open AI community. This is why they struggle getting funded because investors hate openness. And in China they struggle against the political and hiring power of the big tech companies.

Just this week they published a serious foundational library for LLMs https://github.com/deepseek-ai/TileKernels

Others worth mentioning:

https://github.com/deepseek-ai/DeepGEMM a competitive foundational library

https://github.com/deepseek-ai/Engram

https://github.com/deepseek-ai/DeepSeek-V3

https://github.com/deepseek-ai/DeepSeek-R1

https://github.com/deepseek-ai/DeepSeek-OCR-2

They have 33 repos and counting: https://github.com/orgs/deepseek-ai/repositories?type=all

And DeepSeek often has very cool new approaches to AI copied by the rest. Many others copied their tech. And some of those have 10x or 100x the GPU training budget and that's their moat to stay competitive.

The models from Chinese Big Tech and some of the small ones are open weights only. (and allegedly benchmaxxed) (see https://xcancel.com/N8Programs/status/2044408755790508113). Not the same.


DeepSeek's models are indeed open weight. Why do you feel that pointing this out would be considered slander?

I think they were reading GP's comment as a correction. Like "not open-source, just open weight". I'm not sure if their reading was accurate but I enjoyed their high effort comment nonetheless

X is full of "open weights!" corrections as a dog whistle by the anti-China crowd. And they are right about models from the Chinese Big Tech, but completely wrong about DeepSeek.

>> Truly open source coming from China.

> Open weight!

They clearly were implying it's not open source.


Correct. We have open-weight models from OpenAI, Facebook, Mistral, DeepSeek, Z.ai, MiniMax, and all sorts of other companies. Most of them have fantastic and open licensing terms.

If we can't build the weights, then we don't have the source. I'm not entirely sure what an open-source model would even look like, but I am confident that these binary blobs that we are loading into llama.cpp and vllm aren't the equivalent of source code. We have absolutely no idea what sort of data went into them.

This is fine. It isn't slanderous. It is what we have, and it is awesome. Just because it is awesome doesn't make it open source.


It’s not slander to say something true. These are open weights, not open source. They don’t provide the training data or the methodology requires to reproduce these weights.

So you can’t see what facts are pruned out, what biases were applied, etc. Even more importantly, you can’t make a slightly improved version.

This model is as open source as a windows XP installation ISO.


> These are open weights, not open source.

Did you even read my comment?


I did. Show me the source code.

> DeepSeek is famous for publishing everything. They might take a bit to publish source code but it's almost always there.

they-might-take-a-bit-to-publish


Related interesting find on Qwen.

"Qwen's base models live in a very exam-heavy basin - distinct from other base models like llama/gemma. Shown below are the embeddings from randomly sampled rollouts from ambiguous initial words like "The" and "A":"

https://xcancel.com/N8Programs/status/2044408755790508113


This makes a lot of my experience with Qwen make sense. I’ve watched all the benchmarks imply how close it should be to various GPT or Claude releases, but in my own use chatting with it or trying to get it do agentic tasks it was nowhere near as smart as even GPT-3.5 for example. Meanwhile Gemma 4 casually dropped and even the 4B models were performing better than Qwen 3.5 MOE in my chats. Benchmaxxing.

They don't have demand for the price it would require for inference.

They are definitely distilling it into a much smaller model and ~98% as good, like everybody does.


Some people are speculating that Opus 4.7 is distilled from Mythos due to the new tokenizer (it means Opus 4.7 is a new base model, not just an improved Opus 4.6)

The new tokenizer is interesting, but it definitely is possible to adapt a base model to a new tokenizer without too much additional training, especially if you're distilling from a model that uses the new tokenizer. (see, e.g., https://openreview.net/pdf?id=DxKP2E0xK2).

Not impossible, but you have to be at least a little bit mad to deploy tokenizer replacement surgery at this scale.

They also changed the image encoder, so I'm thinking "new base model". Whatever base that was powering 4.5/4.6 didn't last long then.


Yes, I was thinking that. But it could as well be the other way around. Using the pretrained 4.7 (1T?) to speed up ~70% Mythos (10T?) pretraining.

It's just speculative decoding but for training. If they did at this scale it's quite an achievement because training is very fragile when doing these kinds of tricks.


Reverse distillation. Using small models to bootstrap large models. Get richer signal early in the run when gradients are hectic, get the large model past the early training instability hell. Mad but it does work somewhat.

Not really similar to speculative decoding?

I don't think that's what they've done here though. It's still black magic, I'm not sure if any lab does it for frontier runs, let alone 10T scale runs.


> They don't have demand for the price it would require for inference.

citation needed. I find it hard to believe; I think there are more than enough people willing to spend $100/Mtok for frontier capabilities to dedicate a couple racks or aisles.


Apple got it right with unified memory with wide bus. That's why Mac Minis are flying for local models. But they are 10x less powerful in AI TOPS. And you can't upgrade the memory.

I really wish AMD and Intel boards get replaced by competent people. They could do it in very short time. Both have integrated GPUs with main memory. AMD and Intel have (or at least used to have) serious know-how in data buses and interconnects, respectively. But I don't see any of that happening.

ROCm? It can't even support decent Attention. It lacks a lot of features and NVIDIA is adding more each year. Soon they will reach escape velocity and nobody will catch them for a decade. smh


Aren't mac minis flying for "local models" because people have no clue what they are doing?

All those people who bought them for openclaw just bought them because it was the trendy thing to do. No one of those people is running local models on there.


> I really wish AMD and Intel boards get replaced by competent people.

Intel? Agreed. But AMD is making money hand over fist with enterprise AI stuff.

Right now, any effort that AMD or NVIDIA expend on the consumer sector is a waste of money that they could be spending making 10x more at the enterprise level on AI.


They aren't flying outside US, or countries with similar salary levels.

Granted, I feel like NVIDIA GPU pricing is such that Mac minis will be way less than 10x cheaper if not already, so one might still get ahead purchasing a bulk order of Mac minis....


A 5090 will cost you about the same amount of money as a Mac Studio M3 Ultra with eight times the RAM.

It's pretty insane how overpriced NVIDIA hardware is.


The 256GB Mac Studio (the one with "eight times the RAM") is listed for ~$2000 more than the current 5090 prices, and another additional $1500 for the 80-core GPU variant. Only the "base" model with 96gb is a remotely similar price, $3600-$4000.

And a 5090 has a little over 2x the memory bandwidth - ~820GB/s vs ~1790GB/s. And significantly higher peak FLOPS on the 5090 too.

Sure, if the goal is to get the "Cheapest single-device system with 256GB ram" it looks pretty good, but there's lots of other axes it falls down on. Great if you know you don't care about them, but not "Better In Every Way". Arguably, better in only a single way - but that single way may well be the one you need.

And the current 5090 price might be a transient peak - only three months ago they were closer to $2500 - significantly less than half the $6000 base-spec 256GB Mac Studio. While the Mac Studio has been constant.


Yes but the 5090 can run games.

Running games on my loaded M4 Max is worse than on my 3090 despite the over-four-year generational gap.

Like, Pacific Drive will reach maybe 30fps at less than 1080p whereas the 3090 will run it better even in 4K.

That could just be CrossOver's issue with Unreal Engine games, but "just play different games" is not a solution I like.


It seems like general improvements in ram efficiency, such as that used in Gemma 4, means it’s back to memory bandwidth as the bottleneck and less about total available memory size. I’m also curious to see how much more agent autonomy will reduce less need for low latency and shift the focus to more throughput. Meaning it’s easier to spread the model out over multiple smaller GPUs and use pipeline parallelism to keep them busy. This would also mean using ram for market discrimination becomes less effective.

But the 5090 can run Crysis


Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: