Speak for yourself. I now dare to code much harder problems and learning is bliss. No more having to sit down to dig needle-in-haystack through horrible documentation or random Stack Overflow posts.
LLMs are a magnificent tool if you use them correctly. They enable deep work like nothing before.
The problem is the education system focused on passivity (obeyance), memorization, and standardized testing. And worst of all, aiming for the lowest common denominator. So most people are mentally lazy and go for the easy win, almost cheating. You get school and interview cheating and vivecoders.
But it's not the only way to use LLMs.
Similarly, in Wikipedia you can spend hours reading banal pop-slop content or instead spend that time reading amazing articles about history, literature, arts, and science.
Perhaps the approach to, and leverage from, using AI is different for someone who's been active on HN for two decades, and junior devs who've been brought up on iPhones in the flawed school system you're describing?
As TFA says, the problem is that accumulating knowledge takes time and effort, and the AI hype and expectations on LLM-assisted coding helps with rationalizing ever more short-sighted decisions that squander or hinder that process.
Even if you are the absolute unicorn who gets paid to "code much harder problems" and "learning", the rest of the industry exists to deliver actual products and services.
So unless you nurture some type of https://xkcd.com/208/ fantasy, this is not just about you. The industry as a whole needs to find a way to work with LLMs without automating programming away entirely, and the industry as a whole needs to find a way to ensure that newcomers are able to be productive even if code-generation tools are taken away from them.
> in Wikipedia you can spend hours reading banal pop-slop content or instead spend that time reading amazing articles about history, literature, arts, and science.
I'm not saying you're personally doing anything wrong, but there's a parallel here, when smart and curious people read articles about history and literature and art and science, rather than engaging directly with the real thing.
Or then the next level down, where creating amazing work in all of those domains depends on enough "slack" in the system for people to pursue deep work that will not be immediately profitable.
Do you see where I'm going with that? We (and I'm very much including myself: here I am on HN, instead of reading something more substantial) skim the (Wikipedia) surface, instead of diving truly deep. AIs (right now) are the ultimate surface-skimmers, and our fascination with and growing reliance on them reflects something in our current surface-skimming cultural mindset.
If 5090 has 32GB, and let's say somehow a 1-bit quantization is possible and you don't need more VRAM for anything else (forget KV cache etc), it would be able to fit a 256B 1-bit model. Just to picture it in extremes how unlikely this is.
And the active parameters come from the experts. For each token the model picks some experts to run the pass (usually 2 to 4, I haven't read V4's papers). It's not always the same experts.
OTOH, being DeepSeek, I foresee a bunch of V4 distilled FP8 models fitting in a 5090 with tiny batches and with performance close from 75 to 85% of V4. And this might be good enough for many everyday tasks.
Today is a good day for open models. Thank god for DeepSeek.
Please don't slander the most open AI company in the world. Even more open than some non-profit labs from universities. DeepSeek is famous for publishing everything. They might take a bit to publish source code but it's almost always there. And their papers are extremely pro-social to help the broader open AI community. This is why they struggle getting funded because investors hate openness. And in China they struggle against the political and hiring power of the big tech companies.
And DeepSeek often has very cool new approaches to AI copied by the rest. Many others copied their tech. And some of those have 10x or 100x the GPU training budget and that's their moat to stay competitive.
I think they were reading GP's comment as a correction. Like "not open-source, just open weight". I'm not sure if their reading was accurate but I enjoyed their high effort comment nonetheless
X is full of "open weights!" corrections as a dog whistle by the anti-China crowd. And they are right about models from the Chinese Big Tech, but completely wrong about DeepSeek.
Correct. We have open-weight models from OpenAI, Facebook, Mistral, DeepSeek, Z.ai, MiniMax, and all sorts of other companies. Most of them have fantastic and open licensing terms.
If we can't build the weights, then we don't have the source. I'm not entirely sure what an open-source model would even look like, but I am confident that these binary blobs that we are loading into llama.cpp and vllm aren't the equivalent of source code. We have absolutely no idea what sort of data went into them.
This is fine. It isn't slanderous. It is what we have, and it is awesome. Just because it is awesome doesn't make it open source.
It’s not slander to say something true. These are open weights, not open source. They don’t provide the training data or the methodology requires to reproduce these weights.
So you can’t see what facts are pruned out, what biases were applied, etc. Even more importantly, you can’t make a slightly improved version.
This model is as open source as a windows XP installation ISO.
"Qwen's base models live in a very exam-heavy basin - distinct from other base models like llama/gemma. Shown below are the embeddings from randomly sampled rollouts from ambiguous initial words like "The" and "A":"
This makes a lot of my experience with Qwen make sense. I’ve watched all the benchmarks imply how close it should be to various GPT or Claude releases, but in my own use chatting with it or trying to get it do agentic tasks it was nowhere near as smart as even GPT-3.5 for example. Meanwhile Gemma 4 casually dropped and even the 4B models were performing better than Qwen 3.5 MOE in my chats. Benchmaxxing.
Some people are speculating that Opus 4.7 is distilled from Mythos due to the new tokenizer (it means Opus 4.7 is a new base model, not just an improved Opus 4.6)
The new tokenizer is interesting, but it definitely is possible to adapt a base model to a new tokenizer without too much additional training, especially if you're distilling from a model that uses the new tokenizer. (see, e.g., https://openreview.net/pdf?id=DxKP2E0xK2).
Yes, I was thinking that. But it could as well be the other way around. Using the pretrained 4.7 (1T?) to speed up ~70% Mythos (10T?) pretraining.
It's just speculative decoding but for training. If they did at this scale it's quite an achievement because training is very fragile when doing these kinds of tricks.
Reverse distillation. Using small models to bootstrap large models. Get richer signal early in the run when gradients are hectic, get the large model past the early training instability hell. Mad but it does work somewhat.
Not really similar to speculative decoding?
I don't think that's what they've done here though. It's still black magic, I'm not sure if any lab does it for frontier runs, let alone 10T scale runs.
> They don't have demand for the price it would require for inference.
citation needed. I find it hard to believe; I think there are more than enough people willing to spend $100/Mtok for frontier capabilities to dedicate a couple racks or aisles.
Apple got it right with unified memory with wide bus. That's why Mac Minis are flying for local models. But they are 10x less powerful in AI TOPS. And you can't upgrade the memory.
I really wish AMD and Intel boards get replaced by competent people. They could do it in very short time. Both have integrated GPUs with main memory. AMD and Intel have (or at least used to have) serious know-how in data buses and interconnects, respectively. But I don't see any of that happening.
ROCm? It can't even support decent Attention. It lacks a lot of features and NVIDIA is adding more each year. Soon they will reach escape velocity and nobody will catch them for a decade. smh
Aren't mac minis flying for "local models" because people have no clue what they are doing?
All those people who bought them for openclaw just bought them because it was the trendy thing to do. No one of those people is running local models on there.
> I really wish AMD and Intel boards get replaced by competent people.
Intel? Agreed. But AMD is making money hand over fist with enterprise AI stuff.
Right now, any effort that AMD or NVIDIA expend on the consumer sector is a waste of money that they could be spending making 10x more at the enterprise level on AI.
Granted, I feel like NVIDIA GPU pricing is such that Mac minis will be way less than 10x cheaper if not already, so one might still get ahead purchasing a bulk order of Mac minis....
The 256GB Mac Studio (the one with "eight times the RAM") is listed for ~$2000 more than the current 5090 prices, and another additional $1500 for the 80-core GPU variant. Only the "base" model with 96gb is a remotely similar price, $3600-$4000.
And a 5090 has a little over 2x the memory bandwidth - ~820GB/s vs ~1790GB/s. And significantly higher peak FLOPS on the 5090 too.
Sure, if the goal is to get the "Cheapest single-device system with 256GB ram" it looks pretty good, but there's lots of other axes it falls down on. Great if you know you don't care about them, but not "Better In Every Way". Arguably, better in only a single way - but that single way may well be the one you need.
And the current 5090 price might be a transient peak - only three months ago they were closer to $2500 - significantly less than half the $6000 base-spec 256GB Mac Studio. While the Mac Studio has been constant.
It seems like general improvements in ram efficiency, such as that used in Gemma 4, means it’s back to memory bandwidth as the bottleneck and less about total available memory size. I’m also curious to see how much more agent autonomy will reduce less need for low latency and shift the focus to more throughput. Meaning it’s easier to spread the model out over multiple smaller GPUs and use pipeline parallelism to keep them busy. This would also mean using ram for market discrimination becomes less effective.
reply