I doubt this is representative of real world usage. There is a difference between a few turns on a web chatbot, vs many-turn cli usage on a real project.
Entire segments of the podcast sphere are making their money talking about these so-called unspeakable subjects. Why don't you share what you really think.
They are not if there aren't customers who are willing to pay more. For instance imagine a widget that lasts 1 year and is just under 1/2 the price of one that lasts 2 years. There may be high demand because it's the more economical option. If you raise the price so that it's 1/2 the price of the 2 year widget then demand collapses without effecting supply.
If customers were willing to pay more then a higher price wouldn't solve anything. The price is said to be too low exactly because people are trying to buy more than there is available to sell. The whole point of higher prices is to try and scare people away. Not enough supply and a price too low are the same thing.
48 GB is not consumer hardware. But fundamentally, there are economies of scale due to batching, power distribution, better utilization etc.., that means data center tokens will be cheaper. Also, as the cost of training (frontier) models increases, it's not clear the Chinese companies will continue open sourcing them. Notice for example, that Qwen-Max is not open source.
Nothing obviously prevents using this approach, e.g. for 3B-active or 10B-active models, which do run on consumer hardware. I'd love to see how the 3B performs with this on the MacBook Neo, for example. More relevantly, data-center scale tokens are only cheaper for the specific type of tokens data centers sell. If you're willing to wait long enough for your inferences (and your overall volume is low enough that you can afford this) you can use approaches like OP's (offloading read-only data to storage) to handle inference on low-performing, slow "edge" devices.
It is consumer hardware in the sense that Macbook Pros come with this RAM size as base and that you can buy them as a consumer, without having to sign a special B2B contract, show that your company is big and reputable enough, and order a minimum of 10 or 100.
Technically that's correct (which as we all know is the best kind of correct), but really, how many consumers are buying a high-end MacBook Pro with 48GB or more of RAM? That's a very small percentage of the population. In these kinds of discussions, "consumer" is being used as a proxy for "something your average home laptop buyer might have". And a 48GB MBP is not that.
I know it's annoying, because a 48GB MBP is indeed technically "consumer hardware", but please understand the context and don't be pedantic. You know what the GP meant. (And if not, that's... kinda on you.)
It's clearly true there have been abuses as a result of this technology. And its also clearly true criminals have been caught as a result of the cams, that otherwise would not have been.
If you believe the costs of the the abuses, and potential abuses, exceed the benefit, then at least be honest about the trade-off, because there are real benefits.
Personally, I believe the costs, on net, are worth the benefits. And in so far as the costs can be further reduced, without loosing most benefits, then great. This is not right or wrong. It's just a question of values, and how you weight the costs vs benefits.
My question to you is: how are you assessing the costs? Do you know how many crimes have been stopped as a result of these cams? Do you know the extent to which our privacy is being lost and our data is being used against us or others?
I take into account publicly available information (news articles), factor in personal anecdotes, and reason about human nature and incentives. I know the extent of reported abuses, and I do my best to extrapolate. It's not perfect, but such is life.
To be clear, even if we all agreed on the data, I still would not expect everyone to take the same position. There are subjective differences in values.
Flock has put out a report claiming 10% crime in the US is solved using their technology. There are of course counter argument, that claim this is not valid.
I speak daily in both English and Russian and have been using Gemini 3 Flash as my main transcription model for a few months. I haven't seen any model that provides better overall quality in terms of understanding, custom dictionary support, instruction following, and formatting. It's the best STT model in my experience. Gemini 3 Flash has somewhat uncomfortable latency though, and Flash Lite is much better in this regard.
Jeff Dean literally featured it in a tweet announcing the model. Personally it feels absurd to believe they've put absolutely no thought into optimizing this type of SVG output given the disproportionate amount of attention devoted to a specific test for 1 yr+.
I wouldn't really even call it "cheating" since it has improved models' ability to generate artistic SVG imagery more broadly but the days of this being an effective way to evaluate a model's "interdisciplinary" visual reasoning abilities have long since passed, IMO.
It's become yet another example in the ever growing list of benchmaxxed targets whose original purpose was defeated by teaching to the test.
I mean if you want to make your own benchmark, simply don't make it public and don't do it often. If your salamander on skis or whatever gets better with time it likely has nothing to do with being benchmaxxed.
reply