More

stri8ted · 2026-04-23T18:44:41 1776969881

I doubt this is representative of real world usage. There is a difference between a few turns on a web chatbot, vs many-turn cli usage on a real project.

stri8ted · 2026-04-12T18:41:01 1776019261

Entire segments of the podcast sphere are making their money talking about these so-called unspeakable subjects. Why don't you share what you really think.

pixl97 · 2026-04-12T21:12:13 1776028333

I mean, your right, unless you actually work for a government or state and then you cannot.

https://gov.texas.gov/news/post/anti-israel-policies-are-ant...

gosub100 · 2026-04-12T21:25:37 1776029137

Maybe because there are laws against it?

stri8ted · 2026-04-12T18:22:54 1776018174

Those are the same thing

greycol · 2026-04-12T20:50:12 1776027012

They are not if there aren't customers who are willing to pay more. For instance imagine a widget that lasts 1 year and is just under 1/2 the price of one that lasts 2 years. There may be high demand because it's the more economical option. If you raise the price so that it's 1/2 the price of the 2 year widget then demand collapses without effecting supply.

9rx · 2026-04-13T00:29:03 1776040143

If customers were willing to pay more then a higher price wouldn't solve anything. The price is said to be too low exactly because people are trying to buy more than there is available to sell. The whole point of higher prices is to try and scare people away. Not enough supply and a price too low are the same thing.

stri8ted · 2026-03-22T14:25:31 1774189531

48 GB is not consumer hardware. But fundamentally, there are economies of scale due to batching, power distribution, better utilization etc.., that means data center tokens will be cheaper. Also, as the cost of training (frontier) models increases, it's not clear the Chinese companies will continue open sourcing them. Notice for example, that Qwen-Max is not open source.

zozbot234 · 2026-03-22T14:44:18 1774190658

Nothing obviously prevents using this approach, e.g. for 3B-active or 10B-active models, which do run on consumer hardware. I'd love to see how the 3B performs with this on the MacBook Neo, for example. More relevantly, data-center scale tokens are only cheaper for the specific type of tokens data centers sell. If you're willing to wait long enough for your inferences (and your overall volume is low enough that you can afford this) you can use approaches like OP's (offloading read-only data to storage) to handle inference on low-performing, slow "edge" devices.

WesolyKubeczek · 2026-03-22T20:21:08 1774210868

It is consumer hardware in the sense that Macbook Pros come with this RAM size as base and that you can buy them as a consumer, without having to sign a special B2B contract, show that your company is big and reputable enough, and order a minimum of 10 or 100.

m-hodges · 2026-03-22T17:26:17 1774200377

> 48 GB is not consumer hardware.

It’s a MacBook.

kelnos · 2026-03-23T00:05:47 1774224347

Technically that's correct (which as we all know is the best kind of correct), but really, how many consumers are buying a high-end MacBook Pro with 48GB or more of RAM? That's a very small percentage of the population. In these kinds of discussions, "consumer" is being used as a proxy for "something your average home laptop buyer might have". And a 48GB MBP is not that.

I know it's annoying, because a 48GB MBP is indeed technically "consumer hardware", but please understand the context and don't be pedantic. You know what the GP meant. (And if not, that's... kinda on you.)

m-hodges · 2026-03-23T03:01:57 1774234917

> but please understand the context and don't be pedantic.

The context is this is something I can pick up at an Apple Store and not some rig I have to build with NVIDIA cards.

I led with:

> get closer and closer to consumer hardware

I think this demonstrates getting closer, whether you think a MacBook is consumer hardware or not. But I'm the one being pedantic.

stri8ted · 2026-03-10T23:58:49 1773187129

Do you have any evidence to support this view?

rolymath · 2026-03-11T00:55:48 1773190548

Read who and how it was founded. It's not a secret at all.

rrr_oh_man · 2026-03-11T18:52:43 1773255163

It’s funny how I got immediately downvoted and flagged

pocksuppet · 2026-03-11T03:32:04 1773199924

Who else would MITM 30% of the internet?

stri8ted · 2026-03-05T18:46:42 1772736402

Price Input: $2.50 / 1M tokens Cached input: $0.25 / 1M tokens Output: $15.00 / 1M tokens

https://openai.com/api/pricing/

stri8ted · 2026-03-04T22:45:56 1772664356

It's clearly true there have been abuses as a result of this technology. And its also clearly true criminals have been caught as a result of the cams, that otherwise would not have been.

If you believe the costs of the the abuses, and potential abuses, exceed the benefit, then at least be honest about the trade-off, because there are real benefits.

Personally, I believe the costs, on net, are worth the benefits. And in so far as the costs can be further reduced, without loosing most benefits, then great. This is not right or wrong. It's just a question of values, and how you weight the costs vs benefits.

Don't down-vote this all at once.

pseudalopex · 2026-03-05T00:07:00 1772669220

Please don't comment about the voting on comments. It never does any good, and it makes boring reading.[1]

[1] https://news.ycombinator.com/newsguidelines.html

Forgeties79 · 2026-03-04T22:47:11 1772664431

My question to you is: how are you assessing the costs? Do you know how many crimes have been stopped as a result of these cams? Do you know the extent to which our privacy is being lost and our data is being used against us or others?

stri8ted · 2026-03-04T22:53:42 1772664822

I take into account publicly available information (news articles), factor in personal anecdotes, and reason about human nature and incentives. I know the extent of reported abuses, and I do my best to extrapolate. It's not perfect, but such is life.

To be clear, even if we all agreed on the data, I still would not expect everyone to take the same position. There are subjective differences in values.

Forgeties79 · 2026-03-04T23:50:13 1772668213

I get that but at the very least one should demand evidence to their efficacy

stri8ted · 2026-03-05T00:12:49 1772669569

Flock has put out a report claiming 10% crime in the US is solved using their technology. There are of course counter argument, that claim this is not valid.

https://www.flocksafety.com/customers/how-many-crimes-do-aut...

cm2012 · 2026-03-04T23:41:49 1772667709

I strongly agree with this take.

stri8ted · 2026-03-03T17:56:53 1772560613

Can you show some comparisons for WER and other ASR models? Especially for non english.

k9294 · 2026-03-03T18:20:22 1772562022

I've been experimenting with Gemini 3.1 Flash Lite and the quality is very good.

I haven't found official benchmarks yet, but you can find Gemini 3 Flash word error rate benchmarks here: https://artificialanalysis.ai/speech-to-text/models/gemini — they are close to SOTA.

I speak daily in both English and Russian and have been using Gemini 3 Flash as my main transcription model for a few months. I haven't seen any model that provides better overall quality in terms of understanding, custom dictionary support, instruction following, and formatting. It's the best STT model in my experience. Gemini 3 Flash has somewhat uncomfortable latency though, and Flash Lite is much better in this regard.

stri8ted · 2026-02-24T00:25:22 1771892722

How is this content related to HN? Are there any submission criteria?

stri8ted · 2026-02-19T17:18:09 1771521489

Exactly. As far as I'm concerned, the benchmark is useless. It's way too easy and rewarding to train on it.

bonoboTP · 2026-02-19T19:33:31 1771529611

It's just an in-joke, he doesn't intend it as a serious benchmark anymore. I think it's funny.

Legend2440 · 2026-02-19T17:44:33 1771523073

Y'all are way too skeptical, no matter what cool thing AI does you'll make up an excuse for how they must somehow be cheating.

toraway · 2026-02-19T19:14:56 1771528496

Jeff Dean literally featured it in a tweet announcing the model. Personally it feels absurd to believe they've put absolutely no thought into optimizing this type of SVG output given the disproportionate amount of attention devoted to a specific test for 1 yr+.

I wouldn't really even call it "cheating" since it has improved models' ability to generate artistic SVG imagery more broadly but the days of this being an effective way to evaluate a model's "interdisciplinary" visual reasoning abilities have long since passed, IMO.

It's become yet another example in the ever growing list of benchmaxxed targets whose original purpose was defeated by teaching to the test.

https://x.com/jeffdean/status/2024525132266688757?s=46&t=ZjF...

arcatech · 2026-02-19T18:49:56 1771526996

Or maybe you’re too trusting of companies who have already proven to not be trustworthy?

pixl97 · 2026-02-19T17:46:09 1771523169

I mean if you want to make your own benchmark, simply don't make it public and don't do it often. If your salamander on skis or whatever gets better with time it likely has nothing to do with being benchmaxxed.