Hacker Newsnew | past | comments | ask | show | jobs | submit | chadcmulligan's commentslogin

"why do neural networks work better than other models?" That sounds really interesting - any references (for a non specialist)?

https://en.wikipedia.org/wiki/Universal_approximation_theore...

the better question is why does gradient descent work for them


The properties that the uniform approximation theorem proves are not unique to neural networks.

Any models using an infinite dimensional Hilbert space, such as SVMs with RBF or polynomial kernels, Gaussian process regression, gradient boosted decision trees, etc. have the same property (though proven via a different theorem of course).

So the universal approximation theorem tells us nothing about why should expect neural networks to perform better than those models.


Extremely well said. Universal approximation is necessary but not sufficient for the performance we are seeing. The secret sauce is implicit regularization, which comes about analogously to enforcing compression.

@hodgehog11 The grokking phenomenon (Power et al. 2022) is a puzzle for the compression view: models trained on algorithmic tasks like modular arithmetic memorize training data first (near-zero training loss, near-random test accuracy) and then, after many more gradient steps, suddenly generalize. The transition happens long after any obvious compression pressure would have fired. Do you think grokking is consistent with implicit regularization as compression, or does it require a separate mechanism - something more like a phase transition in the weight norms or the Fourier frequency structure?

>Do you think grokking is consistent with implicit regularization as compression

Pretty sure it's been shown that grokking requires L1 regularization which pushes model parameters towards zero. This can be viewed as compression in the sense of encoding the distribution in the fewest bits possible, which happens to correspond to better generalization.


Couldn't have said it better, although this is only for grokking with the modular addition task on networks with suitable architectures. L1 regularization is absolutely a clear form of compression. The modular addition example is one of the best cases to see the phenomenon in action.

Whenever people bring this up I like to remind them that linear interpolation is a universal function approximator.

I don't think that this is true. You need an infinite number of dimensions for this (think Taylor's expansion, Fourier expansion, infinitely wide or deep NNs..)

Can you expand on that?

I'll use 1NN as the interpolation strategy instead since I think it illustrates the same point and saves a few characters.

Recap: 1NN says that given a query Q you choose any pair (X,Y) from your learned "model" (a finite set of (X,Y) pairs) M minimizing |Q-X|. Your output is Y.

The following kind of argument works for linear interpolation too (you can even view 1NN as 1-point interpolation), but it's ever so slightly messier since definitions vary a fair bit, you potentially need to talk about the existence of >1 discrete "nearest" or "enclosing" set of neighbors, and proving that you can get away with fewer points than 1NN or have lower error than 1NN is itself also messier.

Pick your favorite compact-domain, continuous function embedded in some Euclidean space. For any target error you'd like to hit, the uniform continuity of that function guarantees that if your samples cover the domain well enough (no point in the domain is greater than some fixed distance, needing smaller distances for lower errors, from some point in your model) then the maximum error from a 1NN strategy is bounded by the associated error given by uniform continuity (which, again, you can make as small as you'd like by increasing the sampling resolution). The compact domain means you can physically achieve those error bounds with finite sample sizes.

For a simple example, imagine fitting more and more, smaller and smaller, line segments to y=x^2 on [-1,1].


Universal approximation is like saying that a problem is computable

sure, that gives some relief - but it says nothing in practice unlike f.e. which side of P/NP divide the problem is on


Asymptotics has been used to validate tons of statistical tools. This is just another tool being validated.

If you have a tool that you don't know works when data increases (n-> infinity), then you shouldn't use it.

So practicaly, I believe it has serious implications.


> unlike f.e. which side of P/NP divide the problem is on

Actually the P/NP divide is a similar case in my opinion. In practice a quadratic algorithm is sometimes unacceptably slow and an NP problem can be virtually solved. E.g. SAT problems are routinely solved at scale.


An NP problem can contain subproblems that are not worst case problems.

It's similar to the gap between pushdown automata and Turing machines. You can check if pushdown automata will terminate or not. You can't do it for Turing machines, but this doesn't stop you from running a pushdown automata algorithm on the turning machine with decidable termination.


I don't follow. Why wouldn't it work? It seems to me that a biased random walk down a gradient is about as universal as it gets. A bit like asking why walking uphill eventually results in you arriving at the top.

It wouldn't work if your landscape has more local minima than atoms in the known universe (which it does) and only some of them are good. Neural networks can easily fail, but there's a lot of things one can do to help ensure it works.

A funny thing is, in very high-dimensional space, like millions and billions of parameters, the chance that you'd get stuck in a local minima is extremely small. Think about it like this, to be stuck in a local minima in 2D, you only need 2 gradient components to be zero, in higher dimension, you'd need every single one of them, millions up millions of them, to be all zero. You'd only need 1 single gradient component to be non-zero and SGD can get you out of it. Now, SGD is a stochastic walk on that manifold, not entirely random, but rather noisy, the chance that you somehow walk into a local minima is very very low, unless that is a "really good" local minima, in a sense that it dominates all other local minimas in its neighborhood.

You are essentially correct, which is why stochastic gradient optimizers induce a low-sharpness bias. However, there is an awful lot more that complicates things. There are plenty of wide minima that it can get stuck in far away from where people typically initialise, so the initialisation scheme proves extremely important (but is mostly done for you).

Perhaps more important, just because it is easy to escape any local minimum does not mean that there is necessarily a trend towards a really good optimum, as it can just bounce between a bunch of really bad ones for a long time. This actually happens almost all the time if you try to design your entire architecture from scratch, e.g. highly connected networks. People who are new to the field sometimes don't seem to understand why SGD doesn't just always fix everything; this is why. You need very strong inductive biases in your architecture design to ensure that the loss (which is data-dependent so you cannot ascertain this property a priori) exhibits a global bowl-like shape (we often call this a 'funnel') to provide a general trajectory for the optimizer toward good solutions. Sometimes this only works for some optimizers and not others.

This is why architecture design is something of an art form, and explaining "why neural networks work so well" is a complex question involving a ton of parts, all of which contribute in meaningful ways. There are often plenty of counterexamples to any simpler explanation.


(‘Minimum’ is the singular of ‘minima’.)

>you'd need every single one of them, millions up millions of them, to be all zero

If they were all correlated with each other that does not seem far fetched.


Ok but it's already known that you shouldn't initialize your network parameters to a single constant and instead initialize the parameters with random numbers.

The model can converge towards such a state even if randomly initialized.

Both you and the comment above are correct; initializing with iid elements ensures that correlations are not disastrous for training, but strong correlations are baked into the weights during training, so pretty much anything could potentially happen.

Not a mathematician so I’m immediately out of my depth here (and butchering terminology), but it seems, intuitively, like the presence of a massive amount of local minima wouldn’t really be relevant for gradient descent. A given local minimum would need to have a “well” at least be as large as your step size to reasonably capture your descent.

E.g. you could land perfectly on a local minima but you won’t stay the unless your step size was minute or the minima was quite substantial.


I believe what was meant was that assuming local minima of a sufficient size to capture your probe, given a sufficiently high density of those, you become extremely likely to get stuck. A counterpoint regarding dimensionality is made by the comment adjacent to yours.

The randomness (and exploration) encouraged by batch training also helps avoid 'real' minima, if they exist.

Interestingly, there exist problems which provably can't be learned via gradient descent for them.

If you dive into Analysis (the underlying theory behind calculus) this book - "How to Think About Analysis" by Lara Alcock is the book I wish I had when I studied it. Calculus by Spivak is the book I learnt from but it is probably not the easiest, it is very thorough though.

Table of contents look really helpful to understanding.

with vampires!


You really start wondering when they are introduced and it all kind of clicks at the end, when we realize we had the rug pulled from under our feet when the book started, and we only know it by the point we land on our faces.


+1 for Stross, Egan and the Bobiverse - I haven't read the others so will have a look, just wanted to add Stand on Zanzibar by Brunner, if the Bobiverse is there then MurderBot should be to.


Maybe OT - I find Claude Code hit or miss, I spend a lot of time removing dumb code or asking Claude to remove it eg "why do you have a separate..." Claude: "Good catch — there's no real reason...." and so on.

Where I find it incredible - learning new things, I recently started flutter/dart dev - I just ask Claude to tell me about the bits, or explaining things to me, it's truly revolutionary imho, I'm building things in flutter after a week without reading a book or manual. It's like a talking encyclopaedia, or having an expert on tap, do many people use it like this? or am I just out of the loop, I always think of Star Trek when I'm doing it. I architected / designed a new system by asking Claude for alternatives and it gave me an option I'd never considered to a problem, it's amazing for this, after all it's read all the books and manuals in the world, it's just a matter of asking the right questions.


Ive done a couple exploratory learning with AIs and wow could it help with learning.

Imo we may be messing up the economy with AIs. They should be engineering better workers, not being employed to make one person do the work of three poorly.

The power of AIs to smooth learning and raise expertise, rather than replace it, should be the adaptation goal. Obviously AIs as work assistants are powerful, but all the AI bullshitting CEOs overselling AIs is really damaging on the whole economic level

Particularly because the current marketing leads to the next generation abandoning roles that AI bullshitters claim are perfectly replaced.

It's like the urbanization demographic bomb on steroids.


I find myself worrying the AI bubble will pop and we'll lose this aspect of AI's without it ever being properly explored. Instead of doomscrolling now I find myself firing up claude and saying 'explain ... to me' and it proceeds to tell me all about it. I can ask it questions and it seems fairly right - at least right enough for me to proceed, it's way better at this than building code, in my experience anyway.


When people say the "bubble will pop" it's meant in analogy to the dotcom era - businesses and investers lost money, but the internet (and its opportunities) didn't vanish.

Even open-weight local models are becoming good enough for teaching yourself quite a range of stuff, especially the beginner aspects. LLMs are not going to simply disappear because of a financial reallignment. The worst thing might be not being able to access a super-duper frontier model for free?


Many people use it like this - this is playing to its strengths, rather than trying to work around its weaknesses. "What's the idiomatic X language way to do Y?" gets you a solid, useful answer in seconds.

But it's just a damn good tool, not the apocalypse/the thing that lets you finally fire everyone. So it kind of gets lost in the hype.


this is the only use case I'm super bullish on. And for this it is revolutionary. Agreed.


Recently started making a Flutter app and it is fantastic to use, cross platform to everywhere, dart is a very nice language to.


Haven't these guys been to a Taiwanese restaurant, they have great mock meats, and of course vegetarians have great mock meats too, love a good black bean pattie. The hubris this company shows is amazing.


They are focusing on an American palette, which is averse to things like tofu, seitan, or tempeh as they are considered not masculine enough by a significant portion of the population. This is reinforced by both genders.


Tofu is so ridiculously OP in terms of nutrition, production costs, and culinary versatility. It's a shame society here in the US is so strongly stymied by the manipulative meat lobby.


As someone who is frugal AF and open minded...

I think you are overstating Tofu.

But honestly I'm mostly eating for nutrition rather than taste. Tofu doesnt hit the numbers I need.


Tofu uses about 50x less land, 5x water, and produces 15x less CO2 per gram of protein compared to beef. It's pretty remarkable by these metrics even compared to other plant foods. Tofu would certainly be understood as "OP" in any simulated game like civilization. We in the US are disadvantaged just for our intractable attachment to beef.

Plus, all the other nice things about it (high in fiber, doesn't incur the bodily damage associated with red meat and saturated fat, is complete protein, lasts for over a month in the fridge, can be produced shelf-table, etc.) And it's not like you have to choose one or the other.

Meat is subsidized in the US, so while tofu is usually cheap, it's not by as much as those numbers would suggest. (About 3x, for those just tracking protein. At my Costco, it's about $30 for 4lb of 85% ground beef or $7 for 4lb of tofu. That works out to $0.10 per gram protein for the beef, or $0.03 per gram of protein for the tofu.)


Had to use beef and not chicken? And I noticed you didn't mention nutrition but rather economics.


Tofu still compares favorably to chicken, it uses about 2x less water and produces about 3x less CO2 than chicken. Looking at my Costco, tofu is about the same price per gram protein as chicken.

I used common English words using simple grammatical structures in short paragraphs. My comment was short and easy to read. I articulated the nutritive qualities clearly and succinctly. Tofu compares favorably to beef and chicken in terms of nutrition as well. (And that's even if we're ignoring how commonly meats have added table and curing salts).

Further, we're discussing beef in this thread, and anyone who was reading this thread would have that context. You're implying that tofu to beef was intentionally done to disfavor the meat industry, which is either bad-faith rhetoric, or you are someone who is commenting without having the context that beef is discussed pretty centrally throughout this thread.

Your reply does not seem like good-faith participation from a human. You are clearly not even reading the comments you are replying to.


What do you mean by overstating? It has really good protein numbers. The protein itself has an evenly balanced amino acid profile (or in other words - a "complete" protein). It has a good amount of calcium, iron, and low fat. You can technically make it yourself and there's numerous ways of cooking and flavoring it.


Not good compared to chicken.


There's no hope trying to sell "plant-based hamburger" with any name to toxic masculinity advocates who think soy feminizes you (even though seitan isn't soy). These guys are getting hospitalized from eating all-beef diets because chicken is "too feminine".


They could wash it down with some Brawndo I suppose


Great news, thanks to some fantastic "journalism" about estrogen contents, artificial meat is now viewed as being feminizing in a very literal way.


Back in Europe I had many good meat alternatives in grocery stores that were quite budget friendly as well. Like vegetarian 'Schnitzel', 'chicken', 'fish'. Here in NA, most of the meat alternatives are breaded, or high in fat and salt. It's disappointing.


They used to be more common in the pre power board age (piggy backs, and screw terminal - they were very common back in the day), I can't see a date anywhere on the page. 5-8 are more specialist.


As a fellow chad I concur. Though I am improving my poker skills - games of chance will still be around


You likely already know, but the "Pluribus" poker bot was beating humans back in 2019. Games of chance will be around if people are around, but you'll have to be careful to ensure you're playing against people, unassisted people.

https://en.wikipedia.org/wiki/Pluribus_(poker_bot)


Yeah, thanks, I only play live games. I'm in australia so online poker is illegal here. I was thinking of getting a vpn and having a play online, then I saw this recently https://www.reddit.com/r/Damnthatsinteresting/comments/1qi69...


So much of these degenerate online gambling / "investment" platforms are illegal here for good reason. If you are just a normal person playing fairly, you are being scammed. Same for things like Polymarket, the only winners are the people with insider knowledge.


Even horse racing, it's a solved problem, and if you start winning they'll just cancel your a/c (happened to a friend of mine)


misogynistic behaviors were cultural at the time, I agree they're abhorrent but people are embedded in their culture. The same is said of Hitchcock, (as an example) and his behaviour was unacceptable by todays standards. We've come some way from that but still a way to go.

From the about the authors in the OP's link "Feynman was a remarkably effective educator. Of all his numerous awards, he was especially proud of the Oersted Medal for Teaching, which he won in 1972.". He probably didn't do a lot of the stuff he popularised, but that was what he did, it is a skill taking abstract stuff and making it coherent. I know when I did physics (in the 90's) many swore by his books, particularly for quantum, it was a bit of a secret we'd have these incomprehensible books on quantum, and someone would say - have you seen "The Feynman lectures", they are good, I wish we had the videos available at the time.


> misogynistic behaviors were cultural at the time, I agree they're abhorrent but people are embedded in their culture.

Moral relativism is a thing, but I think a more useful way to think of it rather than just saying "misogyny was a thing back then, should we care he was a misogynist then?" is to ask "if he were to have lived and worked in the 2000s, would he associate with Epstein?" And to be honest… Feynman does strike me as the kind of person to have the intellect to attract Epstein's attention and also the, for lack of a better term, party attitude to go to a couple of Epstein's parties that would result in him having awkward press releases trying to explain that he just had no possible idea that Epstein was doing anything sexual with children and conveniently forgetting all the times he was on the private island for some party or another...

That's the real strong vibe I get from Surely You're Joking. He's the kind of person who wants to be seen as someone who gets up to wacky hijinks, to be seen as "cool," and he specifically interprets "cool" in a way that's misogynistic even at a time (when he was dictating the stories that led to Surely You're Joking) when misogyny was starting to become a professional hindrance.

(And one of the things that really worries me about Surely You're Joking is that it's often recommended as a sort of "look at the wacky hijinks you can get up to as a physicist," so recommending the book is a valorization of his wacky hijinks and... well, that's ultimately what Angela's video is about, that's a thing we need to stop doing.)


> That's the real strong vibe I get from Surely You're Joking. He's the kind of person who wants to be seen as someone who gets up to wacky hijinks, to be seen as "cool," and he specifically interprets "cool" in a way that's misogynistic even at a time (when he was dictating the stories that led to Surely You're Joking) when misogyny was starting to become a professional hindrance.

In my experience, everyone who says this is talking about exactly one chapter in Surely You're Joking, but they don't appear to actually have paid close attention to the story. It's a story that Feynman recounts about trying to pick up girls when he was younger. He was advised by an older, "cooler" man to be mean. Feynman tries it and it works, but he feels bad about it and says that he never did it again. People calling Feynman a misogynist for this story seem to have just skipped the end of the chapter.


It's been decades since I read Surely You're Joking, and I've completely forgotten about that chapter. It plays no part in my conscious recollection of the book.

The episode that really stuck in my mind was the episode about his competition with the abacus-user, who was better at math, which essentially ends with him giving up trying to explain how he could mental math a cube root faster, because the abacus-user was just someone who couldn't understand a math explanation.


I remembered enjoying the book, so having not read it in a long time, I tried sharing Surely You're Joking with my kids at bedtime.

That chapter wasn’t the only thing I ended up skipping or heavily editing.

* Picking a room at Los Alamos with a window facing the women’s housing, but being disappointed that a tree or something blocked his view. (Wasn’t he also married at this point?)

* Starting a new Uni faculty position and hanging out at student dances, dismayed that girls would stop chatting & dancing with him when they learned he was a prof and not a fellow student.

* Hanging out at strip clubs to practice his drawing skills.

* Considering a textbook sales rep’s offer to help him find “trouble” in Vegas.

So maybe that one chapter turns around some at the end, but it’s not the only cringe-worthy moment in the book, and I can see why some people may have an overall negative opinion.

If I were going to do this with my kids now that they are teens, I wouldn’t filter as much and use the more questionable events as points of discussion.


> would he associate with Epstein?

This is from Lawrence Krauss[0]'s email to Epstein[1]:

> ps. I have decided that Feynman would have done what I did... and I am therefore content.. no matter what... :)

> On Apr 6, 2011, at 3:56 PM, Jeffrey Epstein wrote:

> what evidence? no real sex.. where is she getting her so called facts

Krauss's letter is obviously horrible in its implications. What's interesting to me is his interpretation of what Feynman would have done. Is it his delusional justification of what he'd done with Epstein, or is it based on a certain reputation of Feynman in the science community?

[0] https://en.wikipedia.org/wiki/Lawrence_Krauss [1] https://www.epstein.media/files/house_oversight_030915/


> misogynistic behaviors were cultural at the time, I agree they're abhorrent but people are embedded in their culture. The same is said of Hitchcock, (as an example) and his behaviour was unacceptable by todays standards. We've come some way from that but still a way to go.

The video actually addresses this very point in the first few minutes:

> the second component of the Feynman lifestyle that the Feynman bro has to follow, you know as told in this book, is that women are inherently inferior to you and if you want to be the smartest big boy physicist in the room you need to make sure they know that I think people are sometimes shocked to hear this like that that exists in this book especially because as I said if you were a precocious teenager interested in physics people shoved this book at you they just put it into your hands like oh you want to be a physicist here's the coolest physicist ever

> I feel like it's at this point in the video when like Mr. Cultural Relativism is going to show up in the comments and be like how dare you judge people from the past on their actions that's not fair things were different back then women liked when men lied to them and pretended to be an undergrad so that-- it was fine back then it was fine and I just, no, actually this book was published 40 years ago which is just not that long ago Richard Feynman should have known that women were people 40 years ago like absolutely not it's not "how things were back then" what's wrong with you people, no, it's inappropriate then it's inappropriate now

Later the actual author, Ralph Leighton, of "Surely You're Joking, Mr. Feynman!" is mentioned so perhaps the responsibility for what was included is his more than Feynman's. I think the criticism stands that the degree of sexism effectively celebrated by inclusion was certainly less culturally accepted in 1985 when the book was published than when the events occurred, and that's the point of raising the issue of why was it judged as good and proper to include this marginalizing anecdotes when his actual contributions to physics and teaching were worthy of celebration.


I do not think Feynman was celebrating his activity in the book. From memory, he learnt the behaviour from other bar flies at the bars he hung out. And he expressed his surprise at how some women reacted. This was far from his upbringing and his experience with his fiancee.

The behaviour is hardly laudable, but "celebrated" it is not.


> I do not think Feynman was celebrating his activity in the book.

The argument presented in the video about this is that these are the stories Feynman edited and reworked over time, and shared with his friend Ralph Leighton, who then recorded them in the "Surely You're Joking" book.

The video also describes a change in his behavior later in life. In 1974, responding to a letter asking to reprint "What is Science?"[1] from 1966, he comments that "some of the remarks about the female mind might not be taken in the light spirit they were meant"[2]. This is cited in the video as Feynman becoming more progressive between 1966 and 1974. The "Surely" book is published in 1985, and yet still includes the misogynistic stories. The video's complaint is that there should be some contextualization about views changing, like was given in Feynman's reply in 1974, but there being none it comes across as an implicit endorsement. I don't recall from the video if Feynman reviewed or edited the "Surely" book, which leaves it as Ralph Leighton's perspective more than Feynman's.

It seems a legitimate criticism that this book held up as an example of a good role model in physics doesn't try to avoid perpetuating bad stereotypes. It's probably egregious to say the mere inclusion of the stories celebrates their actions. But it's equally egregious to fail to even try to address the bad behavior, especially when it's held out as a positive example.

[1] https://feynman.com/science/what-is-science/

[2] https://archive.org/details/perfectly-reasonable-deviations-...


And…who hasn’t done offensive things, before learning that what they’re doing is bad? It’s a matter of developing self control and awareness.


Certainly. But you're missing the point. Feynman chose to tell the stories to Ralph Leighton who then recorded them in the "Surely" book which was published in 1985, well after Feynman's own perspective seems to have changed about the more offensive things he'd said.

By many other accounts he was a kind, caring, thoughtful person, but some of the selected stories in "Surely" paint a significantly different picture. To me it's unclear, not having studied the life of Richard Feynman, what parts are exaggerated. But it does seem clear that these stories were refined and selected for inclusion, and were therefore considered endearing or representative for the intention of the book. And in the time and culture in which it was published that seems like a bit of a miss at the very least.


Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: