I feel like most engineers I talk to still haven't realised what this is going to mean for the industry. The power loom for coding is here. Our skills still matter, but differently.
When the power loom came around, what happened with most seamtresses? Did they move on to become fashion designers, materials engineers to create new fabrics, chemists to create new color dyes, or did they simply retire or were driven out of the workforce?
There were riots and many people died. Many people lost their jobs. I didn't say this is good but it is happening. As individuals we should act to protect ourselves from these changes.
That might mean joining a union and trying to influence how AI is adopted where you work. It might mean changing which if your skills you lean on most. But just whining about AI is bad is how you end up like those seamstresses.
On the other hand, a lot of those jobs were offshored to places where labor is cheaper. It would be interesting to compare how many people work in the textile industry in Bangladesh today compared to the US 50 years ago.
> joining a union and trying to influence how AI is adopted where you work.
Did the strong unions for car manufacturers in Detroit protected the long term stability of the profession? Did it ensure that the Rust belt was still a thriving economic area?
> Just whining about AI is bad
I'm not whining. I just think that we are witnessing the end of "knowledge workers" and a further compression of the middle class. Given that I'm smack in the middle of my economically active years (turning 45 this year), I am trying to figure out where this puck is going and whether I will be fast enough to skate there to catch it.
On the other hand, a lot of those jobs were offshored to places where labor is cheaper. It would be interesting to compare how many people work in the textile industry in Bangladesh today compared to the US 50 years ago.
I believe this is a major part of it. People cannot fathom what the industrial countries look like because basically nothing is made in the west anymore. There are literally hundreds of millions of people, maybe billions that work towards making the western economies profitable who get paid nothing to do it and live in filthy polluted slums for everyone else's benefit.
Looms might speed up the process but I guarantee there are thousands of people working in the poorest countries on earth to make it all happen.
Interestingly, AI seems to be massively polluting and while the west has absorbed some of it, it's probably not long until we see more of the data centers being built in poorer countries where the environment can be exploited even harder.
> I'll make more progress than mentally wearing myself out reading a bunch of LLM generated code trying to figure out how to solve the problem manually.
Most engineers realize that there's currently more tech debt being created than ever before. And it will only get worse.
No, I think many realize it, but it's easier to deny the asteroid that's about to destroy your way of life than it is to think about optimizing for the reality after impact.
The Treo was great and was definitely possible to read webpages on it. I thought it was the best smart phone at the time. The screen size web browsing and email were all better on the iPhone.
Cultivating and leveraging fear is truly a cornerstone of Security™.
I don't think the claims about capability are ridiculous. The idea that the general capability is proprietary and that it will be exclusive to the trusted partners of one company is ridiculous.
All the big tech companies are getting access to mythos. If anthropic is blowing smoke with regards to its ability to find vulnerabilities it will leak very soon.
> All the big tech companies are getting access to mythos. If anthropic is blowing smoke with regards to its ability to find vulnerabilities it will leak very soon.
Will it? All the big companies are invested in AI anyway; the people getting exclusive access to this are going to be under an NDA anyway.
It will leak eventually, but the news that is isn't much better than current models news may not be enough to dispel the original claims.
What it can do is call multiple MCPs, dumping tons of crap into the context and then separately run some analysis on that data.
Composable MCPs would require some sort of external sandbox in which the agent can write small bits of code to transform and filter the results from one MCP to the next.
This is confusing to me. What is composability if not calling a program, getting its program, and feeding it into another program as input? Why does it matter if that output is stored in the LLM's context, or if it's stored in a file, or if it's stored ephemerally?
Maybe I'm misunderstanding the definition of composability, but it sounds like your issue isn't that MCP isn't composable, but that it's wasteful because it adds data from interstitial steps to the context. But there are numerous ways to circumvent this.
For example, it wouldn't be hard to create a tool that just runs an LLM, so when the main LLM convo calls this tool it's effectively a subagent. This subagent can do work, call MCPs, store their responses in its context, and thereby feed that data as input into other MCPs/CLIs, and continue in this way until it's done with its work, then return its final result and disappear. The main LLM will only get the result and its context won't be polluted with intermediary steps.
It cannot do "anything" with the tools. Tools are very constrained in that the agent must insert into it's context the tool call, and it can only receive the response of the tool directly back into its context.
Tools themselves also cannot be composed in any SOTA models. Composition is not a feature the tool schema supports and they are not trained on it.
Models obviously understand the general concept of function composition, but we don't currently provide the environments in which this is actually possible out side of highly generic tools like Bash or sandboxed execution environments like https://agenttoolprotocol.com/
Why are tokens not coloured? Would there just be too many params if we double the token count so the model could always tell input tokens from output tokens?
That's something I'm wondering as well. Not sure how it is with frontier models, but what you can see on Huggingface, the "standard" method to distinguish tokens still seems to be special delimiter tokens or even just formatting.
Are there technical reasons why you can't make the "source" of the token (system prompt, user prompt, model thinking output, model response output, tool call, tool result, etc) a part of the feature vector - or even treat it as a different "modality"?
By the nature of the LLM architecture I think if you "colored" the input via tokens the model would about 85% "unlearn" the coloring anyhow. Which is to say, it's going to figure out that "test" in the two different colors is the same thing. It kind of has to, after all, you don't want to be talking about a "test" in your prompt and it be completely unable to connect that to the concept of "test" in its own replies. The coloring would end up as just another language in an already multi-language model. It might slightly help but I doubt it would be a solution to the problem. And possibly at an unacceptable loss of capability as it would burn some of its capacity on that "unlearning".
So most training data would be grey and a little bit coloured? Ok, that sounds plausible. But then maybe they tried and the current models get it already right 99.99% of the time, so observing any improvement is very hard.
They have a lot of data in the form: user input, LLM output.
Then the model learns what the previous LLM models produced, with all their flaws. The core LLM premise is that it learns from all available human text.
This hasn't been the full story for years now. All SOTA models are strongly post-trained with reinforcement learning to improve performance on specific problems and interaction patterns.
The vast majority of this training data is generated synthetically.
Because they're the main prompt injection vector, I think you'd want to distinguish tool results from user messages. By the time you go that far, you need colors for those two, plus system messages, plus thinking/responses. I have to think it's been tried and it just cost too much capability but it may be the best opportunity to improve at some point.
This has the potential to improve things a lot, though there would still be a failure mode when the user quotes the model or the model (e.g. in thinking) quotes the user.
I’ve been curious about this too - obvious performance overhead to have a internal/external channel but might make training away this class of problems easier
The models are already massively over trained. Perhaps you could do something like initialise the 2 new token sets based on the shared data, then use existing chat logs to train it to understand the difference between input and output content? That's only a single extra phase.
reply