You seem to be going off the title which is plainly incorrect and not what the paper says. The paper demonstrates HOW different models can learn similar representations due to "data, architecture, optimizer, and tokenizer".
"How Different Language Models Learn Similar Number Representations" (actual title) is distinctly different from "Different Language Models Learn Similar Number Representations" - the latter implying some immutable law of the universe.
"How Different Language Models Learn Similar Number Representations" (actual title) is distinctly different from "Different Language Models Learn Similar Number Representations" - the latter implying some immutable law of the universe.