I did yes :)
That o3 does well on frontier math held-out set is impressive, no doubt
I think there is plenty of room for doubt still. elliotglazer on reddit writes:
Epoch's lead mathematician here. Yes, OAI funded this and has the dataset, which allowed them to evaluate o3 in-house. We haven't yet independently verified their 25% claim. To do so, we're currently developing a hold-out dataset and will be able to test their model without them having any prior exposure to these problems.
My personal opinion is that OAI's score is legit (i.e., they didn't train on the dataset), and that they have no incentive to lie about internal benchmarking performances. However, we can't vouch for them until our independent evaluation is complete.
(emphasis mine). So there is good reason to doubt that the "held-out dataset" even exists.
How do you figure? It's absolutely possible in principle that a quantum computer can efficiently perform computations which would be extremely expensive to perform on a classical computer.
I don't think it's very surprising. The various CS departments are extremely happy to ride the wave of easy funding and spend a lot of time boosting AI, just like how a few years ago all the cryptographers were getting into blockchains. For instance they added an entire new "AI" major, while eliminating the electrical engineering major on the grounds that "computation" is more important than electrical engineering.
My university sends me checks occasionally, like when they overcharged the premium on my dental insurance. No idea why they can't just do an electronic transfer like for my stipend.
and here i just assumed the name was original to ffxiv
yea i did try to read the lecture notes and got reminded very fast why i don't try to read physics writing lol
I think there is in fact a notion of continuous entropy where that is actually true, and it does appear to be used in statistical mechanics (but I am not a physicist). But there are clearly a lot of technical details which have been scrubbed away by the LW treatment.
the task the AI solves is writing test cases for finding the Least Common Multiple modulo a number.
Looking at the image of the prompt, it looks more like a CRT computation to me.
It’s famously much easier to verify modulo arithmetic than it is to actually compute it.
It's not particularly difficult to compute CRT, though it is definitely trivial to verify the result afterwards. I'm not sure I'd agree that that's a general fact about modular arithmetic computations though.
The article is very poorly written, but here's an explanation of what they're saying. An "inductive Turing machine" is a Turing machine which is allowed to run forever, but for each cell of the output tape there eventually comes a time after which it never modifies that cell again. We consider the machine's output to be the sequence of eventual limiting values of the cells. Such a machine is strictly more powerful than Turing machines in that it can compute more functions than just recursive ones. In fact it's an easy exercise to show that a function is computable by such a machine iff it is "limit computable", meaning it is the pointwise limit of a sequence of recursive functions. Limit computable functions have been well studied in mainstream computer science, whereas "inductive Turing machines" seem to mostly be used by people who want to have weird pointless arguments about the Church-Turing thesis.
are we really clutching our pearls because someone named themselves after a demon