I read the paper. It asserts a hell of a lot. For evidence it points only to other papers that do not IMO make claims as solid as are taken by these authors.
I suspect with the right mix of skills, tools, and orchestration, a frontier level LLM could review code as well as human. But this handwavey paper makes no such detailed architectural description and provides scant data for such conclusions.
In short this paper is hypetrash, quite apart from the state of the technology
I read the paper. It asserts a hell of a lot. For evidence it points only to other papers that do not IMO make claims as solid as are taken by these authors.
I suspect with the right mix of skills, tools, and orchestration, a frontier level LLM could review code as well as human. But this handwavey paper makes no such detailed architectural description and provides scant data for such conclusions.
In short this paper is hypetrash, quite apart from the state of the technology