Next.js App Router + React Server Components Demo

new
past
show
ask
show
jobs
submit

▲Provably unmasking malicious behavior through execution traces (arxiv.org)

42 points by PaulHoule 17 hours ago | 5 comments

thethirdone 15 hours ago [-]

Based on Table 1: This method is actually worse than generating a random number (0-100% independent of the program) and testing if it is less than 98.8%. That would achieve a better detection rate without increasing the false positive rate.

It doesn't seem worth it to try to follow the math to see if there is something interesting.

causalmodels 16 hours ago [-]

Interesting direction but the 98.8% FPR in Table 1 seems like a dealbreaker. Anyone understand what's going on with the contradictory results between the text and tables?

dwattttt 16 hours ago [-]

> Empirically, CTVP attains very good detection rates with reliable false positives

A novel use of the word "reliable"? Jokes aside, either they mean the FPR as the opposite of what you'd expect, the table is not representative of their approach, or they're just... really optimistic?

godelski 9 hours ago [-]

  >  Anyone understand what's going on with the contradictory results between the text and tables?

Well Figure 1 would also disagree. It shows a FPR of 47.5%.

From Sec 3, end of second to last paragraph

  | The protocol is deterministic given fixed RNG seeds, caches model outputs

by program hash, and *bounds false positives via the chosen percentile and gap parameters.*

I believe this is a choice, though I think it is suspect that the FPR is pushed this high to get the TP results.

Disclaimer: I only gave this a very cursory skim so don't rely on me too much

Joel_Mckay 11 hours ago [-]

"'Forbidden' AI Technique" (Computerphile)

https://www.youtube.com/watch?v=Xx4Tpsk_fnM

"The Hard Problem of Controlling Powerful AI Systems" (Computerphile)

https://www.youtube.com/watch?v=JAcwtV_bFp4

Attempting to guide statistical salience of LLM reasoning model procedures, usually just created an evasive interface facade in the output. =3

Rendered at 15:00:41 GMT+0000 (Coordinated Universal Time) with Vercel.