A rising variety of consultants have known as for these assessments to be ditched, saying they enhance AI hype and create “the phantasm that [AI language models] have larger capabilities than what actually exists.” Learn the complete story right here.
What stood out to me in Will’s story is that we all know remarkably little about how AI language fashions work and why they generate the issues they do. With these assessments, we’re making an attempt to measure and glorify their “intelligence” based mostly on their outputs, with out totally understanding how they operate underneath the hood.
Different highlights:
Our tendency to anthropomorphize makes this messy: “Individuals have been giving human intelligence assessments—IQ assessments and so forth—to machines because the very starting of AI,” says Melanie Mitchell, an artificial-intelligence researcher on the Santa Fe Institute in New Mexico. “The problem all through has been what it means if you check a machine like this. It doesn’t imply the identical factor that it means for a human.”
Children vs. GPT-3: Researchers on the College of California, Los Angeles, gave GPT-3 a narrative a few magical genie transferring jewels between two bottles after which requested it learn how to switch gumballs from one bowl to a different, utilizing objects comparable to a posterboard and a cardboard tube. The thought is that the story hints at methods to resolve the issue. GPT-3 proposed elaborate however mechanically nonsensical options. “That is the kind of factor that youngsters can simply resolve,” says Taylor Webb, one of many researchers.
AI language fashions usually are not people: “With giant language fashions producing textual content that appears so human-like, it’s tempting to imagine that human psychology assessments might be helpful for evaluating them. However that’s not true: human psychology assessments depend on many assumptions that will not maintain for big language fashions,” says Laura Weidinger, a senior analysis scientist at Google DeepMind.
Classes from the animal kingdom: Lucy Cheke, a psychologist on the College of Cambridge, UK, suggests AI researchers may adapt strategies used to check animals, which have been developed to keep away from leaping to conclusions based mostly on human bias.
No one is aware of how language fashions work: “I feel that the basic drawback is that we preserve specializing in check outcomes fairly than the way you go the assessments,” says Tomer Ullman, a cognitive scientist at Harvard College.