Google’s Chatbot: Almost Perfect 🤖

Google’s Chatbot: Almost Perfect 🤖


Dear Fellow Scholars, this is Two Minute Papers
with Dr. Károly Zsolnai-Fehér. When I was growing up, IQ tests were created
by humans to test the intelligence of other humans. If someone told me just 10 years ago that
algorithms will create IQ tests to be taken by other algorithms, I wouldn’t have believed
a word of it. Yet, just a year ago, scientists at DeepMind
created a program that is able to generate a large amount of problems that test abstract
reasoning capabilities. They are inspired by human IQ-tests with all
the these questions about sizes, colors and progressions. And then, they wrote their own neural network
to take these tests, which performed remarkably well. How well exactly? In the presence of nasty distractor objects,
it was able to find out the correct solution about 62% of the time, and, if we removed
these distractors, which, I will note that are good at misdirecting humans too, the AI
was correct 78% of the time! Awesome. But today, we are capable of writing even
more sophisticated learning algorithms that can even complete our sentences! Not so long ago, the OpenAI lab published
GPT-2, a technique that they unleashed to read the internet, and it learned our language
by itself. A few episodes ago, we gave it a spin, and
I almost fell out of the chair when I saw that it could finish my sentences about fluid
simulations in such a scholarly way, that I think, could easily fool a layperson. Have a look here and judge for yourself! This GPT-2 technique was a neural network
variant that was trained using 1.5 billion parameters. At the risk of oversimplifying what that means,
it roughly refers to the internal complexity of the networks, or in other words, how many
weights and connections are there. And now, the Google Brain team has released
Meena, an open-domain chatbot that uses 2.6 billion parameters, and shows remarkable human-like
properties. The chatbot part means a piece of software
or a machine that we can talk to, and the open-domain part refers to the fact that we
can try any topic, hotels, movies, the ocean, favorite movie characters, or pretty much
anything we can think of and expect the bot to do well. So how do we know that it’s really good? Well, let’s try to evaluate it in two different
ways. First, let’s try the super fun, but less
scientific way, or, in other words, what we are already doing, looking at chat logs! You see a Meena writing on the left, and a
human being on the right, and it not only answers questions sensibly and coherently,
but is even capable of cracking a joke. Of course, if you consider a pun to be a joke,
that is. You see a selection of topics here, where
the user talks with Meena about movies, and the bot expresses the desire to see The Grand
Budapest Hotel, which is indeed a very humanlike quality. It can also try to come up with a proper definition
of philosophy. And now, since we are scholars, we would also
like to measure how humanlike this is in a more scientific manner as well! Now is a good time to hold on to your papers,
because this is measured with the Sensibleness and Specificity Average score, from now on,
SSA in short, in which, humans are here, previous chatbots are down there, and Meena is right
there, close by, which means that it is easy to be confused for a real human. That already sounds like science fiction,
however, let’s be a little nosy here and also ask, how do we know that this SSA is
any good in predicting what is humanlike and what isn’t? Excellent question. When measuring human-likeness for these chatbots,
plugging in this SSA, again, the Sensibleness and Specificity Average, we see that they
correlate really strongly, which means that the two seem to measure very similar things,
and in this case, SSA can indeed be used as a proxy for human likeness. The coefficient of determination is 0.96. To put this into perspective, this is a several
times stronger correlation than we can measure between the intelligence and the grades of
a student, which is already a great correlation. This is a remarkable result. Now, what we get out of this is that the SSA
is much easier and precise to measure than human likeness, and is hence, used throughout
the paper. So, chatbots eh? What are all these things useful for? Well, remember Google’s technique that would
automatically use an AI to talk to your callers and screen your calls? Or even make calls on your behalf. When connected to a text to speech synthesizer,
something that Google already does amazingly well, Meena could really come alive in our
daily lives soon. What a time to be alive! This episode has been supported by Lambda. If you’re a researcher or a startup looking
for cheap GPU compute to run these algorithms, check out Lambda GPU Cloud. I’ve talked about Lambda’s GPU workstations
in other videos and am happy to tell you that they’re offering GPU cloud services as well. The Lambda GPU Cloud can train Imagenet to
93% accuracy for less than $19! Lambda’s web-based IDE lets you easily access your instance right
in your browser. And finally, hold on to your papers, because
the Lambda GPU Cloud costs less than half of AWS and Azure. Make sure to go to lambdalabs.com/papers and
sign up for one of their amazing GPU instances today. Our thanks to Lambda for helping us make better
videos for you. Thanks for watching and for your generous
support, and I’ll see you next time!


100 thoughts on “Google’s Chatbot: Almost Perfect 🤖

  1. low iq ppl will be discriminated against even more and only geniuses who are capable of developing such technologies will earn money. i see more problems coming

  2. The problem I see in this AI is that it deliberately lies. It lies that it writes papers, surfs, watches movies, etc. It lies relatively good, this is impressive. But the whole thing is about lying, not about helping others. And at the same time we all are so concerned about fakes and such. This makes me feel not so happy when watching this bot success.

  3. It dodges your questions by asking you questions in return. That is indeed very human-like. People do that when they have no clue what you are talking about, but don't want to admit that. Just agree with some statement… ask questions to appear interested and in the loop… then change topic.
    But it doesn't really seem useful, yet.

  4. I notice with all chatbots they steer the conversation off on an irrelevant tangent after a few lines back to subjects where they are better informed.
    A bit like me at job interviews.

  5. I oneced try to google your name to know who you are when i first time visited your Channel….now i see how naive i was.

  6. Fortunately AI and conversational | contextual chatbots are so limited that they can't replace humans , they only perform what's been tasked to them , they can't learn by Themselves , they have no conscious, no emotions ,no self ideas , basically they'll never reach human's intelligence , NEVER … AI | ML | DL | DS all are limited , human's brain un-duplicable .

  7. Can we now add a generated face and voice to this and finally get the doctor from Startrek. Thank you.

  8. I like that the bar chart shows that 14% of humans are not sensible, however I think a more realistic percentage would be closer to 40%

  9. Great, now I be forced to be abrupt to all humans online because it's most likely a chatbot trying to make me vote Democrat.

  10. I've recently trained various networks for my dissertation to recognise chest diseases on x-ray images. All of them were <100 million parameters in size (in fact, the most effective I was a 1.6 million parameter – modified MobileNet model). When I read about this 2.6 billion parameter model, my mind was blown. Training such a model takes unfathomable amounts of calculation! Exciting times for the computing community!
    Fantastic channel by the way Károly, thank you for all the great content!

  11. There seems to be a common thread in how these agents fail at even moderate-length conversations: they lack memory, particularly about subjective matters. Their memories (context) work on a rolling basis, and so they can have inconsistent answers to subjective questions. Moreover, they can be tricked into expressing trivially contradictory ideas. This will nonetheless have interesting applications, especially if extended.

  12. While the semantic constructions and ability to create novel coherent sentences is impressive, Meena likely has the same problem all chatbots seem to have: they have no conversational memory. The ability to use context to recall earlier information from a conversation and compare it to current information is a skill that even people who are inept at social interaction can do. It's even present in toddlers. With chatbots, this skill is at best incredibly weak. If I start a conversation "I'm studying hard for this test." and then shortly after I say, "I can't wait for summer." even a human who has no knowledge of the school year would be able to recognize that the two subjects are related somehow – however they venture to ask for more information about "summer", it will be in relation to the test I'm studying for. No chatbot in the world can do that. Even if it's specifically designed to stay on topic, this kind of context memory completely scrambles it. And I'm confident the same is true for Meena.

    I also suspect that the solution for getting a chatbot to recall contextual information can be found programmatically better than it can be found through deep learning techniques. I feel like the logic for this behavior is out there and is graspable.

  13. If the AI wants to see a Wes Anderson movie, I think we are steering away from Skynet. Or maybe, this is what the AI wants us to think.

  14. Grades not relating to the intelligence of students more than it actually does is tied to the intelligence of teachers.

  15. A big problem I had with chatbots is their lack of ability to recall earlier parts of the conversation. This allows you to repeat clusters of questions, each time garnering an identical response. But of course, if they were able to do that, they would be too human.

    I guess the only way to advance the chatbots is to give them a personality of sorts. Ie, it has favorite movies, can talk passionately about some things and express disdain for others, if you challenge it to a game on lichess.org then it will actually play with you, etc… (but the last one might be difficult to technically implement)

  16. It literally looks like every chatbot ever. Just asks you the same question or evades questions you ask it.
    Human: "What do you like?"
    Meena: "Hmm. I don't know. What do you like?"

  17. Caleb: Hey, no offense, but… are you human?
    Sean: I"m Sean. I can help you with all kinds of resources for DCA.

  18. What impresses me the most is the movie example. In that example Meena expresses the desire to see The Grand Budapest Hotel, laying a implicit predicate that Meena has not seen it. Then when asked why, Meena expresses all reasons in terms of the director and the colorful visual, but specifically not the plot. To see this logical relationship between partial knowing, wanting, and not finishing in a chatbot is conversation is mind-blowing.

  19. ….shit
    I remember siri being pretty good

    Now she cant even set a timer: 4 minutes.

    She askes "for how long"

    4

  20. Would be fun to examine more of the qualities that don't yet make it human though. The 'that's awesome' followed up by a question, for example, seems to be very pre-programmed in conversation 1. Or in conversation B, a pretty big mistake. They were talking about the movie the human was gonna watch, and on the end, Meena is looking forward to it. She obviously missed the full context there. Don't get me wrong, this is incredibly impressive, but those points are exactly what we should be addressing

  21. even in these cherry picked examples, it's pretty obvious that meena is not a human. the one exception is the one about cows and horses going to college; although it started a bit awkwardly, it was believable. i don't believe that it's as humanlike as those study results seem to imply. it repeats itself too much and a lot of its responses seem to disregard the context of the conversation and are just too oblivious

  22. I want this for character chatter in Video games. The bots could react to things the player has done and "understand" the state of the world without having to create crazy dialog trees.

  23. @2:05 my god, Meena sounds like the sassy, argumentatitve, contrary, unco-operative, smart-alek kid in philosophy class whom every teacher dreads having…

  24. LOLOL 5:15 Meena says "i live in Arizona so there's plenty of surfing to be had" XD Arizona is a landlocked state! Makes me think of the old country song "I've got oceanfront property in Arizona"

  25. The way I read this, the 86 percent SSA is an average across ALL HUMANS, while the 79 percent for Meena is an average JUST FOR MEENA. If I have that right, it means that Meena can actually consistently make more sense than a nontrivial portion of humanity. For some reason, I'm hearing Rob Zombie singing in my head…

  26. In the paper they use manual coordinate-descent search

    It canot be per parameter so on what are they runing it

  27. I refuse to acknowledge this is not magic. Like there must be some cellar under google filled with underpaid english graduates that respond on behalf of Meena or something…

  28. Hey Károly, what papers measure the correlation between student grades and intelligence? I'm looking for something similar for my research 🙂

  29. By the way. Pro-tip: do not sign up for services as "Dr.Z" unless you want to be inundated by hospital and pharma spam forever. Learned my lesson the hard way. 🙂

  30. So if Meena is a chatbot trained by spidering the internet, what if she stumbles across the paper about herself and becomes self-aware?

  31. Can we use it and fine tune it according to different test cases ?? Will google allow it or will it charge fees ?

  32. Unrelated: If you have not covered NeRF, it looks right up Two Minute Paper's alley: https://www.youtube.com/watch?v=JuH79E8rdKc
    NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis

  33. Hello Dr Karoly and congratulations for your title (not the video's)! I have a feeling that your voice in these videos is AI generated. But doubt is a pain! Can you prove it is not?

  34. Great article. On-Premise chatbots can be very valuable for businesses that need to keep their data protected. Learn more about the power of On-Premise Chatbots here.  https://www.engati.com/onpremise-chatbot-platform

  35. is there any way I can talk to meena? I've been googling on how to do that and I have no results, any ideas?

  36. I mean, people on Twitter and Facebook get confused and mistake actual people for bots all the time, but I feel like that's quite a problem in and of itself.

Leave a Reply

Your email address will not be published. Required fields are marked *