Saying Goodbye To The Kojo Nnamdi Show
On this last episode, we look back on 23 years of joyous, difficult and always informative conversation.
It’s the holy grail of computing: teaching computers to think the way humans do. The pioneer of “deep learning” says the key is to mimic the brain’s system of neural networks. Now Geoffrey Hinton is taking his work inside Google to help improve Android’s voice search and work on advancing language interpretation so computers can begin to interpret our musings and ramblings online. Tech Tuesday explores how computers learn and where the next breakthroughs will be.
MR. KOJO NNAMDIFrom WAMU 88.5 at American University in Washington, welcome to "The Kojo Nnamdi Show," connecting your neighborhood with the world. It's "Tech Tuesday." For a long time, the concept of artificial intelligence seemed futuristic and a little bit scary. What would it mean to have machines behaving like humans? Turns out, it's fun. So far, anyway. We can ask our phones to give us sports scores and restaurant reviews and directions. And they're happy to oblige. Some of this seemingly new machine intelligence is coming from the marriage of an old concept.
MR. KOJO NNAMDIAnd fast new computer processors. The concept that's hot in computer science today is called "deep learning." The idea that computers can mimic the human brain's web of neuro networks to make sense of increasingly complex data. Google is buying up companies that work on "deep learning" and bringing their talent in house, as are its competitors. And research universities are delving into deep learning as well. But, how far will artificial intelligence take us? Will computers ever really learn to understand spoken or written language as well as their human creators?
MR. KOJO NNAMDIJoining us to discuss this is Philip Resnik. He is a Professor in the Department of Linguistics and Institute for Advanced Computer Studies at the University of Maryland, and the Founder of React Labs. Philip Resnik, good to see you again.
MR. PHILIP RESNIKThanks. Great to be back.
NNAMDIJoining us from a Canadian Broadcasting Company studio in Toronto is Geoffrey Hinton. He is a distinguished Professor Computer Science at the University of Toronto, and distinguished Researcher at Google. Geoffrey Hinton, thank you for joining us.
MR. GEOFFREY HINTONThank you for inviting me.
NNAMDIAnd joining us by phone from Stanford is Richard Socher, Stanford University PhD student in computer science and developer of the Neural Analysis of Sentiment algorithm. Richard Socher, thank you for joining us.
MR. RICHARD SOCHERHi, and thanks for having me.
NNAMDIHey, if you'd like to be a part of the conversation, give us a call at 800-433-8850. How would you rate your phone's voice recognition capabilities? How well does it understand what you're asking? 800-433-8850. Send us a tweet at kojoshow or email to kojo@wamu.org. Geoffrey Hinton, I'll start with you, and with the news. Over the weekend, we learned that Google is paying 400 million dollars for an artificial intelligence company called Deep Mind, founded by a neuroscientist.
NNAMDIWhat capabilities will Google get with this start up?
HINTONWell, Deep Mind is -- has a lot of expertise in what's called reinforcement learning. And what that means is, when you learn to ride a bicycle, for example, you don't get detailed feedback on how you should be twitching all your muscles. What you get is feedback when you fall off, that says you did something wrong in the past, but you don't know what. And you have to figure out how you should twitch your muscles so that you don't fall off. That's reinforcement learning. And they're very good at doing that kind of learning. They've been applying it recently to learning old computer games.
HINTONSo, they can just look at the pixels on the screen, and look at the rewards you get, in terms of scores, and figure out what to do with the buttons so that they do well at the game.
NNAMDIYou've been working for decades on teaching computers to think like humans by mimicking the neural networks found in the brain. Explain the concept behind deep learning for computers.
HINTONSo, most standard patent recognition, until very recently, worked by the -- a computer programmer, or a scientist, figuring out what the good features were. So, if you want to recognize birds in an image, for example, someone might figure out that a beak is a good thing to look for. So, a beak is like two fairly straight lines that join at a sharp angle. And if you find that in an image, that's some evidence for a bird. And the way patent recognition would work is, people would hand design a whole bunch of features like that, and then the computer would figure out how much weight to put on each of those features.
HINTONIt might decide that beak's a pretty good feature, but other features you thought were good, actually, don't really discriminate very well. So, that's old fashioned patent recognition. The new advance is to get the computer to also learn what features it should use. So, now, in deep learning, what happens is the computer just looks at the pixels, and it tries to figure out what features it should extract from those pixels. And then from those features attract more complicated features, and so on, until it has really complicated features that allow it to recognize birds.
HINTONAnd all that's done by the computer, so it only needs the pixels to come in, plus information about what the right category is.
NNAMDIWell, one place where deep learning has had a big impact is voice recognition. You can ask an android phone or an iPhone a question. It will give you an answer. How does the neural network model help the computer recognize and understand human speech, Geoff?
HINTONThere's actually two ways neural networks are relevant there. Probably the most significant one is, when I talk, I produce a sound wave. You can do some initial analysis on that sound wave, using something called (word?) analysis. And then you get something called a spectrogram. And a spectrogram tells you just how much energy there is at each frequency, at each moment in time. Then you have to look at the spectrogram, so it's now like a vision problem, and you have to figure out what phoneme is the person trying to say?
HINTONSo, phonemes are things like bah, and duh, and ooo. And you have to figure out from the spectrogram which phonemes they're probably trying to say. Of course, it's all quite noisy, so that's tricky. And neural nets are used to take things like the spectrogram and make bets about which piece of which phoneme is being said. Once you've made a whole set of bets like that, often thousands of bets, then, a later stage looks at the whole sequence of bets you've made and decides which is the most plausible sequence of phonemes.
HINTONAnd to do that, you need a model of what sequences of phonemes are likely to occur in this language? And you can also use neural nets for that. Although, they're not -- I don't think they're widely used in practice for that stage yet, but they will be. And the neural nets are good at predicting what are good sequences of phonemes.
NNAMDIIn case you're just joining us, it's a "Tech Tuesday" conversation on deep learning, teaching computers to think like humans. We've been talking with Geoffrey Hinton, distinguished Professor of Computer Science at the University of Toronto. And distinguished Researcher at Google. Joining us in our Washington studio is Philip Resnik. He is a Professor in the Department of Linguistics and the Institute for Advanced Computer Studies at the University of Maryland. He's also the Founder of React Labs.
NNAMDIAnd Richard Socher is a Stanford University PhD student in computer science. He's the developer of the Neural Analysis of Sentiment algorithm, which we'll be talking about shortly. If you'd like to talk with us, give us a call at 800-433-8850. Richard, as I said, you've developed an algorithm that lets a computer analyze the sentiment expressed in a sentence. To decide whether it's positive or negative or, well, somewhere in between. Explain what your algorithm does.
SOCHERSo, the main insight was that -- so my main motivation was to use the kinds of techniques that Geoff had developed and a lot of other people, but I realized that if we wanted to use them for language, they wouldn't quite work, because they look at sort of fixed input -- a fixed, you know, image size or fixed sound wave chunk. And language and sentences can have variable lengths, and we're trying to capture sort of the grammatical structure of language also, and how words combine to longer, to more complex meanings.
SOCHERAnd so, the algorithm that we developed uses ideas based on recursive neural networks, where we combine, for single words and then longer and longer phrases, try to abstract and get to the meaning of the whole sentence. And so, the algorithm basically works by looking at the grammatical structure of the sentence, and then I realize you basically learn representations and features, like Geoff described for images, you learn these features for words. But then you also learn how these features of single words combine into longer phrases, and then you can basically classify, similar to, again, in the image world, now that you have sort of a feature of ecto representation.
SOCHERFor each word and for the phrases, you can combine them and then classify that, for instance, good is a positive word, but not very good, for instance, is very positive, then once I combine not with the phrase of very good, then the algorithm learned, from examples, that not very good becomes probably more neutral or negative.
NNAMDIMore about that later, but Philip Resnik, the next frontier for deep learning is teaching computers to understand not only the sentiment in a sentence, but to understand natural language. Why is it so hard to teach a computer to understand what we mean when we talk or write?
RESNIKWell, the short answer to that is that language is very complicated. The longer answer to that, without going too long, is to say that language involves a great deal that just isn't apparent in the input itself. Right? So, back in the 50s and 60s, Chomsky had sort of a first revolution in the study of language, whatever you think of the assumptions he brought into it, he really firmly established the idea that you gotta do more than just look at the stimulus and the response. There's something in the middle.
RESNIKThere was a second revolution around, you know 1990 or so, where people who do computational work with language recognized that can't build in knowledge about language by programming it into a computer by hand. That's what Geoff referred to when he was talking about this sort of, the hand designed features. But there was still a very, very big gap, because the observables in language, that you get when you say are looking at text, there is internal structure that's just not apparent there.
RESNIKTo the computer, the word orange and the word banana are about as related to each other as the word orange and the word staircase. And yet, in order to properly recognize language, and I wanna be very cautious about words like understanding and teaching computers to think, because that sets expectations that you wanna be very, very careful about. But that said, in order to make the next big advance, you needed a way of getting computers to have a more internal, abstract representation of what the units of language are, and how you put them together.
RESNIKAnd neural networks have been around for a very long time, but the revolution that's -- a third revolution that's in the process of taking place here, and by the way, it's very, very cool to be here with Geoff, who's like a true pioneer in this. And Richard, who's a great representative of like, the emerging generation of it, at the same time. What's happened here is they have figured out computationally effective ways of getting from the raw data to structured representations that actually have something closer to what we think of as meaning, rather than just viewing these things as opaque block like objects with no sense of how they should be combined.
NNAMDIBecause when I look at a sentence and understand it, I am bringing to that sentence the knowledge of everything I know in the world.
RESNIKAbsolutely. In fact, language understanding, what you and I are doing, is really less about language than it is about our knowledge of the world. When you get a funny sentence like, Iraqi head seeks arms, right, you notice that this is funny, right, because we know something about how the world works, and that this is an unexpected sense of these words. It's that kind of world knowledge that we've had an enormous difficulty, over the decades, of getting into the system, but by being able to observe huge quantities of data, and use it effectively, the deep learning approach is making some real strides.
NNAMDIIraqi head seeks arms sounds like a headline I'd find in The Onion. How do the rise of social media and the quirks of online communication make the task even harder?
RESNIKWell, language, even prior to social media, was an enormously variable phenomenon. If you look at the number of different ways that things can be put together, it's astonishingly variable. You take a simple sentence like, I see a bird, right, and you say, OK, well, if you did stuff in grammar school, you've got I as a pronoun, and see is a verb, and so forth. But, if you look at it and say, well, I could be a letter of the alphabet, or a pronoun. That's two possibilities. C could be the Holy See. So, that could be a noun or a verb.
RESNIKYou do that -- just two possibilities per word in a four word sentence, now you've got 16 possibilities. Right? And so, social media takes this yet another step by introducing huge quantities of additional variability that don't look like a lot of the stuff we've seen in the past with edited kinds of text. In many respects, it's a lot more like conversation. And then you have this pressure to keep things short, in things like texting and Twitter. And now you're relying even more on all of that world knowledge by encoding stuff in a way that you have to rely on the world knowledge, rather than what's visible on the page.
NNAMDIThe letter I, letter C, letter A, B, R, D. I see a bird. Thank you very for joining us. You're on the air. Go ahead, please. We're talking with Rachel in Washington, D.C. Rachel, you're on the air. Go ahead, please.
RACHELOh, hi Kojo. Thank you for having me. Yeah, like I told the producer, I guess, you know, I'm a new graduate with a Master's in Electrical Engineering, and my area of concentration is signals and image processing. And I do a lot of work independently.
NNAMDIYes. Image processing. You work independently. You're also conducting several conversations at the same time.
RACHELYeah, I work independently. I do a lot of research on my own, using neural networks. And my question is, is any of your guests interested in hiring somebody like me, because I'm job hunting.
NNAMDIYou know, I never realized that we were a job referential service here before.
RACHELNo, I know you're not.
NNAMDIBut you figured you'd try, anyway, huh?
RACHELYeah.
NNAMDIWell, thank you very much for your call. I don't think we can respond to it at this point, Irene, but good luck to you. Geoff Hinton, how do you see the challenges of teaching computers to interpret natural language and where's the best place to start?
HINTONI think it's a very hard problem. It's very hard to predict how long it's gonna be before we can do it really well. Obviously, companies like Google are very interested in that, so they can produce better search results. I think Richard Socher started in a very good place. He took something that really does involve some understanding, but doesn't involve full understanding. So, in sentiment analysis, figuring out whether a movie review is positive or negative, all you have to do is come away with whether it's positive or neutral or negative.
HINTONYou don't have to really understand everything it said about the movie. And, with most of these very difficult scientific endeavors, the right approach is to take something that's simpler than the final task, but nevertheless contains the essence of it. And in sentiment analysis, you've got the essence of recursion. That is, you have to be able to take the sentiment of a piece of a sentence, and figure out how that combines with words like although or but or notwithstanding to get the sentiments of a larger chunk. And I think that's a very good way to start applying neural networks to language understanding.
NNAMDIGotta take a short break. When we come back, we'll continue our "Tech Tuesday" conversation on deep learning, teaching computers to think like humans. A conversation we're encouraging you to join. Do you think computers will be able to interpret human emotions in the messages and reviews we post online? Give us a call. 800-433-8850. Do you think artificial intelligence, computers learning to think, will change the way we live in the future? You can also shoot us an email to kojo@wamu.org or a tweet at kojoshow, using the hash tag tech Tuesday. I'm Kojo Nnamdi.
NNAMDIWelcome back. It's "Tech Tuesday." We're talking with Richard Socher. He is a Stanford University PhD student in Computer Science and the developer of the Neural Analysis of Sentiment algorithm. He joins us, by phone, from Stanford. Joining us from CBC Studios in Toronto is Geoffrey Hinton, distinguished Professor of Computer Science at the University of Toronto and distinguished researcher at Google. And here, in our Washington studio, is Philip Resnik, Professor in the Department of Linguistics and Institute For Advanced Computer Studies at the University of Maryland. And Founder of React Labs.
NNAMDIGeoff, another area that's using deep learning is object recognition. How do computers mimic the brain's neural networks to recognize patterns in an object and decide their significance?
HINTONSo, the way it works best, at present, is to get a big database of images, which have been hand labeled. So, for each image, you know what the prominent object in the image is, or you know at least one of the prominent objects in the image. And you now train a neural network to try and reproduce those hand labels. And the way the neural network works is it takes pixels as input. It then has many different layers of feature detectors, all of which learn, and as you go up through the network, you get more and more complex features.
HINTONAnd it tries to guess the right answer. And when it gets it wrong, you send a message backwards through the network telling it to revise all of its feature detectors a little bit. So, for example, if there was a bird in the image, and it had a low level feature detector for beak, but the low level feature detector didn't fire, because it was a slightly curved beak, and it wasn't used to that, then it probably wouldn't recognize the bird. And when you send the message backwards, saying all those feature detectors that are good for birds, please get a bit more active, then the beak detector will figure out that, OK, maybe I should get active even if it's a slightly curved.
HINTONAnd then the system will work better.
NNAMDIHow are computers able to recognize particular objects in photos? If I ask the computer to find all the jewelry in a photo posted on the web, what steps does it take to figure that out, Geoff?
HINTONOK. I'll try and give a quick overview of what happens. At the front end of the system, it's gonna have a large number of feature detectors, but they don't all need to be different. All you need to do is detect the same feature in different parts of the image. So, for example, the beak of a bird might be anywhere in the image, and you need something that will detect those straight lines so that later you can detect the beak. And so what the neural networks do is called convolutional neural networks, that were developed by Yann LeCun. What they do is they have feature detectors that are copies of each other all over the image. And they learn those feature detectors, but there aren't very many different types of them.
HINTONThey'll typically detect things like little pieces of straight edge. Then at the next level up, they have copies of feature detectors too, but these are a bit more complicated. They might detect little conjunctions of straight edges, like the beak of a bird, and so on. And you go up through the network, making copies of the feature detectors in all the different positions, until you figure you've done enough of that, and then you have a few layers of neurons that just try and put things together and look over the whole image. And try and find familiar combinations of features.
HINTONAnd from those, they try and recognize the object, so one might have found a beak, another might have found a wing, another might have found something it's sure is a feather. And if you see all those three, it's a pretty good bet it's a bird.
NNAMDIOne of the goals of deep learning is for computers to be able to recognize and categorize data without a person first having to label each image or word or object. Philip Resnik, I'll start with you, but I'd like to go around on this one. Why is there so much interest in removing the human from the equation?
RESNIKWell, we've known for a long time, especially in language processing, that the algorithms that work best have a specific task, and they're the best when you're able to train them on a big collection of the right answers. These would be the, for example, the labeled photographs that Geoff was talking about. Or sentences that have a label saying, yes, this is positive or negative. The challenge here is that getting labels is very, very difficult, potentially very, very expensive. If you want a computer to do a task, you have to actually give it lots and lots of data to train on, and that can be a real bottleneck.
RESNIKThere are ways of making that cheaper. You can crowd source that, and, you know, for pennies a task on Amazon's mechanical Turk marketplace, you can get people to do it. Or you can use found data. So, for example, in machine translation, every time the United Nations releases something in two different languages, you now have correct answers to the question of, how do I translate this sentence from language A to language B? But, the real advance that needs to take place is one where you can learn more general purpose representations that are effective across a variety of tasks.
RESNIKAnd there are ways that people are exploring that, but until we can move much further in that direction, and I think it's gonna take quite a long time to really get there, what we think of when we say understanding, has to be in scare quotes. Because Watson, on Jeopardy, Siri, Google translate, these are not understanding anything in anything resembling the sense that you and I usually mean when we say understanding. They're using algorithms and representations and learning to perform a task. We need to be able to get away from you need answers to the task you're trying to get it to do, to a more general representation.
RESNIKSo that they can learn to put things together in new ways, and generalize to tasks they haven't seen before.
NNAMDIRichard, Socher, same question to you. So much interest in removing humans from the equation.
SOCHERYeah, so, I want to pick up, first, on two things that Philip said, which is that not all information is in language, and that we want to learn these general representations, and then connect that to your question. Basically, that led us to some interesting work on grounding meaning, trying to connect the meaning of words to, for instance, their visual counterparts, or the factual knowledge that we have. And that sort of, that kind of, model that can learn to connect that I may have a plateau or it's a bank, and the bank could be a sand thing, or it could be the bank where I put my money in.
SOCHERAnd, you know, understanding the meaning of words, but then also what that actually means that I can, you know, a bank is connected because it's owned by a certain company if you go to a specific instance for banks. So, connecting these representations to factual knowledge that we have in databases, but also to visual knowledge that we could have in the visual domain. Models that Geoff and other people developed, for instance, at Google. And if we have these kind of representations, then we can, you know, not necessarily get humans out of the loop. I think everybody wants to keep humans in the loop on sort of important and creative tasks.
SOCHERBut, you know, make our lives easier. If a company can go through all of the Twitter feeds of everybody on the internet, or that's using Twitter, and then can automatically analyze what people are saying about that company or about their product that they just released, and so on, that would be immensely useful. And then you don't have to have humans sit there and read every single tweet and to make a judgment whether the marketing campaign worked well or whether you should maybe trade certain stock very quickly, and so on.
NNAMDIGeoffrey Hinton, anything you'd like to -- oh, go ahead, please, Richard.
SOCHERTrying to make our lives easier, in many ways.
NNAMDIGeoffrey Hinton, anything you'd like to add to that?
HINTONUh, yeah. A couple of things. I think one example of why you can't have people label everything is if you consider YouTube. So, there's many hours of YouTube video uploaded every minute. You couldn't possibly keep up with that, if you wanted to label all the frames in a YouTube video. But it would be nice for a computer to understand what was happening in YouTube videos. So, you could, for example, say find me a video where someone walks into a lamp post, because they're attention was distracted.
HINTONNow, Google would love to be able to do that, so it can search YouTube videos better. And you have to ask, how's it going to learn to do that if nobody's telling it what's happening in the videos? So one idea that's been around for a long time, and is actually being pushed very hard by someone called Jeff Hawkins, is to try and predict the next frame in the video. So, that's an example of a task that doesn't require a human laborer. What you want is, from the video, so far, see if you can predict exactly what's gonna happen in the next frame.
HINTONIn order to do that well, you need to understand what's going on in the video. And so that's a signal you could use for learning.
NNAMDIHere is Phil in Fairfax, Virginia. Phil, you're on the air. Go ahead, please.
PHILHi, Kojo. Thanks for taking my call. Great show today. If we were to assume that some of the breakthroughs that have been discussed so far come to pass, and now we've got a computer that does really, really well with the tasks that we're giving it in English, how hard is it gonna be to go to the next language? And is something like German gonna be easier than Arabic or Vietnamese? I'm assuming it's more than just pouring in additional vocabulary, but I'd like to hear your guests' thoughts on that. I'll take it off the air.
NNAMDIPhilip Resnik.
RESNIKThanks, Phil. That's a great question. Dealing with multiple languages is, indeed, a challenge, and there are inherent reasons for that, having just to do with the differences between language and the complexities of them. But there's also a practical reason for that, and that is that for many of these approaches, the ones that work best, they -- a big piece of what makes them work is by throwing a lot of data at them. So, if you're dealing with language, you're going to do best when you have lots and lots of data to throw at it.
RESNIKWhat this means is that the majority languages of the world, or the languages that have strategic or economic importance, like, say Chinese and Arabic, are languages where there's a lot of resources available to try to build systems. When you get to less frequently spoken languages, when you get all the way to minority languages, within natural language processing, as it's been practiced up until now. And I think, and Geoff and Richard can confirm or comment, within deep learning approaches, the challenge is going to be exacerbated.
RESNIKBecause you simply have less data to work with.
NNAMDIGeoff?
HINTONWell, one thing I can say about that is, not so much in terms of understanding the meaning of the language, but in actually hearing what's being said, you have the same problem for recognition, where there are some languages where there isn't much data. And researchers at Google and other places have discovered that it works very well to build a system that will recognize English, for example, and then to use the same low level features for recognizing much rarer languages.
HINTONSo, what you do is you simply take the system that was built for recognizing English words, you strip off -- or English phonemes, you strip off the top layers, replace them by top layers for some new language, but you can reuse all the low level features. And -- or you can train a net to do all languages at once. In which case, the languages that don't have much data won't have much say about what the low level features should be, but they'll get to bootstrap on the back of the languages with a lot of data. And that works much better for languages where there's not much data.
NNAMDIRichard?
SOCHERYes, I think there's a certain transfer knowledge ideas that we're also exploring at Stanford, where we tried to sort of initialize representations that we learned for single words in English. And then use just a little bit of machine translation or, you know, these biotechs that Philip mentioned, where, you know, they can begin -- Parliament, for instance, has, you know, English and French transcriptions, and things like that. And then tried to initialize the vectors or the representations of words, in the other language.
SOCHERBut I agree with Philip that, in general, it's very hard for most machine learning techniques to try to learn from data, to learn from very little data. And that's certainly one of the open research problems.
NNAMDIWe got an email from Robin, who said, my father, one of the first systems analysts, used to tell the true story of a team at one of the Southern California Universities in the 60s, that attempted to develop a program to translate English into Russian and vice versa. After months of debugging, they finally got the program up and running. They were very excited to test it, but realized they had no immediate access to a Russian speaker. Finally, they decided to input a phrase in English and translate it to Russian and back to English.
NNAMDIThey thought that if it came back the same way it went in, the program would be a success. They inputted the phrase, out of sight, out of mind. Translated it to Russian and back into English. It came back, invisible idiot. That's why computers, at least back then, couldn't be translating machines. Philip, talk about the difference between supervised and unsupervised learning for computers. How does each pose a challenge for teaching computers to think like humans?
RESNIKI was -- I'll answer that, but I also have to comment on the email.
NNAMDIPlease do.
RESNIKIt's the exact same story, but there's another, more classic punch line to that, which is that the scientists put in the -- the spirit is willing, but the flesh is weak, and out came back the vodka is good, but the meat is rotten.
NNAMDIMakes sense to me.
RESNIKSo, supervised and unsupervised learning. So, there's really sort of four points on a spectrum. At one end of the spectrum, you manually craft, build in your rules about whatever process it is you're trying to get the computer to do. This is the sort of 1980s artificial intelligence approach where you build things entirely by hand. Then, there is a, what is called supervised learning, which we've been referring to. The idea of supervision. It's like having a teacher. It's somebody who tells you what the right answer is, so that you can learn from it. And earlier in the conversation, there was discussion about taking signal about the mistakes you made, whether it's falling off a bike or mispredicting the next frame in a video, and being able then to update the neural network so that it can learn to do better the next time. At the far end of the spectrum is unsupervised learning, which is essentially the idea that the system is learning to represent what's out there in the world without having a specific correct answer given to it.
RESNIKSo people in the listening audience may be familiar with data clustering, where you actually have lots of points in a graph and systems automatically identify this region of things looks like it's dense with points. I think there must be something significant about it. That's a kind of unsupervised process.
RESNIKAnd, finally, in the middle of those last two is the notion of semi-supervised learning, and that gets back a bit to what Richard was talking about because semi-supervised learning is about having small amounts of labeled data and then using large, ideally vast, amounts of unlabeled data and getting leverage from both of those. And so one of the big innovations that I'm seeing in the deep learning community is this idea of unsupervised training of features, in other words, kind of self-organizing ways of representing what's there in the world, whether it's the word orange or the concept of a bird.
RESNIKBut then taking those representations and applying them in supervised problems, where, as Geoff points out, it's very useful to have a specific task so that you can actually update the neural network, which you can think of intuitively as the idea of updating, you know, the weights of connections between neurons and the brain so that it can perform better at that specific task.
NNAMDIYou mentioned earlier Canadian government parliamentary proceedings in both French and English. Talk about the concept of found data. How does it help computers learn?
RESNIKWell, so, as we were talking about a little bit earlier, it can be difficult to get data to learn from, labeled training data in the jargon. And Geoff mentioned that there are workarounds for this, for example, formulating problems. So, for example, in speech, what the system is learning to do is to try to predict the next word that I'm going to say. And then it can just look at lots and lots of speech or lots and lots of text. No human has to go and label things.
RESNIKFor many real world tasks, what we're trying to do is get the computer to do something that humans do anyway. For example, in machine translation, there are people worldwide right now, and they think that what they're doing is producing translations for some customer. And what I think they're doing is building me training data for my machine translation system.
RESNIKSimilarly in medical coding, you know, when you go to the doctor and they have the, you know, procedure or diagnosis code that goes to the insurance company to bill it, there's a huge industry in this country of people who are manually reading through a doctor's dictations and deciding what the billing code should be.
RESNIKWell, it turns out that, especially with new developments in the medical world, we're going to need a lot of help from computers to do this task over the next year or so. And what those medical coders are doing, they think that what they're doing is helping the hospital revenue cycle. And what I think they're doing is providing training data. That's all found data.
NNAMDIGot to take a short break. When we come back, we'll continue this conversation. If you have called, 800-433-8850, we'll get to your call. You can also send email to kojo@wamu.org. What would you like to see a computer learn to do? What do you think computers can learn from the way the human brain works? 800-433-8850. I'm Kojo Nnamdi.
NNAMDIIt's Tech Tuesday. We're discussing deep learning, teaching computers to think like humans with Philip Resnik, professor in the Department of Linguistics and Institute for Advanced Computer Studies at the University of Maryland, and founder of React Labs. Geoffrey Hinton is distinguished professor of computer science at the University of Toronto and distinguished researcher at Google.
NNAMDIAnd Richard Socher is at Stanford University. He's a PhD student in computer science and developer of the Neural Analysis of Sentiment algorithm. Richard, you're essentially crowdsourcing your effort to teach your algorithm the ways of the world, setting up an online demo and letting users try it out and make corrections. How is that demo doing?
SOCHERExcellent, I think. What's fun about it -- basically, this is for -- the people not familiar, if you can go to nlp.stanford.edu/sentiment, you can basically play around with our algorithm, this Neural Analysis of Sentiment algorithm. And what's fun is that people, you know, can actually see how it analyzes a sentence. And a lot of people have fun trying to break it. And it's just sort of what Philip brought up before the break. That is essentially giving us new training data when people are, you know, trying to break it.
SOCHERAnd they will come up with very interesting cases. One person said, this movie is like poking my eyes out with a rusty nail. And our algorithm had never seen the word rusty -- in its, you know, training time before. And so it didn't really know what it means to have eyes and what it means to be poking them out and that that's probably negative.
SOCHERSo it just said this is neutral, basically giving up and not knowing what's going on. But once we saw that sentence, we can now go in quickly, either crowdsource that to label or even use the label so people can provide insights, interface, and then train it and give that as additional training data. And that has allowed us to sort of come up with lots of better improvements and basically allow the algorithm to learn more and come up with interesting new ways.
SOCHERFor instance, it said -- has also never seen the negation of no one. So if I say no one liked this movie, you know, that's clearly negative. But if you've never seen no one with something in some negative context of any sort, then it's going to be hard to predict that that's negative. And so lots of interesting new training examples. It also has shown us, in terms of mere research directions, what we cannot yet do.
SOCHERSo one person, for instance, said, oh, why isn't this sentence negative? The iPhone screen is better than the Android. And, you know, you think about it, and you realize, well, clearly, that is actually ambiguous, right? It's positive for one of the phones and negative for the other. And so trying to, you know, incorporate those kinds of ideas is what we're now working on also.
NNAMDIGeoff, before I go back to the phones, deep learning has improved in recent years as computer processors have gotten faster and more powerful. Can you explain why you need all that computing power?
HINTONSo in the human brain, you have trillions of connections. And most of them are changing their strengths, and that's how you learn things. So your knowledge is in the strengths of trillions of connections. And if we want to mimic something that powerful in a computer, we need to have trillions of weights that we're changing. And that just requires a lot of compute power. And it turns out that the boards that were developed for computer graphics are actually very good at doing the kinds of computations we want.
HINTONSo curiously, governments put lots of work into developing super computers. But the thing that really paid off for us was not that but the work that companies put in to trying to satisfy the needs of people playing computer games to render the graphics very fast. And the same computations, or very similar computations, get used for learning. And so we can use those boards, which are much cheaper than super computers, and very effective for what we're doing.
NNAMDIHere now is Lee in Waldor, Md. Lee, you're on the air. Go ahead, please.
LEEHi, Kojo. Thanks. It's a fabulous program. I have a down to earth real world question. I've been active for many years on a political forum that Amazon has. And it's a spirited exchange of different views. But recently we've had some odd people creep in who appear to be able to declare what may be a paid sentiment, because it's always very consistently conservative, and right in the middle of one of the conversations, the guy blurted out, I just picked up a fault in the AE35 unit. It's going to 100 percent failure in 72 hours. I'm...
NNAMDIWhat was that?
LEEI have no idea. He's been asked repeatedly to explain it. He's apparently not programmed with a response to that kind of question.
NNAMDII guess not.
LEEAnd so I looked up sentiment analysis in Wikipedia and read it -- breezed through it as I was looking. And I -- you know, the influence industry is probably the biggest in the country and probably makes the most money, and it's probably the most deeply invested in getting past the skepticism about advertising and creating sort of a cyber word of mouth, which was always the goal standard. And I'm wondering to what extent sentiment analysis is penetrating online communication.
NNAMDIRichard Socher.
SOCHERTo what extent is sentiment analysis penetrating human -- or I guess online communication? I think the biggest application right now is probably in trying to aggregate information about what people like online. So on Twitter, companies want to know, are people saying positive things about their new product or about a launch? Is there some kind of indication that a certain new ad campaign is improving, you know, the general image that the public has about a certain product or company?
SOCHERAnd then restaurant review pages, in general, review pages on Amazon or Yelp try to sort of aggregate and give you a summary of what people really liked about a certain restaurant or about a certain national park and so on, so you know, can you immediately go there and get a summary of, like, don't go there, eat this, and things like that. So I think those two are probably the biggest impacts.
NNAMDIGeoff, you're a pioneer in the field of artificial intelligence and the effort to teach computers to mimic the human brain. You started this work in the early 1980s and pressed on for three decades while the rest of mainstream academia apparently was not that interested in your work. What made you persevere?
HINTONI just thought it was obviously the way we had to go if we wanted to compute with the human brain. If you're dealing with a system that has trillions of numbers inside it and the knowledge is encoded in those trillions of numbers, you're not going to be able to get similar performance out of something that only has a million numbers in it. And so it just seemed to me evident that you could never program stuff in -- you could never program in enough to have common sense or to be able to do object recognition where there might be 100,000 different things you could recognize.
HINTONSo it had to be a learning algorithm, and it had to be operating in a system that had a huge number of weights. So for me it just always seemed like the only possible approach. I guess I was lucky. It's beginning to work.
NNAMDIThink there's more than luck involved here. Here's Donna in Washington, Va. Donna, you're on the air. Go ahead, please.
DONNAThank you. My question is, how does a computer use voice recognition to arrive at the correct context interpret the real intent of our natural language without being misled or even polluted by all the sort of junk or filler words we use in our daily language, such as you know, I mean, or the ubiquitous word, like? It seems that all of those sounds would somehow mislead the final result. So that's my question. I'll hang up and listen to you for...
NNAMDIThank you for your call. Philip Resnik?
RESNIKWell, so the first thing to say is that computers are only so good at recognizing intent. If you look at a system like Siri, for example, it's got a set of things that it's very, very good at. And then it's got a -- the rest of the world, and perhaps the greatest cleverness in that system is recognizing that people are going to stray into those parts of the world that the system isn't good at and pre-programming a whole bunch of really clever responses.
RESNIKSo when people say, hey, Siri, would you like to go on a date with me, it actually has something to say rather than there's been a failure in the AE35 unit or whatever it was. The question here really is one about naturally occurring language as opposed to carefully edited language. And that does -- that's most definitely a challenge. The fact of the matter is that most systems that are attempting to analyze sentiment -- Richard's being actually a notable exception -- or otherwise create a label of some kind on language, even, is this document relevant or not, in a search engine.
RESNIKMost of those things are actually not using the full structure of the language to do it. Those things are actually taking the language that they're given, treating it as something equivalent to or not too far off of a bag of words -- it's as if you put all the words into a bag and jumbled them up, and you don't know what order they're in -- and then using that representation of things to draw a conclusion.
RESNIKIn a system like that, the fact that you've got junk words is not going to be that important because junk words are going to occur in an unsystematic way across lots of different possibilities, whereas there's going to be signal in the meaningful words that lead you to the right conclusion, and systems that learn can pick up on that.
NNAMDIGeoff...
RESNIKSo there's actually...
NNAMDIGo ahead, please, Richard.
RESNIKCan I -- there's actually a really funny side story on sentiment and understanding context. So some people also using sentiment for stock trading. And it turns out that a couple of years ago, Anne Hathaway brought out a good movie. It got good reviews. And as soon as the good reviews came out on the Internet, the stocks for Berkshire Hathaway, the company, went up a lot, like, more than chance.
RESNIKAnd so people were like, what's going on? And I think it was clear that people were running sentiment analysis algorithms there, but not understanding the context right and not this (word?) what entities are being talked about correctly? And so, you know, that might also be a cautionary tale of, you know, trying to...
NNAMDIWhich brings me to this, Geoffrey Hinton. We don't have a lot of time left. But now all the big Internet companies are interested in deep learning. Google bought a company that you founded with a couple of graduate students. Now you're spending part of your time working for Google. How are companies like Google, Facebook, Microsoft, and IBM using deep learning and neural networks today? You have about a minute.
HINTONOK. So the main uses I know about are for speech recognition -- all the companies are doing that now -- for object recognition. So in Google Plus, you can upload your own photos to the Web. You don't have to put any tags, and you can search for concepts like jewelry. Or you could even search for Yoda, and it'll tell you if you've got any pictures with Yoda in. They're also using them in -- to help with translation. One very interesting use for language is we've stopped treating words as discreet symbols.
HINTONAnd, now, for each word, we produce a bunch of real numbers, say, 100 numbers, that capture the meaning of that word. So the hundred numbers for Paris would be very similar to the hundred numbers for Rome, for example.
NNAMDII'm afraid that's...
HINTONAnd that's a big development in natural language.
NNAMDII'm afraid that's all the time we have. Geoffrey Hinton is distinguished professor of computer science at the University of Toronto. Philip Resnik is professor in the Department of Linguistics and Institute for Advanced Computer Studies at the University of Maryland. And Richard Socher is Stanford University PhD student in computer science and developer of the Neural Analysis of Sentiment algorithm. Thank you all for joining us. And thank you all for listening. I'm Kojo Nnamdi.
On this last episode, we look back on 23 years of joyous, difficult and always informative conversation.
Kojo talks with author Briana Thomas about her book “Black Broadway In Washington D.C.,” and the District’s rich Black history.
Poet, essayist and editor Kevin Young is the second director of the Smithsonian's National Museum of African American History and Culture. He joins Kojo to talk about his vision for the museum and how it can help us make sense of this moment in history.
Ms. Woodruff joins us to talk about her successful career in broadcasting, how the field of journalism has changed over the decades and why she chose to make D.C. home.