New Frontiers in Political Polling: Social Media and "Sentiment Analysis"

New Frontiers in Political Polling: Social Media and "Sentiment Analysis"

MR. KOJO NNAMDI

12:06:42
From WAMU 88.5 at American University in Washington, welcome to "The Kojo Nnamdi Show," connecting your neighborhood with the world. It's Tech Tuesday. We used to argue about politics around the dinner table and the water cooler. Today, we post links on Twitter and rants on our Facebook page. American politics is migrating onto social media platforms. And a new kind of pollster is following close behind, trying to make sense of an expanding ocean of new real-time political data, musings, links and snarky comments.

MR. KOJO NNAMDI

12:07:25
Taken alone, it's hard to grab any deep insight from a single tweet of 140 characters, but a couple of million tweets might just be better and quicker than a traditional poll or focus group, if you can figure out how to interpret them and pull the signal from the noise. That's where people like Philip Resnik come in. He's working to design algorithms and software programs to interpret language and emotions in social media, a field at the intersection of linguistics and computer science known as sentiment analysis.

MR. KOJO NNAMDI

12:08:01
It may sound futuristic and maybe a little bit ominous, but it's already happening. In fact, Facebook may already be interpreting your public and private political musings and sharing it with a major news outlet without you knowing. Philip Resnik joins us in studio. He is a professor in the Department of Linguistics and the Institute for Advanced Computer Studies at the University of Maryland. He's also lead scientist at a company called Converseon, a social media consultancy. Philip Resnik, thank you so much for joining us.

PROF. PHILIP RESNIK

12:08:35
Thank you. It's great to be here.

NNAMDI

12:08:36
You, too, can join the conversation. Call us at 800-433-8850. What do you know about sentiment analysis? Do you think it's the future of political polling? 800-433-8850. You can send us a tweet by going to #TechTuesday, email to kojo@wamu.org, or go to our website, kojoshow.org, and join the conversation there. This month, Facebook struck a unique deal with Politico, granting the newspaper and website exclusive access to data about its users' political rants and raves.

NNAMDI

12:09:12
Some privacy advocates cried foul because Facebook is analyzing private postings and sharing them with another, but others say this could be a peek into the future. Tell us a little bit about this deal and what it actually points to.

RESNIK

12:09:26
Well, it's kind of an interesting deal from a technologist's perspective, because in my field, the saying goes, more data is better data. At the same time, generally, you don't go after private data. I've dug into this a little bit. It actually looks as if they're not sharing the posts themselves but merely numbers based on the posts. Still, the idea of analyzing people's private data in order to get information, however valuable it might be, is definitely a little troubling.

NNAMDI

12:09:59
And so how would it work?

RESNIK

12:10:01
Well, the basic idea is actually pretty simple. The Facebook folks would be collecting up all posts where there are mentions of particular politicians' names, presumably the Republican primary candidates. And there's actually a field called computational linguistics, which is...

NNAMDI

12:10:20
Your field.

RESNIK

12:10:20
Yeah, which is my field, and it has two components to it. Some of us are scientists interested in using computational models to understand language and how it works better. And some of us are engineers trying to do useful things with language. The Facebook folks include engineers trying to do useful things with it, and so they are using a program called linguistic inquiry and word count, which is something that's been out there a while.

RESNIK

12:10:43
It's been developed by Jamie Pennebaker at U.T. Austin. And it is basically doing matching of words in particular categories. It's a very simple approach and yet surprisingly effective. And so what they're doing essentially is feeding that information into aggregate statistics and going from there to numbers that they pass out to the political folks to analyze.

NNAMDI

12:11:03
Is this the future of political polling?

RESNIK

12:11:07
Yes and no. I think, on the one hand, the future of political polling really is going to move away from asking questions and moving more toward paying attention to the conversations that people have, not just what they're saying in response to a pollster but what they're saying to each other. And now, we don't do it at the water cooler. We do it out on Facebook. We do it on Twitter.

RESNIK

12:11:33
We do it out in social media. On the other hand, this particular kind of approach is very crude. There's a lot more that can be done than what the Facebook people are doing and this particular program is doing because it's ignoring a lot of the linguistic issues. So there's a lot of work still to be done.

NNAMDI

12:11:51
Let's circle back to the polls as they're currently being conducted in Florida, where Republicans there are voting on their choice of presidential candidate. We look at polls that are taken in advance, or we look at entry polls. But these polls are kind of different because what we say to a pollster is often putting on our best face. However, what we say in our public tweets and in our private Facebook postings are often reflections of how we really feel about issues. So one can imagine how that might be -- give us a better take on what's likely to happen in the Florida Republican primary.

RESNIK

12:12:28
I think that's right, not just for that primary, but in general. In general, the ability to look at what people are saying to each other means that, instead of focusing on what the pollsters are asking -- and sometimes those questions are manipulated -- instead, what you're doing is looking at language in its natural form as people are talking to each other. And one of the things that means is that you're going to be able to pick up what people really are talking about, not just the voices that have the most money or speaking the most loudly.

NNAMDI

12:12:57
Again, the number to call, 800-433-8850. Do you think these kinds of programs, in fact, do give us a more representative view than traditional polling? Call us at 800-433-8850. Join this conversation about sentiment analysis with Philip Resnik. He's a professor in the Department of Linguistics and the Institute for Advanced Computer Studies at the University of Maryland.

NNAMDI

12:13:20
You can also send a tweet to #TechTuesday or go to our website, kojoshow.org, and ask a question or make a comment there. A lot of Americans, Philip, are disturbed by how the media and political establishment conduct elections. News outlets have a tendency to present elections as horse races. We often talk about polling data as if it's the same thing as actual voting data.

NNAMDI

12:13:43
We also now know that political campaigns have began to use extremely sophisticated programs to slice voters into smaller and smaller demographic groups to target with negative ads or campaign solicitations. What if sentiment analysis just makes this all worse?

RESNIK

12:14:02
It certainly could go in that direction. And it's something that people should be familiar with. When you have Facebook ads showing up on your Facebook page that seem directed toward the content, when you have Gmail, these are situations where algorithms are analyzing private data in order to do something that arguably might be of value. So there are two sides to the coin.

RESNIK

12:14:23
On the one hand, you worry about the worst kind of micromarketing where they're going to take whatever message they want to feed you and just find better ways to get you to listen to it. And on the flipside of that, this is, again, an environment where, instead of simply looking at the consumer or looking at the consumer of politics, the voter, as somebody to be fed to. There is an opportunity here to listen in ways that haven't been done before.

NNAMDI

12:14:51
Language is a particularly tricky kind of data source because one word can mean different things in different context. Consider these two hypothetical tweets about the Republican primary. One reads, "This Florida GOP primary is so unpredictable and exciting. Hope @Newt Gingrich brings down the establishment." The other reads, "@Newt Gingrich is so unpredictable. Hope the Florida primary doesn't bring down the GOP."

NNAMDI

12:15:20
You can't design a program, I guess, just to look for the term unpredictable when it can mean two entirely different things depending on the context. How do you teach a computer to recognize the difference?

RESNIK

12:15:33
That's a -- it's a very good question. In fact, unpredictable as a word is one that I use as an example of this. The usual example I give is that unpredictable is something good with movie plots. But if you're talking about your car's steering, not so good.

NNAMDI

12:15:43
Not so good.

RESNIK

12:15:44
So the idea, which is what the Facebook method is doing right now, of simply matching on words, it really doesn't get at the full set of issues. Another example that I like -- you'd normally think that breakthrough, for example, is something...

NNAMDI

12:15:58
Yeah.

RESNIK

12:15:58
...that's going to be positive, unless you're reading reviews about toilet paper, in which case, people are not quite...

RESNIK

12:16:03
...as happy about it. So the way that you address this problem has a couple of different dimensions to it. But the most important way that you address the problem is by collecting data and using automatic techniques to learn that the words have different meanings and different contexts, and so people in my field have been working for years, starting with things like online reviews from Amazon or other sources.

RESNIK

12:16:28
We have information about whether people like things or not in the form of star ratings. And you can actually do analysis in order to figure out the extent to which particular terms are positive or negative in particular contexts because you know the truth about what's positive or negative based on the star ratings that people give. So there's a whole element of machine learning that takes place that can actually lead to more effective systems than just these kinds of simple word-matching approaches.

NNAMDI

12:16:54
I'm still trying to separate breakthrough and toiler paper in my head.

NNAMDI

12:16:58
It's stuck there for some reason. Here is Beth in Bowie, Md. Beth, you're on the air. Go ahead, please.

BETH

12:17:04
Yeah. My question is actually similar to that one. And it's, how do you deal with sarcasm in postings on Facebook and Twitter that might mention a political figure but might have a context that's more sarcastic or preferential to something else rather than to actually what they're talking about?

NNAMDI

12:17:22
Indeed, Beth, consider this hypothetical tweet. Vote @mittromney, a man who tells it like it is. If a liberal Florida Democrat tweets that, its meaning is very different than if a Romney supporter uses the exact same words. Does sarcasm matter, Philip?

RESNIK

12:17:39
It does, and it doesn't. It does because it changes the meaning. Then there's a whole set of phenomenon that you need to look at -- sarcasm, metaphor, irony and even simple negation, just having a statement and having somebody put not or some kind of marker for irony at the end can completely turn something around. On the other hand, what you have to realize is that this is not a game of individual tweets or individual postings.

RESNIK

12:18:03
This is a game of analyzing a large quantity of data in order to discern the signal from the noise. And so we can do awfully well, surprisingly well using these kinds of word-based methods and then more sophisticated statistical methods even if the sarcasm, the irony and so forth is not handled well on an individual level. You just have to think of this as signal processing. There's a signal, and there's noise. And if you can manage to pull out useful signal from the noise, then you've succeeded.

NNAMDI

12:18:32
In that debate, if you will, do you need -- is there a way to weight, if you will, for sarcasm?

RESNIK

12:18:40
You have to detect it first, and that's a hard problem. And so, in general, the machine learning kinds of methods that people use have automatic kinds of weighting that are done on the basis on things. So, for example, if you have a term that's often used sarcastically but also often used non-sarcastically, it might wind up with a lesser weight simply because it's just not a good predictor.

NNAMDI

12:19:03
And, of course, we're saying W-E-I-G-H-T as in weight, not as in stay a longer time. Here is Julie in Gambrills, Md. Julie, you're on the air. Go ahead, please.

JULIE

12:19:14
Hi. Yes. I'm wondering how -- Facebook is as a source for understanding how people really feel about issues. I know that I have friends with varying political beliefs and about social issues. And I've discovered the hard way that I should keep my mouth shut or, you know, keep my comments not reflective of my true beliefs. I just keep it light and avoid when people post something inflammatory to me.

JULIE

12:19:53
I just steer away from it now instead of saying what -- because I've had people un-friend me on Facebook and had really, really feminist arguments happen because one doesn't like something I've said. So I'm wondering if social reinforcement of being polite keeps us, you know, (word?) these topics and if the polling is -- or not polling, but what you're doing is -- I'm sorry, I lost my train of thought. You understand what I'm saying.

NNAMDI

12:20:32
Well, I think I do understand your train of thought. You're wondering whether on the one hand we have a private face, and, on the other hand, we have a public face. But are social networks like Facebook becoming our public-private space?

RESNIK

12:20:44
Right. So, I mean, one of the things to remember here is that the social media conversation has a lot of the same properties as real conversations. And what that means is that people disagree with each other. People sometimes defer to other people's opinions when there's stuff they don't believe. There are, again, two sides of the coin here. On the one hand, you might have people, like the caller, avoiding topics.

RESNIK

12:21:07
And, on the other hand, you have the danger of what's sometimes called the echo chamber effect, where people within particular social groups simply tend to talk to each other and reinforce what they're saying. So I think that the takeaway here is that, in many respects, this domain is not different from any other kind of conversation in a lot of respects. And, in fact, it's kind of interesting. From a linguistic point of view, the social media conversations look a lot more like speech than they do like written language in a lot of situations.

NNAMDI

12:21:38
Got to take a short break. When we come back, we'll continue this Tech Tuesday conversation about the future of sentiment analysis frontiers and political polling, social media and sentiment analysis. You can still call us at 800-433-8850, or go to our website, kojoshow.org, and ask a question or make a comment there. Do we have a reasonable expectation of privacy when it comes to our political postings on social media? Send us an email to kojo@wamu.org. The number again, 800-433-8850. It's Tech Tuesday. I'm Kojo Nnamdi.

NNAMDI

12:24:02
Welcome come back to our Tech Tuesday conversation with Philip Resnik about sentiment analysis. He's a professor in the Department of Linguistics in the Institute for Advanced Computer Studies at the University of Maryland. Philip is also a lead scientist at a company called Converseon, which is a social media consultancy. We're taking your calls at 800-433-8850. You can send email to kojo@wamu.org.

NNAMDI

12:24:28
We got an email from Becky, who says, "I've always had a big problem with pollsters getting to decide what questions are worth asking and who gets to answer them. Never mind that they always call when I'm trying to get my kids' dinner ready. Is your guest optimistic that new technology can allow more diverse voices and topics to be heard?"

RESNIK

12:24:47
Well, that's -- that is certainly an annoyance, and it raises a real substantive issue, too, which has to do with the selection bias. Who actually is willing to take the time to respond to calls like this, and who, in fact, has landlines now that they take these calls on? I'm actually optimistic about the idea of social media analysis in the long-run providing a democratizing influence on the way that we look at how people are feeling about political issues, policy issues and issues in general.

RESNIK

12:25:17
One of the things that technology allows you to do is, instead of coming in with a presupposed set of topics -- you know, this is about foreign policy. This is about the abortion debate. This is about a particular set of things. There's a set of techniques called topic modeling that actually allow the automatic discovery of topics and trends, clusters and groups of issues, that can provide a bottom-up, if you will, picture of what's going on out there. That is, in fact, a lot more sophisticated than simply looking at trending terms on Twitter.

NNAMDI

12:25:51
And it would appear that I have a lot more sentiments than I am currently aware. Let's talk a little bit about the actual sentiments and emotions. I tend to classify my emotions in a very broad way. I'm either happy, I'm sad, I'm bored, I'm angry, I'm excited. One of the companies working in this space has built a program that sorts human emotions into 120 categories. A hundred and twenty categories, Philip?

RESNIK

12:26:17
Interesting. So one of the things that you have to do when you're trying to develop technology like this is define what it means to be successful.

NNAMDI

12:26:27
Yes.

RESNIK

12:26:28
And one of the ways we define being successful is, can we make distinctions as well as people can make them? So you take, for example, a blog posting or a tweet, and you ask somebody, can you identify the emotion or the sentiment in this particular piece of text? And you do this with multiple people independently, and you see how well they agree with each other, right? And that's not going to be 100 percent. In fact, it might be only 80 percent in the case of sort of positive versus negative.

RESNIK

12:26:52
If you're making an attempt to distinguish among 120 different categories -- I'm not familiar with this particular company that you're mentioning, but I'd be willing to bet that getting people to make these fine grain distinctions is probably not something that they're succeeding well. And so if they're doing that, how well can they say the technology is succeeding?

NNAMDI

12:27:14
This is a relatively new frontier in American politics, but private companies have been using sentiment analysis for quite some time. You're the head scientist, as we mentioned earlier, at a company called Converseon, which does this kind of analytics. What do companies look for?

RESNIK

12:27:30
Well, companies like Converseon and many others are interested in what you might call broadly the voice of the customer, and there are other good buzz words that go into that same space, where, essentially, the idea is that other companies that have brands and products -- and this is going to relate to politics, too, because, in some sense, politicians are themselves brands and products -- are interested in what people have to say.

RESNIK

12:27:55
And there are a whole bunch of reasons that people do that. One is crisis management if, all of a sudden, you start to see something trending negative that you weren't aware of. Another is to look for the kinds of facets that people are paying attention to. Is it the price of the car that they care about? Are people reacting positively, despite the fact that it has a high price, you know, because it has other features that they like? Another aspect that people look for is the intent of the people who are out there talking about things. Is this person a potential buyer? Is this person simply discussing it?

RESNIK

12:28:27
Is that person looking to sell? People are using this kind of technology to generate sales leads. People are using this in order to manage relationships with the customer base. So, I mean, the bottom line is, as soon as you have a source of data that tells you something that might be valuable to know as you're developing a product strategy or as you're developing a marketing campaign, people are going to be interested in tapping into that source of data.

NNAMDI

12:28:55
Here is Richard in Greenbelt, Md., about exactly how you look at that data. Richard, your turn.

RICHARD

12:29:04
Yes, hi. Well, actually, it's about sampling. And given that all the efforts that (unintelligible) or anyone in Post, Times puts into, you know, defining the 171 people that represent the American people, how -- does that concern you? And wouldn't it be true that, at first blush, anybody that's on Facebook writing about politics is already engaged, you know? That is, if they weren't engaging, you know -- I don't know how (unintelligible).

NNAMDI

12:29:37
They wouldn't be talking about it. Is that your point? If they weren't...

RICHARD

12:29:42
I'm sorry?

NNAMDI

12:29:43
If they weren't engaged, they wouldn't be talking about it.

RICHARD

12:29:44
Right. They wouldn't have...

NNAMDI

12:29:46
OK. Here's Philip.

RESNIK

12:29:49
Yeah. So I -- this is -- Richard asks a question that has a lot of substance to it. And, you know, statisticians and people who do this in politics have had years, decades, of experience developing methods and paying attention to what assumptions they make, how do you sample, how do you sample properly. And that notion is really in its infancy here. At a high level, going for a very, very large sample enables you to slice and dice in ways that you might not be able to, and so you can address some of the sampling issues that way.

RESNIK

12:30:22
And there's a longer statistical conversation that other people who do this, you know, could be having about that. The thing that worries me the most, though, is the fact that the folks who are doing this kind of sentiment analysis -- and this applies to the Politico-Facebook partnership. When I look at The Washington Post's Mention Machine and other things, people are actually behaving as if the algorithms are perfect and simply giving you numbers. And you don't even do that in a normal polling scenario.

RESNIK

12:30:50
You say plus or minus three, and there's a -- you know, there's a set of assumptions there. And there's an entire domain of study that we need to look at that says, now that we've got this set of techniques, what are the assumptions being made? And how do we present this information in a way that accurately reflects the confidence you should have and the confidence you shouldn't have rather than potentially a misleading sound bite?

NNAMDI

12:31:11
I am glad you brought that up because that becomes -- and, Richard, thank you for call -- particularly important for some people crucial in the area of the stock market and investments. We mentioned the deal between Facebook and Politico. Well, Twitter has also cut deals with private companies giving them special access to user data. One company called Topsy Labs is marketing a sentiment analysis service as a possible leading indicator of stock price movements. What do you know about that?

RESNIK

12:31:39
Well, I don't know anything specifically about Topsy Labs. It's actually kind of interesting. Years and years ago, I had a conversation with some folks who were interested in stock market prediction, and this was before the advent of the Web. And they're simply -- we decided not to pursue it because there just wasn't enough information out there.

RESNIK

12:32:01
The -- there's -- just as I said in terms of, you know, businesses that are looking to market their products or to understand how people are reacting better to their brands, where there's money involved, especially where there's potentially big money, there's going to be a lot of interest. There are some results that suggest that you can, in fact, get useful information and signal from social media text.

RESNIK

12:32:26
So, for example, Noah Smith, a professor at Carnegie Mellon University, is a leader in what's called text-driven forecasting. And he and colleagues looked at a paper where they were predicting the, you know, consumer confidence and showing that a measure of sentiment was actually tracking the consumer confidence, and so that lends promise to this. Although one of the things that's interesting to point out -- and Noah himself pointed it out -- is that if you go back to 2008, you see a really strange spike in the graph where his models predicted a lot of positivity.

RESNIK

12:32:58
But consumer confidence didn't look like that. When you drill down further there, what you discover is the way that they were finding economically related tweets was they were looking for the word jobs, and...

NNAMDI

12:33:08
Not a good idea in retrospect, is it?

RESNIK

12:33:10
Well, in retrospect, when you go back and look, you discover that one of the things that was happening in 2008 was the release of the iPhone.

RESNIK

12:33:21
And so what you found at this time was a very non-predictive spike. A similar...

NNAMDI

12:33:28
As in Steve, yes?

RESNIK

12:33:29
Yeah, exactly. And as a similar anecdote, there's a story out there -- I'm not sure whether or not it's true -- that every time Anne Hathaway gets mentioned in the -- you know, in the press, Berkshire Hathaway stock goes up because people like her. So there are a lot of subtleties here. The bottom line, again, is this is a game of signal and noise. And people have been trying forever to find new sources of signal to predict the stock market. They're not about to stop now.

NNAMDI

12:33:52
We're taking your calls at 800-433-8850. What do you think? Is sentiment analysis the future of political polling? Share your thoughts, 800-433-7750. Send us a tweet at #TechTuesday or an email to kojo@wamu.org. Here is Emil in Washington, D.C. Emil, you're on the air. Go ahead, please.

EMIL

12:34:14
Hi, guys. As I've been listening, my question kind of became more and more widespread. Initially I was going to point out that if you were to take at random a sampling of my friends' posts on Facebook, for example, you would find (word?) to be measured but not any predictor of who's going to show up at the polls, who's going to, you know, go vote, who's going to involve themselves with campaigns. And yet I know that there are all sorts of systems in place for measuring likely voter turnout and so forth.

EMIL

12:34:44
Just how reliable is that? Because I know there seems to be an inverse relationship, at least among my friend circles, between how loudly people talk and how much they talk, let's say, about politics and how likely they are to do actually anything about it. The people who get involved in campaigns very infrequently post anything political.

NNAMDI

12:35:04
What do you say, Philip?

RESNIK

12:35:06
Well, I think the deep issue here has to do with the relationship between what people say in social media or anywhere else and real-world behavior. And this is a tough nut to crack. I mean, pollsters can ask, are you likely to vote? But they have to follow-up with the individuals to find out whether they actually voted. One of the advantages of a text-driven forecasting kind of approach, where you're predicting things in the aggregate, is you get the truth, if you wait for it.

RESNIK

12:35:31
So if you're making predictions from text about the way the stock market is going to move, it either moved that way, or it didn't. If you make predictions about who is going to win the election, you get an answer to that. Sometimes you have to wait a while for it. When it comes to predicting individual behavior and connecting with individual behavior, there are so many variables. And, like the old water cooler conversations, we don't have access to those things as much anymore.

RESNIK

12:35:56
Now, it might be the case that on Facebook, when people go ahead and click the link that gives them the I Voted badge on their page, you're now going to have a source of information that you can tap into and try to correlate things they said with that. The question is, really, is what moves into being accessible online and what remains hidden from view in terms of these algorithms?

NNAMDI

12:36:17
Emil, thank you very much for your call. A few months back, musician and teen idol Justin Bieber had a public relations nightmare on his hands. His handlers were worried about accusations he'd fathered a child out of wedlock, that they would cut into his fan base and his standing on social media, so they employed a new technique to head off any problems before they got started.

NNAMDI

12:36:38
They hired a sentiment analysis company to monitor all mentions of Bieber and assess whether the language of the tweets and Facebook postings were positive and negative.

RESNIK

12:36:50
Yep.

RESNIK

12:36:52
I -- my reaction to this actually was, I have to tell you, was -- if you had to pick out of those 120 emotions, I actually reacted to this in a joyful way because -- not anything to do with Justin Bieber, but my reaction is, if you're seeing an article as an academic about your field of research being discussed in Entertainment Weekly, you know things have gotten really, really interesting. This is simply another example of marketing and protecting a brand. And Justin Bieber is a brand.

RESNIK

12:37:23
And what these folks did was they monitored this, and they got information. And in this particular case, it looked as if there was, you know, a big groundswell of support. And they breathed a sigh of relief and weren't as unhappy about it. Detecting whether people are supportive or unsupportive, detecting a set of basic emotions, I probably would go more with an inventory of six, Ekman's six basic emotions.

RESNIK

12:37:47
This is probably something where you can do useful things. And we're going to see more of it. If Justin Bieber's doing it, if the politicians are doing it, I think what we're seeing is the wave of the future.

NNAMDI

12:37:56
You're working from six. I mentioned earlier the company that has 120 different emotions. Well, that company is called Kanjoya. It's a social analytic company that was profiled at a recent LA Times article. In many ways, though, people like Justin Bieber are a kind of product that is used to sell other products. Is that how we should think about our elected leaders, like Mitt Romney and Newt Gingrich and Barack Obama? Are they just competing brands, if you will?

RESNIK

12:38:24
Well, I'm sad to say that I think there is a lot of branding-type conversation. I expect that that's the case in politics. As both in academic and as a citizen, I find what's potentially happening with social media encouraging rather than discouraging because, when people think about brands, they think about the facets of brands, you know, the particular features of the product. What are people speaking to.

RESNIK

12:38:55
One of the opportunities we have here is to look bottom-up, not at what people are saying about the questions you know to ask, but what are they saying about the things that you didn't think to ask? There's going to be an interplay. There's going to be, in some sense, the same kind of adversarial relationship that you have with advertisers, you know, out there, as well as the same kind of relationship where, you know, advertisers are trying to push things that they think will be useful to you. Nothing is really different here. It's just moving into a new territory.

NNAMDI

12:39:24
We got an email from Harvey in Manassas West. "Would you ask your guest how he gains access to Facebook and tweet communications in order to do his analysis?" I will combine that with a tweet we got from @chrisnorris. "Tech Tuesday, does Facebook give the comment and name to political postings?"

RESNIK

12:39:44
OK. So I should emphasize that I'm not involved with the Politico and Facebook partnership. My understanding, from what I've read about it, is that Facebook is not actually giving posts to Politico. My understanding is that they're -- what they were doing is they were doing analysis and then basically giving numbers, essentially charts, graphs, numbers, tables. That doesn't necessarily mean anybody's happier about it when they're analyzing their private posts.

RESNIK

12:40:10
What those of us who work on this in academia do involves a whole variety of techniques. There are -- Twitter and other companies do make some data available for public and for research use. There are also people who aggregate blogs and make those things available for research use. And, you know, there's -- there are basically lots of sources, including scraping Web pages.

RESNIK

12:40:38
If you go back to the early days of this, you would find simply, you know, a grad student somewhere in a lab writing a crawler, you know, to go through a set of pages, parts out the HTML and, you know, try and find how many stars the review had and what the text was. The data is really a golden resource here, and so academics and researchers in industry are finding any which way they can. That said, Twitter, if you pay enough, you can get the Firehose. They call it the Firehose.

RESNIK

12:41:11
You can get everything that's coming through on Twitter. And other sources of data, I'm sure, have similar sorts of arrangements.

NNAMDI

12:41:17
It's a Tech Tuesday conversation on sentiment analysis. We're taking your calls at 800-433-8850. Do you think these programs -- these kinds of programs give us a more representative view than traditional polling? 800-433-8850 or go to our website, kojoshow.org. Join the Tech Tuesday conversation there. I'm Kojo Nnamdi.

NNAMDI

12:43:27
It's Tech Tuesday. We're talking about sentiment analysis, new frontiers in political and other polling, social media and sentiment analysis. Our guest is Philip Resnik. He is a professor in the Department of Linguistics and the Institute for Advanced Computer Studies at the University of Maryland and lead scientist at a company known as Converseon, a social media consultancy. Philip, we got this tweet from @lousylinguist: "I've never seen an NLP..." -- which is short for neuro-linguistic programming.

NNAMDI

12:43:55
"I've never seen an NLP tech spark the imagination of business as fast as sentiment analysis at "Kojo Show." Tell us a little bit, Phyllis (sic) about -- Philip about the feel of computational linguistics. It may sound really esoteric, but it's fueling a lot of exciting innovations.

RESNIK

12:44:13
Sure. I'd be happy to. And I will point out, actually, NLP in this context, going to ambiguity, actually stands for natural language processing. You sometimes find confusion with neuro-linguistic programming, which is a somewhat sketchy field where people try to influence each other. You know...

NNAMDI

12:44:31
I understand natural language programming much better -- process...

RESNIK

12:44:33
Natural language processing -- natural language meaning human language -- as opposed to computer languages is a field that's been around pretty much since the advent of computers, initially attempts at machine translation, for example, within this field. People who work in natural language processing basically are trying to make computers smarter about human language. So think about Watson answering questions on "Jeopardy." Think about Siri as a virtual assistant. In the early dreams of this in the '70s and '80s, think about (word?) or Data on "Star Trek: The Next Generation."

RESNIK

12:45:09
These days, natural language processing is out there in a way that it didn't used to be, and that's partly because of the advent of the Web and then the new wave of interest in social media. People have been working in this field for quite a long time. But it takes something like this to really bring it to the popular attention. And at this point, I'll say hopefully, I guess, because it's my field, I don't think there's any turning back. Language is the medium out there for so much of what goes on.

RESNIK

12:45:42
And the fact that it's available online all of a sudden makes it a natural resource, both something to be studied as scientists and something to be worked with as engineers.

NNAMDI

12:45:51
Well now that it's come to popular awareness, I guess people want to know. In some ways, we seem to be moving towards some sort of computer awareness of human emotions and human preferences. Is this a form of artificial intelligence?

RESNIK

12:46:06
So what I often say to people is if they tell you they've written a program that actually understands what you're saying, you should put your hand on your wallet and keep it there. Artificial intelligence is something that people have studied for a long time, and natural language processing is -- it can be considered a branch of AI, artificial intelligence. That said, notions like awareness, consciousness, you know, all of the philosophical conversation, bear very little relation to what people are actually doing today, both on the scientific side and the engineering side.

RESNIK

12:46:40
What we're trying to do is understand how language works, and we're trying to take advantage of computational methods in order to create machines that will do useful things for us. And my own personal view on this is that the best way to go about this is to find ways of taking advantage of the data and the computational power to complement, not to replace, human insight.

NNAMDI

12:47:04
On to Liz in Alexandria, Va. Liz, your turn.

LIZ

12:47:10
Hello. Good afternoon. My name is Liz, and I was very interested in this conversation because this is what I do as part of my museum consulting business in the D.C. area, and I actually wrote a paper about it that was published in The Exhibitionist academic journal.

LIZ

12:47:29
So I essentially will go into social media sites like Yelp or Trip Advisor and look at a museum's ratings, take the reviews and then code them for different factors that talk about experience and then come up with a report that will tell the client, like, what essentially is happening as part of their visit experience so they can make changes in the real world that actually will then change or drive up reviews or satisfaction in the virtual world.

NNAMDI

12:48:01
Sentiment...

LIZ

12:48:02
I had no idea.

RESNIK

12:48:04
Yeah. Actually, it's -- I love the fact that your call came in right after I was commenting about the importance of human insight. What you call coding, people in my field often call annotation. It's basically taking unstructured material and labeling it with categories that are useful to you for a particular problem. The important thing to emphasize here is that it's not just a question of taking technology and throwing it at the wall and seeing if it sticks.

RESNIK

12:48:32
What it's really a question of is using technology to extract information that we combine with human interpretation and insight and then use in order to solve problems, to address issues, to solve tasks, to provide value of some kind. The human in the loop is really, really central here. There's an analogy to the pollsters, right? What questions are being asked, and how are you asking them? There's something very crucial about having human insight in a particular domain.

RESNIK

12:49:02
And I -- actually, that's one of the reasons that I'm actually working with political scientists and not just jumping into this. And, you know, on the academic side, I'm collaborating with some political scientists in order to try to understand the set of issues better.

NNAMDI

12:49:14
Thank you very much for your call, Liz. There have been 19 Republican debates to this point -- I'm so glad you mentioned politics -- a process that's been both fascinating and exhausting. Millions of people have been using Twitter while watching these debates, using the same hashtags and launching a kind of community experience, if you will. You're working on a new project that would give -- build on that experience or that -- the project would just simply build on the experience. Tell us about React Labs.

RESNIK

12:49:42
Well, so in the React Labs project, we're developing a mobile app that runs on smartphones -- or you can use it on your computer -- that allows people to react to live events, such as political debates, in real time. And it, you know, prevents the kind of, you know, tweet your comments functionality that people are doing. You see the -- you know, you can see tweets fly by if you go watch these debates. But what it's doing differently is it actually presents you with a panel of buttons so that you can actually tap, agree, disagree, spin, dodge, and it...

NNAMDI

12:50:19
So that you won't have to spend your own time writing everything.

RESNIK

12:50:22
Not only don't you have to spend your own time writing it, it's well-defined -- this goes back to coding and categories, like the previous caller mentioned -- and it -- what it does is collect that material and put it up on an associated Web type -- website so that it is unfolding in real time from second to second. So what you've really got here -- unlike polling, which is after the fact -- is something that's actually in the moment and very, very fine grained.

RESNIK

12:50:48
And we're doing this in order to look at people's reactions, not, in general, to how people answered questions, but to the specific statements they were made. And you can get a whole bunch of variety varying just second by second, and there's no other technique that we've come across that can do that.

NNAMDI

12:51:02
Well, we used to yell at the TV on such occasions...

RESNIK

12:51:05
Yes.

NNAMDI

12:51:05
...or vent over the dinner table, but those comments would disappear into the ether as soon as we uttered them. Now that we have these transcripts of reactions to events like debates, we have a dataset that could be very interesting and very useful to political scientists and campaigns in terms of understanding which words and concepts resonate and how.

RESNIK

12:51:27
Absolutely. So my interest in this started with an interest in spin, how our -- how is language being used in order to manipulate our perception of issues? And one of the things that I realized is, A, it's hard to define, and, B, it's hard to get data about it. And so the genesis of React Labs was actually partly about wanting to create a new platform for political engagement, but also partly about wanting to have data that we could look at and see the large quantity of reactions providing converging evidence that a particular statement was perceived as being spin, or was perceived as dodging the question and so forth.

RESNIK

12:52:04
So these datasets, I'm hoping, will be particularly valuable. And if it's OK to say, if people want to sign up for a beta test, there is a website, reactlabs.org. And...

NNAMDI

12:52:15
There's a link at our website, kojoshow.org, for reactlabs.org.

RESNIK

12:52:18
Terrific. Yep.

NNAMDI

12:52:20
This kind of analysis could have very interesting uses in the real world, away from partisan politics. Take, for instance, Yen (sp?) in Alexandria, Va., and what he saw a week or two ago. Yen, you're on the air. Go ahead, please.

YEN

12:52:35
Yes. Good afternoon. This is a fascinating discussion. I'm really enjoying listening to it. And I just remember that, last week, I saw a news story about the FBI planning to develop an early warning system based on material scrapped, so to speak, from social networks. And my question is, how is the technology that your guest is using similar or different to what the FBI is planning to use?

NNAMDI

12:53:02
You know, All Tech Considered, the technology segment on ATC, "All Things Considered," recently profiled the technologist who's doing this kind of work for the FBI and Homeland Security.

RESNIK

12:53:10
Absolutely. So a lot of work in natural language processing is driven, in the research community, by government funding. And governments, you know, are particularly concerned about understanding what's going on out there, the same way that companies and politicians are, albeit for different reasons. So the Defense Advanced Research Projects Agency or DARPA, the Intelligence Advanced Research Projects Activity, IARPA, there's a lot of activity on developing these basic techniques. Now, it is worth emphasizing -- because I can already feel, from out there, people's skin beginning to crawl.

NNAMDI

12:53:48
Yes. Where can I move to?

RESNIK

12:53:49
Right. So, in my experience, the folks who are running these programs are extraordinarily cautious, in fact, insanely worried and cautious, about the idea that these techniques could be applied, you know, with some kind of a government imprimatur to communications within the U.S.

NNAMDI

12:54:10
Yeah. We've seen a recent story about two British nationals who were denied access to the U.S. because of their social media postings.

RESNIK

12:54:16
Yeah. The focus that I've seen out there in the community is largely on multilingual applications, not surprisingly for languages like Arabic and Chinese, but many others as well. So you absolutely find government involvement in supporting these techniques in the same way that government research, you know, led to the Internet in the first place. You're going to see that. That said, I think that there -- certainly among the people doing this work and, I strongly believe, among the people sponsoring it, there is a clear sense of boundaries here as well.

NNAMDI

12:54:52
Yen, thank you very much for your call. Here is Sam in Centreville, Va. Sam, you're on the air. Go ahead, please.

SAM

12:55:00
I have a couple of comments or questions. The first one is that -- about recognizing sentiments. There's a whole lot of -- there are a whole lot of people out in the -- who just cannot recognize things like sarcasm and -- because even humans can't do it, it seems very, very difficult to expect computers to do it. My next thing has to do with selecting -- there are many, many people who either don't or can't -- social media, and it seems like they are being left out of your statistics gathering...

NNAMDI

12:55:50
Yeah. I can see Monty Python driving a computer crazy, but go ahead.

SAM

12:55:56
...that, you know, the man-on-the-street interviews way back when selected against the -- many people because you would think, OK, what about the people who were in at work at the time that these polls were being taken? And who are these people who are actually being found by the pollsters? I have...

NNAMDI

12:56:24
What would be your concern now with the use of social media instead of people being selected by pollsters?

SAM

12:56:32
Well, the -- it's -- there are people who cannot access the social media or people who will not. For instance...

NNAMDI

12:56:42
Who choose not to access it. We're running out of time, which is why I was going into shorthand for you, Sam, because we also got a posting by Jen on our website. "Wouldn't the digital divide skew the results as mapped to the general population? Would this create a larger divide if the results of the analysis gear future debates and issues to those who have access to Twitter, Facebook, the Internet?"

RESNIK

12:57:03
Yeah. So Sam and also in the -- on the website raises, actually, a really very important issue, the idea of who is being sampled here, right. There are really two sides to this. One is whether there is some kind of systematic bias that's taking place. And the thing to note here is there is already bias that takes place that people know that they have to adjust for in traditional polls. The other aspect of this is that we're heading toward the future, and there's going to be an increasingly large number of people whose conversations are -- about this are, in fact, taking place on social media.

RESNIK

12:57:39
So there are definitely issues of access, and this is really new territory. And we're going to have to address those.

NNAMDI

12:57:48
Sam, thank you very much for your call. You do raise an important issue. Philip Resnik, thank you so much for joining us.

RESNIK

12:57:53
Thank you.

NNAMDI

12:57:53
Philip Resnik is a professor in the Department of Linguistics and the Institute for Advanced Computer Studies at the University of Maryland. He's also lead scientist at a company called Converseon, a social media consultancy. Thank you all for listening to this edition of Tech Tuesday. I'm Kojo Nnamdi.
Transcripts of WAMU programs are available for personal use. Transcripts are provided "As Is" without warranties of any kind, either express or implied. WAMU does not warrant that the transcript is error-free. For all WAMU programs, the broadcast audio should be considered the authoritative version. Transcripts are owned by WAMU 88.5 FM American University Radio and are protected by laws in both the United States and international law. You may not sell or modify transcripts or reproduce, display, distribute, or otherwise use the transcript, in whole or in part, in any way for any public or commercial purpose without the express written permission of WAMU. All requests for uses beyond personal and noncommercial use should be referred to (202) 885-1200.
The Kojo Nnamdi Show is produced by member-supported WAMU 88.5 in Washington DC.