Food Wednesday explores how a catastrophic drought in California is affecting choices people make throughout our food system - all the way down to shoppers at the grocery store in your neighborhood.
Whether you like it or not, your online status updates are now fueling America’s political machine. Campaigns are mining social networks for data on potential supporters. Facebook is teaming up with Politico, granting the news outlet exclusive access to user data. The goal: using sophisticated computer programs to interpret our language, emotions and political dispositions. Tech Tuesday explores the technical challenges and ethical grey areas of “sentiment analysis” in politics.
- Philip Resnik Professor, Department of Linguistics and Institute for Advanced Computer Studies, University of Maryland
The Young Turks’ explanatory piece on the Facebook/Politico partnership:
“Sentiment Analysis,” Real-Time Reactions, And Social Networks
Earlier this month, Facebook struck a deal with Politico, granting the news outlet exclusive access to data about its users’ political preferences. Under the deal, Facebook used a sophisticated algorithm to analyze the language of its users public and private postings, flagging all mentions of Republican Presidential candidates by name, and analyzing whether those comments were positive or negative. This “sentiment analysis” data was then shared with Politico to produce reports and commentary. As Mashable’s Alex Fitzpatrick reported, Politico can also use the Facebook information to provide more personal data about users, like age and location.
The Facebook-Politico deal prompted mixed reactions. Some observers said the arrangement raised strong privacy concerns. Others expressed hope that this kind of data-analysis could end up deepening our understanding of American political attitudes, even as they raised concerns about the arrangement itself.
The Future of Political Polling?
Traditionally, public opinion polls and focus groups were the only reliable source of data about politics and consumer goods. But many campaigns and corporations believe that “sentiment analysis” could someday replace traditional polling. Last year, researchers at Carnegie Mellon University found that data analysis from Twitter yielded the same results as traditional polling. In the lead-up to the Iowa Caucuses, an analysis of Twitter data by Mashable / Global Point Research correctly predicted a surge by Republican Rick Santorum before traditional pollsters.
“Sentiment analysis” exists at the intersection of computer science and linguistics. Writing on Language Log, Guest Phil Resnick recently explored the technical challenges and ethical implications of building algorithms to interpret and quantify natural human language on the web.
Resnik is currently working on an app with colleagues that would let people react to live events (like political debates) through their smartphones, and their comments would be displayed in real time on an associated web page. Resnik thinks this kind of tool will let people engage deeply with the events and participants as they watch, rather than just yelling at the T.V. or venting on Twitter. Scientists would also be able to study exactly which candidates’ statements people reacted to down to milliseconds.
MR. KOJO NNAMDIFrom WAMU 88.5 at American University in Washington, welcome to "The Kojo Nnamdi Show," connecting your neighborhood with the world. It's Tech Tuesday. We used to argue about politics around the dinner table and the water cooler. Today, we post links on Twitter and rants on our Facebook page. American politics is migrating onto social media platforms. And a new kind of pollster is following close behind, trying to make sense of an expanding ocean of new real-time political data, musings, links and snarky comments.
MR. KOJO NNAMDITaken alone, it's hard to grab any deep insight from a single tweet of 140 characters, but a couple of million tweets might just be better and quicker than a traditional poll or focus group, if you can figure out how to interpret them and pull the signal from the noise. That's where people like Philip Resnik come in. He's working to design algorithms and software programs to interpret language and emotions in social media, a field at the intersection of linguistics and computer science known as sentiment analysis.
MR. KOJO NNAMDIIt may sound futuristic and maybe a little bit ominous, but it's already happening. In fact, Facebook may already be interpreting your public and private political musings and sharing it with a major news outlet without you knowing. Philip Resnik joins us in studio. He is a professor in the Department of Linguistics and the Institute for Advanced Computer Studies at the University of Maryland. He's also lead scientist at a company called Converseon, a social media consultancy. Philip Resnik, thank you so much for joining us.
PROF. PHILIP RESNIKThank you. It's great to be here.
NNAMDIYou, too, can join the conversation. Call us at 800-433-8850. What do you know about sentiment analysis? Do you think it's the future of political polling? 800-433-8850. You can send us a tweet by going to #TechTuesday, email to email@example.com, or go to our website, kojoshow.org, and join the conversation there. This month, Facebook struck a unique deal with Politico, granting the newspaper and website exclusive access to data about its users' political rants and raves.
NNAMDISome privacy advocates cried foul because Facebook is analyzing private postings and sharing them with another, but others say this could be a peek into the future. Tell us a little bit about this deal and what it actually points to.
RESNIKWell, it's kind of an interesting deal from a technologist's perspective, because in my field, the saying goes, more data is better data. At the same time, generally, you don't go after private data. I've dug into this a little bit. It actually looks as if they're not sharing the posts themselves but merely numbers based on the posts. Still, the idea of analyzing people's private data in order to get information, however valuable it might be, is definitely a little troubling.
NNAMDIAnd so how would it work?
RESNIKWell, the basic idea is actually pretty simple. The Facebook folks would be collecting up all posts where there are mentions of particular politicians' names, presumably the Republican primary candidates. And there's actually a field called computational linguistics, which is...
RESNIKYeah, which is my field, and it has two components to it. Some of us are scientists interested in using computational models to understand language and how it works better. And some of us are engineers trying to do useful things with language. The Facebook folks include engineers trying to do useful things with it, and so they are using a program called linguistic inquiry and word count, which is something that's been out there a while.
RESNIKIt's been developed by Jamie Pennebaker at U.T. Austin. And it is basically doing matching of words in particular categories. It's a very simple approach and yet surprisingly effective. And so what they're doing essentially is feeding that information into aggregate statistics and going from there to numbers that they pass out to the political folks to analyze.
NNAMDIIs this the future of political polling?
RESNIKYes and no. I think, on the one hand, the future of political polling really is going to move away from asking questions and moving more toward paying attention to the conversations that people have, not just what they're saying in response to a pollster but what they're saying to each other. And now, we don't do it at the water cooler. We do it out on Facebook. We do it on Twitter.
RESNIKWe do it out in social media. On the other hand, this particular kind of approach is very crude. There's a lot more that can be done than what the Facebook people are doing and this particular program is doing because it's ignoring a lot of the linguistic issues. So there's a lot of work still to be done.
NNAMDILet's circle back to the polls as they're currently being conducted in Florida, where Republicans there are voting on their choice of presidential candidate. We look at polls that are taken in advance, or we look at entry polls. But these polls are kind of different because what we say to a pollster is often putting on our best face. However, what we say in our public tweets and in our private Facebook postings are often reflections of how we really feel about issues. So one can imagine how that might be -- give us a better take on what's likely to happen in the Florida Republican primary.
RESNIKI think that's right, not just for that primary, but in general. In general, the ability to look at what people are saying to each other means that, instead of focusing on what the pollsters are asking -- and sometimes those questions are manipulated -- instead, what you're doing is looking at language in its natural form as people are talking to each other. And one of the things that means is that you're going to be able to pick up what people really are talking about, not just the voices that have the most money or speaking the most loudly.
NNAMDIAgain, the number to call, 800-433-8850. Do you think these kinds of programs, in fact, do give us a more representative view than traditional polling? Call us at 800-433-8850. Join this conversation about sentiment analysis with Philip Resnik. He's a professor in the Department of Linguistics and the Institute for Advanced Computer Studies at the University of Maryland.
NNAMDIYou can also send a tweet to #TechTuesday or go to our website, kojoshow.org, and ask a question or make a comment there. A lot of Americans, Philip, are disturbed by how the media and political establishment conduct elections. News outlets have a tendency to present elections as horse races. We often talk about polling data as if it's the same thing as actual voting data.
NNAMDIWe also now know that political campaigns have began to use extremely sophisticated programs to slice voters into smaller and smaller demographic groups to target with negative ads or campaign solicitations. What if sentiment analysis just makes this all worse?
RESNIKIt certainly could go in that direction. And it's something that people should be familiar with. When you have Facebook ads showing up on your Facebook page that seem directed toward the content, when you have Gmail, these are situations where algorithms are analyzing private data in order to do something that arguably might be of value. So there are two sides to the coin.
RESNIKOn the one hand, you worry about the worst kind of micromarketing where they're going to take whatever message they want to feed you and just find better ways to get you to listen to it. And on the flipside of that, this is, again, an environment where, instead of simply looking at the consumer or looking at the consumer of politics, the voter, as somebody to be fed to. There is an opportunity here to listen in ways that haven't been done before.
NNAMDILanguage is a particularly tricky kind of data source because one word can mean different things in different context. Consider these two hypothetical tweets about the Republican primary. One reads, "This Florida GOP primary is so unpredictable and exciting. Hope @Newt Gingrich brings down the establishment." The other reads, "@Newt Gingrich is so unpredictable. Hope the Florida primary doesn't bring down the GOP."
NNAMDIYou can't design a program, I guess, just to look for the term unpredictable when it can mean two entirely different things depending on the context. How do you teach a computer to recognize the difference?
RESNIKThat's a -- it's a very good question. In fact, unpredictable as a word is one that I use as an example of this. The usual example I give is that unpredictable is something good with movie plots. But if you're talking about your car's steering, not so good.
NNAMDINot so good.
RESNIKSo the idea, which is what the Facebook method is doing right now, of simply matching on words, it really doesn't get at the full set of issues. Another example that I like -- you'd normally think that breakthrough, for example, is something...
RESNIK...that's going to be positive, unless you're reading reviews about toilet paper, in which case, people are not quite...
RESNIK...as happy about it. So the way that you address this problem has a couple of different dimensions to it. But the most important way that you address the problem is by collecting data and using automatic techniques to learn that the words have different meanings and different contexts, and so people in my field have been working for years, starting with things like online reviews from Amazon or other sources.
RESNIKWe have information about whether people like things or not in the form of star ratings. And you can actually do analysis in order to figure out the extent to which particular terms are positive or negative in particular contexts because you know the truth about what's positive or negative based on the star ratings that people give. So there's a whole element of machine learning that takes place that can actually lead to more effective systems than just these kinds of simple word-matching approaches.
NNAMDII'm still trying to separate breakthrough and toiler paper in my head.
NNAMDIIt's stuck there for some reason. Here is Beth in Bowie, Md. Beth, you're on the air. Go ahead, please.
BETHYeah. My question is actually similar to that one. And it's, how do you deal with sarcasm in postings on Facebook and Twitter that might mention a political figure but might have a context that's more sarcastic or preferential to something else rather than to actually what they're talking about?
NNAMDIIndeed, Beth, consider this hypothetical tweet. Vote @mittromney, a man who tells it like it is. If a liberal Florida Democrat tweets that, its meaning is very different than if a Romney supporter uses the exact same words. Does sarcasm matter, Philip?
RESNIKIt does, and it doesn't. It does because it changes the meaning. Then there's a whole set of phenomenon that you need to look at -- sarcasm, metaphor, irony and even simple negation, just having a statement and having somebody put not or some kind of marker for irony at the end can completely turn something around. On the other hand, what you have to realize is that this is not a game of individual tweets or individual postings.
RESNIKThis is a game of analyzing a large quantity of data in order to discern the signal from the noise. And so we can do awfully well, surprisingly well using these kinds of word-based methods and then more sophisticated statistical methods even if the sarcasm, the irony and so forth is not handled well on an individual level. You just have to think of this as signal processing. There's a signal, and there's noise. And if you can manage to pull out useful signal from the noise, then you've succeeded.
NNAMDIIn that debate, if you will, do you need -- is there a way to weight, if you will, for sarcasm?
RESNIKYou have to detect it first, and that's a hard problem. And so, in general, the machine learning kinds of methods that people use have automatic kinds of weighting that are done on the basis on things. So, for example, if you have a term that's often used sarcastically but also often used non-sarcastically, it might wind up with a lesser weight simply because it's just not a good predictor.
NNAMDIAnd, of course, we're saying W-E-I-G-H-T as in weight, not as in stay a longer time. Here is Julie in Gambrills, Md. Julie, you're on the air. Go ahead, please.
JULIEHi. Yes. I'm wondering how -- Facebook is as a source for understanding how people really feel about issues. I know that I have friends with varying political beliefs and about social issues. And I've discovered the hard way that I should keep my mouth shut or, you know, keep my comments not reflective of my true beliefs. I just keep it light and avoid when people post something inflammatory to me.
JULIEI just steer away from it now instead of saying what -- because I've had people un-friend me on Facebook and had really, really feminist arguments happen because one doesn't like something I've said. So I'm wondering if social reinforcement of being polite keeps us, you know, (word?) these topics and if the polling is -- or not polling, but what you're doing is -- I'm sorry, I lost my train of thought. You understand what I'm saying.
NNAMDIWell, I think I do understand your train of thought. You're wondering whether on the one hand we have a private face, and, on the other hand, we have a public face. But are social networks like Facebook becoming our public-private space?
RESNIKRight. So, I mean, one of the things to remember here is that the social media conversation has a lot of the same properties as real conversations. And what that means is that people disagree with each other. People sometimes defer to other people's opinions when there's stuff they don't believe. There are, again, two sides of the coin here. On the one hand, you might have people, like the caller, avoiding topics.
RESNIKAnd, on the other hand, you have the danger of what's sometimes called the echo chamber effect, where people within particular social groups simply tend to talk to each other and reinforce what they're saying. So I think that the takeaway here is that, in many respects, this domain is not different from any other kind of conversation in a lot of respects. And, in fact, it's kind of interesting. From a linguistic point of view, the social media conversations look a lot more like speech than they do like written language in a lot of situations.
NNAMDIGot to take a short break. When we come back, we'll continue this Tech Tuesday conversation about the future of sentiment analysis frontiers and political polling, social media and sentiment analysis. You can still call us at 800-433-8850, or go to our website, kojoshow.org, and ask a question or make a comment there. Do we have a reasonable expectation of privacy when it comes to our political postings on social media? Send us an email to firstname.lastname@example.org. The number again, 800-433-8850. It's Tech Tuesday. I'm Kojo Nnamdi.
NNAMDIWelcome come back to our Tech Tuesday conversation with Philip Resnik about sentiment analysis. He's a professor in the Department of Linguistics in the Institute for Advanced Computer Studies at the University of Maryland. Philip is also a lead scientist at a company called Converseon, which is a social media consultancy. We're taking your calls at 800-433-8850. You can send email to email@example.com.
NNAMDIWe got an email from Becky, who says, "I've always had a big problem with pollsters getting to decide what questions are worth asking and who gets to answer them. Never mind that they always call when I'm trying to get my kids' dinner ready. Is your guest optimistic that new technology can allow more diverse voices and topics to be heard?"
RESNIKWell, that's -- that is certainly an annoyance, and it raises a real substantive issue, too, which has to do with the selection bias. Who actually is willing to take the time to respond to calls like this, and who, in fact, has landlines now that they take these calls on? I'm actually optimistic about the idea of social media analysis in the long-run providing a democratizing influence on the way that we look at how people are feeling about political issues, policy issues and issues in general.
RESNIKOne of the things that technology allows you to do is, instead of coming in with a presupposed set of topics -- you know, this is about foreign policy. This is about the abortion debate. This is about a particular set of things. There's a set of techniques called topic modeling that actually allow the automatic discovery of topics and trends, clusters and groups of issues, that can provide a bottom-up, if you will, picture of what's going on out there. That is, in fact, a lot more sophisticated than simply looking at trending terms on Twitter.
NNAMDIAnd it would appear that I have a lot more sentiments than I am currently aware. Let's talk a little bit about the actual sentiments and emotions. I tend to classify my emotions in a very broad way. I'm either happy, I'm sad, I'm bored, I'm angry, I'm excited. One of the companies working in this space has built a program that sorts human emotions into 120 categories. A hundred and twenty categories, Philip?
RESNIKInteresting. So one of the things that you have to do when you're trying to develop technology like this is define what it means to be successful.
RESNIKAnd one of the ways we define being successful is, can we make distinctions as well as people can make them? So you take, for example, a blog posting or a tweet, and you ask somebody, can you identify the emotion or the sentiment in this particular piece of text? And you do this with multiple people independently, and you see how well they agree with each other, right? And that's not going to be 100 percent. In fact, it might be only 80 percent in the case of sort of positive versus negative.
RESNIKIf you're making an attempt to distinguish among 120 different categories -- I'm not familiar with this particular company that you're mentioning, but I'd be willing to bet that getting people to make these fine grain distinctions is probably not something that they're succeeding well. And so if they're doing that, how well can they say the technology is succeeding?
NNAMDIThis is a relatively new frontier in American politics, but private companies have been using sentiment analysis for quite some time. You're the head scientist, as we mentioned earlier, at a company called Converseon, which does this kind of analytics. What do companies look for?
RESNIKWell, companies like Converseon and many others are interested in what you might call broadly the voice of the customer, and there are other good buzz words that go into that same space, where, essentially, the idea is that other companies that have brands and products -- and this is going to relate to politics, too, because, in some sense, politicians are themselves brands and products -- are interested in what people have to say.
RESNIKAnd there are a whole bunch of reasons that people do that. One is crisis management if, all of a sudden, you start to see something trending negative that you weren't aware of. Another is to look for the kinds of facets that people are paying attention to. Is it the price of the car that they care about? Are people reacting positively, despite the fact that it has a high price, you know, because it has other features that they like? Another aspect that people look for is the intent of the people who are out there talking about things. Is this person a potential buyer? Is this person simply discussing it?
RESNIKIs that person looking to sell? People are using this kind of technology to generate sales leads. People are using this in order to manage relationships with the customer base. So, I mean, the bottom line is, as soon as you have a source of data that tells you something that might be valuable to know as you're developing a product strategy or as you're developing a marketing campaign, people are going to be interested in tapping into that source of data.
NNAMDIHere is Richard in Greenbelt, Md., about exactly how you look at that data. Richard, your turn.
RICHARDYes, hi. Well, actually, it's about sampling. And given that all the efforts that (unintelligible) or anyone in Post, Times puts into, you know, defining the 171 people that represent the American people, how -- does that concern you? And wouldn't it be true that, at first blush, anybody that's on Facebook writing about politics is already engaged, you know? That is, if they weren't engaging, you know -- I don't know how (unintelligible).
NNAMDIThey wouldn't be talking about it. Is that your point? If they weren't...
NNAMDIIf they weren't engaged, they wouldn't be talking about it.
RICHARDRight. They wouldn't have...
NNAMDIOK. Here's Philip.
RESNIKYeah. So I -- this is -- Richard asks a question that has a lot of substance to it. And, you know, statisticians and people who do this in politics have had years, decades, of experience developing methods and paying attention to what assumptions they make, how do you sample, how do you sample properly. And that notion is really in its infancy here. At a high level, going for a very, very large sample enables you to slice and dice in ways that you might not be able to, and so you can address some of the sampling issues that way.
RESNIKAnd there's a longer statistical conversation that other people who do this, you know, could be having about that. The thing that worries me the most, though, is the fact that the folks who are doing this kind of sentiment analysis -- and this applies to the Politico-Facebook partnership. When I look at The Washington Post's Mention Machine and other things, people are actually behaving as if the algorithms are perfect and simply giving you numbers. And you don't even do that in a normal polling scenario.
RESNIKYou say plus or minus three, and there's a -- you know, there's a set of assumptions there. And there's an entire domain of study that we need to look at that says, now that we've got this set of techniques, what are the assumptions being made? And how do we present this information in a way that accurately reflects the confidence you should have and the confidence you shouldn't have rather than potentially a misleading sound bite?
NNAMDII am glad you brought that up because that becomes -- and, Richard, thank you for call -- particularly important for some people crucial in the area of the stock market and investments. We mentioned the deal between Facebook and Politico. Well, Twitter has also cut deals with private companies giving them special access to user data. One company called Topsy Labs is marketing a sentiment analysis service as a possible leading indicator of stock price movements. What do you know about that?
RESNIKWell, I don't know anything specifically about Topsy Labs. It's actually kind of interesting. Years and years ago, I had a conversation with some folks who were interested in stock market prediction, and this was before the advent of the Web. And they're simply -- we decided not to pursue it because there just wasn't enough information out there.
RESNIKThe -- there's -- just as I said in terms of, you know, businesses that are looking to market their products or to understand how people are reacting better to their brands, where there's money involved, especially where there's potentially big money, there's going to be a lot of interest. There are some results that suggest that you can, in fact, get useful information and signal from social media text.
RESNIKSo, for example, Noah Smith, a professor at Carnegie Mellon University, is a leader in what's called text-driven forecasting. And he and colleagues looked at a paper where they were predicting the, you know, consumer confidence and showing that a measure of sentiment was actually tracking the consumer confidence, and so that lends promise to this. Although one of the things that's interesting to point out -- and Noah himself pointed it out -- is that if you go back to 2008, you see a really strange spike in the graph where his models predicted a lot of positivity.
RESNIKBut consumer confidence didn't look like that. When you drill down further there, what you discover is the way that they were finding economically related tweets was they were looking for the word jobs, and...
NNAMDINot a good idea in retrospect, is it?
RESNIKWell, in retrospect, when you go back and look, you discover that one of the things that was happening in 2008 was the release of the iPhone.
RESNIKAnd so what you found at this time was a very non-predictive spike. A similar...
NNAMDIAs in Steve, yes?
RESNIKYeah, exactly. And as a similar anecdote, there's a story out there -- I'm not sure whether or not it's true -- that every time Anne Hathaway gets mentioned in the -- you know, in the press, Berkshire Hathaway stock goes up because people like her. So there are a lot of subtleties here. The bottom line, again, is this is a game of signal and noise. And people have been trying forever to find new sources of signal to predict the stock market. They're not about to stop now.
NNAMDIWe're taking your calls at 800-433-8850. What do you think? Is sentiment analysis the future of political polling? Share your thoughts, 800-433-7750. Send us a tweet at #TechTuesday or an email to firstname.lastname@example.org. Here is Emil in Washington, D.C. Emil, you're on the air. Go ahead, please.
EMILHi, guys. As I've been listening, my question kind of became more and more widespread. Initially I was going to point out that if you were to take at random a sampling of my friends' posts on Facebook, for example, you would find (word?) to be measured but not any predictor of who's going to show up at the polls, who's going to, you know, go vote, who's going to involve themselves with campaigns. And yet I know that there are all sorts of systems in place for measuring likely voter turnout and so forth.
EMILJust how reliable is that? Because I know there seems to be an inverse relationship, at least among my friend circles, between how loudly people talk and how much they talk, let's say, about politics and how likely they are to do actually anything about it. The people who get involved in campaigns very infrequently post anything political.
NNAMDIWhat do you say, Philip?
RESNIKWell, I think the deep issue here has to do with the relationship between what people say in social media or anywhere else and real-world behavior. And this is a tough nut to crack. I mean, pollsters can ask, are you likely to vote? But they have to follow-up with the individuals to find out whether they actually voted. One of the advantages of a text-driven forecasting kind of approach, where you're predicting things in the aggregate, is you get the truth, if you wait for it.
RESNIKSo if you're making predictions from text about the way the stock market is going to move, it either moved that way, or it didn't. If you make predictions about who is going to win the election, you get an answer to that. Sometimes you have to wait a while for it. When it comes to predicting individual behavior and connecting with individual behavior, there are so many variables. And, like the old water cooler conversations, we don't have access to those things as much anymore.
RESNIKNow, it might be the case that on Facebook, when people go ahead and click the link that gives them the I Voted badge on their page, you're now going to have a source of information that you can tap into and try to correlate things they said with that. The question is, really, is what moves into being accessible online and what remains hidden from view in terms of these algorithms?
NNAMDIEmil, thank you very much for your call. A few months back, musician and teen idol Justin Bieber had a public relations nightmare on his hands. His handlers were worried about accusations he'd fathered a child out of wedlock, that they would cut into his fan base and his standing on social media, so they employed a new technique to head off any problems before they got started.
NNAMDIThey hired a sentiment analysis company to monitor all mentions of Bieber and assess whether the language of the tweets and Facebook postings were positive and negative.
RESNIKI -- my reaction to this actually was, I have to tell you, was -- if you had to pick out of those 120 emotions, I actually reacted to this in a joyful way because -- not anything to do with Justin Bieber, but my reaction is, if you're seeing an article as an academic about your field of research being discussed in Entertainment Weekly, you know things have gotten really, really interesting. This is simply another example of marketing and protecting a brand. And Justin Bieber is a brand.
RESNIKAnd what these folks did was they monitored this, and they got information. And in this particular case, it looked as if there was, you know, a big groundswell of support. And they breathed a sigh of relief and weren't as unhappy about it. Detecting whether people are supportive or unsupportive, detecting a set of basic emotions, I probably would go more with an inventory of six, Ekman's six basic emotions.
RESNIKThis is probably something where you can do useful things. And we're going to see more of it. If Justin Bieber's doing it, if the politicians are doing it, I think what we're seeing is the wave of the future.
NNAMDIYou're working from six. I mentioned earlier the company that has 120 different emotions. Well, that company is called Kanjoya. It's a social analytic company that was profiled at a recent LA Times article. In many ways, though, people like Justin Bieber are a kind of product that is used to sell other products. Is that how we should think about our elected leaders, like Mitt Romney and Newt Gingrich and Barack Obama? Are they just competing brands, if you will?
RESNIKWell, I'm sad to say that I think there is a lot of branding-type conversation. I expect that that's the case in politics. As both in academic and as a citizen, I find what's potentially happening with social media encouraging rather than discouraging because, when people think about brands, they think about the facets of brands, you know, the particular features of the product. What are people speaking to.
RESNIKOne of the opportunities we have here is to look bottom-up, not at what people are saying about the questions you know to ask, but what are they saying about the things that you didn't think to ask? There's going to be an interplay. There's going to be, in some sense, the same kind of adversarial relationship that you have with advertisers, you know, out there, as well as the same kind of relationship where, you know, advertisers are trying to push things that they think will be useful to you. Nothing is really different here. It's just moving into a new territory.
NNAMDIWe got an email from Harvey in Manassas West. "Would you ask your guest how he gains access to Facebook and tweet communications in order to do his analysis?" I will combine that with a tweet we got from @chrisnorris. "Tech Tuesday, does Facebook give the comment and name to political postings?"
RESNIKOK. So I should emphasize that I'm not involved with the Politico and Facebook partnership. My understanding, from what I've read about it, is that Facebook is not actually giving posts to Politico. My understanding is that they're -- what they were doing is they were doing analysis and then basically giving numbers, essentially charts, graphs, numbers, tables. That doesn't necessarily mean anybody's happier about it when they're analyzing their private posts.
RESNIKWhat those of us who work on this in academia do involves a whole variety of techniques. There are -- Twitter and other companies do make some data available for public and for research use. There are also people who aggregate blogs and make those things available for research use. And, you know, there's -- there are basically lots of sources, including scraping Web pages.
RESNIKIf you go back to the early days of this, you would find simply, you know, a grad student somewhere in a lab writing a crawler, you know, to go through a set of pages, parts out the HTML and, you know, try and find how many stars the review had and what the text was. The data is really a golden resource here, and so academics and researchers in industry are finding any which way they can. That said, Twitter, if you pay enough, you can get the Firehose. They call it the Firehose.
RESNIKYou can get everything that's coming through on Twitter. And other sources of data, I'm sure, have similar sorts of arrangements.
NNAMDIIt's a Tech Tuesday conversation on sentiment analysis. We're taking your calls at 800-433-8850. Do you think these programs -- these kinds of programs give us a more representative view than traditional polling? 800-433-8850 or go to our website, kojoshow.org. Join the Tech Tuesday conversation there. I'm Kojo Nnamdi.
NNAMDIIt's Tech Tuesday. We're talking about sentiment analysis, new frontiers in political and other polling, social media and sentiment analysis. Our guest is Philip Resnik. He is a professor in the Department of Linguistics and the Institute for Advanced Computer Studies at the University of Maryland and lead scientist at a company known as Converseon, a social media consultancy. Philip, we got this tweet from @lousylinguist: "I've never seen an NLP..." -- which is short for neuro-linguistic programming.
NNAMDI"I've never seen an NLP tech spark the imagination of business as fast as sentiment analysis at "Kojo Show." Tell us a little bit, Phyllis (sic) about -- Philip about the feel of computational linguistics. It may sound really esoteric, but it's fueling a lot of exciting innovations.
RESNIKSure. I'd be happy to. And I will point out, actually, NLP in this context, going to ambiguity, actually stands for natural language processing. You sometimes find confusion with neuro-linguistic programming, which is a somewhat sketchy field where people try to influence each other. You know...
NNAMDII understand natural language programming much better -- process...
RESNIKNatural language processing -- natural language meaning human language -- as opposed to computer languages is a field that's been around pretty much since the advent of computers, initially attempts at machine translation, for example, within this field. People who work in natural language processing basically are trying to make computers smarter about human language. So think about Watson answering questions on "Jeopardy." Think about Siri as a virtual assistant. In the early dreams of this in the '70s and '80s, think about (word?) or Data on "Star Trek: The Next Generation."
RESNIKThese days, natural language processing is out there in a way that it didn't used to be, and that's partly because of the advent of the Web and then the new wave of interest in social media. People have been working in this field for quite a long time. But it takes something like this to really bring it to the popular attention. And at this point, I'll say hopefully, I guess, because it's my field, I don't think there's any turning back. Language is the medium out there for so much of what goes on.
RESNIKAnd the fact that it's available online all of a sudden makes it a natural resource, both something to be studied as scientists and something to be worked with as engineers.
NNAMDIWell now that it's come to popular awareness, I guess people want to know. In some ways, we seem to be moving towards some sort of computer awareness of human emotions and human preferences. Is this a form of artificial intelligence?
RESNIKSo what I often say to people is if they tell you they've written a program that actually understands what you're saying, you should put your hand on your wallet and keep it there. Artificial intelligence is something that people have studied for a long time, and natural language processing is -- it can be considered a branch of AI, artificial intelligence. That said, notions like awareness, consciousness, you know, all of the philosophical conversation, bear very little relation to what people are actually doing today, both on the scientific side and the engineering side.
RESNIKWhat we're trying to do is understand how language works, and we're trying to take advantage of computational methods in order to create machines that will do useful things for us. And my own personal view on this is that the best way to go about this is to find ways of taking advantage of the data and the computational power to complement, not to replace, human insight.
NNAMDIOn to Liz in Alexandria, Va. Liz, your turn.
LIZHello. Good afternoon. My name is Liz, and I was very interested in this conversation because this is what I do as part of my museum consulting business in the D.C. area, and I actually wrote a paper about it that was published in The Exhibitionist academic journal.
LIZSo I essentially will go into social media sites like Yelp or Trip Advisor and look at a museum's ratings, take the reviews and then code them for different factors that talk about experience and then come up with a report that will tell the client, like, what essentially is happening as part of their visit experience so they can make changes in the real world that actually will then change or drive up reviews or satisfaction in the virtual world.
LIZI had no idea.
RESNIKYeah. Actually, it's -- I love the fact that your call came in right after I was commenting about the importance of human insight. What you call coding, people in my field often call annotation. It's basically taking unstructured material and labeling it with categories that are useful to you for a particular problem. The important thing to emphasize here is that it's not just a question of taking technology and throwing it at the wall and seeing if it sticks.
RESNIKWhat it's really a question of is using technology to extract information that we combine with human interpretation and insight and then use in order to solve problems, to address issues, to solve tasks, to provide value of some kind. The human in the loop is really, really central here. There's an analogy to the pollsters, right? What questions are being asked, and how are you asking them? There's something very crucial about having human insight in a particular domain.
RESNIKAnd I -- actually, that's one of the reasons that I'm actually working with political scientists and not just jumping into this. And, you know, on the academic side, I'm collaborating with some political scientists in order to try to understand the set of issues better.
NNAMDIThank you very much for your call, Liz. There have been 19 Republican debates to this point -- I'm so glad you mentioned politics -- a process that's been both fascinating and exhausting. Millions of people have been using Twitter while watching these debates, using the same hashtags and launching a kind of community experience, if you will. You're working on a new project that would give -- build on that experience or that -- the project would just simply build on the experience. Tell us about React Labs.
RESNIKWell, so in the React Labs project, we're developing a mobile app that runs on smartphones -- or you can use it on your computer -- that allows people to react to live events, such as political debates, in real time. And it, you know, prevents the kind of, you know, tweet your comments functionality that people are doing. You see the -- you know, you can see tweets fly by if you go watch these debates. But what it's doing differently is it actually presents you with a panel of buttons so that you can actually tap, agree, disagree, spin, dodge, and it...
NNAMDISo that you won't have to spend your own time writing everything.
RESNIKNot only don't you have to spend your own time writing it, it's well-defined -- this goes back to coding and categories, like the previous caller mentioned -- and it -- what it does is collect that material and put it up on an associated Web type -- website so that it is unfolding in real time from second to second. So what you've really got here -- unlike polling, which is after the fact -- is something that's actually in the moment and very, very fine grained.
RESNIKAnd we're doing this in order to look at people's reactions, not, in general, to how people answered questions, but to the specific statements they were made. And you can get a whole bunch of variety varying just second by second, and there's no other technique that we've come across that can do that.
NNAMDIWell, we used to yell at the TV on such occasions...
NNAMDI...or vent over the dinner table, but those comments would disappear into the ether as soon as we uttered them. Now that we have these transcripts of reactions to events like debates, we have a dataset that could be very interesting and very useful to political scientists and campaigns in terms of understanding which words and concepts resonate and how.
RESNIKAbsolutely. So my interest in this started with an interest in spin, how our -- how is language being used in order to manipulate our perception of issues? And one of the things that I realized is, A, it's hard to define, and, B, it's hard to get data about it. And so the genesis of React Labs was actually partly about wanting to create a new platform for political engagement, but also partly about wanting to have data that we could look at and see the large quantity of reactions providing converging evidence that a particular statement was perceived as being spin, or was perceived as dodging the question and so forth.
RESNIKSo these datasets, I'm hoping, will be particularly valuable. And if it's OK to say, if people want to sign up for a beta test, there is a website, reactlabs.org. And...
NNAMDIThere's a link at our website, kojoshow.org, for reactlabs.org.
NNAMDIThis kind of analysis could have very interesting uses in the real world, away from partisan politics. Take, for instance, Yen (sp?) in Alexandria, Va., and what he saw a week or two ago. Yen, you're on the air. Go ahead, please.
YENYes. Good afternoon. This is a fascinating discussion. I'm really enjoying listening to it. And I just remember that, last week, I saw a news story about the FBI planning to develop an early warning system based on material scrapped, so to speak, from social networks. And my question is, how is the technology that your guest is using similar or different to what the FBI is planning to use?
NNAMDIYou know, All Tech Considered, the technology segment on ATC, "All Things Considered," recently profiled the technologist who's doing this kind of work for the FBI and Homeland Security.
RESNIKAbsolutely. So a lot of work in natural language processing is driven, in the research community, by government funding. And governments, you know, are particularly concerned about understanding what's going on out there, the same way that companies and politicians are, albeit for different reasons. So the Defense Advanced Research Projects Agency or DARPA, the Intelligence Advanced Research Projects Activity, IARPA, there's a lot of activity on developing these basic techniques. Now, it is worth emphasizing -- because I can already feel, from out there, people's skin beginning to crawl.
NNAMDIYes. Where can I move to?
RESNIKRight. So, in my experience, the folks who are running these programs are extraordinarily cautious, in fact, insanely worried and cautious, about the idea that these techniques could be applied, you know, with some kind of a government imprimatur to communications within the U.S.
NNAMDIYeah. We've seen a recent story about two British nationals who were denied access to the U.S. because of their social media postings.
RESNIKYeah. The focus that I've seen out there in the community is largely on multilingual applications, not surprisingly for languages like Arabic and Chinese, but many others as well. So you absolutely find government involvement in supporting these techniques in the same way that government research, you know, led to the Internet in the first place. You're going to see that. That said, I think that there -- certainly among the people doing this work and, I strongly believe, among the people sponsoring it, there is a clear sense of boundaries here as well.
NNAMDIYen, thank you very much for your call. Here is Sam in Centreville, Va. Sam, you're on the air. Go ahead, please.
SAMI have a couple of comments or questions. The first one is that -- about recognizing sentiments. There's a whole lot of -- there are a whole lot of people out in the -- who just cannot recognize things like sarcasm and -- because even humans can't do it, it seems very, very difficult to expect computers to do it. My next thing has to do with selecting -- there are many, many people who either don't or can't -- social media, and it seems like they are being left out of your statistics gathering...
NNAMDIYeah. I can see Monty Python driving a computer crazy, but go ahead.
SAM...that, you know, the man-on-the-street interviews way back when selected against the -- many people because you would think, OK, what about the people who were in at work at the time that these polls were being taken? And who are these people who are actually being found by the pollsters? I have...
NNAMDIWhat would be your concern now with the use of social media instead of people being selected by pollsters?
SAMWell, the -- it's -- there are people who cannot access the social media or people who will not. For instance...
NNAMDIWho choose not to access it. We're running out of time, which is why I was going into shorthand for you, Sam, because we also got a posting by Jen on our website. "Wouldn't the digital divide skew the results as mapped to the general population? Would this create a larger divide if the results of the analysis gear future debates and issues to those who have access to Twitter, Facebook, the Internet?"
RESNIKYeah. So Sam and also in the -- on the website raises, actually, a really very important issue, the idea of who is being sampled here, right. There are really two sides to this. One is whether there is some kind of systematic bias that's taking place. And the thing to note here is there is already bias that takes place that people know that they have to adjust for in traditional polls. The other aspect of this is that we're heading toward the future, and there's going to be an increasingly large number of people whose conversations are -- about this are, in fact, taking place on social media.
RESNIKSo there are definitely issues of access, and this is really new territory. And we're going to have to address those.
NNAMDISam, thank you very much for your call. You do raise an important issue. Philip Resnik, thank you so much for joining us.
NNAMDIPhilip Resnik is a professor in the Department of Linguistics and the Institute for Advanced Computer Studies at the University of Maryland. He's also lead scientist at a company called Converseon, a social media consultancy. Thank you all for listening to this edition of Tech Tuesday. I'm Kojo Nnamdi.
Most Recent Shows
The trial of Jason Rezaian, a Washington Post reporter being held in Iran, began this week behind closed doors--and was adjourned unexpectedly. We explore his case and Iran's habit of locking up members of the press.
The Internet has made self expression easier than ever. But despite the burgeoning channels for free speech, there are dangerous limitations to this First Amendment right. Kojo speaks with journalist David Shipler about how this fundamental American right is still being tested.
Last week the Federal Trade Commission announced that, along with all 50 states and the District of Columbia, it was taking legal action against four 'sham' cancer charities. Allegations that the groups deceived donors to the tune of $187 million have rippled through the non-profit world. We consider what red flags donors should be on the lookout for and how data can - and can't - help us decide who's a good actor.