At the University of Chicago Gleacher Center on August 12, as part of the Graham School’s Masters of Science in Analytics (MScA) Speaker Series, Kristian Hammond, Professor of Computer Science and Journalism at Northwestern University and Co-Founder and Chief Scientist at Narrative Science, gave a presentation to a room of current and prospective MScA students, as well as Chicago-area analytics enthusiasts, on natural language generation (NLG) as it relates to the role of today’s data scientist and the broader goals of Narrative Science’s focus on automated narrative generation from data.
Preparing to start his eighteenth year at Northwestern, Hammond conducts research in the area of human-machine interaction, context-driven information systems, and artificial intelligence. As founder of Northwestern’s Intelligence Laboratory (InfoLab), his team creates technology that bridges the gap between people and the information they need, a goal in line with work he’s been doing at Narrative Science for more than six years.
Calling language at once miraculous and uniquely human, while noting that no other creature can tell stories and share their understanding of the world, Hammond began his presentation by outlining the special capacity granted by our ability to communicate with one another and how it differs from signs of intelligence seen in certain animals as well as the processing capacities of machines.
“When we’re talking about language,” he explained, “we are talking about things that are more than words. We’re talking about stories, explanations, we’re talking about connecting things together. Just understanding specific words, which is what Alexa, Cortana, and Siri do, is not understanding language,” he stressed. “Language is contextual and offers us access to the world. It’s a window into the world. A spreadsheet or visualization of data doesn’t tell you about the world. It shows you data.”
And data, Hammond noted, is something that we have in phenomenal quantities. Ranging from data taken from our cars and bodies, to people’s movements around the country, to how much energy we use, to census and housing data, not to mention data in the form of cat videos, 250,000 Libraries of Congress of data are currently being recorded daily as part of our effort to meter and monitor everything.
“There’s hardly anything in the world that we don’t have some data on,” he said. “We have all this data: it’s phenomenal. Everyone wants to make informed decisions and you do that by gathering data. But we have it now on a scale beyond anything that we as humans can consider. It’s at the data layer of machines. It means machines know a tremendous amount, but, unfortunately, they can’t communicate much.”
The task of communicating the information extracted from the data layer, Hammond continued, has traditionally fallen to the data scientist, an unenviable role in his view. Stuck between IT and the business side, data scientists spend their time shuttling back and forth translating one side’s message for the other. It means the data scientist spends his or her career in conversation with people who don’t quite know what he or she is talking about, a situation Hammond caricatured by means of a cartoon showing a presumed data scientist in conversation with a chicken.
“Welcome to data analytics,” he joked to the room of prospective data scientists. “But there’s a solution to this problem: natural language generation. It involves the generation of language from a set of core facts that the machine knows. It automates the process of being a data scientist and of understanding what the data mean. Then it turns that meaning into language.”
He went on to unpack the steps leading to natural language generation. Through analyzing data, he explained, we arrive at a set of relevant facts. They emerge by means of drawing inferences, which separates the facts that pertain to the question at hand from those which don’t. Rather than serving as a starting point, then, facts emerge from the data. In the end, telling stories with language involves taking these pertinent facts and making decisions about their articulation in language.
“That’s what Quill does,” Hammond said, referring to Narrative Science’s NLG platform. “It goes through this process of analysis, inference, and generation. It generates ideas from data and then language from ideas. That’s the world for us—data, facts, and language. Given that we can hand a machine this data, we can open up the world. The machine will give us stories that it’s figured out through the lens of the data.”
Hammond’s presentation went on to describe the variety of uses such an application has in the real world, noting that its basic platform is sufficiently abstract to answer relevant questions for nearly any data-based context. Whether it’s translating a complex grid of data related to Lake Michigan swimming conditions into clear language relevant to the user’s needs, or relieving workers in the financial services industry of the time-consuming task of self-reporting, in all the scenarios described the tools offered by Narrative Science support a more efficient relationship to data as well as a more productive use of one’s time.
“How many of you, by a show of hands, spent time looking at a spreadsheet over the past week?” Hammond asked the room, leaving scarcely a single arm unraised. “The fact that you do that is a failure of technology. You did something a machine is supposed to do. Our goal is to create an integrated intelligence, which is an intelligence involving a machine and a person where each one is doing what they do best.”
Concluding the afternoon with a lively question and answer session, Hammond was given an opportunity to expand on his views regarding artificial intelligence and its impact on the future of jobs as well as the world more broadly. Suspecting perhaps that the relevance of studying data analytics had been thrown in question, an audience member asked whether there was any reason to study data analytics given the scale and imminence of widespread automation.
“Absolutely,” Hammond replied. “The entire point of data analytics is to understand how data works and how you can turn that data into meaning. The goal is to find the correct solution. We want to be right. But if you’re right, then in fact you’re not necessary anymore. The problem’s solved and you can build yourself. But where a new algorithm is needed, where we need to explore a context further, where we don’t understand the data yet, that’s a green field for people who care about analytics. And there will always be something more. If you like doing analytics, then do it. Absolutely.”