LOS ALAMOS — Not everyone has the aptitude to comprehend computer languages. But a group of astute La Cueva High School students who do came up with a way for computers to understand relationships between words in human language through statistical analysis, thereby winning the top prize at this year’s New Mexico Supercomputing Challenge.
The announcement came during an awards ceremony last week at the Church of Christ in Los Alamos.
“We thought it was important to have a project that had a practical application,” said Ari Echt-Wilson, who along with her younger brother, Eli, and Justin Sanchez formed the winning team.
Their program could be used to help computer search engines better identify articles related to a particular subject, Internet radio stations to develop playlists around a certain topic, or even analyze whether a political candidate is trying to evade a question during a political debate.
In fact, the trio used transcripts from the three presidential debates and the vice presidential debate in 2012 as data for their project, “Learning and Analyzing Topics in Human Language.”
“We set out to discover how strongly one word relates to another, then see if the computer could learn to use them to analyze topics,” Justin said.
“We wanted to know, ‘Can this be done?’ And we got really good results,” added Eli.
The team identified words that typically appear together and implemented the algorithm in Java. To speed up the process, they wrote a parallel program in Hadoop MapReduce, which they discovered was surprisingly accurate.
“We then used correlations between words to group words into topics,” they wrote in the executive summary of their report. We used a straightforward clustering algorithm to find groups of words that had high correlations with each other.”
For example, they found words such as “amnesty,” “citizenship” and “pathway” to be among the words that had a high correlation with “immigration” during the debates. So they would expect to see such words used in a response to a question about immigration posed to a candidate.
If not, or if the use of such words were scarce, the candidate may have been evading the question.
To test how well their computer program analyzed the results, they obtained a separate set of data obtained by individuals, using SurveyMonkey.
“The computer’s answers correlated well to the results of a survey of human judgment,” they concluded.
Justin plans to attend Harvard next year to study physics and computer science, while Ari is Stanford-bound with the intent of pursuing a career in international relations. Eli will be back at La Cueva next year for his junior year.
The team beat out the 2011 champion, Cole Kendrick, a freshman at Los Alamos High School, who took second place this year for his computer simulation project of Saturn’s ring structure.
La Cueva’s Alexandra Porter placed third for her project “Simulation of Approximate Computing to Numerical Methods.”