Copyright © 2020 Albuquerque Journal
For scientists, doing research on different topics related to COVID-19 can be like looking for a needle in a haystack.
There are thousands of peer reviews, scientific research papers and pieces of literature to sort through, a process that could take days to complete.
But a software program developed by Sandia National Laboratories may help scientists narrow that search to just a matter of minutes.
“We take algorithms that are more advanced that we use on various types of data and get that into a format that people who are not data scientists can make use of,” Sandia computer scientist Travis Bauer told the Journal. He said the work was done without using specialized software. “… Now, we can try to rapidly adapt that to their needs and created some software … that would support some of their questions.”
The project that created the software took about a week to complete, Bauer said. Thousands of documents released by the White House were used. They were put into the hands of subject matter experts. It differs from a program such as Google in that the peer reviews and studies are not available on the internet. Sandia’s program accesses Python, a platform that lists all peer-reviewed published scientific studies conducted in the U.S. and abroad. This program batches results graphically based on the similarity of the documents, and then colors them based on similarity to the snippet or information provided.
Bauer and a team of data scientists, engineers, a human-factors expert, and experts in virology, genetics, public health, biosecurity and biodefense were involved. Algorithms and compression data techniques were used to compare and analyze the documents.
“It’s a lot like if someone read all of the documents and put them all over the floor in different piles,” he said. “Documents that were on similar topics were closer together. Those that were not were further apart. So, all of the documents are sort of laid out in little piles in a space that the users can view.”
During the effort, scientists were able to whittle down more than 29,000 published coronavirus studies to just 87 in 10 minutes using identifying language and character similarities.
The software is downloadable from the internet, Bauer and Sandia spokesman Luke Frank said. But it does require knowledge of the Python computer program.