Finding the needles in 'big data' haystacks - Albuquerque Journal

Finding the needles in ‘big data’ haystacks

The SmartTensors tool developed by Los Alamos National Laboratory can sort through massive data streams to find the key features making the data understandable. (Courtesy of Los Alamos National Laboratory)

Copyright © 2021 Albuquerque Journal

A seemingly bottomless ocean of “big data” has flooded our world. Bits and bytes are pouring in from sources ranging from satellites and MRI scans to massive computer simulations and seismic-sensor networks, from security cameras to smartphones, from genome sequencing of SARS-Cov-2 to COVID-19 test results, from social networks to texts zipping from phone to phone.

Making sense of this ever-increasing racket is vital to national security, economic stability, individual health and practically every branch of science – and the job is getting easier, thanks to the SmartTensors artificial intelligence tool we have developed at Los Alamos National Laboratory.

Without any human guidance, this technology sifts through millions of millions of bytes of diverse data to find the hidden patterns and features that make the data understandable, revealing its underlying processes or causes. SmartTensors also can identify just how many features are needed to make sense of enormous, multidimensional datasets.

In analyzing data, finding that optimal number reduces a massive set of data to a scale that’s manageable for computers to process and subject matter experts to analyze. The features that SmartTensors extracts are explainable and understandable chunks of data.

What makes a face?

Take, for example, facial recognition algorithms, which rely on large datasets. A face is a set of key facial features and features that matter less – noses, eyes, eyebrows, ears, mouths, cheeks, foreheads, jawlines, hairlines and chins. SmartTensors can be pointed at a large number of photos of faces and isolate those features as the important ones for recognizing faces. It also can determine how many of those features – the optimal number – are required to do the job accurately and reliably. For instance, maybe only specific shapes of eyes, noses and mouths are needed for facial recognition. It might also be essential to categorize all the faces that have oval eyes and slim noses.

In other database examples, the features needed to represent the whole dataset might not be that obvious. Very large sets of data – measured in billions of millions of bytes – typically are made up of unknown features obscured by a torrent of less useful information and noise in the data.

Vast datasets, such as COVID-19 test results or information from earthquake sensors, are formed exclusively by things we can observe directly. But in big-data analytics, it is difficult to directly link these observables to the underlying processes that control the behavior and generate the data. These processes or hidden features are not directly observable and are confusingly mixed with each other, with unimportant features, and with noise.

Cocktail party problem

The problem is similar to extracting the individual voices at a noisy cocktail party with a set of microphones recording the chatter. How do you isolate one or more conversations while individuals are moving around and talking? The number of hidden features here is the number of individual voices and their characteristics, which might include the pitch and tone of each person’s voice, for instance. Once that’s determined, it’s easier to follow a conversational thread or a person.

Similarly, to sort out the important information in a dataset, SmartTensors organizes the information into a data cube, or tensor, that’s made of three or more dimensions. Each dimension is a particular category of information within that data. So, in the cocktail party example, the pitch of a voice might be one dimension, its tonal qualities another, its volume a third, and so on. If you think of the data cube as being made up of many small, stacked cubes, each one represents information about some or all of the features of the data. The representation of the data in the form of a tensor allows fast processing as the AI churns through all the data.

As you might expect, we’ve applied SmartTensors to more important problems than separating individual conversations at a cocktail party. SmartTensors is helping us understand climate processes, watershed mechanisms, hidden geothermal resources, carbon sequestration processes, chemical reactions, protein structures, pharmaceutical molecules, cancerous mutations in human genomes, and more. In a world swimming in big data, this kind of tool just might help us all keep our heads above water.

Boian Alexandrov is an AI expert and principal investigator on the SmartTensors project in the Physics and Chemistry of Materials group at Los Alamos National Laboratory. Velimir “Monty” Vesselinov is an expert in machine learning, data analytics and model diagnostics in the Computational Earth Science group at Los Alamos, and also a principal investigator on the project. SmartTensors was funded by the Laboratory Directed Research and Development (LDRD) program at Los Alamos. For more information, visit the SmartTensors website.

Home » Journal North » Journal North Recent News » Finding the needles in ‘big data’ haystacks

Insert Question Legislature form in Legis only stories

Albuquerque Journal and its reporters are committed to telling the stories of our community.

• Do you have a question you want someone to try to answer for you? Do you have a bright spot you want to share?
   We want to hear from you. Please email

taboola desktop

ABQjournal can get you answers in all pages


Questions about the Legislature?
Albuquerque Journal can get you answers
Email addresses are used solely for verification and to speed the verification process for repeat questioners.
Mathcore band Rolo Tomassi to make first tour stop ...
ABQnews Seeker
Rolo Tomassi is heading out on ... Rolo Tomassi is heading out on a month-long tour in the United States and the tour makes a stop at Launchpad on Wednesday, June ...
36th Festival Flamenco de Alburquerque celebrates the art of ...
ABQnews Seeker
Beginning on Friday, June 9, and ... Beginning on Friday, June 9, and running through June 17, the National Institute of Flamenco puts on the oldest and largest flamenco festival outside ...
'Two Sinners and a Mule' filmed across New Mexico
ABQnews Seeker
"Two Sinners and a Mule" is ... "Two Sinners and a Mule" is currently available to rent or buy on streaming platforms.
Documentary looks at Carole King's groundbreaking concert
ABQnews Seeker
"Carole King: Home Again – Live ... "Carole King: Home Again – Live in Central Park" airs at 9 p.m. Sunday, June 4, on New Mexico PBS, channel 5.1. It will ...
HED's message to UNM and New Mexico State: Cooperate ...
ABQnews Seeker
The state's Higher Education Department wants ... The state's Higher Education Department wants the rivalry games between the University of New Mexico and New Mexico State men's basketball teams to continue. ...
Keeping score: Boxing judge, IBF executive Levi Martinez won ...
ABQnews Seeker
Las Cruces resident Levi Martinez, 70, ... Las Cruces resident Levi Martinez, 70, has judged 199 world title fights and has worked bouts involving some of the sport's biggest name. ...
Editorial: UNM, NMSU need to set game safety protocols
From the Editorial Board: Has the ... From the Editorial Board: Has the UNM-NMSU men's basketball rivalry gotten so ugly a SWAT team needs to be on standby?
Editorial: City should restore ABQ Museum funds
From the Editorial Board: Albuquerque leaders ... From the Editorial Board: Albuquerque leaders should restore to the Albuquerque Museum the more than $2.5 million cut from the original $200 million general ...
Sports Speak Up! On Lobos-Aggies basketball impasse and the ...
Featured Sports
A CALLER TO a sports talk ... A CALLER TO a sports talk radio show stated ‘I don't want to play the blame game' in regard to the NMSU-UNM tragedy last ...