Finding the needles in 'big data' haystacks - Albuquerque Journal

Finding the needles in ‘big data’ haystacks

The SmartTensors tool developed by Los Alamos National Laboratory can sort through massive data streams to find the key features making the data understandable. (Courtesy of Los Alamos National Laboratory)

Copyright © 2021 Albuquerque Journal

A seemingly bottomless ocean of “big data” has flooded our world. Bits and bytes are pouring in from sources ranging from satellites and MRI scans to massive computer simulations and seismic-sensor networks, from security cameras to smartphones, from genome sequencing of SARS-Cov-2 to COVID-19 test results, from social networks to texts zipping from phone to phone.

Making sense of this ever-increasing racket is vital to national security, economic stability, individual health and practically every branch of science – and the job is getting easier, thanks to the SmartTensors artificial intelligence tool we have developed at Los Alamos National Laboratory.

Without any human guidance, this technology sifts through millions of millions of bytes of diverse data to find the hidden patterns and features that make the data understandable, revealing its underlying processes or causes. SmartTensors also can identify just how many features are needed to make sense of enormous, multidimensional datasets.

In analyzing data, finding that optimal number reduces a massive set of data to a scale that’s manageable for computers to process and subject matter experts to analyze. The features that SmartTensors extracts are explainable and understandable chunks of data.

What makes a face?

Take, for example, facial recognition algorithms, which rely on large datasets. A face is a set of key facial features and features that matter less – noses, eyes, eyebrows, ears, mouths, cheeks, foreheads, jawlines, hairlines and chins. SmartTensors can be pointed at a large number of photos of faces and isolate those features as the important ones for recognizing faces. It also can determine how many of those features – the optimal number – are required to do the job accurately and reliably. For instance, maybe only specific shapes of eyes, noses and mouths are needed for facial recognition. It might also be essential to categorize all the faces that have oval eyes and slim noses.

In other database examples, the features needed to represent the whole dataset might not be that obvious. Very large sets of data – measured in billions of millions of bytes – typically are made up of unknown features obscured by a torrent of less useful information and noise in the data.

Vast datasets, such as COVID-19 test results or information from earthquake sensors, are formed exclusively by things we can observe directly. But in big-data analytics, it is difficult to directly link these observables to the underlying processes that control the behavior and generate the data. These processes or hidden features are not directly observable and are confusingly mixed with each other, with unimportant features, and with noise.

Cocktail party problem

The problem is similar to extracting the individual voices at a noisy cocktail party with a set of microphones recording the chatter. How do you isolate one or more conversations while individuals are moving around and talking? The number of hidden features here is the number of individual voices and their characteristics, which might include the pitch and tone of each person’s voice, for instance. Once that’s determined, it’s easier to follow a conversational thread or a person.

Similarly, to sort out the important information in a dataset, SmartTensors organizes the information into a data cube, or tensor, that’s made of three or more dimensions. Each dimension is a particular category of information within that data. So, in the cocktail party example, the pitch of a voice might be one dimension, its tonal qualities another, its volume a third, and so on. If you think of the data cube as being made up of many small, stacked cubes, each one represents information about some or all of the features of the data. The representation of the data in the form of a tensor allows fast processing as the AI churns through all the data.

As you might expect, we’ve applied SmartTensors to more important problems than separating individual conversations at a cocktail party. SmartTensors is helping us understand climate processes, watershed mechanisms, hidden geothermal resources, carbon sequestration processes, chemical reactions, protein structures, pharmaceutical molecules, cancerous mutations in human genomes, and more. In a world swimming in big data, this kind of tool just might help us all keep our heads above water.

Boian Alexandrov is an AI expert and principal investigator on the SmartTensors project in the Physics and Chemistry of Materials group at Los Alamos National Laboratory. Velimir “Monty” Vesselinov is an expert in machine learning, data analytics and model diagnostics in the Computational Earth Science group at Los Alamos, and also a principal investigator on the project. SmartTensors was funded by the Laboratory Directed Research and Development (LDRD) program at Los Alamos. For more information, visit the SmartTensors website.

Albuquerque Journal and its reporters are committed to telling the stories of our community.

• Do you have a question you want someone to try to answer for you? Do you have a bright spot you want to share?
   We want to hear from you. Please email

Nativo Sponsored Content

taboola desktop


DA’s lawsuit seeks GPS data for defendants on pretrial ...
ABQnews Seeker
District Attorney Raúl Torrez on Thursday ... District Attorney Raúl Torrez on Thursday sued the administrator of the 2nd Judicial District Court in Albuquerque, alleging that court officials are violating the ...
Broad coalition urges support for PNM/Avangrid merger
ABQnews Seeker
After PRC meeting, AG Balderas concerned ... After PRC meeting, AG Balderas concerned about commissioners' impartiality
Sandia Prep junior sings her way to Carnegie Hall
ABQnews Seeker
Sofia Chalamidas hopes to attend college ... Sofia Chalamidas hopes to attend college at either New York University or Carnegie Mellon, and become a professional singer
Why 'But, Judge, it's Christmas!' can't carry any weight
From the newspaper
Every year around this time, I ... Every year around this time, I hear a phrase repeated over and over. I hear it when a party requests ...
Editorial: NMSU spreads cheer, lets folks clear tickets in ...
It's literally spreading holiday cheer, allowing ... It's literally spreading holiday cheer, allowing folks to clear parking tickets with a donatio ...
Editorial: Pastor's endorsement of candidate merits probe
Albuquerque megachurch leader Steve Smothermon has ... Albuquerque megachurch leader Steve Smothermon has been no stranger to controversy throughout the pa ...
Sora an elusive year-round NM resident
From the newspaper
Winter is an exciting time in ... Winter is an exciting time in New Mexico as we are host to a wide variety of bird species that spend ...
'Encanto' a charming tale of magical new Disney family
The end credits for today's animated ... The end credits for today's animated films run for so long that if you live within a mile or two of ...
'House of Gucci' is pure, unapologetic and over the ...
Everything in "House of Gucci" is ... Everything in "House of Gucci" is over the top. The accents. The performances. The fashion. The sett ...