Finding the needles in 'big data' haystacks - Albuquerque Journal

Finding the needles in ‘big data’ haystacks

The SmartTensors tool developed by Los Alamos National Laboratory can sort through massive data streams to find the key features making the data understandable. (Courtesy of Los Alamos National Laboratory)

Copyright © 2021 Albuquerque Journal

A seemingly bottomless ocean of “big data” has flooded our world. Bits and bytes are pouring in from sources ranging from satellites and MRI scans to massive computer simulations and seismic-sensor networks, from security cameras to smartphones, from genome sequencing of SARS-Cov-2 to COVID-19 test results, from social networks to texts zipping from phone to phone.

Making sense of this ever-increasing racket is vital to national security, economic stability, individual health and practically every branch of science – and the job is getting easier, thanks to the SmartTensors artificial intelligence tool we have developed at Los Alamos National Laboratory.

Without any human guidance, this technology sifts through millions of millions of bytes of diverse data to find the hidden patterns and features that make the data understandable, revealing its underlying processes or causes. SmartTensors also can identify just how many features are needed to make sense of enormous, multidimensional datasets.

In analyzing data, finding that optimal number reduces a massive set of data to a scale that’s manageable for computers to process and subject matter experts to analyze. The features that SmartTensors extracts are explainable and understandable chunks of data.

What makes a face?

Take, for example, facial recognition algorithms, which rely on large datasets. A face is a set of key facial features and features that matter less – noses, eyes, eyebrows, ears, mouths, cheeks, foreheads, jawlines, hairlines and chins. SmartTensors can be pointed at a large number of photos of faces and isolate those features as the important ones for recognizing faces. It also can determine how many of those features – the optimal number – are required to do the job accurately and reliably. For instance, maybe only specific shapes of eyes, noses and mouths are needed for facial recognition. It might also be essential to categorize all the faces that have oval eyes and slim noses.

In other database examples, the features needed to represent the whole dataset might not be that obvious. Very large sets of data – measured in billions of millions of bytes – typically are made up of unknown features obscured by a torrent of less useful information and noise in the data.

Vast datasets, such as COVID-19 test results or information from earthquake sensors, are formed exclusively by things we can observe directly. But in big-data analytics, it is difficult to directly link these observables to the underlying processes that control the behavior and generate the data. These processes or hidden features are not directly observable and are confusingly mixed with each other, with unimportant features, and with noise.

Cocktail party problem

The problem is similar to extracting the individual voices at a noisy cocktail party with a set of microphones recording the chatter. How do you isolate one or more conversations while individuals are moving around and talking? The number of hidden features here is the number of individual voices and their characteristics, which might include the pitch and tone of each person’s voice, for instance. Once that’s determined, it’s easier to follow a conversational thread or a person.

Similarly, to sort out the important information in a dataset, SmartTensors organizes the information into a data cube, or tensor, that’s made of three or more dimensions. Each dimension is a particular category of information within that data. So, in the cocktail party example, the pitch of a voice might be one dimension, its tonal qualities another, its volume a third, and so on. If you think of the data cube as being made up of many small, stacked cubes, each one represents information about some or all of the features of the data. The representation of the data in the form of a tensor allows fast processing as the AI churns through all the data.

As you might expect, we’ve applied SmartTensors to more important problems than separating individual conversations at a cocktail party. SmartTensors is helping us understand climate processes, watershed mechanisms, hidden geothermal resources, carbon sequestration processes, chemical reactions, protein structures, pharmaceutical molecules, cancerous mutations in human genomes, and more. In a world swimming in big data, this kind of tool just might help us all keep our heads above water.

Boian Alexandrov is an AI expert and principal investigator on the SmartTensors project in the Physics and Chemistry of Materials group at Los Alamos National Laboratory. Velimir “Monty” Vesselinov is an expert in machine learning, data analytics and model diagnostics in the Computational Earth Science group at Los Alamos, and also a principal investigator on the project. SmartTensors was funded by the Laboratory Directed Research and Development (LDRD) program at Los Alamos. For more information, visit the SmartTensors website.

Home » Journal North » Journal North Recent News » Finding the needles in ‘big data’ haystacks

Albuquerque Journal and its reporters are committed to telling the stories of our community.

• Do you have a question you want someone to try to answer for you? Do you have a bright spot you want to share?
   We want to hear from you. Please email

taboola desktop

Santa Fe Opera wins international recognition as 'Festival of ...
ABQnews Seeker
The International Opera Awards named the ... The International Opera Awards named the artistic nucleus just north of Santa Fe 'Festival of the Year' at a ceremony in Madrid, Spain.
Artist Sandro Gebert's 'Ideogramer' an homage to street artists
'Ideogramer' will hang at Santa Fe's ... 'Ideogramer' will hang at Santa Fe's Gebert Contemporary through Dec. 31.
Looking at the New Mexico heroines that left their ...
Photographs of five women are on ... Photographs of five women are on the cover of 'New Mexico Heroines of the Twentieth Century' and they are a multicultural sample of the ...
Named for San Ignacio, church has called Santa Barbara-Martineztown ...
Just west of Interstate 25, San ... Just west of Interstate 25, San Ignacio Catholic Church sits within the Santa Barbara-Martineztown neighborhood minutes away from Albuquerque High School.
SF poet Arthur Sze awarded honor for lifetime achievement
Arthur Sze was awarded the Ruth ... Arthur Sze was awarded the Ruth Lilly Poetry Prize, which is given out by the Poetry Foundation in October.
Buen Vieje returns to the North Fourth Arts Center ...
North Fourth Art Center resident company ... North Fourth Art Center resident company Buen Viaje Dance is back with two performances at 7 p.m. Saturday, Dec. 10, and 2 p.m. Sunday, ...
Upstart Crows to perform 'A Christmas Carol' resplendent in ...
The Upstart Crows will stage their ... The Upstart Crows will stage their own annual readings of the classic Christmas fable on Dec. 17, 23 and 25 in Santa Fe.
New Mexico Gay Men's Chorus singing 'Scrooge!' with a ...
Based on the 1970 film musical ... Based on the 1970 film musical of the same name, 'Scrooge!' will play at the Lensic Performing Arts Center in Santa Fe on Friday, ...
Is it 'safe' to grow tomatoes in Styrofoam?
Are you asking me if, since ... Are you asking me if, since the containers are made of Styrofoam, would the tomatoes pick up any chemicals leaching from the Styrofoam? That ...