Using 1 trillion files helps scientist find a needle in a haystack - Albuquerque Journal

Using 1 trillion files helps scientist find a needle in a haystack

A Los Alamos National Laboratory scientist having trouble solving a stubborn research problem needed some help – his scientific simulations had generated a sea of data, but it took so long to search the data that he couldn’t find the information he needed. He found himself looking for the proverbial needle in a haystack. At the same time, the lab’s storage research team had been hard at work on another classic big data problem: creating massive numbers of files as quickly as possible. The day the team met with the scientist, you could say that Big Science and Big Data put their heads together – and now they’re making history.

A modern laptop computer typically has just short of a million files in all of its folders. And the Trinity supercomputer can create a million files in just about 5 seconds. But in extreme scale simulation, science researchers often deal with quantities far beyond a million. In fact, in this physicist’s simulation, he needed to generate trillions of particles – a million times larger than a million – and then look at the trajectory of only a few of them. Imagine you’re standing in the Sahara looking at trillions of grains of sand around your feet. Your challenge is to locate just one of them, then track its every movement as a dust devil whips through.

If the lab scientist tried to create a file for each of those trillion particles using Trinity, it would take 57 days just to create the files in that folder – and the supercomputer wouldn’t be doing anything else during that time. Trinity is too important to the lab’s stockpile stewardship mission to simply perform this one task for 57 days. A typical day in the life of Trinity supports multiple scientists, each pursuing important research projects in materials science, plasma physics, fluid dynamics – you name it. There had to be a better solution.

The Ultrascale Systems Research Center in the lab’s High Performance Computing Division is tasked with realizing the next generation of supercomputing. With efforts in storage research, novel computer architectures and extreme scale platform management, the center is uniquely positioned to tackle these seemingly impossible computing challenges. In particular, a collaboration between the center and Carnegie Mellon University had developed an experimental file system designed to support unprecedented numbers of files and folders. It wasn’t obvious it would work, but it seemed like a chance worth taking.

In February and March this year, the scientist began using the experimental file system to track particles on Trinity. It had been a long journey with many obstacles to overcome, but success was finally in sight. Still, it was not until May 2018 that Trinity churned out a trillion files in about two minutes for the first time. That staggering rate translates to about 7 billion files a second, approximately 20,000 times faster than running on Trinity without the new file system. Days later, the pace had jumped to two trillion files in two and a half minutes. The team never set out to create a trillion files. They simply wanted to improve data management for scientists. But when they looked down and saw the trillion files, they felt a brief moment of satisfaction: High-performance computing at Los Alamos continues to lead the way on extreme scale science.

In the high-performance computing universe, speed and efficiency in handling mind-boggling amounts of information are everything. Supercomputers enable previously impossible science, turning lifetimes of data-gathering into minutes. With new tools, scientists can manage ever-growing data streams faster and more efficiently than ever before. And the future of research depends on it.

Next challenge? So-called exascale computing. Running 50 times faster than today’s fastest supercomputers, exascale machines will help scientists simulate complex natural and engineered systems that range from the atomic to the cosmic. That research will include grand challenges in biology, astrophysics, materials and earth systems. Projects like the trillion-file effort are steps toward that exascale goal, and it’s almost here. The U.S. Department of Energy Exascale Project plans to have the next superfast generation of computers running by 2021.

Stay tuned.

Bradley Wade Settlemyer is a systems programmer and leads the storage systems research efforts at Los Alamos National Laboratory’s Ultrascale Systems Research Center. The team that enabled the trillion-file milestone includes Settlemyer and Gary Grider from Los Alamos, and collaborators from Carnegie Mellon University. This article was provided by LANL.


Home » Journal North » Journal North Recent News » Using 1 trillion files helps scientist find a needle in a haystack

Albuquerque Journal and its reporters are committed to telling the stories of our community.

• Do you have a question you want someone to try to answer for you? Do you have a bright spot you want to share?
   We want to hear from you. Please email

taboola desktop

Ojibwe artist Patrick Collins uses painting to find himself ...
Patrick Collins will be showing his ... Patrick Collins will be showing his work at the Santa Fe Indian Market.
Vomiting causes can be elusive
ABQnews Seeker
It’s nobody’s favorite pastime. Sadly, some ... It’s nobody’s favorite pastime. Sadly, some cats retch intermittently throughout their lives
Tears of joy fill Isotopes' locker room after Bernard ...
ABQnews Seeker
After more than 1,000 games and ... After more than 1,000 games and a decade in the minors, Wynton Bernard gets his Major League call up to the Colorado Rockies.
The end of the race -- Florida shelter for ...
'Back in the early 90s, we ... 'Back in the early 90s, we made a promise to all those greyhounds that had no place to go'
‘Stray’: How a virtual orange tabby is helping real ...
Livestreaming game play for charity isn't ... Livestreaming game play for charity isn't new, but the resonance "Stray" quickly found from cat lovers is unusual.
Editorial: PED’s ‘restorative’ discipline should not forget victims
The New Mexico Public Education Department's ... The New Mexico Public Education Department's heart is in the right place with a new discipline ...
As lawmakers eye changes, NM gets low marks for ...
ABQnews Seeker
Legislature looking to retool structure amid ... Legislature looking to retool structure amid ongoing revenue boom
Republicans open Hispanic community center in ABQ
ABQnews Seeker
GOP aims to attract groups who ... GOP aims to attract groups who typically vote Democrat
Forensics return from ‘Rust’ set
ABQnews Seeker
Sheriff's Office says agency still needs ... Sheriff's Office says agency still needs to review Alec Baldwin's phone records