Story Tools
 E-mail Story
 Print Friendly













LANL

  • Beating Not Tied to LANL, Police Say (06-10-05)

  • Accounts of Man's Beating Differ (06-10-05)

  • Strip Club Stories Vary For Auditor (06-08-05)

  • Lab Whistle-Blower Beaten (06-07-05)

  • LANL Worker, Blogger Retiring (06-03-05)

  • Preserving Homestead Heritage (05-29-05)

  • Lockheed Adds Partners to LANL Bid (05-28-05)

  • Gov. Urges LANL Employees to Hang On (05-28-05)

  • UC to Fight for Lab Contract (05-27-05)

  • UC Moves Closer to a Bid for LANL (05-26-05)

  • LANL, UC-San Diego Join Forces for Degree (05-23-05)

  • LANL Critic Whistled Before (02-13-05)

  • Lab Auditor Claims Retaliation (02-10-05)

  • LANL Sees Budget Hike; Sandia Funds Drop (02-10-05)

  • LANL Boss, Security Under Attack (02-09-05)

  • Guest Opinion: LANL Workers Will Get Benefits (02-06-05)

  • Missing Journals Had Column Critical of LANL (02-05-05)

  • Lab Gets Funds To Go 'Medialess' (02-02-05)

  • Blog a Forum for LANL Workers (01-31-05)

  • Comments on Draft Lab Contract Go to Agency (01-30-05)

  • 'Missing LANL Disks Weren't (01-29-05)

  • Beryllium Found at Lab (01-21-05)

  • Lawmakers Echo LANL Employees' Concerns (01-25-05)

  • Regular Activities To Resume at LANL (01-22-05)

  • UC May Have LANL Bid Partner (01-21-05)

  • DOE Nominee Wants Lab Benefits To Stay (01-20-05)

  • Anti-Nuke Groups May Bid on LANL Contract (01-20-05)

  • LANL Workers Threaten Exodus (01-18-05)

  • Lab Employees Organize (01-18-05)

  • Lab Waste Flows Restricted (01-15-05)

  • Chancellor To Recommend UT Not Pursue Contract (01-14-05)

  • Shutdown Cost Review Sought (01-12-05)

  • Lab's Management Criteria Change (01-10-05)

  • LANL Impact Under DOE Review (01-08-05)

  • LANL May Lose Task to Sandia Labs (01-08-05)

  • More Time Given for Comments on Management Criteria (01-07-05)

  • FBI Completes Investigation of Missing Disks (01-07-05)

  • Bingaman Wants Comments Deadline Extended (01-06-05)

  • Lab Awards Nearly $800,000 in Contracts (01-02-05)

  • Lab's Nuke Waste Transfer on Track (12-27-04)

  • LANL Disputes DOE Report on Particle Accelerator (12-26-04)

  • Lab Facility's Future Uncertain With Move of Nukes (12-26-04)

  • Lab Managers Wanted Fraud Report Held, Official Says (10-16-04)

  • LANL Employees' Jobs Guaranteed (10-02-04)

  • Nanos Creating a Climate of Fear (08-11-04 guest commentary)

  • LANL Retirees Voice Anger, Anguish (08-08-04)

  • LANL Improvements Can't Wait (07-25-04 guest commentary)

  • LANL Restrictions Now Nationwide (07-24-04)

  • Lab Worker Aided FBI in Theft Case (05-30-04)

  • Scientist Wants To Rank LANL Waste (05-09-04)

  • PAYING TOO MUCH FOR A BAD MACHINE
    (04-18-04 guest commentary)

  • Lab's Temps To Go Permanent (03-17-04)

  • LANL's Nuke Site Standing Solidified (03-14-04)

  • Group: Suit Causes Labs To Cut Support (02-12-04)

  • Lab Says Spending Controlled (01-25-04)

  • LANL Losing Cleanup Funds (01-22-04)

  • LANL Needs To Face Reforms (01-18-04 guest commentary)

  • LANL Sued on Pay Rates (01-07-04)

  • DOE To Take Bids for LANL Contract (04-30-03)

  • LANL Zinged on Computer Security (04-29-03)

  • Gov., Senators Urge Delay of LANL-U.C. Decision (04-26-03)

  • Domenici Backs Bidding for LANL Contract (04-23-03)

  • DOE Slams Lab Report on 2001 Accident (03-26-03)

  • Ex-Lab Official Stunned by Move (03-25-03)

  • LANL Audits Chief Leaving (03-14-03)

  • LANL Officials Defend Firings (03-13-03)

  • LANL No 'Den of Thieves,' Ex-Official Says (03-13-03)

  • LANL Security Chief, Deputy To Leave Lab (03-11-03)

  • Several Lab Workers Say They Were Slandered in Testimony (03-08-03)

  • LANL Managers Brace for Congressional Grilling (03-07-03)

  • Keep UC Running LANL, Richardson Says (03-01-03)

  • LANL Deputy Did Not Resign (02-28-03)

  • Testimony on LANL Called Outrageous (02-27-03)

  • Clock Running Out for LANL (02-23-03)

  • Secret Witness To Be at LANL Hearing (02-20-03)

  • LANL Petitioners Support UC Management (02-19-03)

  • Lab Employees Want UC To Stay (02-15-03)

  • 96% of Lab Purchases Reconciled, UC Auditor Says (02-11-03)

  • 2 Get New LANL Jobs (02-06-03)

  • Lab Fraud Put U.S. at Risk, Officials Say

  • DOE Report Slams Lab Managers (01-31-03)

  • DOE Report on Lab Fair, Congressional Delegates Say (01-31-03)

  • DOE Calls Firing of Whistleblowers "Incomprehensible" (01-30-03)

  • DOE Denies Retribution in Suspension of LANL Nuke Safety Officer (01-30-03)

  • Lab Vendors Losing Sales (01-29-03)

  • LANL Wants To Gain Employees' Trust (01-28-03)

  • 2 LANL Workers To Stay in Jobs (01-25-03)

  • California Lab Faces Scrutiny Amid LANL Problems (01-24-03)

  • LANL Business Division Restructured (01-24-03)

  • Lab Boss Backs Rehiring Sleuths (01-21-03)

  • University Rehires LANL Sleuths (01-18-03)

  • LANL Says it May Have Lost Hard Drive (01-17-03)

  • LANL Boss To 'Drain the Swamp' (01-16-03)

  • LANL's Head of Audits Reassigned (01-11-03)

  • No Pay Cuts Came With Lab Demotions (01-10-03)

  • University of Calif. Names Lab Oversight VP (01-09-03)

  • LANL Security Managers Demoted (01-08-03)

  • 'Lab Could've Been Heroes,' Fired Security Worker Says (01-05-03)

  • Many LANL Purchases Unreconciled (01-04-03)

  • LANL Shakeup -- Top 2 Managers Quit (01-03-03)

  • Director's Tenure Was Turbulent (01-03-03)

  • LANL Changes Draw Congressional Reaction (01-02-03)

  • LANL Director Browne Resigns (01-02-03)

  • Text of John Browne's Resignation Letter (01-02-03)

  • U.S. Senator Sets Sights on LANL (12-12-02)

  • Lab E-Mail Backtracks Order To Provide Documents (12-12-02)

  • Lab Told To Clean Up Its Act (12-11-02)

  • LANL Wants Copies of Probe Papers (12-10-02)

  • U.S. House Latest To Probe LANL (12-09-02)

  • Tracking Lab Property Not Easy (12-08-02)

  • Labor Dept. Finds for Mid-'90s Lab Whistle-Blower (12-06-02)

  • Lab Says It's Out to Find Fraud (12-05-02)

  • Charges Not New to LANL (12-04-02)

  • University Won't 'Tolerate' LANL Theft (11-23-02)

  • Lab Staff Lax on Purchase Reports (11-22-02)

  • Another $723,000 in Items Missing (11-21-02)

  • DOE Team Arrives To Probe Lab Problems (11-19-02)

  • $3 Million of LANL Items 'Lost' (11-17-02)

  • Missing LANL Items High-Tech Devices (11-17-02)

  • LANL Official Announces Resignation (11-09-02)

  • LANL Probe Targets Workers (11-06-02)

  • Official LANL site

    Journal North
     
    Home
     Sports
     Opinion
     Entertainment



    More North opinion


  • Journal North:  Home | Sports | Opinion | Obits | Entertainment

              Front Page  north  opinion




    Paying Too Much for a Bad Machine

    By Chris Mechels
    Guest Commentary
        In a March 15 column in the Journal North, John Morrison, leader of Los Alamos National Laboratory's computer division, argued that a recent story raising concerns about the performance of ASCI Q supercomputer was off the mark, and that Q was really quite successful in many ways. He made claims that "The Q system is delivering everything it's been asked to do," and that outside experts "endorsed Q's accuracy, performance and reliability." Unfortunately, Mr. Morrison relied on argument and hyperbole, not facts, and he did not provide any data or cite any documents which would allow us to check his claims. I have recent information which serves to challenge many of the claims that he made.
        First, let's examine Morrison's claim that Q's hardware problems are "routine in every experimental, high-performance computer system." A commonly used figure of merit for hardware reliability for large supercomputers such as the 20 tera-flop Q is 50 hours mean time to failure. (One teraflop equals one trillion computer operations per second). In fact the upcoming Sandia 40 tera-flop "Red Storm" computer, scheduled for 2004 delivery, has such a requirement. The LANL contract for Q, which I have, has such a requirement. The 2003 figure for the Q is nowhere near 50 hours mean time to failure; in fact it is 3.82 hours. This information is from LANL's own records, obtained through a public records request. This incredibly bad reliability is driven by processor failures that average 107 failures per month. To put this in perspective, the LANL Blue Mountain machine, Q's predecessor, has more processors than Q but averaged about one processor failure every other month. Morrison claims that this extreme failure rate is due to the large number of Q processors and "Los Alamos' high altitude." This is simply not true. The Pittsburgh Supercomputing Center machine uses the same processor as Q but has a mean time to failure rate of 11 hours, showing the same order of reliability problems as Q. Pittsburgh is not at a high altitude.
       
    Compensation costs
        The failures are due to a Compaq design decision, which Morrison mentions. LANL knew, very early on, that Q would be unreliable due to the design but elected to proceed with the procurement. While one can compensate for this bad design, the compensation itself costs performance and is not foolproof. The result is that computing tasks are lost due to failures, and some large tasks become impractical due to unreliability. For more understanding of the Q problems, see http://www.cs.sandia.gov/SOS7/presentations/morrison.ppt, a presentation by Morrison himself. It contains a frank discussion of Q's problems, quite unlike Morrison's comments in the Journal.
        Let's look at the claim that "as the home of the first Cray machines," LANL has experience in dealing with the problems of "essentially unique machines." I was the Cray employee most responsible for the first Cray at LANL in 1976, before I joined LANL, and I observed that for the first four years the machine was essentially useless for its intended uses due to poor or unavailable software. It was sort of like trying to use your PC without Windows. It was not until LANL's sister lab, Lawrence Livermore, got their first Cray in 1980 that the LANL machine became useful because we brought in software from Livermore. So the LANL expertise, which Morrison lauded, did not measure up. Livermore saved the day...
       
    Not a unique machine
        As to Morrison's claim that problems are "inevitable in machines built to run applications of unprecedented size and complexity," this does not apply to Q. First, Q is not unique. Two previous installations (at facilities here and in France) showed up the problems of this machine before LANL's was installed. There was time to change horses, but LANL chose not to. Pacific Northwest Laboratory took delivery of a supercomputer from the same vendor, Hewlett Packard, but with a different processor type, and it seems to work quite well. The Livermore supercomputers, from IBM, and the Sandia "Red" supercomputer, from Intel, do not have the reliability problems exhibited by Q.
        The JASON report, which Morrison claims gave Q glowing reports, didn't. Go look for yourself at http://www.fas.org/irp/agency/dod/jason/asci.pdf. In fact, one of the JASON members, in an interview, thought it remarkable that LANL was getting useful work from Q "given its poor reliability." Section 5.5.3 of the report is critical of another serious LANL shortcoming, in software development: "There was a striking difference between the high quality of software engineering at Sandia as compared to Los Alamos National Laboratory" and "at Los Alamos in particular, better ways ... must be found." Los Alamos has long resisted modern software development practices, such as software engineering and software configuration management.
        It has always been risky to accept LANL claims about supercomputing successes, and this continues with Q. For four decades, LANL has made claims which cannot be supported by data. The U.S. taxpayers did not get what they paid for in Q; quite the contrary. LANL paid $168 million for a very unreliable machine, delivered over one year late... Almost immediately after completing the Q acquisition, in the summer of 2003, LANL bought another 11 tera-flop supercomputer for $10 million. This machine is also much more reliable and usable. I conclude that LANL spent $150 million more than necessary for a machine of Q's capability, and got a very unreliable machine in the bargain. We can hope that LANL, and Morrison, learned from the experience, but their history offers little encouragement. We are left to wonder how much money they will waste on the next supercomputer.
        Mechels is a retired Los Alamos National Laboratory employee.