Science AMA Series: We are Livermore Computing, home of the supercomputers at Lawrence Livermore National Lab! AUA!


We are members of Livermore Computing (LC) at the Lawrence Livermore National Laboratory (LLNL) in Livermore, California. LC is home to some of the world's fastest supercomputers, including Sequoia, the 4th fastest in the world. Scientists use our High Performance Computing (HPC) machines to run physical simulations: from geology, astronomy, and cardiac arrhythmia, to the US nuclear stockpile and other problems of national interest. We bring in the machines, keep them running fast, and provide scientists with the tools they need to run these simulations.

We have varying roles in system administration, software development, data archiving, visualization, operations, facilities management, user interfaces to the center and data, user support, and research. Our developers lead and contribute to many open source projects: From Linux kernel infrastructure like file systems, such as Lustre and ZFS on Linux; to industry spanning cluster management tools, such as SLURM, Flux, and pdsh; and beyond to all aspects of scientific and cluster computing with spack, STAT, and SCR. For more info about our various open source efforts, visit For more information about our center, visit

So if you have a question about any part of running or using supercomputers at HPC centers, we'll be back at 1 pm ET, feel free to ask and we will answer as many questions as we can!

EDIT: Good Morning from the West Coast! We see that everyone has started asking fantastic questions! We will start answering some questions!

EDIT 2: Thanks for all the great questions. We hope to come back soon. Next time, we plan to try to answer your questions in parallel! Learn more, contact, or apply to join us here:

Our thanks to Reddit and r/Science for providing us with the opportunity to have this AUA!

We leave you with a photo of some of us in front of Sequoia today!

Have a nice day everyone! :)

What is one thing about your work that the general public pushes back against and what one thing would you like them to understand?


Sometimes people ask us why we need ever more powerful supercomputers. No matter how much computing power we have provided, our national missions create a need for more complex simulations and thus a need for more powerful supercomputers.

Find more information on the nation's exascale computing project see To see examples of application areas requiring exascale computing, check out the ECP Research Areas.

It seems like there's a shift towards distributed computing to perform computationally expensive calculations because it's generally cheaper. Do the tasks that are performed on LC's supercomputers necessarily need to be performed on supercomputers (instead of cheaper distributed computing)? If so, why?


/u/_perpetual_student_ is correct on the general character of things simulated on supercomputers. A more casual take on the topic is to think of the whole system being simulated. If every part of the system constantly influences every other part of the system, the computing resources simulating each part of the system must be able to quickly communicate with one another, this requires very low latency. If the system is only locally coupled, or interactions are more infrequent, loosely coupled distributed computing with higher latency might be more appropriate.

Amusingly, even on these supercomputers, we’re often interested in where jobs get scheduled, and can see major slowdowns based on where on the supercomputer jobs get scheduled.

How excited are you for Sierra? I'm surprised you didn't mention it in your opening statement since it should be delivered soon! There are some crazy things that it will bring to the table: POWER architecture instead of Intel (welcome back IBM, I guess?), HBM and NVMe memory, coherent memory, and a large distributed file system. I'm really looking forward to seeing both Summit and Sierra in operation early next year, and I'd love to see how the new 3D stacked memory affects large scale supercomputing applications. And performance boosts of 4-5x over Sequoia? Incredible!

Do you think we'll see more supercomputers being built with power efficiency in mind? Utilizing GPUs and other hardware accelerators seems to be the new trend here in the US, while the chinese have gone for a RISC-like simple processor that is replicated as many times as possible. Which do you think will win out for future supercomputers?

Thanks for doing what you guys do, and contributing back to the open source community. One of the best things about this field is how much the community helps each other out to further our understanding.


We are extremely excited for Sierra!

We have people developing tools, programming models, porting millions of lines of code, and just generally trying new and interesting things. A lot of them open source. We have sysadmins getting familiar with new hardware, our networking folks are looking at how to make networks keep up with the incredible computational speedup. We also have several teams focused on helping applications developers prepare for running efficiently on Sierra.

Power efficiency is extremely important. We are proud to be active contributors, users, and supporters of the open source community. For info on our software go to

Do the future prospects of Artificial Intelligence ring a positive or negative tune, and why?


It depends on how AI is used and the intentions of the user. With the increasing complexity of programs (multiple millions of lines of code), we might reach a point where further development of some applications require help from an AI agent. One useful area to explore is monitoring the condition of supercomputers and predicting failures. We will always need more intelligence, both human and artificial. Several LLNL projects are investigating AI use in HPC. One effort leverages the TrueNorth brain-inspired supercomputer, you can read more at

Hello, thank you for doing this.

What are you looking for in an application to perform calculations using your HPC cluster?

I'm working my way up to some expensive molecular modeling calculations and one of my goals is to get to the point where I'm ready to submit them to a supercomputer like yours over the next few years (PhD thesis work). Any hints I can get to make sure my application is top notch would be appreciated.


Broadly speaking, we try to run calculations that a) need a supercomputer to be successful and b) can actually make use of one. If you want to run on supercomputers, most HPC centers require you to submit a proposal for what you want to do and justify the cycles you need. So to start with, you need to make a case that your code will scale and that it’s solving an important problem.

Generally, to run on our machines you need to be collaborating with someone at LLNL or labs we work with. There are many proposal processes that fund people here — our computer scientists work on LDRD proposals ( with academic collaborators, and we have larger, ongoing collaboration with many universities through the PSAAP program ( We also collaborate with industry through our newer HPC4Mfg program ( Other labs also have proposal processes for CPU time — the DOE INCITE program is one of the most popular ( — it’ll get you time on machines at ANL and ORNL.

Or, you can always find someone at LLNL who’s interested in what you’re working on and come join us for an internship! We have researchers doing work in molecular dynamics — check out this simulation in our ddcMD code with 9 billion atoms ( or the writeup on the Gordon Bell prize we won with that code ( Also, Berni Alder still comes in for lunch at the central cafeteria (, so you might even catch him there.

Is alcohol bad for computing?

Or does it hurt your Livermore?


Alcohol is bad for computers, especially the non-liquid cooled ones. Livermore is known for its wine. Alcohol is good for computer scientists. See the following for reference:

What day-to-day maintenance does your job require to keep the supercomputers in working shape?


It's a complex operation to keep a supercomputing center running. Our 24/7 staff constantly monitors the systems and does hardware repairs like replacing RAM modules, replacing CPUs, Hard drives, etc. Other day-to-day maintenance tasks include software updates, hardware upgrades, account maintenance, facility maintenance, tool development, and many other tasks.

How do you start working in your field? What degrees or certifications are you looking for in perspective candidates?


Well... we have a geologist who is a Linux kernel hacker!

Seriously, we have a range of degrees represented across Livermore Computing. Some of us have no degree at all. We have employees with associates degrees, bachelors, masters, and PhD’s. Some are straight out of school, others have been in HPC since before the internet existed.

Fields range from Computer Science and Computer Engineering, to Mathematics and Statistics degrees. We also staff who come from the sciences directly, including those from physical and life sciences such as Computational Biology, Physics, and others!

Look for employment opportunities here: Apply for an internship: Here’s the page for this summer’s HPC Cluster Engineer Academy:

This is a kinda dumb question but how do super computers work


No such thing as a dumb question, only an opportunity to learn! Supercomputers or HPC, work by distributing a task across numerous computers. They benefit by high speed communications between these computers working on the parts of the task. They also leverage message passing between the processes on the various computers to coordinate working on the overall task. This high speed, coordinated, distributed computation across numerous computers is what makes a supercomputer work.

At the software development end of things, what are some common tasks you have to accomplish?


There are many layers of the software stack that runs on supercomputers, from highly parallel scientific applications, to operating systems kernels and filesystems, to scripts that automate system administration. The specific deliverables depend on the project and the nature of the software being developed. Most of our projects use industry standard tools and methodologies for revision control, code review, issue tracking, and continuous integration. To get an idea of the wide variety of open source software we develop, check out

For someone that is majoring in computer engineering, do you have any tips or advice for getting into the hardware industry?


One of our computer engineers mentioned the following areas as hot topics in the field.

One of the biggest supercomputing concerns these days is power efficiency. Designing and building low energy, high performance hardware is at the core of the move towards Exascale Computing.

Another big concern being tackled in Computer Engineering is driving down the latency of CPU <-> Memory communications as well as those between CPUs both on the same chip and across multiple CPUs.

What's the silliest things that can be done on supercomputers?


We often need to test the heat tolerance of supercomputers, so one of our engineers was asked to write a computation to generate heat. Not do otherwise productive work, just get the computer as hot as we possibly could.

Hi, we are from the Secretary of Energy from Mexico. We're working on setting up a national energy supercomputing center here in Mexico. We're fortunate to have the guidance of Horst Simon, Mateo Valero and other incredible advisors. What would be your key recommendation towards setting up a successful project? We'd love to chat sometime too!


Greetings from the north!

That is a very broad topic and we would encourage you to reach out via

What is the overlap like between the work done by CASC and the HPC groups at LLNL? Do many projects span from the algorithmic level to the low level infrastructure work that your group does, or is the work and software develeped mostly independently? How is this collaboration, organization, and consultation with HPC users/domain scientists organized?


That’s a big question! It’s hard to provide a short answer because there are so many ways that CASC, Livermore Computing (LC) staff, and the rest of the lab work together. Livermore Computing is tasked with running the computing center and supporting users. Part of that is ensuring that we keep buying machines they can use. We have a team that keeps track of the hardware outlook for the next 5 or 10 years, and we work closely with vendors to understand how our applications will perform on new systems. LC also maintains software like Lustre, SLURM, and TOSS, and we also do advanced development, e.g. on Flux, our next-generation resource manager/scheduler. We also have staff who sit directly with code teams and help them to optimize their algorithms.

CASC is a research organization, and its staff work with LC in all of those areas. Teams often include people from both organizations. As an example, we have a project called “Sonar” where LC staff are working with CASC researchers to set up a data analytics cluster, with the aim of understanding performance data from all the applications that run on our clusters. LC admins and developers are helping to set up monitoring services, hardware, databases, etc., and CASC researchers help with building the data model and analyzing it with Spark and some home-grown analysis tools. Flux ( is a similar project — it’s developed primarily by LC staff, but CASC folks are involved doing research into ways to do power- or storage-aware scheduling. The lines can be blurry — some people in LC work on research projects and some people in CASC write code and do development to support them.

Beyond LC and CASC, both organizations also work with code teams, which can include software developers, computational scientists, and domain experts. Typically these folks come from a program that is funding the work, but they also work with LC and CASC researchers on algorithms and optimization. A good example of that might be BLAST ( and MFEM ( BLAST is a higher-order hydrodynamics code developed collaboratively between researchers in CASC and code developers working for the programs. It allows people to simulate fluids much more accurately using curved meshes. MFEM is the meshing library it uses. LC staff have been involved with optimizing the performance of the code, as well as helping to get it running on GPUs. Another example would be Apollo (, a CASC project that automatically tunes the performance of application codes that use RAJA.

TL;DR, the lab is a big place. Organizations can be fluid, and there are many collaborations between different teams. People at LLNL are encouraged to work across organizations. All in all it’s a pretty vibrant environment!

First, thanks for the work you and everyone at Livermore does for the benefit of all of us. Contributing some of your work to open source projects is great too!

I imagine much of the work you do is highly customized to support the goals of each scientific project so you must have templates and frameworks in place so you're not always reinventing the wheel but can you tell us any stories of projects where there was just no prior work to start from and you had to "bake the pie from scratch"?


We have a long history of developing solutions where no prior work existed. This tradition goes back to the 1960's when LLNL developed a time-sharing operating system run on mainframe supercomputers.

In the early 2000's when we first fielded Linux clusters, we lacked cluster management tools. We developed the resource manager, SLURM, and various scalable utilities that were necessary to run large clusters, including pdsh, munge, conman, powerman, and many others. Check out for more examples.

Existing debugging tools were not able to scale to the level of concurrency of our supercomputers, so we developed the Stack Trace Analysis Tool (STAT). This was developed in collaboration with the University of Wisconsin and received an R&D award in 2011. We've used STAT to debug jobs running on the order of 3 million MPI tasks.

Do you have to get every response reviewed and approved before you post it?


This answer is currently under review... Yes.

This is where my uncle works! Say hi to Greg T for me!!!

Now for my question: -Given the rise of cryptocurrency and it's dependence on computing power to solve blocks and earn currency. Are there any plans to use existing supercomputing power to mine for cryptocurrency?


Greg says hi!

DOE supercomputers are government resources for national missions. Bitcoin mining would be a misuse of government funds.

In general, though, it’s fun to think about how you could use lots of supercomputing power for Bitcoin mining, but even our machines aren’t big enough to break the system. The number of machines mining bitcoin worldwide has been estimated to have a hash rate many thousands of times faster than all the Top 500 machines combined, so we wouldn’t be able to decide to break the blockchain by ourselves ( Also, mining bitcoins requires a lot of power, and it’s been estimated that even if you used our Sequoia system to mine bitcoin, you’d only make $40/day ( The amount we pay every day to power the machine is a lot more than that. So even if it were legal to mine bitcoins with DOE supercomputers, there’d be no point. The most successful machines for mining bitcoins use low-power custom ASICs built specifically for hashing, and they’ll be more cost-effective than a general purpose CPU or GPU system any day.

Why do you think Livermore has attracted so many high level scientists? its seems to be just a lil suburb of SF but its also the destination for so many with post docs.

edit - ill be there this weekend, which local winery would you recommend?


It's the people, the world-class facilities, and the national missions! Many postdocs tell us they were attracted by the HPC facilities, capabilities, and the science.

As for the wineries, there are a lot of great ones in Livermore. Visitors often appreciate the wine tours to find their favorites.

Have a safe and fun trip! :)

We've already got like a million ways to work with software on HPC computers and your team comes out with Spack. Why?


Mainly because the existing ones didn't do what we wanted. We manage codes with upwards of 100 dependencies, multiple languages (Python, C, C++, Fortran), and we want to be able to build them with different compilers, MPI implementations, and dependency versions. Spack makes it easy to build lots of different configurations, and to share them with others. Also, we get a lot out of the community we built around it. We started with ~150 packages written at LLNL and now there are more than 1,700, mostly contributed by folks at other labs and universities. Check out the recent RFC podcast on Spack for more:

I also work at another leading national laboratory for supercomputing but in a different field. Hello from the east coast!

How is working with the international super computing community?

And do you guys have pizza parties when you've reached a new record or "first?"



How is working with the international super computing community?

It is really amazing! We collaborate with people all over the world. We have a lot of projects in the LLNL GitHub organization which see contributions from labs across the country as well as users all over the world. Being able to take advantage of such a multitude of perspectives and requirements in designing software has led to much stronger products.

And do you guys have pizza parties when you’ve reached a new record or “first?”

Occasionally we do! Also, when we mess up, we have a tradition of learning from the mistakes. If someone messes up, it is customary for them to bring in donuts, explain what went wrong, and have a discussion about how to improve for next time.

Say I might live nearby, do you do tours of nonclassified areas? Or is pretty much everything off-limits?


LLNL has free community tours that include NIF, CAMS, NARAC, and other parts of the lab. Some of us have organized tours for our college alumni groups, and local high schools come to tour the lab as well. Tours are typically ~3 hours starting at 9am. You can contact us about tours at

I'm a computer science student about to graduate with a deep interest in HPC. What's the best way to enter the field?


For somebody already graduating, we do advertise our jobs, both the Department of Energy and (National Labs)[] have career pages.

For people who are still students there are also internships, Research Experiences for Undergraduates, and we believe all of those career sites also have internships.

We love to see computer scientists going onto HPC system administration because we find that it is not often taught in college. So we created the HPC Cluster Engineer Academy (info at We have a wide range of internship opportunities, see for more information.

The best way to gain experience as a student is through internships at HPC centers!

Hey, I've got a friend interning there this summer.

How often do supercomputer(s) go down, and what's the procedure when one does? Can you also talk about the daily maintenance that goes into running them?


Our supercomputers rarely go down unexpectedly. We do take them down occasionally for maintenance and upgrades to the computer and facilities. When one does go down we have 24/7 staff who are trained to bring it back up and return it to users. We can usually bring a system back online within a few hours of it going down unexpectedly.

Hi, thanks for taking the time!

How do you help users cope with diversity of environments between small-scale experiments that can run on a laptop or local desktop and actual long computations happening on the HPC you are providing?

The more requirements the code has (e.g specific library compiled with specific MKL or BLAS version and architecture etc) the more of a nightmare this becomes. This is a problem we are facing with large imaging data from 10K+ subjects and it's not obvious to figure out a pipeline that works both in "dev" and in "prod". We are experimenting with containers for this but what are your thoughts ?

Thanks !


Our involvement with the open source community really helps. It gives users who run on our machines, users who run in HPC setups at other labs, or in the EU, curious people who try things out on their machines… they all introduce different requirements and show us different bugs.

The more requirements the code has (e.g specific library compiled with specific MKL or BLAS version and architecture etc) the more of a nightmare this becomes.

This is exactly why we invented Spack. See our previous response regarding Spack here. Also, we encourage you to check out Spack's GitHub page at

We are also investigating the use of containers and virtualization in the HPC environment. We’ve found that nothing beats getting the software into the hands of a diverse group of users who will put it through its paces and tell you when it falls over.

does the lab have tours?


In fact, we do! Our Discovery Center routinely hosts field trips for middle school and high school students, inspiring the next generation of scientists, computer scientists, and engineers. For more information on tours go to and for more information about our Discovery Center go to

Thanks for all the hard work on ZFS on Linux. Very excited about the recent 0.7.0 release; it's a huge leap forward.


Thanks! We've been running the 0.7.0-rc releases on some of our large filesystems. We'll be running 0.7.0 very soon.

Not a question, but wanted to say that my father worked at the Lab from its inception tip 1989. worked on Project Chariot/Plowshares with Teller and other test analytics. Welcome to Reddit!


Thank you! We’re glad to be here! :)

How's the wine tasting in Livermore?


Excellent. Napa makes auto parts. Livermore makes wine.

What kind of hardware accelerators do you use? I'm imagining giant FPGA clusters. Any specialised ASICS? Or just good old GPU power?


GPUs figure heavily in our future. See our previous response about Sierra here.

For the average Joe like myself just how much more computing power do your computers have than a standard household desktop? What makes them "super"?


Cape not included... Just to give you an idea of the scope, the average home computer has 2-8 CPU cores and 4-16 GB of RAM, where as our current biggest supercomputer, Sequoia has 1.6 million CPUs available with a total of 1.6 PB of RAM (

Is the pool still there? I used to go to a nearby summer daycare and I learned to swim in the pool at Lawrence Livermore.


The pool has been closed for several years, as too many scientists melted when they got wet. Survivors now swim for fitness at the LARPD pool on East Ave (a mile or two away from the Lab).

How do you link so many CPUs and GPUs to run simultaneously?


This one is at the core of a lot of the work we’re doing right now.

The computers are linked via high-speed networks. There is a software abstraction called Message Passing Interface (MPI) that allows applications to use all the CPUs of the various computers together. We have an abstraction called RAJA which lets us run loops on GPU or CPU (threaded CPU) without too much code change. The really tough question is “how do you move your data between CPUs and GPUs if you want to change your mind mid-computation?” For this the vendors have some solutions (Unified Memory), but we also have projects like CHAI, and we’re well on our way to having these million line codes able to move between CPUs and GPUs quickly.

How do you handle patching of machines? Do you do rolling patches as nodes finish jobs? Do have whole-cluster outages for certain tasks? I know of an university cluster that patches once per academic semester (every 3-4 months). Do you follow a similar policy?


Livermore Computing actually develops and maintains our own distribution of Linux called TOSS, a fork of the RHEL operating system. We update frequently and as needed. Typically updates are performed as jobs finish, but certain updates (for example, BIOS updates) require a complete system outage. Our goal is to keep our systems secure, highly available, consistent, and FAST! More info on TOSS can be found at

What is the typical job queue backlog for your systems or what percentage of time do they sit idle?


Our computers are fully utilized. The queue backlog is dependent upon a number of factors, including job size (large jobs typically favored over smaller jobs), job wall clock duration, project priority and project utilization. We give users tools within our job scheduler/resource manager to estimate when a particular job will start as well as how to backfill a job into idle nodes.

Do you feel that GPUs have a promising future in scientific computing?

Should we all start rewriting our codes to run on GPUs, or is it not worth the time investment? (Obviously this is somewhat problem dependent; I am mostly interested in turbulent fluid simulations, but your opinion on the general trend in scientific computing would be interesting to hear.)


Not all applications are a good fit for GPUs, but some are a great fit and use languages such as CUDA to get the best possible performance.

Our next HPC system (Sierra) contains more than 16,000 GPUs, so we definitely see a very bright future for general purpose GPU computing in HPC. Our strategy is to make use of abstraction layers such as OpenMP or RAJA to expose the parallelism in our applications in such a way that they will work on multiple different architectures. This way, time spent exposing parallelism in applications will be time well spent regardless of which future architectures are most successful.

There are so many great questions here already, but I'd like to know if someone could shed a bit of light on the software development side? How much software maintenance is typically involved with supercomputers? Do you develop software for the computers specifically or more of a client requirement? Thanks for doing the AMA!


You're welcome! There is a great deal of software maintenance involved with supercomputers. Running systems at this scale on bleeding edge hardware means we are often the first to discover bugs that don't affect other sites. Major software rollouts involve an extensive period of testing and stabilization. We also maintain dozens of compilers and communication libraries that enable HPC users to optimize their application performance. As to your last question, the Livermore Computing center develops systems software to provide general services needed by supercomputers, such as parallel filesystems, job schedulers, and resource managers. Users of the supercomputers develop software based on the requirements of their research projects. The center also has a group of developers devoted to creating applications to help our users manage and visualize large amounts of data. Another group focuses on debugging at scale, checkpoint restarts, I/O tools among other things.

For a more extensive list, go to

Hello from Pacific Northwest National Laboratory!


Hello PNNL!

Do you guys use SLURM to manage your cluster?


Managing jobs on our clusters is what we created SLURM for! We have a wide range of MPI implementations available to our users and many do use MVAPICH2.

More info at

I noticed that you have some ansible playbooks on your software page. What software do you primarily use to manage your infrastructure? Do you use anything else, like, cfengine, puppet, chef, ansible, or your own in-house tools? Would the outside tools even be well-suited for your environment?


LLNL uses a variety of different configuration management systems. In Livermore Computing our Supercomputers use a combination of CFEngine and custom built tools.

Why do many supercomputers use CPUs over GPUs? I know some systems have GPU nodes for visualizations but most are CPUs.


CPUs are more general purpose than GPUs. By primarily having CPUs we can have a "one size fits all" for the many needs of our users.

Do you do any research on quantum computing? If so, what do you think the future holds for QC?


Yes, we recently had a great talk from our Quantum Simulation Group in our Physics and Life Sciences Division.

Check it out at

What is the greatest "bottleneck" a super computer can have ?


This really depends on the jobs the supercomputer is doing. Different jobs have different bottlenecks. Some common bottlenecks are CPU/RAM performance, network performance, and storage performance.

WhAts something the super computer does that can't be done on normal computers?


High bandwidth and low latency interconnections, combined with large parallel filesystems (as much as 10PB per file system) enable complex, large-scale simulations.

Quantum computers, have you discussed on how they can be implemented, to democratize them?


Additional Assets


This article and its reviews are distributed under the terms of the Creative Commons Attribution 4.0 International License, which permits unrestricted use, distribution, and redistribution in any medium, provided that the original author and source are credited.