Everything about large language models is big — giant models train on massive datasets across thousands of NVIDIA GPUs.
A team of experienced scientists and developers at Amazon Web Services creating Amazon Titan foundation models for Amazon Bedrock, a generative AI service for foundation models, has been using NVIDIA NeMo for over the past several months.
“One key reason for us to work with NeMo is that it is extensible, comes with optimizations that allow us to run with high GPU utilization while also enabling us to scale to larger clusters so we can train and deliver models to our customers faster,” said Leonard Lausen, a senior applied scientist at AWS.
Think Big, Really Big
Parallelism techniques in NeMo enable efficient LLM training at scale. When coupled with the Elastic Fabric Adapter from AWS, it allowed the team to spread its LLM across many GPUs to accelerate training.
EFA provides AWS customers with an UltraCluster Networking infrastructure that can directly connect more than 10,000 GPUs and bypass the operating system and CPU using NVIDIA GPUDirect.
The combination allowed the AWS scientists to deliver excellent model quality — something that’s not possible at scale when relying solely on data parallelism approaches.
Framework Fits All Sizes
“The flexibility of NeMo,” Lausen said, “allowed AWS to tailor the training software for the specifics of the new Titan model, datasets and infrastructure.”
AWS’s innovations include efficient streaming from Amazon Simple Storage Service (Amazon S3) to the GPU cluster. “It was easy to incorporate these improvements because NeMo builds upon popular libraries like PyTorch Lightning that standardize LLM training pipeline components,” Lausen said.
AWS and NVIDIA aim to infuse products like NVIDIA NeMo and services like Amazon Titan with lessons learned from their collaboration for the benefit of customers.