Zum Inhalt

Speeding Up AI: Bringing Google Colossus to PyTorch via GCSFS and Rapid Bucket

Cloud. Senior Product Manager Trinadh Kotturu. Martin Durant, maintainer of fsspec at Anaconda, Inc. Today, we’re unveiling a significant performance upgrade for AI and ML workloads leveraging the PyTorch ecosystem on Google Cloud. By integrating Rapid Storage, powered by Google’s Colossus storage architecture, directly with PyTorch via the industry-standard fsspec interface, we are enabling researchers and developers to keep their GPUs busier than ever before.. The challenge: Keeping GPUs fed. As model scales increase, data loading and checkpointing frequently turn into the main performance bottlenecks during training. Model training requires data preparation tasks that involve retrieving and processing terabytes or even petabytes of data from remote storage systems such as object stores. Standard REST-based storage access often fails to deliver the massive throughput and ultra-low latency demanded by modern distributed training, resulting in underutilized GPUs. Rapid Bucket provides high-speed storage through bidirectional gRPC. Our new Rapid Bucket solution delivers high-performance object storage within dedicated zonal buckets.

  Google Developers Blog