Cloud. Senior Product Manager Trinadh Kotturu. Martin Durant, maintainer of fsspec at Anaconda, Inc. Today, we’re unveiling a significant performance upgrade for AI and ML workloads built on the PyTorch ecosystem within Google Cloud. By natively integrating Rapid Storage—built on Google’s Colossus architecture—into PyTorch through the standard fsspec interface, we’re helping researchers and developers keep their GPUs utilized like never before. The core challenge: keeping GPUs continuously fed with data. As model sizes grow, data loading and checkpointing often become the primary bottlenecks in training. Model training requires data preparation tasks that involve retrieving and processing terabytes or even petabytes of data from remote storage systems such as object stores. Standard REST-based storage access often fails to deliver the extreme throughput and ultra-low latency demanded by modern distributed training, resulting in underutilized GPU resources. Rapid Bucket: high-speed storage powered by bidirectional gRPC. Our new Rapid Bucket solution delivers high-performance object storage within dedicated zonal buckets.
Google Developers Blog