Together AI Launches Instant Clusters with NVIDIA GPU Support

September 9, 2025 Admin 0 Comments

Alvin Lang
Sep 09, 2025 21:10

Together AI announces the general availability of Instant Clusters, providing self-service NVIDIA GPU clusters for rapid AI training and inference, enhancing scalability and efficiency.

Together AI has unveiled its Instant Clusters, a groundbreaking service offering self-service GPU clusters equipped with NVIDIA H100 and B200 GPUs. This development aims to streamline AI infrastructure by providing ready-to-use clusters for both training and inference, drastically reducing setup times and operational complexity.

Efficient and Scalable AI Solutions

The Instant Clusters are designed to meet the demands of AI-native companies, allowing them to manage sudden surges in computational needs efficiently. According to Together AI, these clusters can be provisioned in minutes, eliminating lengthy procurement processes and enabling rapid scaling of AI operations. This service is particularly advantageous for organizations requiring large-scale reinforcement learning and distributed training.

Advanced Cloud Ergonomics

Developers can expect a cloud experience that is API-first and self-service, aligning with modern cloud computing standards. The Instant Clusters simplify the traditionally complex setup of multi-node GPU clusters by automating the integration of drivers, schedulers, and network fabrics. This approach not only enhances productivity but also ensures consistency across various environments.

Optimized for High-Performance Training

Equipped with NVIDIA Quantum-2 InfiniBand and NVLink technologies, the clusters offer ultra-low-latency, high-throughput communication necessary for rigorous multi-node training tasks. This configuration supports both Kubernetes and Slurm orchestration, providing flexibility and reproducibility in AI deployments.

Scalable Inference Capabilities

As AI applications scale, the need for increased inference capacity becomes critical. Together AI’s solution allows for quick adjustments in cluster size, maintaining performance during peak usage times without the need for architectural changes. This ensures seamless transition from testing environments to full-scale production.

Reliable Performance and Pricing

To ensure reliability, Together AI implements thorough testing and monitoring of its clusters, including burn-in and connectivity checks. Pricing for the service is straightforward, with options available for hourly, daily, and longer-term usage, providing flexibility to match various business needs.

The introduction of Together Instant Clusters represents a significant advancement in AI infrastructure, offering a robust, scalable solution that caters to the evolving needs of AI-driven enterprises.

Image source: Shutterstock

Share with your friends!