At the Google Cloud Next 2022 event in October last year, the OpenXLA project officially surfaced. Google cooperated with the open source AI framework promoted by technology companies including Alibaba, AMD, Arm, Amazon, Intel, Nvidia and other technology companies. Committed to bringing together different machine learning frameworks to enable machine learning developers to proactively choose frameworks and hardware.
On Wednesday, Google announced that the OpenXLA project is officially open source.
Project link: https://github.com/openxla/xla
By creating a unified machine learning compiler that works with multiple different machine learning frameworks and hardware platforms, OpenXLA can accelerate the delivery of machine learning applications and provide greater code portability. This is a significant project for AI research and applications, and Jeff Dean also promoted it on social networks.
Today, machine learning development and deployment are impacted by fragmented infrastructure that can be compromised by frameworks, Varies by hardware and use case. This isolation limits the speed at which developers can work and creates barriers to model portability, efficiency, and production.
On March 8, Google and others took a major step toward removing these barriers with the opening of the OpenXLA project, which includes the XLA, StableHLO, and IREE repositories.
OpenXLA is an open source ML compiler ecosystem co-developed by AI/machine learning industry leaders, with contributors including Alibaba, AWS, AMD, Apple, Arm, Cerebras, Google, Graphcore, Hugging Face, Intel, Meta and Nvidia. It enables developers to compile and optimize models from all leading machine learning frameworks for efficient training and serving on a variety of hardware. Developers using OpenXLA can observe significant improvements in training time, throughput, service latency, and ultimately release and compute costs.
As AI technology enters the practical stage, development teams in many industries are using machine learning to address real-world challenges. Examples include disease prediction and prevention, personalized learning experiences, and exploration of black hole physics.
With the number of model parameters growing exponentially and the amount of computation required by deep learning models doubling every six months, developers are seeking maximum performance and utilization of their infrastructure . A large number of teams are leveraging a variety of hardware models, from energy-efficient machine learning-specific ASICs in the data center to AI edge processors that provide faster response times. Accordingly, in order to improve efficiency, these hardware devices use customized and unique algorithms and software libraries.
But on the other hand, if there is no universal compiler to bridge different hardware devices to the multiple frameworks in use today (such as TensorFlow, PyTorch), people will need to put in a lot of effort to Run machine learning efficiently. In practice, developers must manually optimize model operations for each hardware target. This means using custom software libraries or writing device-specific code requires domain expertise.
This is a paradox, using proprietary technology for efficiency only results in siled, non-generalizable paths across frameworks and hardware resulting in high maintenance costs and in turn vendor lock-in , slowing down the progress of machine learning development.
The OpenXLA project provides a state-of-the-art ML compiler that scales across the complexity of ML infrastructure. Its core pillars are performance, scalability, portability, flexibility and ease of use. With OpenXLA, we aspire to realize the greater potential of AI in the real world by accelerating the development and delivery of AI.
OpenXLA aims to:
The challenges we face today in machine learning infrastructure are enormous and no one organization can effectively do it alone address these challenges. The OpenXLA community brings together developers and industry leaders operating at different levels of the AI stack—from frameworks to compilers, runtimes, and chips—and is therefore ideally suited to address the fragmentation we see in the ML space.
As an open source project, OpenXLA adheres to the following principles:
OpenXLA removes barriers for machine learning developers with a modular tool chain that makes it universal The compiler interface is supported by all leading frameworks, leverages portable standardized model representations, and provides domain-specific compilers with powerful target-specific and hardware-specific optimizations. The toolchain includes XLA, StableHLO, and IREE, all of which leverage MLIR: a compiler infrastructure that enables machine learning models to be represented, optimized, and executed consistently on hardware.
Scope of Machine Learning Use Cases
Current usage of OpenXLA spans the range of ML use cases, including full training of models such as DeepMind’s AlphaFold, GPT2 and Swin Transformer on Alibaba Cloud, as well as multi-modal training on Amazon.com LLM training. Customers such as Waymo leverage OpenXLA for in-vehicle real-time inference. Additionally, OpenXLA is used to optimize Stable Diffusion services on local machines equipped with AMD RDNA™ 3.
Best Performance, Out of the Box
OpenXLA eliminates the need for developers to write device-specific code, You can easily speed up model performance. It features overall model optimization capabilities, including simplifying algebraic expressions, optimizing in-memory data layout, and improving scheduling to reduce peak memory usage and communication overhead. Advanced operator fusion and kernel generation help improve device utilization and reduce memory bandwidth requirements.
Easily scale workloads
Developing efficient parallelization algorithms is time-consuming and requires expertise. With features like GSPMD, developers only need to annotate a subset of key tensors, which can then be used by the compiler to automatically generate parallel computations. This eliminates the significant effort required to partition and efficiently parallelize models across multiple hardware hosts and accelerators.
Portability and Option
OpenXLA provides out-of-the-box support for a variety of hardware devices, Including AMD and NVIDIA GPUs, x86 CPUs and Arm architectures, and ML accelerators such as Google TPU, AWS Trainium and Inferentia, Graphcore IPU, Cerebras Wafer-Scale Engine, and more. OpenXLA also supports TensorFlow, PyTorch, and JAX through StableHLO, a portable layer used as an input format for OpenXLA.
flexibility
OpenXLA provides users with the flexibility to manually adjust model hotspots. Extension mechanisms such as custom calls enable users to write deep learning primitives in CUDA, HIP, SYCL, Triton, and other kernel languages to take full advantage of hardware features.
StableHLO
StableHLO is a portability layer between ML frameworks and ML compilers and is a support A set of high-level operations (HLO) operations for dynamics, quantization, and sparsity. Additionally, it can be serialized to MLIR bytecode to provide compatibility guarantees. All major ML frameworks (JAX, PyTorch, TensorFlow) can produce StableHLO. In 2023, Google plans to work closely with the PyTorch team to achieve integration with PyTorch version 2.0.
The above is the detailed content of Unified AI development: Google OpenXLA is open source and integrates all frameworks and AI chips. For more information, please follow other related articles on the PHP Chinese website!