arxiv:2302.11405

ML-driven Hardware Cost Model for MLIR

Published on Feb 14, 2023

Authors:

Abstract

During early optimization passes, compilers must make predictions for machine-dependent characteristics such as execution unit utilization, number of register spills, latency, throughput etc. to generate better code. Often a hand-written static/analytical hardware cost model is built into the compiler. However, the need for more sophisticated and varied predictions has become more pronounced with the development of deep learning compilers which need to optimize <PRE_TAG><PRE_TAG>dataflow graphs</POST_TAG></POST_TAG>. Such compilers usually employ a much higher level <PRE_TAG><PRE_TAG>MLIR</POST_TAG></POST_TAG> form as an IR representation before lowering to traditional <PRE_TAG><PRE_TAG>LLVM-IR</POST_TAG></POST_TAG>. A static/analytical cost model in such a scenario is cumbersome and error prone as the opcodes represent very high level algebraic/arithmetic operations. Hence, we develop a machine learning-based cost model for high-level <PRE_TAG><PRE_TAG>MLIR</POST_TAG></POST_TAG> which can predict different target variables of interest such as CPU/GPU/xPU utilization, instructions executed, register usage etc. By considering the incoming <PRE_TAG><PRE_TAG>MLIR</POST_TAG></POST_TAG> as a text input a la <PRE_TAG><PRE_TAG>NLP models</POST_TAG></POST_TAG> we can apply well-known techniques from modern NLP research to help predict <PRE_TAG>hardware characteristics</POST_TAG> more accurately. We expect such precise ML-driven <PRE_TAG><PRE_TAG>hardware cost models</POST_TAG></POST_TAG> to guide our deep learning compiler in graph level optimizations around <PRE_TAG><PRE_TAG><PRE_TAG>operator fusion</POST_TAG></POST_TAG></POST_TAG>, <PRE_TAG><PRE_TAG>local memory allocation</POST_TAG></POST_TAG>, <PRE_TAG><PRE_TAG>kernel scheduling</POST_TAG></POST_TAG> etc. as well as in many kernel-level optimizations such as <PRE_TAG><PRE_TAG><PRE_TAG>loop interchange</POST_TAG></POST_TAG></POST_TAG>, <PRE_TAG><PRE_TAG><PRE_TAG>LICM</POST_TAG></POST_TAG></POST_TAG> and <PRE_TAG><PRE_TAG><PRE_TAG>unroll</POST_TAG></POST_TAG></POST_TAG>. We report early work-in -progress results of developing such models on high-level <PRE_TAG><PRE_TAG>MLIR</POST_TAG></POST_TAG> representing <PRE_TAG><PRE_TAG>dataflow graphs</POST_TAG></POST_TAG> emitted by <PRE_TAG>Pytorch</POST_TAG>/<PRE_TAG>Tensorflow-like frameworks</POST_TAG> as well as lower-level dialects like affine. We show that these models can provide reasonably good estimates with low error bounds for various hardware characteristics of interest and can be a go-to mechanism for hardware cost modelling in the future.

View arXiv page View PDF Add to collection

Community

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2302.11405 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2302.11405 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2302.11405 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.