| T | |
| Tensor Parallelism (TP) | |
| Parallelism technique for training on multiple GPUs in which each tensor is split up into multiple chunks, so instead of | |
| having the whole tensor reside on a single GPU, each shard of the tensor resides on its designated GPU. |