SageMaker generation speed, timeouts

#33
by elanmarkowitz - opened

I deployed an endpoint on SageMaker using a g5.48x instance.

However, it seems much slower than other models and frequently times out.

Has anyone else seen this issue or know any ways to increase generation speed?

Deployed using this image: 763104351884.dkr.ecr.us-east-1.amazonaws.com/huggingface-pytorch-tgi-inference:2.1.1-tgi1.3.1-gpu-py310-cu121-ubuntu20.04-v1.0

@elanmarkowitz What does your config look like?

Your need to confirm your account before you can post a new comment.

Sign up or log in to comment