AI & ML interests

AI inference, AI in the cloud, AI on edge, software acceleration of AI workloads on hardware, efficient AI deployments, GPU-Free AI inference, AI model optimization.

Recent Activity

AmpereComputing 's collections 18

DeepSeek R1
Ampere's quantization formats (Q4_K_4 / Q8R16) require Ampere optimized llama.cpp available here: https://hub.docker.com/r/amperecomputingai/llama.cpp
DeepSeek R1
Ampere's quantization formats (Q4_K_4 / Q8R16) require Ampere optimized llama.cpp available here: https://hub.docker.com/r/amperecomputingai/llama.cpp