Process audio and generate text output based on instructions
Calculate memory usage from model configurations