Este contenido no está disponible en el idioma seleccionado.
Chapter 3. Known issues
-
When you deploy a model on an NVIDIA T4 accelerator using the Red Hat AI Inference Server 3.1 container image with
--enable-chunked-prefill
enabled, any chat or text completion query causes the model inference to crash and restart. To workaround this issue, remove the--enable-chunked-prefill
flag when you deploy the model.