anyone got it running with vllm vllm/vllm-openai:gptoss ??
#1
by
doramonk
- opened
i got this error
pip list|
pytorch-triton 3.4.0+git11ec6354
transformers 4.55.0
triton 3.4.0+git663e04e8
triton_kernels 1.0.0
vllm 0.10.1+gptoss
(EngineCore_0 pid=35) INFO 08-26 07:04:43 [parallel_state.py:1102] rank 0 in world size 1 is assigned as DP rank 0, PP rank 0, TP rank 0, EP rank 0
(EngineCore_0 pid=35) WARNING 08-26 07:04:44 [__init__.py:2073] The following intended overrides are not keyword args and will be dropped: {'truncation'}
(EngineCore_0 pid=35) WARNING 08-26 07:04:44 [registry.py:183] InternVLProcessor did not return `BatchFeature`. Make sure to match the behaviour of `ProcessorMixin` when implementing custom processors.
(EngineCore_0 pid=35) WARNING 08-26 07:04:44 [__init__.py:2073] The following intended overrides are not keyword args and will be dropped: {'truncation'}
(EngineCore_0 pid=35) INFO 08-26 07:04:44 [topk_topp_sampler.py:49] Using FlashInfer for top-p & top-k sampling.
(EngineCore_0 pid=35) INFO 08-26 07:04:44 [gpu_model_runner.py:1913] Starting to load model OpenGVLab/InternVL3_5-GPT-OSS-20B-A4B-Preview...
(EngineCore_0 pid=35) INFO 08-26 07:04:45 [gpu_model_runner.py:1945] Loading model from scratch...
(EngineCore_0 pid=35) INFO 08-26 07:04:45 [cuda.py:286] Using Triton backend on V1 engine.
(EngineCore_0 pid=35) WARNING 08-26 07:04:45 [rocm.py:29] Failed to import from amdsmi with ModuleNotFoundError("No module named 'amdsmi'")
(EngineCore_0 pid=35) WARNING 08-26 07:04:45 [rocm.py:40] Failed to import from vllm._rocm_C with ModuleNotFoundError("No module named 'vllm._rocm_C'")
(EngineCore_0 pid=35) ERROR 08-26 07:04:45 [core.py:718] EngineCore failed to start.
(EngineCore_0 pid=35) ERROR 08-26 07:04:45 [core.py:718] Traceback (most recent call last):
(EngineCore_0 pid=35) ERROR 08-26 07:04:45 [core.py:718] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 709, in run_engine_core
(EngineCore_0 pid=35) ERROR 08-26 07:04:45 [core.py:718] engine_core = EngineCoreProc(*args, **kwargs)
(EngineCore_0 pid=35) ERROR 08-26 07:04:45 [core.py:718] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_0 pid=35) ERROR 08-26 07:04:45 [core.py:718] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 510, in __init__
(EngineCore_0 pid=35) ERROR 08-26 07:04:45 [core.py:718] super().__init__(vllm_config, executor_class, log_stats,
(EngineCore_0 pid=35) ERROR 08-26 07:04:45 [core.py:718] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 82, in __init__
(EngineCore_0 pid=35) ERROR 08-26 07:04:45 [core.py:718] self.model_executor = executor_class(vllm_config)
(EngineCore_0 pid=35) ERROR 08-26 07:04:45 [core.py:718] ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_0 pid=35) ERROR 08-26 07:04:45 [core.py:718] File "/usr/local/lib/python3.12/dist-packages/vllm/executor/executor_base.py", line 54, in __init__
(EngineCore_0 pid=35) ERROR 08-26 07:04:45 [core.py:718] self._init_executor()
(EngineCore_0 pid=35) ERROR 08-26 07:04:45 [core.py:718] File "/usr/local/lib/python3.12/dist-packages/vllm/executor/uniproc_executor.py", line 49, in _init_executor
(EngineCore_0 pid=35) ERROR 08-26 07:04:45 [core.py:718] self.collective_rpc("load_model")
(EngineCore_0 pid=35) ERROR 08-26 07:04:45 [core.py:718] File "/usr/local/lib/python3.12/dist-packages/vllm/executor/uniproc_executor.py", line 58, in collective_rpc
(EngineCore_0 pid=35) ERROR 08-26 07:04:45 [core.py:718] answer = run_method(self.driver_worker, method, args, kwargs)
(EngineCore_0 pid=35) ERROR 08-26 07:04:45 [core.py:718] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_0 pid=35) ERROR 08-26 07:04:45 [core.py:718] File "/usr/local/lib/python3.12/dist-packages/vllm/utils/__init__.py", line 2948, in run_method
(EngineCore_0 pid=35) ERROR 08-26 07:04:45 [core.py:718] return func(*args, **kwargs)
(EngineCore_0 pid=35) ERROR 08-26 07:04:45 [core.py:718] ^^^^^^^^^^^^^^^^^^^^^
(EngineCore_0 pid=35) ERROR 08-26 07:04:45 [core.py:718] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 211, in load_model
(EngineCore_0 pid=35) ERROR 08-26 07:04:45 [core.py:718] self.model_runner.load_model(eep_scale_up=eep_scale_up)
(EngineCore_0 pid=35) ERROR 08-26 07:04:45 [core.py:718] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 1946, in load_model
(EngineCore_0 pid=35) ERROR 08-26 07:04:45 [core.py:718] self.model = model_loader.load_model(
(EngineCore_0 pid=35) ERROR 08-26 07:04:45 [core.py:718] ^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_0 pid=35) ERROR 08-26 07:04:45 [core.py:718] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/base_loader.py", line 44, in load_model
(EngineCore_0 pid=35) ERROR 08-26 07:04:45 [core.py:718] model = initialize_model(vllm_config=vllm_config,
(EngineCore_0 pid=35) ERROR 08-26 07:04:45 [core.py:718] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_0 pid=35) ERROR 08-26 07:04:45 [core.py:718] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/utils.py", line 63, in initialize_model
(EngineCore_0 pid=35) ERROR 08-26 07:04:45 [core.py:718] return model_class(vllm_config=vllm_config, prefix=prefix)
(EngineCore_0 pid=35) ERROR 08-26 07:04:45 [core.py:718] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_0 pid=35) ERROR 08-26 07:04:45 [core.py:718] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/internvl.py", line 1048, in __init__
(EngineCore_0 pid=35) ERROR 08-26 07:04:45 [core.py:718] self.language_model = init_vllm_registered_model(
(EngineCore_0 pid=35) ERROR 08-26 07:04:45 [core.py:718] ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_0 pid=35) ERROR 08-26 07:04:45 [core.py:718] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/utils.py", line 316, in init_vllm_registered_model
(EngineCore_0 pid=35) ERROR 08-26 07:04:45 [core.py:718] return initialize_model(vllm_config=vllm_config, prefix=prefix)
(EngineCore_0 pid=35) ERROR 08-26 07:04:45 [core.py:718] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_0 pid=35) ERROR 08-26 07:04:45 [core.py:718] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/utils.py", line 63, in initialize_model
(EngineCore_0 pid=35) ERROR 08-26 07:04:45 [core.py:718] return model_class(vllm_config=vllm_config, prefix=prefix)
(EngineCore_0 pid=35) ERROR 08-26 07:04:45 [core.py:718] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_0 pid=35) ERROR 08-26 07:04:45 [core.py:718] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/gpt_oss.py", line 241, in __init__
(EngineCore_0 pid=35) ERROR 08-26 07:04:45 [core.py:718] self.model = GptOssModel(
(EngineCore_0 pid=35) ERROR 08-26 07:04:45 [core.py:718] ^^^^^^^^^^^^
(EngineCore_0 pid=35) ERROR 08-26 07:04:45 [core.py:718] File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/decorators.py", line 183, in __init__
(EngineCore_0 pid=35) ERROR 08-26 07:04:45 [core.py:718] old_init(self, vllm_config=vllm_config, prefix=prefix, **kwargs)
(EngineCore_0 pid=35) ERROR 08-26 07:04:45 [core.py:718] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/gpt_oss.py", line 214, in __init__
(EngineCore_0 pid=35) ERROR 08-26 07:04:45 [core.py:718] TransformerBlock(
(EngineCore_0 pid=35) ERROR 08-26 07:04:45 [core.py:718] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/gpt_oss.py", line 183, in __init__
(EngineCore_0 pid=35) ERROR 08-26 07:04:45 [core.py:718] self.attn = OAIAttention(config, prefix=f"{prefix}.attn")
(EngineCore_0 pid=35) ERROR 08-26 07:04:45 [core.py:718] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_0 pid=35) ERROR 08-26 07:04:45 [core.py:718] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/gpt_oss.py", line 64, in __init__
(EngineCore_0 pid=35) ERROR 08-26 07:04:45 [core.py:718] config.rope_ntk_beta,
(EngineCore_0 pid=35) ERROR 08-26 07:04:45 [core.py:718] ^^^^^^^^^^^^^^^^^^^^
(EngineCore_0 pid=35) ERROR 08-26 07:04:45 [core.py:718] File "/usr/local/lib/python3.12/dist-packages/transformers/configuration_utils.py", line 211, in __getattribute__
(EngineCore_0 pid=35) ERROR 08-26 07:04:45 [core.py:718] return super().__getattribute__(key)
(EngineCore_0 pid=35) ERROR 08-26 07:04:45 [core.py:718] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_0 pid=35) ERROR 08-26 07:04:45 [core.py:718] AttributeError: 'GptOssConfig' object has no attribute 'rope_ntk_beta'. Did you mean: 'rope_theta'?
(EngineCore_0 pid=35) Process EngineCore_0:
(EngineCore_0 pid=35) Traceback (most recent call last):
(EngineCore_0 pid=35) File "/usr/lib/python3.12/multiprocessing/process.py", line 314, in _bootstrap
(EngineCore_0 pid=35) self.run()
(EngineCore_0 pid=35) File "/usr/lib/python3.12/multiprocessing/process.py", line 108, in run
(EngineCore_0 pid=35) self._target(*self._args, **self._kwargs)
(EngineCore_0 pid=35) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 722, in run_engine_core
(EngineCore_0 pid=35) raise e
(EngineCore_0 pid=35) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 709, in run_engine_core
(EngineCore_0 pid=35) engine_core = EngineCoreProc(*args, **kwargs)
(EngineCore_0 pid=35) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_0 pid=35) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 510, in __init__
(EngineCore_0 pid=35) super().__init__(vllm_config, executor_class, log_stats,
(EngineCore_0 pid=35) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 82, in __init__
(EngineCore_0 pid=35) self.model_executor = executor_class(vllm_config)
(EngineCore_0 pid=35) ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_0 pid=35) File "/usr/local/lib/python3.12/dist-packages/vllm/executor/executor_base.py", line 54, in __init__
(EngineCore_0 pid=35) self._init_executor()
(EngineCore_0 pid=35) File "/usr/local/lib/python3.12/dist-packages/vllm/executor/uniproc_executor.py", line 49, in _init_executor
(EngineCore_0 pid=35) self.collective_rpc("load_model")
(EngineCore_0 pid=35) File "/usr/local/lib/python3.12/dist-packages/vllm/executor/uniproc_executor.py", line 58, in collective_rpc
(EngineCore_0 pid=35) answer = run_method(self.driver_worker, method, args, kwargs)
(EngineCore_0 pid=35) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_0 pid=35) File "/usr/local/lib/python3.12/dist-packages/vllm/utils/__init__.py", line 2948, in run_method
(EngineCore_0 pid=35) return func(*args, **kwargs)
(EngineCore_0 pid=35) ^^^^^^^^^^^^^^^^^^^^^
(EngineCore_0 pid=35) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 211, in load_model
(EngineCore_0 pid=35) self.model_runner.load_model(eep_scale_up=eep_scale_up)
(EngineCore_0 pid=35) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 1946, in load_model
(EngineCore_0 pid=35) self.model = model_loader.load_model(
(EngineCore_0 pid=35) ^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_0 pid=35) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/base_loader.py", line 44, in load_model
(EngineCore_0 pid=35) model = initialize_model(vllm_config=vllm_config,
(EngineCore_0 pid=35) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_0 pid=35) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/utils.py", line 63, in initialize_model
(EngineCore_0 pid=35) return model_class(vllm_config=vllm_config, prefix=prefix)
(EngineCore_0 pid=35) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_0 pid=35) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/internvl.py", line 1048, in __init__
(EngineCore_0 pid=35) self.language_model = init_vllm_registered_model(
(EngineCore_0 pid=35) ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_0 pid=35) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/utils.py", line 316, in init_vllm_registered_model
(EngineCore_0 pid=35) return initialize_model(vllm_config=vllm_config, prefix=prefix)
(EngineCore_0 pid=35) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_0 pid=35) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/utils.py", line 63, in initialize_model
(EngineCore_0 pid=35) return model_class(vllm_config=vllm_config, prefix=prefix)
(EngineCore_0 pid=35) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_0 pid=35) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/gpt_oss.py", line 241, in __init__
(EngineCore_0 pid=35) self.model = GptOssModel(
(EngineCore_0 pid=35) ^^^^^^^^^^^^
(EngineCore_0 pid=35) File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/decorators.py", line 183, in __init__
(EngineCore_0 pid=35) old_init(self, vllm_config=vllm_config, prefix=prefix, **kwargs)
(EngineCore_0 pid=35) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/gpt_oss.py", line 214, in __init__
(EngineCore_0 pid=35) TransformerBlock(
(EngineCore_0 pid=35) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/gpt_oss.py", line 183, in __init__
(EngineCore_0 pid=35) self.attn = OAIAttention(config, prefix=f"{prefix}.attn")
(EngineCore_0 pid=35) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_0 pid=35) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/gpt_oss.py", line 64, in __init__
(EngineCore_0 pid=35) config.rope_ntk_beta,
(EngineCore_0 pid=35) ^^^^^^^^^^^^^^^^^^^^
(EngineCore_0 pid=35) File "/usr/local/lib/python3.12/dist-packages/transformers/configuration_utils.py", line 211, in __getattribute__
(EngineCore_0 pid=35) return super().__getattribute__(key)
(EngineCore_0 pid=35) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_0 pid=35) AttributeError: 'GptOssConfig' object has no attribute 'rope_ntk_beta'. Did you mean: 'rope_theta'?
[rank0]:[W826 07:04:46.900869145 ProcessGroupNCCL.cpp:1522] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
(APIServer pid=13) Traceback (most recent call last):
(APIServer pid=13) File "/usr/local/bin/vllm", line 10, in <module>
(APIServer pid=13) sys.exit(main())
(APIServer pid=13) ^^^^^^
(APIServer pid=13) File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/cli/main.py", line 54, in main
(APIServer pid=13) args.dispatch_function(args)
(APIServer pid=13) File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/cli/serve.py", line 50, in cmd
(APIServer pid=13) uvloop.run(run_server(args))
(APIServer pid=13) File "/usr/local/lib/python3.12/dist-packages/uvloop/__init__.py", line 109, in run
(APIServer pid=13) return __asyncio.run(
(APIServer pid=13) ^^^^^^^^^^^^^^
(APIServer pid=13) File "/usr/lib/python3.12/asyncio/runners.py", line 195, in run
(APIServer pid=13) return runner.run(main)
(APIServer pid=13) ^^^^^^^^^^^^^^^^
(APIServer pid=13) File "/usr/lib/python3.12/asyncio/runners.py", line 118, in run
(APIServer pid=13) return self._loop.run_until_complete(task)
(APIServer pid=13) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=13) File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete
(APIServer pid=13) File "/usr/local/lib/python3.12/dist-packages/uvloop/__init__.py", line 61, in wrapper
(APIServer pid=13) return await main
(APIServer pid=13) ^^^^^^^^^^
(APIServer pid=13) File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 1827, in run_server
(APIServer pid=13) await run_server_worker(listen_address, sock, args, **uvicorn_kwargs)
(APIServer pid=13) File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 1847, in run_server_worker
(APIServer pid=13) async with build_async_engine_client(
(APIServer pid=13) ^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=13) File "/usr/lib/python3.12/contextlib.py", line 210, in __aenter__
(APIServer pid=13) return await anext(self.gen)
(APIServer pid=13) ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=13) File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 167, in build_async_engine_client
(APIServer pid=13) async with build_async_engine_client_from_engine_args(
(APIServer pid=13) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=13) File "/usr/lib/python3.12/contextlib.py", line 210, in __aenter__
(APIServer pid=13) return await anext(self.gen)
(APIServer pid=13) ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=13) File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 209, in build_async_engine_client_from_engine_args
(APIServer pid=13) async_llm = AsyncLLM.from_vllm_config(
(APIServer pid=13) ^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=13) File "/usr/local/lib/python3.12/dist-packages/vllm/utils/__init__.py", line 1520, in inner
(APIServer pid=13) return fn(*args, **kwargs)
(APIServer pid=13) ^^^^^^^^^^^^^^^^^^^
(APIServer pid=13) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/async_llm.py", line 173, in from_vllm_config
(APIServer pid=13) return cls(
(APIServer pid=13) ^^^^
(APIServer pid=13) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/async_llm.py", line 119, in __init__
(APIServer pid=13) self.engine_core = EngineCoreClient.make_async_mp_client(
(APIServer pid=13) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=13) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py", line 101, in make_async_mp_client
(APIServer pid=13) return AsyncMPClient(*client_args)
(APIServer pid=13) ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=13) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py", line 733, in __init__
(APIServer pid=13) super().__init__(
(APIServer pid=13) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py", line 421, in __init__
(APIServer pid=13) with launch_core_engines(vllm_config, executor_class,
(APIServer pid=13) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=13) File "/usr/lib/python3.12/contextlib.py", line 144, in __exit__
(APIServer pid=13) next(self.gen)
(APIServer pid=13) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/utils.py", line 697, in launch_core_engines
(APIServer pid=13) wait_for_engine_startup(
(APIServer pid=13) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/utils.py", line 750, in wait_for_engine_startup
(APIServer pid=13) raise RuntimeError("Engine core initialization failed. "
(APIServer pid=13) RuntimeError: Engine core initialization failed. See root cause above. Failed core proc(s): {}
edited /usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/gpt_oss.py
and new error:
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
(EngineCore_0 pid=36) INFO 08-26 07:28:52 [parallel_state.py:1102] rank 0 in world size 1 is assigned as DP rank 0, PP rank 0, TP rank 0, EP rank 0
(EngineCore_0 pid=36) WARNING 08-26 07:28:54 [__init__.py:2073] The following intended overrides are not keyword args and will be dropped: {'truncation'}
(EngineCore_0 pid=36) WARNING 08-26 07:28:54 [registry.py:183] InternVLProcessor did not return `BatchFeature`. Make sure to match the behaviour of `ProcessorMixin` when implementing custom processors.
(EngineCore_0 pid=36) WARNING 08-26 07:28:54 [__init__.py:2073] The following intended overrides are not keyword args and will be dropped: {'truncation'}
(EngineCore_0 pid=36) INFO 08-26 07:28:54 [topk_topp_sampler.py:49] Using FlashInfer for top-p & top-k sampling.
(EngineCore_0 pid=36) INFO 08-26 07:28:54 [gpu_model_runner.py:1913] Starting to load model OpenGVLab/InternVL3_5-GPT-OSS-20B-A4B-Preview...
(EngineCore_0 pid=36) INFO 08-26 07:28:54 [gpu_model_runner.py:1945] Loading model from scratch...
(EngineCore_0 pid=36) INFO 08-26 07:28:54 [cuda.py:286] Using Triton backend on V1 engine.
(EngineCore_0 pid=36) WARNING 08-26 07:28:54 [rocm.py:29] Failed to import from amdsmi with ModuleNotFoundError("No module named 'amdsmi'")
(EngineCore_0 pid=36) WARNING 08-26 07:28:54 [rocm.py:40] Failed to import from vllm._rocm_C with ModuleNotFoundError("No module named 'vllm._rocm_C'")
(EngineCore_0 pid=36) INFO 08-26 07:28:54 [triton_attn.py:263] Using vllm unified attention for TritonAttentionImpl
(EngineCore_0 pid=36) ERROR 08-26 07:28:55 [core.py:718] EngineCore failed to start.
(EngineCore_0 pid=36) ERROR 08-26 07:28:55 [core.py:718] Traceback (most recent call last):
(EngineCore_0 pid=36) ERROR 08-26 07:28:55 [core.py:718] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 709, in run_engine_core
(EngineCore_0 pid=36) ERROR 08-26 07:28:55 [core.py:718] engine_core = EngineCoreProc(*args, **kwargs)
(EngineCore_0 pid=36) ERROR 08-26 07:28:55 [core.py:718] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_0 pid=36) ERROR 08-26 07:28:55 [core.py:718] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 510, in __init__
(EngineCore_0 pid=36) ERROR 08-26 07:28:55 [core.py:718] super().__init__(vllm_config, executor_class, log_stats,
(EngineCore_0 pid=36) ERROR 08-26 07:28:55 [core.py:718] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 82, in __init__
(EngineCore_0 pid=36) ERROR 08-26 07:28:55 [core.py:718] self.model_executor = executor_class(vllm_config)
(EngineCore_0 pid=36) ERROR 08-26 07:28:55 [core.py:718] ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_0 pid=36) ERROR 08-26 07:28:55 [core.py:718] File "/usr/local/lib/python3.12/dist-packages/vllm/executor/executor_base.py", line 54, in __init__
(EngineCore_0 pid=36) ERROR 08-26 07:28:55 [core.py:718] self._init_executor()
(EngineCore_0 pid=36) ERROR 08-26 07:28:55 [core.py:718] File "/usr/local/lib/python3.12/dist-packages/vllm/executor/uniproc_executor.py", line 49, in _init_executor
(EngineCore_0 pid=36) ERROR 08-26 07:28:55 [core.py:718] self.collective_rpc("load_model")
(EngineCore_0 pid=36) ERROR 08-26 07:28:55 [core.py:718] File "/usr/local/lib/python3.12/dist-packages/vllm/executor/uniproc_executor.py", line 58, in collective_rpc
(EngineCore_0 pid=36) ERROR 08-26 07:28:55 [core.py:718] answer = run_method(self.driver_worker, method, args, kwargs)
(EngineCore_0 pid=36) ERROR 08-26 07:28:55 [core.py:718] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_0 pid=36) ERROR 08-26 07:28:55 [core.py:718] File "/usr/local/lib/python3.12/dist-packages/vllm/utils/__init__.py", line 2948, in run_method
(EngineCore_0 pid=36) ERROR 08-26 07:28:55 [core.py:718] return func(*args, **kwargs)
(EngineCore_0 pid=36) ERROR 08-26 07:28:55 [core.py:718] ^^^^^^^^^^^^^^^^^^^^^
(EngineCore_0 pid=36) ERROR 08-26 07:28:55 [core.py:718] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 211, in load_model
(EngineCore_0 pid=36) ERROR 08-26 07:28:55 [core.py:718] self.model_runner.load_model(eep_scale_up=eep_scale_up)
(EngineCore_0 pid=36) ERROR 08-26 07:28:55 [core.py:718] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 1946, in load_model
(EngineCore_0 pid=36) ERROR 08-26 07:28:55 [core.py:718] self.model = model_loader.load_model(
(EngineCore_0 pid=36) ERROR 08-26 07:28:55 [core.py:718] ^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_0 pid=36) ERROR 08-26 07:28:55 [core.py:718] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/base_loader.py", line 44, in load_model
(EngineCore_0 pid=36) ERROR 08-26 07:28:55 [core.py:718] model = initialize_model(vllm_config=vllm_config,
(EngineCore_0 pid=36) ERROR 08-26 07:28:55 [core.py:718] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_0 pid=36) ERROR 08-26 07:28:55 [core.py:718] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/utils.py", line 63, in initialize_model
(EngineCore_0 pid=36) ERROR 08-26 07:28:55 [core.py:718] return model_class(vllm_config=vllm_config, prefix=prefix)
(EngineCore_0 pid=36) ERROR 08-26 07:28:55 [core.py:718] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_0 pid=36) ERROR 08-26 07:28:55 [core.py:718] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/internvl.py", line 1048, in __init__
(EngineCore_0 pid=36) ERROR 08-26 07:28:55 [core.py:718] self.language_model = init_vllm_registered_model(
(EngineCore_0 pid=36) ERROR 08-26 07:28:55 [core.py:718] ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_0 pid=36) ERROR 08-26 07:28:55 [core.py:718] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/utils.py", line 316, in init_vllm_registered_model
(EngineCore_0 pid=36) ERROR 08-26 07:28:55 [core.py:718] return initialize_model(vllm_config=vllm_config, prefix=prefix)
(EngineCore_0 pid=36) ERROR 08-26 07:28:55 [core.py:718] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_0 pid=36) ERROR 08-26 07:28:55 [core.py:718] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/utils.py", line 63, in initialize_model
(EngineCore_0 pid=36) ERROR 08-26 07:28:55 [core.py:718] return model_class(vllm_config=vllm_config, prefix=prefix)
(EngineCore_0 pid=36) ERROR 08-26 07:28:55 [core.py:718] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_0 pid=36) ERROR 08-26 07:28:55 [core.py:718] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/gpt_oss.py", line 259, in __init__
(EngineCore_0 pid=36) ERROR 08-26 07:28:55 [core.py:718] self.model = GptOssModel(
(EngineCore_0 pid=36) ERROR 08-26 07:28:55 [core.py:718] ^^^^^^^^^^^^
(EngineCore_0 pid=36) ERROR 08-26 07:28:55 [core.py:718] File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/decorators.py", line 183, in __init__
(EngineCore_0 pid=36) ERROR 08-26 07:28:55 [core.py:718] old_init(self, vllm_config=vllm_config, prefix=prefix, **kwargs)
(EngineCore_0 pid=36) ERROR 08-26 07:28:55 [core.py:718] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/gpt_oss.py", line 213, in __init__
(EngineCore_0 pid=36) ERROR 08-26 07:28:55 [core.py:718] TransformerBlock(
(EngineCore_0 pid=36) ERROR 08-26 07:28:55 [core.py:718] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/gpt_oss.py", line 183, in __init__
(EngineCore_0 pid=36) ERROR 08-26 07:28:55 [core.py:718] self.mlp = MLPBlock(config,
(EngineCore_0 pid=36) ERROR 08-26 07:28:55 [core.py:718] ^^^^^^^^^^^^^^^^
(EngineCore_0 pid=36) ERROR 08-26 07:28:55 [core.py:718] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/gpt_oss.py", line 153, in __init__
(EngineCore_0 pid=36) ERROR 08-26 07:28:55 [core.py:718] self.experts = FusedMoE(num_experts=config.num_local_experts,
(EngineCore_0 pid=36) ERROR 08-26 07:28:55 [core.py:718] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_0 pid=36) ERROR 08-26 07:28:55 [core.py:718] TypeError: FusedMoE.__init__() got an unexpected keyword argument 'has_bias'
(EngineCore_0 pid=36) Process EngineCore_0:
(EngineCore_0 pid=36) Traceback (most recent call last):
(EngineCore_0 pid=36) File "/usr/lib/python3.12/multiprocessing/process.py", line 314, in _bootstrap
(EngineCore_0 pid=36) self.run()
(EngineCore_0 pid=36) File "/usr/lib/python3.12/multiprocessing/process.py", line 108, in run
(EngineCore_0 pid=36) self._target(*self._args, **self._kwargs)
(EngineCore_0 pid=36) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 722, in run_engine_core
(EngineCore_0 pid=36) raise e
(EngineCore_0 pid=36) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 709, in run_engine_core
(EngineCore_0 pid=36) engine_core = EngineCoreProc(*args, **kwargs)
(EngineCore_0 pid=36) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_0 pid=36) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 510, in __init__
(EngineCore_0 pid=36) super().__init__(vllm_config, executor_class, log_stats,
(EngineCore_0 pid=36) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 82, in __init__
(EngineCore_0 pid=36) self.model_executor = executor_class(vllm_config)
(EngineCore_0 pid=36) ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_0 pid=36) File "/usr/local/lib/python3.12/dist-packages/vllm/executor/executor_base.py", line 54, in __init__
(EngineCore_0 pid=36) self._init_executor()
(EngineCore_0 pid=36) File "/usr/local/lib/python3.12/dist-packages/vllm/executor/uniproc_executor.py", line 49, in _init_executor
(EngineCore_0 pid=36) self.collective_rpc("load_model")
(EngineCore_0 pid=36) File "/usr/local/lib/python3.12/dist-packages/vllm/executor/uniproc_executor.py", line 58, in collective_rpc
(EngineCore_0 pid=36) answer = run_method(self.driver_worker, method, args, kwargs)
(EngineCore_0 pid=36) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_0 pid=36) File "/usr/local/lib/python3.12/dist-packages/vllm/utils/__init__.py", line 2948, in run_method
(EngineCore_0 pid=36) return func(*args, **kwargs)
(EngineCore_0 pid=36) ^^^^^^^^^^^^^^^^^^^^^
(EngineCore_0 pid=36) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 211, in load_model
(EngineCore_0 pid=36) self.model_runner.load_model(eep_scale_up=eep_scale_up)
(EngineCore_0 pid=36) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 1946, in load_model
(EngineCore_0 pid=36) self.model = model_loader.load_model(
(EngineCore_0 pid=36) ^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_0 pid=36) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/base_loader.py", line 44, in load_model
(EngineCore_0 pid=36) model = initialize_model(vllm_config=vllm_config,
(EngineCore_0 pid=36) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_0 pid=36) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/utils.py", line 63, in initialize_model
(EngineCore_0 pid=36) return model_class(vllm_config=vllm_config, prefix=prefix)
(EngineCore_0 pid=36) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_0 pid=36) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/internvl.py", line 1048, in __init__
(EngineCore_0 pid=36) self.language_model = init_vllm_registered_model(
(EngineCore_0 pid=36) ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_0 pid=36) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/utils.py", line 316, in init_vllm_registered_model
(EngineCore_0 pid=36) return initialize_model(vllm_config=vllm_config, prefix=prefix)
(EngineCore_0 pid=36) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_0 pid=36) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/utils.py", line 63, in initialize_model
(EngineCore_0 pid=36) return model_class(vllm_config=vllm_config, prefix=prefix)
(EngineCore_0 pid=36) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_0 pid=36) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/gpt_oss.py", line 259, in __init__
(EngineCore_0 pid=36) self.model = GptOssModel(
(EngineCore_0 pid=36) ^^^^^^^^^^^^
(EngineCore_0 pid=36) File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/decorators.py", line 183, in __init__
(EngineCore_0 pid=36) old_init(self, vllm_config=vllm_config, prefix=prefix, **kwargs)
(EngineCore_0 pid=36) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/gpt_oss.py", line 213, in __init__
(EngineCore_0 pid=36) TransformerBlock(
(EngineCore_0 pid=36) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/gpt_oss.py", line 183, in __init__
(EngineCore_0 pid=36) self.mlp = MLPBlock(config,
(EngineCore_0 pid=36) ^^^^^^^^^^^^^^^^
(EngineCore_0 pid=36) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/gpt_oss.py", line 153, in __init__
(EngineCore_0 pid=36) self.experts = FusedMoE(num_experts=config.num_local_experts,
(EngineCore_0 pid=36) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_0 pid=36) TypeError: FusedMoE.__init__() got an unexpected keyword argument 'has_bias'
[rank0]:[W826 07:28:55.362850557 ProcessGroupNCCL.cpp:1522] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
(APIServer pid=14) Traceback (most recent call last):
(APIServer pid=14) File "/usr/local/bin/vllm", line 10, in <module>
(APIServer pid=14) sys.exit(main())
(APIServer pid=14) ^^^^^^
(APIServer pid=14) File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/cli/main.py", line 54, in main
(APIServer pid=14) args.dispatch_function(args)
(APIServer pid=14) File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/cli/serve.py", line 50, in cmd
(APIServer pid=14) uvloop.run(run_server(args))
(APIServer pid=14) File "/usr/local/lib/python3.12/dist-packages/uvloop/__init__.py", line 109, in run
(APIServer pid=14) return __asyncio.run(
(APIServer pid=14) ^^^^^^^^^^^^^^
(APIServer pid=14) File "/usr/lib/python3.12/asyncio/runners.py", line 195, in run
(APIServer pid=14) return runner.run(main)
(APIServer pid=14) ^^^^^^^^^^^^^^^^
(APIServer pid=14) File "/usr/lib/python3.12/asyncio/runners.py", line 118, in run
(APIServer pid=14) return self._loop.run_until_complete(task)
(APIServer pid=14) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=14) File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete
(APIServer pid=14) File "/usr/local/lib/python3.12/dist-packages/uvloop/__init__.py", line 61, in wrapper
(APIServer pid=14) return await main
(APIServer pid=14) ^^^^^^^^^^
(APIServer pid=14) File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 1827, in run_server
(APIServer pid=14) await run_server_worker(listen_address, sock, args, **uvicorn_kwargs)
(APIServer pid=14) File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 1847, in run_server_worker
(APIServer pid=14) async with build_async_engine_client(
(APIServer pid=14) ^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=14) File "/usr/lib/python3.12/contextlib.py", line 210, in __aenter__
(APIServer pid=14) return await anext(self.gen)
(APIServer pid=14) ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=14) File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 167, in build_async_engine_client
(APIServer pid=14) async with build_async_engine_client_from_engine_args(
(APIServer pid=14) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=14) File "/usr/lib/python3.12/contextlib.py", line 210, in __aenter__
(APIServer pid=14) return await anext(self.gen)
(APIServer pid=14) ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=14) File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 209, in build_async_engine_client_from_engine_args
(APIServer pid=14) async_llm = AsyncLLM.from_vllm_config(
(APIServer pid=14) ^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=14) File "/usr/local/lib/python3.12/dist-packages/vllm/utils/__init__.py", line 1520, in inner
(APIServer pid=14) return fn(*args, **kwargs)
(APIServer pid=14) ^^^^^^^^^^^^^^^^^^^
(APIServer pid=14) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/async_llm.py", line 173, in from_vllm_config
(APIServer pid=14) return cls(
(APIServer pid=14) ^^^^
(APIServer pid=14) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/async_llm.py", line 119, in __init__
(APIServer pid=14) self.engine_core = EngineCoreClient.make_async_mp_client(
(APIServer pid=14) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=14) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py", line 101, in make_async_mp_client
(APIServer pid=14) return AsyncMPClient(*client_args)
(APIServer pid=14) ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=14) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py", line 733, in __init__
(APIServer pid=14) super().__init__(
(APIServer pid=14) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py", line 421, in __init__
(APIServer pid=14) with launch_core_engines(vllm_config, executor_class,
(APIServer pid=14) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=14) File "/usr/lib/python3.12/contextlib.py", line 144, in __exit__
(APIServer pid=14) next(self.gen)
(APIServer pid=14) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/utils.py", line 697, in launch_core_engines
(APIServer pid=14) wait_for_engine_startup(
(APIServer pid=14) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/utils.py", line 750, in wait_for_engine_startup
(APIServer pid=14) raise RuntimeError("Engine core initialization failed. "
(APIServer pid=14) RuntimeError: Engine core initialization failed. See root cause above. Failed core proc(s): {}
I got this running by first pulling the vllm/vllm-openai:v0.10.1.1
Docker image.
Next, you need to update the gpt_oss.py
file as instructed. I did this by creating a new, custom image from this Dockerfile:
# Start with the official vLLM image
FROM vllm/vllm-openai:v0.10.1.1
# Copy the modified file to the correct path inside the container
COPY gpt_oss.py /usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/gpt_oss.py
Note that the code for gpt_oss.py
is in the vLLM section (https://huggingface.co/OpenGVLab/InternVL3_5-GPT-OSS-20B-A4B-Preview)