Omar AX LoRA Adapter for SKT A.X-4.0-Light
This is a LoRA adapter fine-tuned for the Korean language model.
Model Details
- Base Model: skt/A.X-4.0-Light
- LoRA Rank: 24
- LoRA Alpha: 12
- LoRA Dropout: 0.1
- Target Modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
- Task Type: CAUSAL_LM
Usage with vLLM
INFO 08-20 08:26:21 [init.py:241] Automatically detected platform cuda. [1;36m(APIServer pid=15975)[0;0m INFO 08-20 08:26:24 [api_server.py:1805] vLLM API server version 0.10.1 [1;36m(APIServer pid=15975)[0;0m INFO 08-20 08:26:24 [utils.py:326] non-default args: {'model_tag': 'skt/A.X-4.0-Light', 'lora_modules': [LoRAModulePath(name='omar_ax', path='playdatakoo/omar-ax-lora-skt-4.0', base_model_name=None)], 'model': 'skt/A.X-4.0-Light', 'enable_lora': True} [1;36m(APIServer pid=15975)[0;0m INFO 08-20 08:26:30 [init.py:711] Resolved architecture: Qwen2ForCausalLM [1;36m(APIServer pid=15975)[0;0m INFO 08-20 08:26:30 [init.py:1750] Using max model len 16384 [1;36m(APIServer pid=15975)[0;0m INFO 08-20 08:26:31 [scheduler.py:222] Chunked prefill is enabled with max_num_batched_tokens=2048. INFO 08-20 08:26:36 [init.py:241] Automatically detected platform cuda. [1;36m(EngineCore_0 pid=16412)[0;0m INFO 08-20 08:26:37 [core.py:636] Waiting for init message from front-end. [1;36m(EngineCore_0 pid=16412)[0;0m INFO 08-20 08:26:37 [core.py:74] Initializing a V1 LLM engine (v0.10.1) with config: model='skt/A.X-4.0-Light', speculative_config=None, tokenizer='skt/A.X-4.0-Light', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, override_neuron_config={}, tokenizer_revision=None, trust_remote_code=False, dtype=torch.bfloat16, max_seq_len=16384, download_dir=None, load_format=auto, tensor_parallel_size=1, pipeline_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, kv_cache_dtype=auto, device_config=cuda, decoding_config=DecodingConfig(backend='auto', disable_fallback=False, disable_any_whitespace=False, disable_additional_properties=False, reasoning_backend=''), observability_config=ObservabilityConfig(show_hidden_metrics_for_version=None, otlp_traces_endpoint=None, collect_detailed_traces=None), seed=0, served_model_name=skt/A.X-4.0-Light, enable_prefix_caching=True, chunked_prefill_enabled=True, use_async_output_proc=True, pooler_config=None, compilation_config={"level":3,"debug_dump_path":"","cache_dir":"","backend":"","custom_ops":[],"splitting_ops":["vllm.unified_attention","vllm.unified_attention_with_output","vllm.mamba_mixer2"],"use_inductor":true,"compile_sizes":[],"inductor_compile_config":{"enable_auto_functionalized_v2":false},"inductor_passes":{},"cudagraph_mode":1,"use_cudagraph":true,"cudagraph_num_of_warmups":1,"cudagraph_capture_sizes":[512,504,496,488,480,472,464,456,448,440,432,424,416,408,400,392,384,376,368,360,352,344,336,328,320,312,304,296,288,280,272,264,256,248,240,232,224,216,208,200,192,184,176,168,160,152,144,136,128,120,112,104,96,88,80,72,64,56,48,40,32,24,16,8,4,2,1],"cudagraph_copy_inputs":false,"full_cuda_graph":false,"pass_config":{},"max_capture_size":512,"local_cache_dir":null} [1;36m(EngineCore_0 pid=16412)[0;0m ERROR 08-20 08:26:38 [core.py:700] EngineCore failed to start. [1;36m(EngineCore_0 pid=16412)[0;0m ERROR 08-20 08:26:38 [core.py:700] Traceback (most recent call last): [1;36m(EngineCore_0 pid=16412)[0;0m ERROR 08-20 08:26:38 [core.py:700] File "/usr/local/lib/python3.11/dist-packages/vllm/v1/engine/core.py", line 691, in run_engine_core [1;36m(EngineCore_0 pid=16412)[0;0m ERROR 08-20 08:26:38 [core.py:700] engine_core = EngineCoreProc(*args, **kwargs) [1;36m(EngineCore_0 pid=16412)[0;0m ERROR 08-20 08:26:38 [core.py:700] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [1;36m(EngineCore_0 pid=16412)[0;0m ERROR 08-20 08:26:38 [core.py:700] File "/usr/local/lib/python3.11/dist-packages/vllm/v1/engine/core.py", line 492, in init [1;36m(EngineCore_0 pid=16412)[0;0m ERROR 08-20 08:26:38 [core.py:700] super().init(vllm_config, executor_class, log_stats, [1;36m(EngineCore_0 pid=16412)[0;0m ERROR 08-20 08:26:38 [core.py:700] File "/usr/local/lib/python3.11/dist-packages/vllm/v1/engine/core.py", line 80, in init [1;36m(EngineCore_0 pid=16412)[0;0m ERROR 08-20 08:26:38 [core.py:700] self.model_executor = executor_class(vllm_config) [1;36m(EngineCore_0 pid=16412)[0;0m ERROR 08-20 08:26:38 [core.py:700] ^^^^^^^^^^^^^^^^^^^^^^^^^^^ [1;36m(EngineCore_0 pid=16412)[0;0m ERROR 08-20 08:26:38 [core.py:700] File "/usr/local/lib/python3.11/dist-packages/vllm/executor/executor_base.py", line 54, in init [1;36m(EngineCore_0 pid=16412)[0;0m ERROR 08-20 08:26:38 [core.py:700] self._init_executor() [1;36m(EngineCore_0 pid=16412)[0;0m ERROR 08-20 08:26:38 [core.py:700] File "/usr/local/lib/python3.11/dist-packages/vllm/executor/uniproc_executor.py", line 48, in _init_executor [1;36m(EngineCore_0 pid=16412)[0;0m ERROR 08-20 08:26:38 [core.py:700] self.collective_rpc("init_device") [1;36m(EngineCore_0 pid=16412)[0;0m ERROR 08-20 08:26:38 [core.py:700] File "/usr/local/lib/python3.11/dist-packages/vllm/executor/uniproc_executor.py", line 58, in collective_rpc [1;36m(EngineCore_0 pid=16412)[0;0m ERROR 08-20 08:26:38 [core.py:700] answer = run_method(self.driver_worker, method, args, kwargs) [1;36m(EngineCore_0 pid=16412)[0;0m ERROR 08-20 08:26:38 [core.py:700] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [1;36m(EngineCore_0 pid=16412)[0;0m ERROR 08-20 08:26:38 [core.py:700] File "/usr/local/lib/python3.11/dist-packages/vllm/utils/init.py", line 3007, in run_method [1;36m(EngineCore_0 pid=16412)[0;0m ERROR 08-20 08:26:38 [core.py:700] return func(*args, **kwargs) [1;36m(EngineCore_0 pid=16412)[0;0m ERROR 08-20 08:26:38 [core.py:700] ^^^^^^^^^^^^^^^^^^^^^ [1;36m(EngineCore_0 pid=16412)[0;0m ERROR 08-20 08:26:38 [core.py:700] File "/usr/local/lib/python3.11/dist-packages/vllm/worker/worker_base.py", line 603, in init_device [1;36m(EngineCore_0 pid=16412)[0;0m ERROR 08-20 08:26:38 [core.py:700] self.worker.init_device() # type: ignore [1;36m(EngineCore_0 pid=16412)[0;0m ERROR 08-20 08:26:38 [core.py:700] ^^^^^^^^^^^^^^^^^^^^^^^^^ [1;36m(EngineCore_0 pid=16412)[0;0m ERROR 08-20 08:26:38 [core.py:700] File "/usr/local/lib/python3.11/dist-packages/vllm/v1/worker/gpu_worker.py", line 179, in init_device [1;36m(EngineCore_0 pid=16412)[0;0m ERROR 08-20 08:26:38 [core.py:700] raise ValueError( [1;36m(EngineCore_0 pid=16412)[0;0m ERROR 08-20 08:26:38 [core.py:700] ValueError: Free memory on device (2.96/44.34 GiB) on startup is less than desired GPU memory utilization (0.9, 39.91 GiB). Decrease GPU memory utilization or reduce GPU memory used by other processes.
API Usage
Usage with PEFT
Performance
- Hot-swap Speed: 0.18s (switching between base and LoRA)
- GPU Memory: ~14.8 GiB
- Inference Speed: 0.18-0.19s per request after warmup
Files
- : LoRA configuration
- : LoRA weights (SafeTensors format)
- Tokenizer files included for convenience
- Downloads last month
- 5
Model tree for playdatakoo/omar-ax-lora-skt-4.0
Base model
skt/A.X-4.0-Light