JayKimDevolved's picture
JayKimDevolved/deepseek
c011401 verified
[2025-01-17 17:09:47,023 I 28071 28071] (gcs_server) gcs_server_main.cc:52: Ray cluster metadata ray_version=2.40.0 ray_commit=22541c38dbef25286cd6d19f1c151bf4fd62f2ed
[2025-01-17 17:09:47,023 I 28071 28071] (gcs_server) io_service_pool.cc:35: IOServicePool is running with 1 io_service.
[2025-01-17 17:09:47,028 I 28071 28071] (gcs_server) event.cc:493: Ray Event initialized for GCS
[2025-01-17 17:09:47,029 I 28071 28071] (gcs_server) event.cc:493: Ray Event initialized for EXPORT_NODE
[2025-01-17 17:09:47,029 I 28071 28071] (gcs_server) event.cc:493: Ray Event initialized for EXPORT_ACTOR
[2025-01-17 17:09:47,029 I 28071 28071] (gcs_server) event.cc:493: Ray Event initialized for EXPORT_DRIVER_JOB
[2025-01-17 17:09:47,029 I 28071 28071] (gcs_server) event.cc:324: Set ray event level to warning
[2025-01-17 17:09:47,035 I 28071 28071] (gcs_server) gcs_server.cc:73: GCS storage type is StorageType::IN_MEMORY
[2025-01-17 17:09:47,036 I 28071 28071] (gcs_server) gcs_init_data.cc:42: Loading job table data.
[2025-01-17 17:09:47,036 I 28071 28071] (gcs_server) gcs_init_data.cc:54: Loading node table data.
[2025-01-17 17:09:47,036 I 28071 28071] (gcs_server) gcs_init_data.cc:80: Loading actor table data.
[2025-01-17 17:09:47,036 I 28071 28071] (gcs_server) gcs_init_data.cc:93: Loading actor task spec table data.
[2025-01-17 17:09:47,036 I 28071 28071] (gcs_server) gcs_init_data.cc:66: Loading placement group table data.
[2025-01-17 17:09:47,036 I 28071 28071] (gcs_server) gcs_init_data.cc:46: Finished loading job table data, size = 0
[2025-01-17 17:09:47,036 I 28071 28071] (gcs_server) gcs_init_data.cc:58: Finished loading node table data, size = 0
[2025-01-17 17:09:47,036 I 28071 28071] (gcs_server) gcs_init_data.cc:84: Finished loading actor table data, size = 0
[2025-01-17 17:09:47,036 I 28071 28071] (gcs_server) gcs_init_data.cc:97: Finished loading actor task spec table data, size = 0
[2025-01-17 17:09:47,036 I 28071 28071] (gcs_server) gcs_init_data.cc:71: Finished loading placement group table data, size = 0
[2025-01-17 17:09:47,036 I 28071 28071] (gcs_server) gcs_server.cc:162: No existing server cluster ID found. Generating new ID: 4abe1c98cef6db5bdc0214c42f730bab4c1e2feec081023be1a12880
[2025-01-17 17:09:47,037 I 28071 28071] (gcs_server) gcs_server.cc:644: Autoscaler V2 enabled: 0
[2025-01-17 17:09:47,042 I 28071 28071] (gcs_server) grpc_server.cc:134: GcsServer server started, listening on port 52229.
[2025-01-17 17:09:47,313 I 28071 28071] (gcs_server) gcs_server.cc:245: Gcs Debug state:
GcsNodeManager:
- RegisterNode request count: 0
- DrainNode request count: 0
- GetAllNodeInfo request count: 0
GcsActorManager:
- RegisterActor request count: 0
- CreateActor request count: 0
- GetActorInfo request count: 0
- GetNamedActorInfo request count: 0
- GetAllActorInfo request count: 0
- KillActor request count: 0
- ListNamedActors request count: 0
- Registered actors count: 0
- Destroyed actors count: 0
- Named actors count: 0
- Unresolved actors count: 0
- Pending actors count: 0
- Created actors count: 0
- owners_: 0
- actor_to_register_callbacks_: 0
- actor_to_restart_callbacks_: 0
- actor_to_create_callbacks_: 0
- sorted_destroyed_actor_list_: 0
GcsResourceManager:
- GetAllAvailableResources request count: 0
- GetAllTotalResources request count: 0
- GetAllResourceUsage request count: 0
GcsPlacementGroupManager:
- CreatePlacementGroup request count: 0
- RemovePlacementGroup request count: 0
- GetPlacementGroup request count: 0
- GetAllPlacementGroup request count: 0
- WaitPlacementGroupUntilReady request count: 0
- GetNamedPlacementGroup request count: 0
- Scheduling pending placement group count: 0
- Registered placement groups count: 0
- Named placement group count: 0
- Pending placement groups count: 0
- Infeasible placement groups count: 0
Publisher:
[runtime env manager] ID to URIs table:
[runtime env manager] URIs reference table:
GcsTaskManager:
-Total num task events reported: 0
-Total num status task events dropped: 0
-Total num profile events dropped: 0
-Current num of task events stored: 0
-Total num of actor creation tasks: 0
-Total num of actor tasks: 0
-Total num of normal tasks: 0
-Total num of driver tasks: 0
GcsAutoscalerStateManager:
- last_seen_autoscaler_state_version_: 0
- last_cluster_resource_state_version_: 0
- pending demands:
[2025-01-17 17:09:47,314 I 28071 28071] (gcs_server) gcs_server.cc:843: Main service Event stats:
Global stats: 25 total (5 active)
Queueing time: mean = 99.421 ms, max = 275.096 ms, min = 3.239 us, total = 2.486 s
Execution time: mean = 11.073 ms, total = 276.826 ms
Event stats:
GcsInMemoryStore.Put - 9 total (0 active), Execution time: mean = 30.569 ms, total = 275.123 ms, Queueing time: mean = 213.145 ms, max = 274.410 ms, min = 3.239 us, total = 1.918 s
GcsInMemoryStore.GetAll - 5 total (0 active), Execution time: mean = 13.003 us, total = 65.016 us, Queueing time: mean = 81.917 us, max = 88.340 us, min = 74.930 us, total = 409.584 us
PeriodicalRunner.RunFnPeriodically - 4 total (2 active, 1 running), Execution time: mean = 3.824 us, total = 15.294 us, Queueing time: mean = 137.508 ms, max = 275.096 ms, min = 274.937 ms, total = 550.033 ms
event_loop_lag_probe - 2 total (0 active), Execution time: mean = 11.691 us, total = 23.383 us, Queueing time: mean = 6.517 ms, max = 12.751 ms, min = 283.093 us, total = 13.034 ms
GcsInMemoryStore.Get - 1 total (0 active), Execution time: mean = 19.887 us, total = 19.887 us, Queueing time: mean = 4.575 us, max = 4.575 us, min = 4.575 us, total = 4.575 us
NodeInfoGcsService.grpc_server.GetClusterId - 1 total (1 active), Execution time: mean = 0.000 s, total = 0.000 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s
ClusterResourceManager.ResetRemoteNodeView - 1 total (1 active), Execution time: mean = 0.000 s, total = 0.000 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s
NodeInfoGcsService.grpc_server.GetClusterId.HandleRequestImpl - 1 total (0 active), Execution time: mean = 1.579 ms, total = 1.579 ms, Queueing time: mean = 3.746 ms, max = 3.746 ms, min = 3.746 ms, total = 3.746 ms
RayletLoadPulled - 1 total (1 active), Execution time: mean = 0.000 s, total = 0.000 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s
[2025-01-17 17:09:47,314 I 28071 28071] (gcs_server) gcs_server.cc:847: task_io_context Event stats:
Global stats: 5 total (1 active)
Queueing time: mean = 448.546 us, max = 1.420 ms, min = 11.735 us, total = 2.243 ms
Execution time: mean = 476.681 us, total = 2.383 ms
Event stats:
event_loop_lag_probe - 3 total (0 active), Execution time: mean = 789.405 us, total = 2.368 ms, Queueing time: mean = 714.334 us, max = 1.420 ms, min = 11.735 us, total = 2.143 ms
GcsTaskManager.GcJobSummary - 1 total (1 active), Execution time: mean = 0.000 s, total = 0.000 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s
PeriodicalRunner.RunFnPeriodically - 1 total (0 active), Execution time: mean = 15.188 us, total = 15.188 us, Queueing time: mean = 99.728 us, max = 99.728 us, min = 99.728 us, total = 99.728 us
[2025-01-17 17:09:47,314 I 28071 28071] (gcs_server) gcs_server.cc:847: pubsub_io_context Event stats:
Global stats: 5 total (1 active)
Queueing time: mean = 2.030 ms, max = 8.656 ms, min = 9.050 us, total = 10.152 ms
Execution time: mean = 24.955 us, total = 124.775 us
Event stats:
event_loop_lag_probe - 3 total (0 active), Execution time: mean = 32.790 us, total = 98.370 us, Queueing time: mean = 2.904 ms, max = 8.656 ms, min = 9.050 us, total = 8.712 ms
Publisher.CheckDeadSubscribers - 1 total (1 active), Execution time: mean = 0.000 s, total = 0.000 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s
PeriodicalRunner.RunFnPeriodically - 1 total (0 active), Execution time: mean = 26.405 us, total = 26.405 us, Queueing time: mean = 1.441 ms, max = 1.441 ms, min = 1.441 ms, total = 1.441 ms
[2025-01-17 17:09:47,314 I 28071 28071] (gcs_server) gcs_server.cc:847: ray_syncer_io_context Event stats:
Global stats: 5 total (0 active)
Queueing time: mean = 453.947 us, max = 2.190 ms, min = 8.269 us, total = 2.270 ms
Execution time: mean = 485.215 us, total = 2.426 ms
Event stats:
event_loop_lag_probe - 3 total (0 active), Execution time: mean = 807.981 us, total = 2.424 ms, Queueing time: mean = 738.633 us, max = 2.190 ms, min = 8.269 us, total = 2.216 ms
RaySyncerRegister - 2 total (0 active), Execution time: mean = 1.065 us, total = 2.130 us, Queueing time: mean = 26.919 us, max = 29.631 us, min = 24.208 us, total = 53.839 us
[2025-01-17 17:09:48,595 I 28071 28071] (gcs_server) gcs_node_manager.cc:85: Registering node info, address = 192.168.0.2, node name = 192.168.0.2 node_id=0df5098b63b61b261d6b38f00795725b5909c1b1e4c347b784459207
[2025-01-17 17:09:48,595 I 28071 28071] (gcs_server) gcs_node_manager.cc:91: Finished registering node info, address = 192.168.0.2, node name = 192.168.0.2, is_head_node = 1 node_id=0df5098b63b61b261d6b38f00795725b5909c1b1e4c347b784459207
[2025-01-17 17:09:48,595 I 28071 28071] (gcs_server) gcs_placement_group_manager.cc:819: A new node: 0df5098b63b61b261d6b38f00795725b5909c1b1e4c347b784459207 registered, will try to reschedule all the infeasible placement groups.
[2025-01-17 17:09:48,602 I 28071 28156] (gcs_server) ray_syncer.cc:377: Get connection node_id=0df5098b63b61b261d6b38f00795725b5909c1b1e4c347b784459207
[2025-01-17 17:09:55,315 I 28071 28071] (gcs_server) gcs_job_manager.cc:90: Adding job, job id = 01000000, driver pid = 28003
[2025-01-17 17:09:55,315 I 28071 28071] (gcs_server) gcs_job_manager.cc:111: Finished adding job, job id = 01000000, driver pid = 28003
[2025-01-17 17:09:57,058 W 28071 28094] (gcs_server) metric_exporter.cc:105: [1] Export metrics to agent failed: RpcError: RPC Error message: failed to connect to all addresses; last error: UNKNOWN: ipv4:127.0.0.1:35011: Failed to connect to remote host: Connection refused; RPC Error details: . This won't affect Ray, but you can lose metrics from the cluster.
[2025-01-17 17:10:01,310 I 28071 28071] (gcs_server) gcs_actor_manager.cc:393: Registering actor job_id=01000000 actor_id=fbd2f8f7cc7e7b4ac0fd0b3201000000
[2025-01-17 17:10:01,310 I 28071 28071] (gcs_server) gcs_actor_manager.cc:398: Registered actor, job id = 01000000, actor id = fbd2f8f7cc7e7b4ac0fd0b3201000000
[2025-01-17 17:10:01,312 I 28071 28071] (gcs_server) gcs_actor_manager.cc:479: Creating actor job_id=01000000 actor_id=fbd2f8f7cc7e7b4ac0fd0b3201000000
[2025-01-17 17:10:01,312 I 28071 28071] (gcs_server) gcs_actor_scheduler.cc:313: Start leasing worker from node 0df5098b63b61b261d6b38f00795725b5909c1b1e4c347b784459207 for actor fbd2f8f7cc7e7b4ac0fd0b3201000000, job id = 01000000
[2025-01-17 17:10:01,314 I 28071 28071] (gcs_server) gcs_actor_scheduler.cc:633: Finished leasing worker from 0df5098b63b61b261d6b38f00795725b5909c1b1e4c347b784459207 for actor fbd2f8f7cc7e7b4ac0fd0b3201000000, job id = 01000000
[2025-01-17 17:10:01,315 I 28071 28071] (gcs_server) gcs_actor_scheduler.cc:448: Start creating actor fbd2f8f7cc7e7b4ac0fd0b3201000000 on worker 2f2abf4327dfc8ec0c674120298d6cd1d20b4cf0c1448a7e6f0a3002 at node 0df5098b63b61b261d6b38f00795725b5909c1b1e4c347b784459207, job id = 01000000
[2025-01-17 17:10:02,057 I 28071 28071] (gcs_server) gcs_actor_scheduler.cc:485: Finished actor creation task for actor fbd2f8f7cc7e7b4ac0fd0b3201000000 on worker 2f2abf4327dfc8ec0c674120298d6cd1d20b4cf0c1448a7e6f0a3002 at node 0df5098b63b61b261d6b38f00795725b5909c1b1e4c347b784459207, job id = 01000000
[2025-01-17 17:10:02,057 I 28071 28071] (gcs_server) gcs_actor_manager.cc:1530: Actor created successfully job_id=01000000 actor_id=fbd2f8f7cc7e7b4ac0fd0b3201000000
[2025-01-17 17:10:02,058 I 28071 28071] (gcs_server) gcs_actor_manager.cc:494: Finished creating actor. Status: OK job_id=01000000 actor_id=fbd2f8f7cc7e7b4ac0fd0b3201000000
[2025-01-17 17:10:06,872 I 28071 28071] (gcs_server) gcs_actor_manager.cc:393: Registering actor job_id=01000000 actor_id=0b10e3a5771cad428b642af201000000
[2025-01-17 17:10:06,872 I 28071 28071] (gcs_server) gcs_actor_manager.cc:398: Registered actor, job id = 01000000, actor id = 0b10e3a5771cad428b642af201000000
[2025-01-17 17:10:06,874 I 28071 28071] (gcs_server) gcs_actor_manager.cc:479: Creating actor job_id=01000000 actor_id=0b10e3a5771cad428b642af201000000
[2025-01-17 17:10:06,874 I 28071 28071] (gcs_server) gcs_actor_scheduler.cc:313: Start leasing worker from node 0df5098b63b61b261d6b38f00795725b5909c1b1e4c347b784459207 for actor 0b10e3a5771cad428b642af201000000, job id = 01000000
[2025-01-17 17:10:06,952 I 28071 28071] (gcs_server) gcs_actor_scheduler.cc:633: Finished leasing worker from 0df5098b63b61b261d6b38f00795725b5909c1b1e4c347b784459207 for actor 0b10e3a5771cad428b642af201000000, job id = 01000000
[2025-01-17 17:10:06,953 I 28071 28071] (gcs_server) gcs_actor_scheduler.cc:448: Start creating actor 0b10e3a5771cad428b642af201000000 on worker a235db546c9cc02ab54504e49e89e366e21f24b693d4fbd560019419 at node 0df5098b63b61b261d6b38f00795725b5909c1b1e4c347b784459207, job id = 01000000
[2025-01-17 17:10:06,972 I 28071 28071] (gcs_server) gcs_actor_manager.cc:393: Registering actor job_id=01000000 actor_id=08faab1998d0b88e2d27195d01000000
[2025-01-17 17:10:06,972 W 28071 28071] (gcs_server) gcs_actor_manager.cc:403: Failed to register actor: NotFound: Actor with name 'AutoscalingRequester' already exists in the namespace AutoscalingRequester job_id=01000000 actor_id=08faab1998d0b88e2d27195d01000000
[2025-01-17 17:10:06,979 I 28071 28071] (gcs_server) gcs_actor_scheduler.cc:485: Finished actor creation task for actor 0b10e3a5771cad428b642af201000000 on worker a235db546c9cc02ab54504e49e89e366e21f24b693d4fbd560019419 at node 0df5098b63b61b261d6b38f00795725b5909c1b1e4c347b784459207, job id = 01000000
[2025-01-17 17:10:06,979 I 28071 28071] (gcs_server) gcs_actor_manager.cc:1530: Actor created successfully job_id=01000000 actor_id=0b10e3a5771cad428b642af201000000
[2025-01-17 17:10:06,979 I 28071 28071] (gcs_server) gcs_actor_manager.cc:494: Finished creating actor. Status: OK job_id=01000000 actor_id=0b10e3a5771cad428b642af201000000
[2025-01-17 17:10:34,327 I 28071 28071] (gcs_server) gcs_job_manager.cc:149: Finished marking job state, job id = 01000000
[2025-01-17 17:10:34,505 I 28071 28071] (gcs_server) gcs_node_manager.cc:366: Removing node, node name = 192.168.0.2, death reason = EXPECTED_TERMINATION, death message = received SIGTERM node_id=0df5098b63b61b261d6b38f00795725b5909c1b1e4c347b784459207
[2025-01-17 17:10:34,505 I 28071 28071] (gcs_server) gcs_placement_group_manager.cc:789: Node failed, rescheduling the placement groups on the dead node. node_id=0df5098b63b61b261d6b38f00795725b5909c1b1e4c347b784459207
[2025-01-17 17:10:34,505 I 28071 28071] (gcs_server) gcs_actor_manager.cc:1274: Node failed, reconstructing actors. node_id=0df5098b63b61b261d6b38f00795725b5909c1b1e4c347b784459207
[2025-01-17 17:10:34,505 I 28071 28071] (gcs_server) gcs_actor_manager.cc:1397: Actor is failed on worker 2f2abf4327dfc8ec0c674120298d6cd1d20b4cf0c1448a7e6f0a3002 at node 0df5098b63b61b261d6b38f00795725b5909c1b1e4c347b784459207, need_reschedule = 1, death context type = ActorDiedErrorContext, remaining_restarts = 0 job_id=01000000 actor_id=fbd2f8f7cc7e7b4ac0fd0b3201000000
[2025-01-17 17:10:34,505 I 28071 28071] (gcs_server) gcs_actor_manager.cc:936: Actor name datasets_stats_actor is cleand up.
[2025-01-17 17:10:34,505 I 28071 28071] (gcs_server) gcs_actor_manager.cc:1397: Actor is failed on worker a235db546c9cc02ab54504e49e89e366e21f24b693d4fbd560019419 at node 0df5098b63b61b261d6b38f00795725b5909c1b1e4c347b784459207, need_reschedule = 1, death context type = ActorDiedErrorContext, remaining_restarts = -1 job_id=01000000 actor_id=0b10e3a5771cad428b642af201000000
[2025-01-17 17:10:34,505 I 28071 28071] (gcs_server) gcs_job_manager.cc:454: Node failed, mark all jobs from this node as finished node_id=0df5098b63b61b261d6b38f00795725b5909c1b1e4c347b784459207
[2025-01-17 17:10:34,505 I 28071 28071] (gcs_server) gcs_actor_manager.cc:1023: Destroying actor job_id=01000000 actor_id=fbd2f8f7cc7e7b4ac0fd0b3201000000
[2025-01-17 17:10:35,505 I 28071 28071] (gcs_server) gcs_server_main.cc:130: GCS server received SIGTERM, shutting down...
[2025-01-17 17:10:35,650 I 28071 28071] (gcs_server) gcs_server.cc:267: Stopping GCS server.