JayKimDevolved's picture
JayKimDevolved/deepseek
c011401 verified
2025-01-15 18:16:20,502 INFO log_monitor.py:155 -- Starting log monitor with [max open files=200], [is_autoscaler_v2=False]
2025-01-15 18:16:20,504 INFO log_monitor.py:273 -- Beginning to track file raylet.err
2025-01-15 18:16:20,504 INFO log_monitor.py:273 -- Beginning to track file monitor.log
2025-01-15 18:16:20,504 INFO log_monitor.py:273 -- Beginning to track file gcs_server.err
2025-01-15 18:16:21,820 INFO log_monitor.py:273 -- Beginning to track file worker-acea9a61ee1069ddb6fabce683f53778dd7d8b787490ed6e46b93a46-ffffffff-522306.out
2025-01-15 18:16:21,820 INFO log_monitor.py:273 -- Beginning to track file worker-acea9a61ee1069ddb6fabce683f53778dd7d8b787490ed6e46b93a46-ffffffff-522306.err
2025-01-15 18:16:21,922 INFO log_monitor.py:273 -- Beginning to track file worker-6b8be585b555dea9399d586dd0624ac70e9dc315aa57782ea53a2e13-ffffffff-522312.out
2025-01-15 18:16:21,922 INFO log_monitor.py:273 -- Beginning to track file worker-014bf3dc158421ac12b50087bb1eab5843261c5008553d41374edf1f-ffffffff-522310.out
2025-01-15 18:16:21,922 INFO log_monitor.py:273 -- Beginning to track file worker-6b8be585b555dea9399d586dd0624ac70e9dc315aa57782ea53a2e13-ffffffff-522312.err
2025-01-15 18:16:21,922 INFO log_monitor.py:273 -- Beginning to track file worker-014bf3dc158421ac12b50087bb1eab5843261c5008553d41374edf1f-ffffffff-522310.err
2025-01-15 18:16:21,923 INFO log_monitor.py:273 -- Beginning to track file worker-deb2de52dd7f814a844bf0e1dbcc9da0f2fb1e349a693d4b6199f853-ffffffff-522324.out
2025-01-15 18:16:21,923 INFO log_monitor.py:273 -- Beginning to track file worker-2cfba5e917b46bc0198205a2c9aba75974041c286bb5e5a38401d368-ffffffff-522305.out
2025-01-15 18:16:21,923 INFO log_monitor.py:273 -- Beginning to track file worker-deb2de52dd7f814a844bf0e1dbcc9da0f2fb1e349a693d4b6199f853-ffffffff-522324.err
2025-01-15 18:16:21,923 INFO log_monitor.py:273 -- Beginning to track file worker-2cfba5e917b46bc0198205a2c9aba75974041c286bb5e5a38401d368-ffffffff-522305.err
2025-01-15 18:16:21,923 INFO log_monitor.py:273 -- Beginning to track file worker-4615456bf1b219d616307a025d9399aa0b35d0a49d5476057444fa0f-ffffffff-522317.out
2025-01-15 18:16:21,923 INFO log_monitor.py:273 -- Beginning to track file worker-9ee57e075376b2d829c83f7ecf234b8331856dfabcf2a87d31e103b6-ffffffff-522311.out
2025-01-15 18:16:21,924 INFO log_monitor.py:273 -- Beginning to track file worker-4615456bf1b219d616307a025d9399aa0b35d0a49d5476057444fa0f-ffffffff-522317.err
2025-01-15 18:16:21,924 INFO log_monitor.py:273 -- Beginning to track file worker-9ee57e075376b2d829c83f7ecf234b8331856dfabcf2a87d31e103b6-ffffffff-522311.err
2025-01-15 18:16:21,924 INFO log_monitor.py:273 -- Beginning to track file worker-6bdfdcb24ea64b97e4173d8f71ccf8783a0a122874cadaccad6cb438-ffffffff-522315.out
2025-01-15 18:16:21,924 INFO log_monitor.py:273 -- Beginning to track file worker-6bdfdcb24ea64b97e4173d8f71ccf8783a0a122874cadaccad6cb438-ffffffff-522315.err
2025-01-15 18:16:21,924 INFO log_monitor.py:273 -- Beginning to track file worker-aee98ee4863dbd2ecae414c6fb8f21eb248b1d378bd03b4125355184-ffffffff-522316.out
2025-01-15 18:16:21,924 INFO log_monitor.py:273 -- Beginning to track file worker-ec3f3b9e5bf8c6b523bfa9bbf9d11e9c7272b33053a9143a8ff7818a-ffffffff-522313.out
2025-01-15 18:16:21,924 INFO log_monitor.py:273 -- Beginning to track file worker-5f35dbcb4060c173ec8b34821282594cae466bab3c3a2677ef88304e-ffffffff-522319.out
2025-01-15 18:16:21,925 INFO log_monitor.py:273 -- Beginning to track file worker-aee98ee4863dbd2ecae414c6fb8f21eb248b1d378bd03b4125355184-ffffffff-522316.err
2025-01-15 18:16:21,925 INFO log_monitor.py:273 -- Beginning to track file worker-1963bf2627a8bf7f0a95476180dff5e2d8af7f50338bc02d99c62fa6-ffffffff-522308.out
2025-01-15 18:16:21,925 INFO log_monitor.py:273 -- Beginning to track file worker-ec3f3b9e5bf8c6b523bfa9bbf9d11e9c7272b33053a9143a8ff7818a-ffffffff-522313.err
2025-01-15 18:16:21,925 INFO log_monitor.py:273 -- Beginning to track file worker-5f35dbcb4060c173ec8b34821282594cae466bab3c3a2677ef88304e-ffffffff-522319.err
2025-01-15 18:16:21,925 INFO log_monitor.py:273 -- Beginning to track file worker-1963bf2627a8bf7f0a95476180dff5e2d8af7f50338bc02d99c62fa6-ffffffff-522308.err
2025-01-15 18:16:21,925 INFO log_monitor.py:273 -- Beginning to track file worker-8a657dc31f6671a502c4c013bacdb623a13a64ce8be5999d6398f223-ffffffff-522307.out
2025-01-15 18:16:21,926 INFO log_monitor.py:273 -- Beginning to track file worker-8a657dc31f6671a502c4c013bacdb623a13a64ce8be5999d6398f223-ffffffff-522307.err
2025-01-15 18:16:21,926 INFO log_monitor.py:273 -- Beginning to track file worker-282a4fea3dec00fc664b38361ddaad3178289e4bb231c41c819c8dfa-ffffffff-522323.out
2025-01-15 18:16:21,926 INFO log_monitor.py:273 -- Beginning to track file worker-282a4fea3dec00fc664b38361ddaad3178289e4bb231c41c819c8dfa-ffffffff-522323.err
2025-01-15 18:16:21,926 INFO log_monitor.py:273 -- Beginning to track file worker-eb073a4e8f05e3d8228ecd987d0c1524b1192bddbaeb549be34151bc-ffffffff-522321.out
2025-01-15 18:16:21,926 INFO log_monitor.py:273 -- Beginning to track file worker-eb073a4e8f05e3d8228ecd987d0c1524b1192bddbaeb549be34151bc-ffffffff-522321.err
2025-01-15 18:16:21,926 INFO log_monitor.py:273 -- Beginning to track file worker-3151a242163e735910eaa2ba527804c8f9d0c27cdd8a801e30f97628-ffffffff-522322.out
2025-01-15 18:16:21,927 INFO log_monitor.py:273 -- Beginning to track file worker-3151a242163e735910eaa2ba527804c8f9d0c27cdd8a801e30f97628-ffffffff-522322.err
2025-01-15 18:16:21,927 INFO log_monitor.py:273 -- Beginning to track file worker-21d91e3214a9ef8ba8f0eadd8380acf5be57071ac2cae18459445bc8-ffffffff-522314.out
2025-01-15 18:16:21,927 INFO log_monitor.py:273 -- Beginning to track file worker-21d91e3214a9ef8ba8f0eadd8380acf5be57071ac2cae18459445bc8-ffffffff-522314.err
2025-01-15 18:16:21,927 INFO log_monitor.py:273 -- Beginning to track file worker-0dc9fa5d099b69e4524196fa3413f4cf04e3889754eeefcf30c4fc90-ffffffff-522309.out
2025-01-15 18:16:21,927 INFO log_monitor.py:273 -- Beginning to track file worker-0dc9fa5d099b69e4524196fa3413f4cf04e3889754eeefcf30c4fc90-ffffffff-522309.err
2025-01-15 18:16:21,927 INFO log_monitor.py:273 -- Beginning to track file worker-c4f60d1ed2c3e9938afaea697b5fceda751cfe95eba34f8253af0a33-ffffffff-522318.out
2025-01-15 18:16:21,928 INFO log_monitor.py:273 -- Beginning to track file worker-c4f60d1ed2c3e9938afaea697b5fceda751cfe95eba34f8253af0a33-ffffffff-522318.err
2025-01-15 18:16:21,928 INFO log_monitor.py:273 -- Beginning to track file worker-7c8187d120445f3a7c0903df91d0fecc52af5d5078290dad7f0b335d-ffffffff-522320.out
2025-01-15 18:16:21,928 INFO log_monitor.py:273 -- Beginning to track file worker-7c8187d120445f3a7c0903df91d0fecc52af5d5078290dad7f0b335d-ffffffff-522320.err
2025-01-15 18:16:23,994 INFO log_monitor.py:273 -- Beginning to track file worker-3b56424e9a2e5670e75dabbe3e4f2b92d7c0012f6894cd0971af8a3b-01000000-524007.out
2025-01-15 18:16:23,995 INFO log_monitor.py:273 -- Beginning to track file worker-3b56424e9a2e5670e75dabbe3e4f2b92d7c0012f6894cd0971af8a3b-01000000-524007.err
2025-01-15 18:16:25,240 INFO log_monitor.py:273 -- Beginning to track file worker-dda7384c4695bde2040f51bd4ea9dfe1ea365840a5f87883089cb692-01000000-524108.out
2025-01-15 18:16:25,240 INFO log_monitor.py:273 -- Beginning to track file worker-dda7384c4695bde2040f51bd4ea9dfe1ea365840a5f87883089cb692-01000000-524108.err
2025-01-15 18:16:26,381 INFO log_monitor.py:273 -- Beginning to track file worker-b97ab9b718397eb08aca79f462bb3146719e9641a18515a523b0b3a6-01000000-524209.out
2025-01-15 18:16:26,381 INFO log_monitor.py:273 -- Beginning to track file worker-b97ab9b718397eb08aca79f462bb3146719e9641a18515a523b0b3a6-01000000-524209.err
2025-01-15 18:16:27,522 INFO log_monitor.py:273 -- Beginning to track file worker-6b73cf235ecd61e8cca6e7e0e05c395ac2b0cf165c946f461ac7f685-01000000-524310.out
2025-01-15 18:16:27,523 INFO log_monitor.py:273 -- Beginning to track file worker-6b73cf235ecd61e8cca6e7e0e05c395ac2b0cf165c946f461ac7f685-01000000-524310.err
2025-01-15 18:16:28,366 ERROR log_monitor.py:359 -- Failed to publish log messages {'ip': '192.168.0.2', 'pid': 'raylet', 'job': None, 'is_err': True, 'lines': ['[2025-01-15 18:16:28,297 C 522173 522173] (raylet) node_manager.cc:1043: [Timeout] Exiting because this node manager has mistakenly been marked as dead by the GCS: GCS failed to check the health of this node for 5 times. This is likely because the machine or raylet has become overloaded.', '*** StackTrace Information ***', '/usr/local/lib/python3.10/dist-packages/ray/core/src/ray/raylet/raylet(+0xbdf73a) [0x55cefe38d73a] ray::operator<<()', '/usr/local/lib/python3.10/dist-packages/ray/core/src/ray/raylet/raylet(+0xbe1b21) [0x55cefe38fb21] ray::RayLog::~RayLog()', '/usr/local/lib/python3.10/dist-packages/ray/core/src/ray/raylet/raylet(+0x323299) [0x55cefdad1299] ray::raylet::NodeManager::NodeRemoved()', '/usr/local/lib/python3.10/dist-packages/ray/core/src/ray/raylet/raylet(+0x536e69) [0x55cefdce4e69] ray::gcs::NodeInfoAccessor::HandleNotification()', '/usr/local/lib/python3.10/dist-packages/ray/core/src/ray/raylet/raylet(+0x669e98) [0x55cefde17e98] EventTracker::RecordExecution()', '/usr/local/lib/python3.10/dist-packages/ray/core/src/ray/raylet/raylet(+0x664e8e) [0x55cefde12e8e] std::_Function_handler<>::_M_invoke()', '/usr/local/lib/python3.10/dist-packages/ray/core/src/ray/raylet/raylet(+0x665306) [0x55cefde13306] boost::asio::detail::completion_handler<>::do_complete()', '/usr/local/lib/python3.10/dist-packages/ray/core/src/ray/raylet/raylet(+0xc53f9b) [0x55cefe401f9b] boost::asio::detail::scheduler::do_run_one()', '/usr/local/lib/python3.10/dist-packages/ray/core/src/ray/raylet/raylet(+0xc56529) [0x55cefe404529] boost::asio::detail::scheduler::run()', '/usr/local/lib/python3.10/dist-packages/ray/core/src/ray/raylet/raylet(+0xc56a42) [0x55cefe404a42] boost::asio::io_context::run()', '/usr/local/lib/python3.10/dist-packages/ray/core/src/ray/raylet/raylet(+0x1e9155) [0x55cefd997155] main', '/lib/x86_64-linux-gnu/libc.so.6(+0x29d90) [0x7fc9b614fd90]', '/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x80) [0x7fc9b614fe40] __libc_start_main', '/usr/local/lib/python3.10/dist-packages/ray/core/src/ray/raylet/raylet(+0x243277) [0x55cefd9f1277]', ''], 'actor_name': None, 'task_name': None}
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/ray/_private/log_monitor.py", line 357, in flush
self.publisher.publish_logs(data)
File "python/ray/_raylet.pyx", line 3122, in ray._raylet.GcsPublisher.publish_logs
File "python/ray/includes/common.pxi", line 104, in ray._raylet.check_status
ray.exceptions.RaySystemError: System error: CANCELLED