[2025-01-26 08:58:45,699] torch.distributed.run: [WARNING] [2025-01-26 08:58:45,699] torch.distributed.run: [WARNING] ***************************************** [2025-01-26 08:58:45,699] torch.distributed.run: [WARNING] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. [2025-01-26 08:58:45,699] torch.distributed.run: [WARNING] ***************************************** [W socket.cpp:436] [c10d] The server socket has failed to bind to [::]:7852 (errno: 98 - Address already in use). [W socket.cpp:436] [c10d] The server socket has failed to bind to 0.0.0.0:7852 (errno: 98 - Address already in use). [E socket.cpp:472] [c10d] The server socket has failed to listen on any local network address. Traceback (most recent call last): File "/home/chaeyun/.conda/envs/risall/bin/torchrun", line 8, in sys.exit(main()) File "/home/chaeyun/.conda/envs/risall/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper return f(*args, **kwargs) File "/home/chaeyun/.conda/envs/risall/lib/python3.9/site-packages/torch/distributed/run.py", line 806, in main run(args) File "/home/chaeyun/.conda/envs/risall/lib/python3.9/site-packages/torch/distributed/run.py", line 797, in run elastic_launch( File "/home/chaeyun/.conda/envs/risall/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 134, in __call__ return launch_agent(self._config, self._entrypoint, list(args)) File "/home/chaeyun/.conda/envs/risall/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 255, in launch_agent result = agent.run() File "/home/chaeyun/.conda/envs/risall/lib/python3.9/site-packages/torch/distributed/elastic/metrics/api.py", line 124, in wrapper result = f(*args, **kwargs) File "/home/chaeyun/.conda/envs/risall/lib/python3.9/site-packages/torch/distributed/elastic/agent/server/api.py", line 736, in run result = self._invoke_run(role) File "/home/chaeyun/.conda/envs/risall/lib/python3.9/site-packages/torch/distributed/elastic/agent/server/api.py", line 871, in _invoke_run self._initialize_workers(self._worker_group) File "/home/chaeyun/.conda/envs/risall/lib/python3.9/site-packages/torch/distributed/elastic/metrics/api.py", line 124, in wrapper result = f(*args, **kwargs) File "/home/chaeyun/.conda/envs/risall/lib/python3.9/site-packages/torch/distributed/elastic/agent/server/api.py", line 705, in _initialize_workers self._rendezvous(worker_group) File "/home/chaeyun/.conda/envs/risall/lib/python3.9/site-packages/torch/distributed/elastic/metrics/api.py", line 124, in wrapper result = f(*args, **kwargs) File "/home/chaeyun/.conda/envs/risall/lib/python3.9/site-packages/torch/distributed/elastic/agent/server/api.py", line 546, in _rendezvous store, group_rank, group_world_size = spec.rdzv_handler.next_rendezvous() File "/home/chaeyun/.conda/envs/risall/lib/python3.9/site-packages/torch/distributed/elastic/rendezvous/static_tcp_rendezvous.py", line 54, in next_rendezvous self._store = TCPStore( # type: ignore[call-arg] RuntimeError: The server socket has failed to listen on any local network address. The server socket has failed to bind to [::]:7852 (errno: 98 - Address already in use). The server socket has failed to bind to 0.0.0.0:7852 (errno: 98 - Address already in use).