1
0
mw-lifecycle-analysis/p2/quest/parallel-mw-olmo-info-cat.log
2025-09-04 10:12:34 -05:00

342 lines
24 KiB
Plaintext
Raw Blame History

This file contains invisible Unicode characters

This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

setting up the environment by loading in conda environment at Thu Sep 4 10:04:55 CDT 2025
running the bertopic job at Thu Sep 4 10:04:55 CDT 2025
----------------------------------------
srun job start: Thu Sep 4 10:04:55 CDT 2025
Job ID: 3272179
Username: nws8519
Queue: gengpu
Account: p32852
----------------------------------------
The following variables are not
guaranteed to be the same in the
prologue and the job run script
----------------------------------------
PATH (in prologue) : /home/nws8519/.conda/envs/olmo/bin:/software/miniconda3/4.12.0/condabin:/home/nws8519/.local/bin:/home/nws8519/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/usr/lpp/mmfs/bin:/hpc/usertools
WORKDIR is: /home/nws8519
----------------------------------------
W0904 10:05:10.900000 1845275 /gpfs/home/nws8519/.conda/envs/olmo/lib/python3.11/site-packages/torch/distributed/run.py:766]
W0904 10:05:10.900000 1845275 /gpfs/home/nws8519/.conda/envs/olmo/lib/python3.11/site-packages/torch/distributed/run.py:766] *****************************************
W0904 10:05:10.900000 1845275 /gpfs/home/nws8519/.conda/envs/olmo/lib/python3.11/site-packages/torch/distributed/run.py:766] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
W0904 10:05:10.900000 1845275 /gpfs/home/nws8519/.conda/envs/olmo/lib/python3.11/site-packages/torch/distributed/run.py:766] *****************************************
W0904 10:05:10.900000 1845276 /gpfs/home/nws8519/.conda/envs/olmo/lib/python3.11/site-packages/torch/distributed/run.py:766]
W0904 10:05:10.900000 1845276 /gpfs/home/nws8519/.conda/envs/olmo/lib/python3.11/site-packages/torch/distributed/run.py:766] *****************************************
W0904 10:05:10.900000 1845276 /gpfs/home/nws8519/.conda/envs/olmo/lib/python3.11/site-packages/torch/distributed/run.py:766] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
W0904 10:05:10.900000 1845276 /gpfs/home/nws8519/.conda/envs/olmo/lib/python3.11/site-packages/torch/distributed/run.py:766] *****************************************
W0904 10:05:10.906000 1400307 /gpfs/home/nws8519/.conda/envs/olmo/lib/python3.11/site-packages/torch/distributed/run.py:766]
W0904 10:05:10.906000 1400307 /gpfs/home/nws8519/.conda/envs/olmo/lib/python3.11/site-packages/torch/distributed/run.py:766] *****************************************
W0904 10:05:10.906000 1400307 /gpfs/home/nws8519/.conda/envs/olmo/lib/python3.11/site-packages/torch/distributed/run.py:766] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
W0904 10:05:10.906000 1400307 /gpfs/home/nws8519/.conda/envs/olmo/lib/python3.11/site-packages/torch/distributed/run.py:766] *****************************************
W0904 10:05:10.907000 1400308 /gpfs/home/nws8519/.conda/envs/olmo/lib/python3.11/site-packages/torch/distributed/run.py:766]
W0904 10:05:10.907000 1400308 /gpfs/home/nws8519/.conda/envs/olmo/lib/python3.11/site-packages/torch/distributed/run.py:766] *****************************************
W0904 10:05:10.907000 1400308 /gpfs/home/nws8519/.conda/envs/olmo/lib/python3.11/site-packages/torch/distributed/run.py:766] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
W0904 10:05:10.907000 1400308 /gpfs/home/nws8519/.conda/envs/olmo/lib/python3.11/site-packages/torch/distributed/run.py:766] *****************************************
/home/nws8519/git/mw-lifecycle-analysis/p2/quest/python_scripts/olmo_parallel_cat.py:117: DtypeWarning: Columns (21) have mixed types. Specify dtype option on import or set low_memory=False.
df = pd.read_csv("/home/nws8519/git/mw-lifecycle-analysis/p2/quest/072525_pp_biberplus_labels.csv")
/home/nws8519/git/mw-lifecycle-analysis/p2/quest/python_scripts/olmo_parallel_cat.py:117: DtypeWarning: Columns (21) have mixed types. Specify dtype option on import or set low_memory=False.
df = pd.read_csv("/home/nws8519/git/mw-lifecycle-analysis/p2/quest/072525_pp_biberplus_labels.csv")
[rank0]: Traceback (most recent call last):
[rank0]: File "/home/nws8519/git/mw-lifecycle-analysis/p2/quest/python_scripts/olmo_parallel_cat.py", line 178, in <module>
[rank0]: main()
[rank0]: File "/home/nws8519/git/mw-lifecycle-analysis/p2/quest/python_scripts/olmo_parallel_cat.py", line 122, in main
[rank0]: dataset = SentenceDataset(comment_texts, comment_types, priming, typology, instructions)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/home/nws8519/git/mw-lifecycle-analysis/p2/quest/python_scripts/olmo_parallel_cat.py", line 76, in __init__
[rank0]: sentences = split_to_sentences(cleaned_comment)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/home/nws8519/git/mw-lifecycle-analysis/p2/quest/python_scripts/olmo_parallel_cat.py", line 106, in split_to_sentences
[rank0]: return nltk.sent_tokenize(text)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/home/nws8519/.conda/envs/olmo/lib/python3.11/site-packages/nltk/tokenize/__init__.py", line 119, in sent_tokenize
[rank0]: tokenizer = _get_punkt_tokenizer(language)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/home/nws8519/.conda/envs/olmo/lib/python3.11/site-packages/nltk/tokenize/__init__.py", line 105, in _get_punkt_tokenizer
[rank0]: return PunktTokenizer(language)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/home/nws8519/.conda/envs/olmo/lib/python3.11/site-packages/nltk/tokenize/punkt.py", line 1744, in __init__
[rank0]: self.load_lang(lang)
[rank0]: File "/home/nws8519/.conda/envs/olmo/lib/python3.11/site-packages/nltk/tokenize/punkt.py", line 1749, in load_lang
[rank0]: lang_dir = find(f"tokenizers/punkt_tab/{lang}/")
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/home/nws8519/.conda/envs/olmo/lib/python3.11/site-packages/nltk/data.py", line 579, in find
[rank0]: raise LookupError(resource_not_found)
[rank0]: LookupError:
[rank0]: **********************************************************************
[rank0]: Resource punkt_tab not found.
[rank0]: Please use the NLTK Downloader to obtain the resource:
[rank0]: >>> import nltk
[rank0]: >>> nltk.download('punkt_tab')
[rank0]: 
[rank0]: For more information see: https://www.nltk.org/data.html
[rank0]: Attempted to load tokenizers/punkt_tab/english/
[rank0]: Searched in:
[rank0]: - '/home/nws8519/nltk_data'
[rank0]: - '/home/nws8519/.conda/envs/olmo/nltk_data'
[rank0]: - '/home/nws8519/.conda/envs/olmo/share/nltk_data'
[rank0]: - '/home/nws8519/.conda/envs/olmo/lib/nltk_data'
[rank0]: - '/usr/share/nltk_data'
[rank0]: - '/usr/local/share/nltk_data'
[rank0]: - '/usr/lib/nltk_data'
[rank0]: - '/usr/local/lib/nltk_data'
[rank0]: **********************************************************************
[rank2]: Traceback (most recent call last):
[rank2]: File "/home/nws8519/git/mw-lifecycle-analysis/p2/quest/python_scripts/olmo_parallel_cat.py", line 178, in <module>
[rank2]: main()
[rank2]: File "/home/nws8519/git/mw-lifecycle-analysis/p2/quest/python_scripts/olmo_parallel_cat.py", line 122, in main
[rank2]: dataset = SentenceDataset(comment_texts, comment_types, priming, typology, instructions)
[rank2]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank2]: File "/home/nws8519/git/mw-lifecycle-analysis/p2/quest/python_scripts/olmo_parallel_cat.py", line 76, in __init__
[rank2]: sentences = split_to_sentences(cleaned_comment)
[rank2]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank2]: File "/home/nws8519/git/mw-lifecycle-analysis/p2/quest/python_scripts/olmo_parallel_cat.py", line 106, in split_to_sentences
[rank2]: return nltk.sent_tokenize(text)
[rank2]: ^^^^^^^^^^^^^^^^^^^^^^^^
[rank2]: File "/home/nws8519/.conda/envs/olmo/lib/python3.11/site-packages/nltk/tokenize/__init__.py", line 119, in sent_tokenize
[rank2]: tokenizer = _get_punkt_tokenizer(language)
[rank2]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank2]: File "/home/nws8519/.conda/envs/olmo/lib/python3.11/site-packages/nltk/tokenize/__init__.py", line 105, in _get_punkt_tokenizer
[rank2]: return PunktTokenizer(language)
[rank2]: ^^^^^^^^^^^^^^^^^^^^^^^^
[rank2]: File "/home/nws8519/.conda/envs/olmo/lib/python3.11/site-packages/nltk/tokenize/punkt.py", line 1744, in __init__
[rank2]: self.load_lang(lang)
[rank2]: File "/home/nws8519/.conda/envs/olmo/lib/python3.11/site-packages/nltk/tokenize/punkt.py", line 1749, in load_lang
[rank2]: lang_dir = find(f"tokenizers/punkt_tab/{lang}/")
[rank2]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank2]: File "/home/nws8519/.conda/envs/olmo/lib/python3.11/site-packages/nltk/data.py", line 579, in find
[rank2]: raise LookupError(resource_not_found)
[rank2]: LookupError:
[rank2]: **********************************************************************
[rank2]: Resource punkt_tab not found.
[rank2]: Please use the NLTK Downloader to obtain the resource:
[rank2]: >>> import nltk
[rank2]: >>> nltk.download('punkt_tab')
[rank2]: 
[rank2]: For more information see: https://www.nltk.org/data.html
[rank2]: Attempted to load tokenizers/punkt_tab/english/
[rank2]: Searched in:
[rank2]: - '/home/nws8519/nltk_data'
[rank2]: - '/home/nws8519/.conda/envs/olmo/nltk_data'
[rank2]: - '/home/nws8519/.conda/envs/olmo/share/nltk_data'
[rank2]: - '/home/nws8519/.conda/envs/olmo/lib/nltk_data'
[rank2]: - '/usr/share/nltk_data'
[rank2]: - '/usr/local/share/nltk_data'
[rank2]: - '/usr/lib/nltk_data'
[rank2]: - '/usr/local/lib/nltk_data'
[rank2]: **********************************************************************
/home/nws8519/git/mw-lifecycle-analysis/p2/quest/python_scripts/olmo_parallel_cat.py:117: DtypeWarning: Columns (21) have mixed types. Specify dtype option on import or set low_memory=False.
df = pd.read_csv("/home/nws8519/git/mw-lifecycle-analysis/p2/quest/072525_pp_biberplus_labels.csv")
/home/nws8519/git/mw-lifecycle-analysis/p2/quest/python_scripts/olmo_parallel_cat.py:117: DtypeWarning: Columns (21) have mixed types. Specify dtype option on import or set low_memory=False.
df = pd.read_csv("/home/nws8519/git/mw-lifecycle-analysis/p2/quest/072525_pp_biberplus_labels.csv")
[rank1]: Traceback (most recent call last):
[rank1]: File "/home/nws8519/git/mw-lifecycle-analysis/p2/quest/python_scripts/olmo_parallel_cat.py", line 178, in <module>
[rank1]: main()
[rank1]: File "/home/nws8519/git/mw-lifecycle-analysis/p2/quest/python_scripts/olmo_parallel_cat.py", line 122, in main
[rank1]: dataset = SentenceDataset(comment_texts, comment_types, priming, typology, instructions)
[rank1]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]: File "/home/nws8519/git/mw-lifecycle-analysis/p2/quest/python_scripts/olmo_parallel_cat.py", line 76, in __init__
[rank1]: sentences = split_to_sentences(cleaned_comment)
[rank1]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]: File "/home/nws8519/git/mw-lifecycle-analysis/p2/quest/python_scripts/olmo_parallel_cat.py", line 106, in split_to_sentences
[rank1]: return nltk.sent_tokenize(text)
[rank1]: ^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]: File "/home/nws8519/.conda/envs/olmo/lib/python3.11/site-packages/nltk/tokenize/__init__.py", line 119, in sent_tokenize
[rank1]: tokenizer = _get_punkt_tokenizer(language)
[rank1]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]: File "/home/nws8519/.conda/envs/olmo/lib/python3.11/site-packages/nltk/tokenize/__init__.py", line 105, in _get_punkt_tokenizer
[rank1]: return PunktTokenizer(language)
[rank1]: ^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]: File "/home/nws8519/.conda/envs/olmo/lib/python3.11/site-packages/nltk/tokenize/punkt.py", line 1744, in __init__
[rank1]: self.load_lang(lang)
[rank1]: File "/home/nws8519/.conda/envs/olmo/lib/python3.11/site-packages/nltk/tokenize/punkt.py", line 1749, in load_lang
[rank1]: lang_dir = find(f"tokenizers/punkt_tab/{lang}/")
[rank1]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]: File "/home/nws8519/.conda/envs/olmo/lib/python3.11/site-packages/nltk/data.py", line 579, in find
[rank1]: raise LookupError(resource_not_found)
[rank1]: LookupError:
[rank1]: **********************************************************************
[rank1]: Resource punkt_tab not found.
[rank1]: Please use the NLTK Downloader to obtain the resource:
[rank1]: >>> import nltk
[rank1]: >>> nltk.download('punkt_tab')
[rank1]: 
[rank1]: For more information see: https://www.nltk.org/data.html
[rank1]: Attempted to load tokenizers/punkt_tab/english/
[rank1]: Searched in:
[rank1]: - '/home/nws8519/nltk_data'
[rank1]: - '/home/nws8519/.conda/envs/olmo/nltk_data'
[rank1]: - '/home/nws8519/.conda/envs/olmo/share/nltk_data'
[rank1]: - '/home/nws8519/.conda/envs/olmo/lib/nltk_data'
[rank1]: - '/usr/share/nltk_data'
[rank1]: - '/usr/local/share/nltk_data'
[rank1]: - '/usr/lib/nltk_data'
[rank1]: - '/usr/local/lib/nltk_data'
[rank1]: **********************************************************************
[rank3]: Traceback (most recent call last):
[rank3]: File "/home/nws8519/git/mw-lifecycle-analysis/p2/quest/python_scripts/olmo_parallel_cat.py", line 178, in <module>
[rank3]: main()
[rank3]: File "/home/nws8519/git/mw-lifecycle-analysis/p2/quest/python_scripts/olmo_parallel_cat.py", line 122, in main
[rank3]: dataset = SentenceDataset(comment_texts, comment_types, priming, typology, instructions)
[rank3]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank3]: File "/home/nws8519/git/mw-lifecycle-analysis/p2/quest/python_scripts/olmo_parallel_cat.py", line 76, in __init__
[rank3]: sentences = split_to_sentences(cleaned_comment)
[rank3]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank3]: File "/home/nws8519/git/mw-lifecycle-analysis/p2/quest/python_scripts/olmo_parallel_cat.py", line 106, in split_to_sentences
[rank3]: return nltk.sent_tokenize(text)
[rank3]: ^^^^^^^^^^^^^^^^^^^^^^^^
[rank3]: File "/home/nws8519/.conda/envs/olmo/lib/python3.11/site-packages/nltk/tokenize/__init__.py", line 119, in sent_tokenize
[rank3]: tokenizer = _get_punkt_tokenizer(language)
[rank3]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank3]: File "/home/nws8519/.conda/envs/olmo/lib/python3.11/site-packages/nltk/tokenize/__init__.py", line 105, in _get_punkt_tokenizer
[rank3]: return PunktTokenizer(language)
[rank3]: ^^^^^^^^^^^^^^^^^^^^^^^^
[rank3]: File "/home/nws8519/.conda/envs/olmo/lib/python3.11/site-packages/nltk/tokenize/punkt.py", line 1744, in __init__
[rank3]: self.load_lang(lang)
[rank3]: File "/home/nws8519/.conda/envs/olmo/lib/python3.11/site-packages/nltk/tokenize/punkt.py", line 1749, in load_lang
[rank3]: lang_dir = find(f"tokenizers/punkt_tab/{lang}/")
[rank3]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank3]: File "/home/nws8519/.conda/envs/olmo/lib/python3.11/site-packages/nltk/data.py", line 579, in find
[rank3]: raise LookupError(resource_not_found)
[rank3]: LookupError:
[rank3]: **********************************************************************
[rank3]: Resource punkt_tab not found.
[rank3]: Please use the NLTK Downloader to obtain the resource:
[rank3]: >>> import nltk
[rank3]: >>> nltk.download('punkt_tab')
[rank3]: 
[rank3]: For more information see: https://www.nltk.org/data.html
[rank3]: Attempted to load tokenizers/punkt_tab/english/
[rank3]: Searched in:
[rank3]: - '/home/nws8519/nltk_data'
[rank3]: - '/home/nws8519/.conda/envs/olmo/nltk_data'
[rank3]: - '/home/nws8519/.conda/envs/olmo/share/nltk_data'
[rank3]: - '/home/nws8519/.conda/envs/olmo/lib/nltk_data'
[rank3]: - '/usr/share/nltk_data'
[rank3]: - '/usr/local/share/nltk_data'
[rank3]: - '/usr/lib/nltk_data'
[rank3]: - '/usr/local/lib/nltk_data'
[rank3]: **********************************************************************
[rank2]:[W904 10:05:56.100290280 ProcessGroupNCCL.cpp:1476] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
[rank0]:[W904 10:05:56.107999460 ProcessGroupNCCL.cpp:1476] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
W0904 10:05:57.705000 1400307 /gpfs/home/nws8519/.conda/envs/olmo/lib/python3.11/site-packages/torch/distributed/elastic/multiprocessing/api.py:900] Sending process 1400332 closing signal SIGTERM
W0904 10:05:57.720000 1400308 /gpfs/home/nws8519/.conda/envs/olmo/lib/python3.11/site-packages/torch/distributed/elastic/multiprocessing/api.py:900] Sending process 1400334 closing signal SIGTERM
E0904 10:05:57.770000 1400307 /gpfs/home/nws8519/.conda/envs/olmo/lib/python3.11/site-packages/torch/distributed/elastic/multiprocessing/api.py:874] failed (exitcode: 1) local_rank: 0 (pid: 1400331) of binary: /home/nws8519/.conda/envs/olmo/bin/python3.11
Traceback (most recent call last):
File "/home/nws8519/.conda/envs/olmo/bin/torchrun", line 8, in <module>
sys.exit(main())
^^^^^^
File "/home/nws8519/.conda/envs/olmo/lib/python3.11/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 355, in wrapper
return f(*args, **kwargs)
^^^^^^^^^^^^^^^^^^
File "/home/nws8519/.conda/envs/olmo/lib/python3.11/site-packages/torch/distributed/run.py", line 892, in main
run(args)
File "/home/nws8519/.conda/envs/olmo/lib/python3.11/site-packages/torch/distributed/run.py", line 883, in run
elastic_launch(
File "/home/nws8519/.conda/envs/olmo/lib/python3.11/site-packages/torch/distributed/launcher/api.py", line 139, in __call__
return launch_agent(self._config, self._entrypoint, list(args))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/nws8519/.conda/envs/olmo/lib/python3.11/site-packages/torch/distributed/launcher/api.py", line 270, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
============================================================
/home/nws8519/git/mw-lifecycle-analysis/p2/quest/python_scripts/olmo_parallel_cat.py FAILED
------------------------------------------------------------
Failures:
<NO_OTHER_FAILURES>
------------------------------------------------------------
Root Cause (first observed failure):
[0]:
time : 2025-09-04_10:05:57
host : qgpu0203
rank : 0 (local_rank: 0)
exitcode : 1 (pid: 1400331)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
============================================================
E0904 10:05:57.885000 1400308 /gpfs/home/nws8519/.conda/envs/olmo/lib/python3.11/site-packages/torch/distributed/elastic/multiprocessing/api.py:874] failed (exitcode: 1) local_rank: 0 (pid: 1400333) of binary: /home/nws8519/.conda/envs/olmo/bin/python3.11
Traceback (most recent call last):
File "/home/nws8519/.conda/envs/olmo/bin/torchrun", line 8, in <module>
sys.exit(main())
^^^^^^
File "/home/nws8519/.conda/envs/olmo/lib/python3.11/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 355, in wrapper
return f(*args, **kwargs)
^^^^^^^^^^^^^^^^^^
File "/home/nws8519/.conda/envs/olmo/lib/python3.11/site-packages/torch/distributed/run.py", line 892, in main
run(args)
File "/home/nws8519/.conda/envs/olmo/lib/python3.11/site-packages/torch/distributed/run.py", line 883, in run
elastic_launch(
File "/home/nws8519/.conda/envs/olmo/lib/python3.11/site-packages/torch/distributed/launcher/api.py", line 139, in __call__
return launch_agent(self._config, self._entrypoint, list(args))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/nws8519/.conda/envs/olmo/lib/python3.11/site-packages/torch/distributed/launcher/api.py", line 270, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
============================================================
/home/nws8519/git/mw-lifecycle-analysis/p2/quest/python_scripts/olmo_parallel_cat.py FAILED
------------------------------------------------------------
Failures:
<NO_OTHER_FAILURES>
------------------------------------------------------------
Root Cause (first observed failure):
[0]:
time : 2025-09-04_10:05:57
host : qgpu0203
rank : 2 (local_rank: 0)
exitcode : 1 (pid: 1400333)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
============================================================
Traceback (most recent call last):
File "/home/nws8519/.conda/envs/olmo/bin/torchrun", line 8, in <module>
sys.exit(main())
^^^^^^
File "/home/nws8519/.conda/envs/olmo/lib/python3.11/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 355, in wrapper
return f(*args, **kwargs)
^^^^^^^^^^^^^^^^^^
File "/home/nws8519/.conda/envs/olmo/lib/python3.11/site-packages/torch/distributed/run.py", line 892, in main
run(args)
File "/home/nws8519/.conda/envs/olmo/lib/python3.11/site-packages/torch/distributed/run.py", line 883, in run
elastic_launch(
File "/home/nws8519/.conda/envs/olmo/lib/python3.11/site-packages/torch/distributed/launcher/api.py", line 139, in __call__
return launch_agent(self._config, self._entrypoint, list(args))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/nws8519/.conda/envs/olmo/lib/python3.11/site-packages/torch/distributed/launcher/api.py", line 265, in launch_agent
if result.is_failed():
^^^^^^^^^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'is_failed'
Traceback (most recent call last):
File "/home/nws8519/.conda/envs/olmo/bin/torchrun", line 8, in <module>
sys.exit(main())
^^^^^^
File "/home/nws8519/.conda/envs/olmo/lib/python3.11/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 355, in wrapper
return f(*args, **kwargs)
^^^^^^^^^^^^^^^^^^
File "/home/nws8519/.conda/envs/olmo/lib/python3.11/site-packages/torch/distributed/run.py", line 892, in main
run(args)
File "/home/nws8519/.conda/envs/olmo/lib/python3.11/site-packages/torch/distributed/run.py", line 883, in run
elastic_launch(
File "/home/nws8519/.conda/envs/olmo/lib/python3.11/site-packages/torch/distributed/launcher/api.py", line 139, in __call__
return launch_agent(self._config, self._entrypoint, list(args))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/nws8519/.conda/envs/olmo/lib/python3.11/site-packages/torch/distributed/launcher/api.py", line 265, in launch_agent
if result.is_failed():
^^^^^^^^^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'is_failed'
srun: error: qgpu0203: tasks 2-3: Exited with exit code 1
srun: error: qgpu0202: tasks 0-1: Exited with exit code 1
unsupervised olmo categorization pau at Thu Sep 4 10:05:58 CDT 2025