342 lines
24 KiB
Plaintext
342 lines
24 KiB
Plaintext
setting up the environment by loading in conda environment at Thu Sep 4 10:04:55 CDT 2025
|
||
running the bertopic job at Thu Sep 4 10:04:55 CDT 2025
|
||
----------------------------------------
|
||
srun job start: Thu Sep 4 10:04:55 CDT 2025
|
||
Job ID: 3272179
|
||
Username: nws8519
|
||
Queue: gengpu
|
||
Account: p32852
|
||
----------------------------------------
|
||
The following variables are not
|
||
guaranteed to be the same in the
|
||
prologue and the job run script
|
||
----------------------------------------
|
||
PATH (in prologue) : /home/nws8519/.conda/envs/olmo/bin:/software/miniconda3/4.12.0/condabin:/home/nws8519/.local/bin:/home/nws8519/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/usr/lpp/mmfs/bin:/hpc/usertools
|
||
WORKDIR is: /home/nws8519
|
||
----------------------------------------
|
||
W0904 10:05:10.900000 1845275 /gpfs/home/nws8519/.conda/envs/olmo/lib/python3.11/site-packages/torch/distributed/run.py:766]
|
||
W0904 10:05:10.900000 1845275 /gpfs/home/nws8519/.conda/envs/olmo/lib/python3.11/site-packages/torch/distributed/run.py:766] *****************************************
|
||
W0904 10:05:10.900000 1845275 /gpfs/home/nws8519/.conda/envs/olmo/lib/python3.11/site-packages/torch/distributed/run.py:766] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
|
||
W0904 10:05:10.900000 1845275 /gpfs/home/nws8519/.conda/envs/olmo/lib/python3.11/site-packages/torch/distributed/run.py:766] *****************************************
|
||
W0904 10:05:10.900000 1845276 /gpfs/home/nws8519/.conda/envs/olmo/lib/python3.11/site-packages/torch/distributed/run.py:766]
|
||
W0904 10:05:10.900000 1845276 /gpfs/home/nws8519/.conda/envs/olmo/lib/python3.11/site-packages/torch/distributed/run.py:766] *****************************************
|
||
W0904 10:05:10.900000 1845276 /gpfs/home/nws8519/.conda/envs/olmo/lib/python3.11/site-packages/torch/distributed/run.py:766] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
|
||
W0904 10:05:10.900000 1845276 /gpfs/home/nws8519/.conda/envs/olmo/lib/python3.11/site-packages/torch/distributed/run.py:766] *****************************************
|
||
W0904 10:05:10.906000 1400307 /gpfs/home/nws8519/.conda/envs/olmo/lib/python3.11/site-packages/torch/distributed/run.py:766]
|
||
W0904 10:05:10.906000 1400307 /gpfs/home/nws8519/.conda/envs/olmo/lib/python3.11/site-packages/torch/distributed/run.py:766] *****************************************
|
||
W0904 10:05:10.906000 1400307 /gpfs/home/nws8519/.conda/envs/olmo/lib/python3.11/site-packages/torch/distributed/run.py:766] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
|
||
W0904 10:05:10.906000 1400307 /gpfs/home/nws8519/.conda/envs/olmo/lib/python3.11/site-packages/torch/distributed/run.py:766] *****************************************
|
||
W0904 10:05:10.907000 1400308 /gpfs/home/nws8519/.conda/envs/olmo/lib/python3.11/site-packages/torch/distributed/run.py:766]
|
||
W0904 10:05:10.907000 1400308 /gpfs/home/nws8519/.conda/envs/olmo/lib/python3.11/site-packages/torch/distributed/run.py:766] *****************************************
|
||
W0904 10:05:10.907000 1400308 /gpfs/home/nws8519/.conda/envs/olmo/lib/python3.11/site-packages/torch/distributed/run.py:766] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
|
||
W0904 10:05:10.907000 1400308 /gpfs/home/nws8519/.conda/envs/olmo/lib/python3.11/site-packages/torch/distributed/run.py:766] *****************************************
|
||
/home/nws8519/git/mw-lifecycle-analysis/p2/quest/python_scripts/olmo_parallel_cat.py:117: DtypeWarning: Columns (21) have mixed types. Specify dtype option on import or set low_memory=False.
|
||
df = pd.read_csv("/home/nws8519/git/mw-lifecycle-analysis/p2/quest/072525_pp_biberplus_labels.csv")
|
||
/home/nws8519/git/mw-lifecycle-analysis/p2/quest/python_scripts/olmo_parallel_cat.py:117: DtypeWarning: Columns (21) have mixed types. Specify dtype option on import or set low_memory=False.
|
||
df = pd.read_csv("/home/nws8519/git/mw-lifecycle-analysis/p2/quest/072525_pp_biberplus_labels.csv")
|
||
[rank0]: Traceback (most recent call last):
|
||
[rank0]: File "/home/nws8519/git/mw-lifecycle-analysis/p2/quest/python_scripts/olmo_parallel_cat.py", line 178, in <module>
|
||
[rank0]: main()
|
||
[rank0]: File "/home/nws8519/git/mw-lifecycle-analysis/p2/quest/python_scripts/olmo_parallel_cat.py", line 122, in main
|
||
[rank0]: dataset = SentenceDataset(comment_texts, comment_types, priming, typology, instructions)
|
||
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||
[rank0]: File "/home/nws8519/git/mw-lifecycle-analysis/p2/quest/python_scripts/olmo_parallel_cat.py", line 76, in __init__
|
||
[rank0]: sentences = split_to_sentences(cleaned_comment)
|
||
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||
[rank0]: File "/home/nws8519/git/mw-lifecycle-analysis/p2/quest/python_scripts/olmo_parallel_cat.py", line 106, in split_to_sentences
|
||
[rank0]: return nltk.sent_tokenize(text)
|
||
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^
|
||
[rank0]: File "/home/nws8519/.conda/envs/olmo/lib/python3.11/site-packages/nltk/tokenize/__init__.py", line 119, in sent_tokenize
|
||
[rank0]: tokenizer = _get_punkt_tokenizer(language)
|
||
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||
[rank0]: File "/home/nws8519/.conda/envs/olmo/lib/python3.11/site-packages/nltk/tokenize/__init__.py", line 105, in _get_punkt_tokenizer
|
||
[rank0]: return PunktTokenizer(language)
|
||
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^
|
||
[rank0]: File "/home/nws8519/.conda/envs/olmo/lib/python3.11/site-packages/nltk/tokenize/punkt.py", line 1744, in __init__
|
||
[rank0]: self.load_lang(lang)
|
||
[rank0]: File "/home/nws8519/.conda/envs/olmo/lib/python3.11/site-packages/nltk/tokenize/punkt.py", line 1749, in load_lang
|
||
[rank0]: lang_dir = find(f"tokenizers/punkt_tab/{lang}/")
|
||
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||
[rank0]: File "/home/nws8519/.conda/envs/olmo/lib/python3.11/site-packages/nltk/data.py", line 579, in find
|
||
[rank0]: raise LookupError(resource_not_found)
|
||
[rank0]: LookupError:
|
||
[rank0]: **********************************************************************
|
||
[rank0]: Resource [93mpunkt_tab[0m not found.
|
||
[rank0]: Please use the NLTK Downloader to obtain the resource:
|
||
|
||
[rank0]: [31m>>> import nltk
|
||
[rank0]: >>> nltk.download('punkt_tab')
|
||
[rank0]: [0m
|
||
[rank0]: For more information see: https://www.nltk.org/data.html
|
||
|
||
[rank0]: Attempted to load [93mtokenizers/punkt_tab/english/[0m
|
||
|
||
[rank0]: Searched in:
|
||
[rank0]: - '/home/nws8519/nltk_data'
|
||
[rank0]: - '/home/nws8519/.conda/envs/olmo/nltk_data'
|
||
[rank0]: - '/home/nws8519/.conda/envs/olmo/share/nltk_data'
|
||
[rank0]: - '/home/nws8519/.conda/envs/olmo/lib/nltk_data'
|
||
[rank0]: - '/usr/share/nltk_data'
|
||
[rank0]: - '/usr/local/share/nltk_data'
|
||
[rank0]: - '/usr/lib/nltk_data'
|
||
[rank0]: - '/usr/local/lib/nltk_data'
|
||
[rank0]: **********************************************************************
|
||
|
||
[rank2]: Traceback (most recent call last):
|
||
[rank2]: File "/home/nws8519/git/mw-lifecycle-analysis/p2/quest/python_scripts/olmo_parallel_cat.py", line 178, in <module>
|
||
[rank2]: main()
|
||
[rank2]: File "/home/nws8519/git/mw-lifecycle-analysis/p2/quest/python_scripts/olmo_parallel_cat.py", line 122, in main
|
||
[rank2]: dataset = SentenceDataset(comment_texts, comment_types, priming, typology, instructions)
|
||
[rank2]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||
[rank2]: File "/home/nws8519/git/mw-lifecycle-analysis/p2/quest/python_scripts/olmo_parallel_cat.py", line 76, in __init__
|
||
[rank2]: sentences = split_to_sentences(cleaned_comment)
|
||
[rank2]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||
[rank2]: File "/home/nws8519/git/mw-lifecycle-analysis/p2/quest/python_scripts/olmo_parallel_cat.py", line 106, in split_to_sentences
|
||
[rank2]: return nltk.sent_tokenize(text)
|
||
[rank2]: ^^^^^^^^^^^^^^^^^^^^^^^^
|
||
[rank2]: File "/home/nws8519/.conda/envs/olmo/lib/python3.11/site-packages/nltk/tokenize/__init__.py", line 119, in sent_tokenize
|
||
[rank2]: tokenizer = _get_punkt_tokenizer(language)
|
||
[rank2]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||
[rank2]: File "/home/nws8519/.conda/envs/olmo/lib/python3.11/site-packages/nltk/tokenize/__init__.py", line 105, in _get_punkt_tokenizer
|
||
[rank2]: return PunktTokenizer(language)
|
||
[rank2]: ^^^^^^^^^^^^^^^^^^^^^^^^
|
||
[rank2]: File "/home/nws8519/.conda/envs/olmo/lib/python3.11/site-packages/nltk/tokenize/punkt.py", line 1744, in __init__
|
||
[rank2]: self.load_lang(lang)
|
||
[rank2]: File "/home/nws8519/.conda/envs/olmo/lib/python3.11/site-packages/nltk/tokenize/punkt.py", line 1749, in load_lang
|
||
[rank2]: lang_dir = find(f"tokenizers/punkt_tab/{lang}/")
|
||
[rank2]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||
[rank2]: File "/home/nws8519/.conda/envs/olmo/lib/python3.11/site-packages/nltk/data.py", line 579, in find
|
||
[rank2]: raise LookupError(resource_not_found)
|
||
[rank2]: LookupError:
|
||
[rank2]: **********************************************************************
|
||
[rank2]: Resource [93mpunkt_tab[0m not found.
|
||
[rank2]: Please use the NLTK Downloader to obtain the resource:
|
||
|
||
[rank2]: [31m>>> import nltk
|
||
[rank2]: >>> nltk.download('punkt_tab')
|
||
[rank2]: [0m
|
||
[rank2]: For more information see: https://www.nltk.org/data.html
|
||
|
||
[rank2]: Attempted to load [93mtokenizers/punkt_tab/english/[0m
|
||
|
||
[rank2]: Searched in:
|
||
[rank2]: - '/home/nws8519/nltk_data'
|
||
[rank2]: - '/home/nws8519/.conda/envs/olmo/nltk_data'
|
||
[rank2]: - '/home/nws8519/.conda/envs/olmo/share/nltk_data'
|
||
[rank2]: - '/home/nws8519/.conda/envs/olmo/lib/nltk_data'
|
||
[rank2]: - '/usr/share/nltk_data'
|
||
[rank2]: - '/usr/local/share/nltk_data'
|
||
[rank2]: - '/usr/lib/nltk_data'
|
||
[rank2]: - '/usr/local/lib/nltk_data'
|
||
[rank2]: **********************************************************************
|
||
|
||
/home/nws8519/git/mw-lifecycle-analysis/p2/quest/python_scripts/olmo_parallel_cat.py:117: DtypeWarning: Columns (21) have mixed types. Specify dtype option on import or set low_memory=False.
|
||
df = pd.read_csv("/home/nws8519/git/mw-lifecycle-analysis/p2/quest/072525_pp_biberplus_labels.csv")
|
||
/home/nws8519/git/mw-lifecycle-analysis/p2/quest/python_scripts/olmo_parallel_cat.py:117: DtypeWarning: Columns (21) have mixed types. Specify dtype option on import or set low_memory=False.
|
||
df = pd.read_csv("/home/nws8519/git/mw-lifecycle-analysis/p2/quest/072525_pp_biberplus_labels.csv")
|
||
[rank1]: Traceback (most recent call last):
|
||
[rank1]: File "/home/nws8519/git/mw-lifecycle-analysis/p2/quest/python_scripts/olmo_parallel_cat.py", line 178, in <module>
|
||
[rank1]: main()
|
||
[rank1]: File "/home/nws8519/git/mw-lifecycle-analysis/p2/quest/python_scripts/olmo_parallel_cat.py", line 122, in main
|
||
[rank1]: dataset = SentenceDataset(comment_texts, comment_types, priming, typology, instructions)
|
||
[rank1]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||
[rank1]: File "/home/nws8519/git/mw-lifecycle-analysis/p2/quest/python_scripts/olmo_parallel_cat.py", line 76, in __init__
|
||
[rank1]: sentences = split_to_sentences(cleaned_comment)
|
||
[rank1]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||
[rank1]: File "/home/nws8519/git/mw-lifecycle-analysis/p2/quest/python_scripts/olmo_parallel_cat.py", line 106, in split_to_sentences
|
||
[rank1]: return nltk.sent_tokenize(text)
|
||
[rank1]: ^^^^^^^^^^^^^^^^^^^^^^^^
|
||
[rank1]: File "/home/nws8519/.conda/envs/olmo/lib/python3.11/site-packages/nltk/tokenize/__init__.py", line 119, in sent_tokenize
|
||
[rank1]: tokenizer = _get_punkt_tokenizer(language)
|
||
[rank1]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||
[rank1]: File "/home/nws8519/.conda/envs/olmo/lib/python3.11/site-packages/nltk/tokenize/__init__.py", line 105, in _get_punkt_tokenizer
|
||
[rank1]: return PunktTokenizer(language)
|
||
[rank1]: ^^^^^^^^^^^^^^^^^^^^^^^^
|
||
[rank1]: File "/home/nws8519/.conda/envs/olmo/lib/python3.11/site-packages/nltk/tokenize/punkt.py", line 1744, in __init__
|
||
[rank1]: self.load_lang(lang)
|
||
[rank1]: File "/home/nws8519/.conda/envs/olmo/lib/python3.11/site-packages/nltk/tokenize/punkt.py", line 1749, in load_lang
|
||
[rank1]: lang_dir = find(f"tokenizers/punkt_tab/{lang}/")
|
||
[rank1]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||
[rank1]: File "/home/nws8519/.conda/envs/olmo/lib/python3.11/site-packages/nltk/data.py", line 579, in find
|
||
[rank1]: raise LookupError(resource_not_found)
|
||
[rank1]: LookupError:
|
||
[rank1]: **********************************************************************
|
||
[rank1]: Resource [93mpunkt_tab[0m not found.
|
||
[rank1]: Please use the NLTK Downloader to obtain the resource:
|
||
|
||
[rank1]: [31m>>> import nltk
|
||
[rank1]: >>> nltk.download('punkt_tab')
|
||
[rank1]: [0m
|
||
[rank1]: For more information see: https://www.nltk.org/data.html
|
||
|
||
[rank1]: Attempted to load [93mtokenizers/punkt_tab/english/[0m
|
||
|
||
[rank1]: Searched in:
|
||
[rank1]: - '/home/nws8519/nltk_data'
|
||
[rank1]: - '/home/nws8519/.conda/envs/olmo/nltk_data'
|
||
[rank1]: - '/home/nws8519/.conda/envs/olmo/share/nltk_data'
|
||
[rank1]: - '/home/nws8519/.conda/envs/olmo/lib/nltk_data'
|
||
[rank1]: - '/usr/share/nltk_data'
|
||
[rank1]: - '/usr/local/share/nltk_data'
|
||
[rank1]: - '/usr/lib/nltk_data'
|
||
[rank1]: - '/usr/local/lib/nltk_data'
|
||
[rank1]: **********************************************************************
|
||
|
||
[rank3]: Traceback (most recent call last):
|
||
[rank3]: File "/home/nws8519/git/mw-lifecycle-analysis/p2/quest/python_scripts/olmo_parallel_cat.py", line 178, in <module>
|
||
[rank3]: main()
|
||
[rank3]: File "/home/nws8519/git/mw-lifecycle-analysis/p2/quest/python_scripts/olmo_parallel_cat.py", line 122, in main
|
||
[rank3]: dataset = SentenceDataset(comment_texts, comment_types, priming, typology, instructions)
|
||
[rank3]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||
[rank3]: File "/home/nws8519/git/mw-lifecycle-analysis/p2/quest/python_scripts/olmo_parallel_cat.py", line 76, in __init__
|
||
[rank3]: sentences = split_to_sentences(cleaned_comment)
|
||
[rank3]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||
[rank3]: File "/home/nws8519/git/mw-lifecycle-analysis/p2/quest/python_scripts/olmo_parallel_cat.py", line 106, in split_to_sentences
|
||
[rank3]: return nltk.sent_tokenize(text)
|
||
[rank3]: ^^^^^^^^^^^^^^^^^^^^^^^^
|
||
[rank3]: File "/home/nws8519/.conda/envs/olmo/lib/python3.11/site-packages/nltk/tokenize/__init__.py", line 119, in sent_tokenize
|
||
[rank3]: tokenizer = _get_punkt_tokenizer(language)
|
||
[rank3]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||
[rank3]: File "/home/nws8519/.conda/envs/olmo/lib/python3.11/site-packages/nltk/tokenize/__init__.py", line 105, in _get_punkt_tokenizer
|
||
[rank3]: return PunktTokenizer(language)
|
||
[rank3]: ^^^^^^^^^^^^^^^^^^^^^^^^
|
||
[rank3]: File "/home/nws8519/.conda/envs/olmo/lib/python3.11/site-packages/nltk/tokenize/punkt.py", line 1744, in __init__
|
||
[rank3]: self.load_lang(lang)
|
||
[rank3]: File "/home/nws8519/.conda/envs/olmo/lib/python3.11/site-packages/nltk/tokenize/punkt.py", line 1749, in load_lang
|
||
[rank3]: lang_dir = find(f"tokenizers/punkt_tab/{lang}/")
|
||
[rank3]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||
[rank3]: File "/home/nws8519/.conda/envs/olmo/lib/python3.11/site-packages/nltk/data.py", line 579, in find
|
||
[rank3]: raise LookupError(resource_not_found)
|
||
[rank3]: LookupError:
|
||
[rank3]: **********************************************************************
|
||
[rank3]: Resource [93mpunkt_tab[0m not found.
|
||
[rank3]: Please use the NLTK Downloader to obtain the resource:
|
||
|
||
[rank3]: [31m>>> import nltk
|
||
[rank3]: >>> nltk.download('punkt_tab')
|
||
[rank3]: [0m
|
||
[rank3]: For more information see: https://www.nltk.org/data.html
|
||
|
||
[rank3]: Attempted to load [93mtokenizers/punkt_tab/english/[0m
|
||
|
||
[rank3]: Searched in:
|
||
[rank3]: - '/home/nws8519/nltk_data'
|
||
[rank3]: - '/home/nws8519/.conda/envs/olmo/nltk_data'
|
||
[rank3]: - '/home/nws8519/.conda/envs/olmo/share/nltk_data'
|
||
[rank3]: - '/home/nws8519/.conda/envs/olmo/lib/nltk_data'
|
||
[rank3]: - '/usr/share/nltk_data'
|
||
[rank3]: - '/usr/local/share/nltk_data'
|
||
[rank3]: - '/usr/lib/nltk_data'
|
||
[rank3]: - '/usr/local/lib/nltk_data'
|
||
[rank3]: **********************************************************************
|
||
|
||
[rank2]:[W904 10:05:56.100290280 ProcessGroupNCCL.cpp:1476] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
|
||
[rank0]:[W904 10:05:56.107999460 ProcessGroupNCCL.cpp:1476] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
|
||
W0904 10:05:57.705000 1400307 /gpfs/home/nws8519/.conda/envs/olmo/lib/python3.11/site-packages/torch/distributed/elastic/multiprocessing/api.py:900] Sending process 1400332 closing signal SIGTERM
|
||
W0904 10:05:57.720000 1400308 /gpfs/home/nws8519/.conda/envs/olmo/lib/python3.11/site-packages/torch/distributed/elastic/multiprocessing/api.py:900] Sending process 1400334 closing signal SIGTERM
|
||
E0904 10:05:57.770000 1400307 /gpfs/home/nws8519/.conda/envs/olmo/lib/python3.11/site-packages/torch/distributed/elastic/multiprocessing/api.py:874] failed (exitcode: 1) local_rank: 0 (pid: 1400331) of binary: /home/nws8519/.conda/envs/olmo/bin/python3.11
|
||
Traceback (most recent call last):
|
||
File "/home/nws8519/.conda/envs/olmo/bin/torchrun", line 8, in <module>
|
||
sys.exit(main())
|
||
^^^^^^
|
||
File "/home/nws8519/.conda/envs/olmo/lib/python3.11/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 355, in wrapper
|
||
return f(*args, **kwargs)
|
||
^^^^^^^^^^^^^^^^^^
|
||
File "/home/nws8519/.conda/envs/olmo/lib/python3.11/site-packages/torch/distributed/run.py", line 892, in main
|
||
run(args)
|
||
File "/home/nws8519/.conda/envs/olmo/lib/python3.11/site-packages/torch/distributed/run.py", line 883, in run
|
||
elastic_launch(
|
||
File "/home/nws8519/.conda/envs/olmo/lib/python3.11/site-packages/torch/distributed/launcher/api.py", line 139, in __call__
|
||
return launch_agent(self._config, self._entrypoint, list(args))
|
||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||
File "/home/nws8519/.conda/envs/olmo/lib/python3.11/site-packages/torch/distributed/launcher/api.py", line 270, in launch_agent
|
||
raise ChildFailedError(
|
||
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
|
||
============================================================
|
||
/home/nws8519/git/mw-lifecycle-analysis/p2/quest/python_scripts/olmo_parallel_cat.py FAILED
|
||
------------------------------------------------------------
|
||
Failures:
|
||
<NO_OTHER_FAILURES>
|
||
------------------------------------------------------------
|
||
Root Cause (first observed failure):
|
||
[0]:
|
||
time : 2025-09-04_10:05:57
|
||
host : qgpu0203
|
||
rank : 0 (local_rank: 0)
|
||
exitcode : 1 (pid: 1400331)
|
||
error_file: <N/A>
|
||
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
|
||
============================================================
|
||
E0904 10:05:57.885000 1400308 /gpfs/home/nws8519/.conda/envs/olmo/lib/python3.11/site-packages/torch/distributed/elastic/multiprocessing/api.py:874] failed (exitcode: 1) local_rank: 0 (pid: 1400333) of binary: /home/nws8519/.conda/envs/olmo/bin/python3.11
|
||
Traceback (most recent call last):
|
||
File "/home/nws8519/.conda/envs/olmo/bin/torchrun", line 8, in <module>
|
||
sys.exit(main())
|
||
^^^^^^
|
||
File "/home/nws8519/.conda/envs/olmo/lib/python3.11/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 355, in wrapper
|
||
return f(*args, **kwargs)
|
||
^^^^^^^^^^^^^^^^^^
|
||
File "/home/nws8519/.conda/envs/olmo/lib/python3.11/site-packages/torch/distributed/run.py", line 892, in main
|
||
run(args)
|
||
File "/home/nws8519/.conda/envs/olmo/lib/python3.11/site-packages/torch/distributed/run.py", line 883, in run
|
||
elastic_launch(
|
||
File "/home/nws8519/.conda/envs/olmo/lib/python3.11/site-packages/torch/distributed/launcher/api.py", line 139, in __call__
|
||
return launch_agent(self._config, self._entrypoint, list(args))
|
||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||
File "/home/nws8519/.conda/envs/olmo/lib/python3.11/site-packages/torch/distributed/launcher/api.py", line 270, in launch_agent
|
||
raise ChildFailedError(
|
||
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
|
||
============================================================
|
||
/home/nws8519/git/mw-lifecycle-analysis/p2/quest/python_scripts/olmo_parallel_cat.py FAILED
|
||
------------------------------------------------------------
|
||
Failures:
|
||
<NO_OTHER_FAILURES>
|
||
------------------------------------------------------------
|
||
Root Cause (first observed failure):
|
||
[0]:
|
||
time : 2025-09-04_10:05:57
|
||
host : qgpu0203
|
||
rank : 2 (local_rank: 0)
|
||
exitcode : 1 (pid: 1400333)
|
||
error_file: <N/A>
|
||
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
|
||
============================================================
|
||
Traceback (most recent call last):
|
||
File "/home/nws8519/.conda/envs/olmo/bin/torchrun", line 8, in <module>
|
||
sys.exit(main())
|
||
^^^^^^
|
||
File "/home/nws8519/.conda/envs/olmo/lib/python3.11/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 355, in wrapper
|
||
return f(*args, **kwargs)
|
||
^^^^^^^^^^^^^^^^^^
|
||
File "/home/nws8519/.conda/envs/olmo/lib/python3.11/site-packages/torch/distributed/run.py", line 892, in main
|
||
run(args)
|
||
File "/home/nws8519/.conda/envs/olmo/lib/python3.11/site-packages/torch/distributed/run.py", line 883, in run
|
||
elastic_launch(
|
||
File "/home/nws8519/.conda/envs/olmo/lib/python3.11/site-packages/torch/distributed/launcher/api.py", line 139, in __call__
|
||
return launch_agent(self._config, self._entrypoint, list(args))
|
||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||
File "/home/nws8519/.conda/envs/olmo/lib/python3.11/site-packages/torch/distributed/launcher/api.py", line 265, in launch_agent
|
||
if result.is_failed():
|
||
^^^^^^^^^^^^^^^^
|
||
AttributeError: 'NoneType' object has no attribute 'is_failed'
|
||
Traceback (most recent call last):
|
||
File "/home/nws8519/.conda/envs/olmo/bin/torchrun", line 8, in <module>
|
||
sys.exit(main())
|
||
^^^^^^
|
||
File "/home/nws8519/.conda/envs/olmo/lib/python3.11/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 355, in wrapper
|
||
return f(*args, **kwargs)
|
||
^^^^^^^^^^^^^^^^^^
|
||
File "/home/nws8519/.conda/envs/olmo/lib/python3.11/site-packages/torch/distributed/run.py", line 892, in main
|
||
run(args)
|
||
File "/home/nws8519/.conda/envs/olmo/lib/python3.11/site-packages/torch/distributed/run.py", line 883, in run
|
||
elastic_launch(
|
||
File "/home/nws8519/.conda/envs/olmo/lib/python3.11/site-packages/torch/distributed/launcher/api.py", line 139, in __call__
|
||
return launch_agent(self._config, self._entrypoint, list(args))
|
||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||
File "/home/nws8519/.conda/envs/olmo/lib/python3.11/site-packages/torch/distributed/launcher/api.py", line 265, in launch_agent
|
||
if result.is_failed():
|
||
^^^^^^^^^^^^^^^^
|
||
AttributeError: 'NoneType' object has no attribute 'is_failed'
|
||
srun: error: qgpu0203: tasks 2-3: Exited with exit code 1
|
||
srun: error: qgpu0202: tasks 0-1: Exited with exit code 1
|
||
unsupervised olmo categorization pau at Thu Sep 4 10:05:58 CDT 2025
|