2025-06-23 22:02:07,072 - __main__ - INFO - Got --pdfs argument, going to add to the work queue 2025-06-23 22:02:07,096 - __main__ - INFO - Loading file at /home/nws8519/git/adaptation-slr/studies_pdfs/001-adams.pdf as PDF document 2025-06-23 22:02:07,114 - __main__ - INFO - Loading file at /home/nws8519/git/adaptation-slr/studies_pdfs/002-barcomb.pdf as PDF document 2025-06-23 22:02:07,140 - __main__ - INFO - Loading file at /home/nws8519/git/adaptation-slr/studies_pdfs/003-bogart.pdf as PDF document 2025-06-23 22:02:07,152 - __main__ - INFO - Loading file at /home/nws8519/git/adaptation-slr/studies_pdfs/004-butler.pdf as PDF document 2025-06-23 22:02:07,167 - __main__ - INFO - Loading file at /home/nws8519/git/adaptation-slr/studies_pdfs/005-crowston-shamshurin.pdf as PDF document 2025-06-23 22:02:07,184 - __main__ - INFO - Loading file at /home/nws8519/git/adaptation-slr/studies_pdfs/006-franke.pdf as PDF document 2025-06-23 22:02:07,205 - __main__ - INFO - Loading file at /home/nws8519/git/adaptation-slr/studies_pdfs/007-gamalielsson.pdf as PDF document 2025-06-23 22:02:07,221 - __main__ - INFO - Loading file at /home/nws8519/git/adaptation-slr/studies_pdfs/008-geiger.pdf as PDF document 2025-06-23 22:02:07,237 - __main__ - INFO - Loading file at /home/nws8519/git/adaptation-slr/studies_pdfs/009-hsieh.pdf as PDF document 2025-06-23 22:02:07,254 - __main__ - INFO - Loading file at /home/nws8519/git/adaptation-slr/studies_pdfs/010-hu.pdf as PDF document 2025-06-23 22:02:07,300 - __main__ - INFO - Loading file at /home/nws8519/git/adaptation-slr/studies_pdfs/011-jahanshahi.pdf as PDF document 2025-06-23 22:02:07,322 - __main__ - INFO - Loading file at /home/nws8519/git/adaptation-slr/studies_pdfs/012-jensen-scacchi.pdf as PDF document 2025-06-23 22:02:07,349 - __main__ - INFO - Loading file at /home/nws8519/git/adaptation-slr/studies_pdfs/013-klug.pdf as PDF document 2025-06-23 22:02:07,388 - __main__ - INFO - Loading file at /home/nws8519/git/adaptation-slr/studies_pdfs/014-norskov.pdf as PDF document 2025-06-23 22:02:07,409 - __main__ - INFO - Loading file at /home/nws8519/git/adaptation-slr/studies_pdfs/015-santos.pdf as PDF document 2025-06-23 22:02:07,426 - __main__ - INFO - Loading file at /home/nws8519/git/adaptation-slr/studies_pdfs/016-sojer-henkel.pdf as PDF document 2025-06-23 22:02:07,441 - __main__ - INFO - Loading file at /home/nws8519/git/adaptation-slr/studies_pdfs/017-wessel.pdf as PDF document 2025-06-23 22:02:07,457 - __main__ - INFO - Loading file at /home/nws8519/git/adaptation-slr/studies_pdfs/018-yin.pdf as PDF document 2025-06-23 22:02:07,470 - __main__ - INFO - Loading file at /home/nws8519/git/adaptation-slr/studies_pdfs/019_ding.pdf as PDF document 2025-06-23 22:02:07,489 - __main__ - INFO - Loading file at /home/nws8519/git/adaptation-slr/studies_pdfs/020_hilton.pdf as PDF document 2025-06-23 22:02:07,520 - __main__ - INFO - Loading file at /home/nws8519/git/adaptation-slr/studies_pdfs/021_he.pdf as PDF document 2025-06-23 22:02:07,543 - __main__ - INFO - Loading file at /home/nws8519/git/adaptation-slr/studies_pdfs/022_lotter.pdf as PDF document 2025-06-23 22:02:07,578 - __main__ - INFO - Loading file at /home/nws8519/git/adaptation-slr/studies_pdfs/023_abdalkareem.pdf as PDF document 2025-06-23 22:02:07,602 - __main__ - INFO - Loading file at /home/nws8519/git/adaptation-slr/studies_pdfs/024_zhou.pdf as PDF document 2025-06-23 22:02:07,625 - __main__ - INFO - Loading file at /home/nws8519/git/adaptation-slr/studies_pdfs/025_venturini.pdf as PDF document 2025-06-23 22:02:07,644 - __main__ - INFO - Loading file at /home/nws8519/git/adaptation-slr/studies_pdfs/026_vendome.pdf as PDF document 2025-06-23 22:02:07,667 - __main__ - INFO - Loading file at /home/nws8519/git/adaptation-slr/studies_pdfs/027_vendome.pdf as PDF document 2025-06-23 22:02:07,680 - __main__ - INFO - Loading file at /home/nws8519/git/adaptation-slr/studies_pdfs/028_meloca.pdf as PDF document 2025-06-23 22:02:07,691 - __main__ - INFO - Loading file at /home/nws8519/git/adaptation-slr/studies_pdfs/029_heinemann.pdf as PDF document 2025-06-23 22:02:07,715 - __main__ - INFO - Loading file at /home/nws8519/git/adaptation-slr/studies_pdfs/030_abdalkareem.pdf as PDF document 2025-06-23 22:02:07,744 - __main__ - INFO - Loading file at /home/nws8519/git/adaptation-slr/studies_pdfs/031_businger.pdf as PDF document 2025-06-23 22:02:07,769 - __main__ - INFO - Loading file at /home/nws8519/git/adaptation-slr/studies_pdfs/032_capiluppi.pdf as PDF document 2025-06-23 22:02:07,787 - __main__ - INFO - Loading file at /home/nws8519/git/adaptation-slr/studies_pdfs/033_businger.pdf as PDF document 2025-06-23 22:02:07,804 - __main__ - INFO - Loading file at /home/nws8519/git/adaptation-slr/studies_pdfs/034_zhang.pdf as PDF document 2025-06-23 22:02:07,804 - __main__ - INFO - Found 34 total pdf paths to add 2025-06-23 22:02:08,224 - __main__ - INFO - Calculated items_per_group: 21 based on average pages per PDF: 23.79 2025-06-23 22:02:08,375 - __main__ - INFO - Starting pipeline with PID 2493381 2025-06-23 22:02:08,375 - __main__ - INFO - Downloading model with hugging face 'allenai/olmOCR-7B-0225-preview' 2025-06-23 22:04:25,408 - __main__ - WARNING - Attempt 1: Please wait for vllm server to become ready... 2025-06-23 22:04:26,456 - __main__ - WARNING - Attempt 2: Please wait for vllm server to become ready... 2025-06-23 22:04:27,509 - __main__ - WARNING - Attempt 3: Please wait for vllm server to become ready... 2025-06-23 22:04:28,563 - __main__ - WARNING - Attempt 4: Please wait for vllm server to become ready... 2025-06-23 22:04:29,618 - __main__ - WARNING - Attempt 5: Please wait for vllm server to become ready... 2025-06-23 22:04:30,672 - __main__ - WARNING - Attempt 6: Please wait for vllm server to become ready... 2025-06-23 22:04:31,730 - __main__ - WARNING - Attempt 7: Please wait for vllm server to become ready... 2025-06-23 22:04:32,783 - __main__ - WARNING - Attempt 8: Please wait for vllm server to become ready... 2025-06-23 22:04:33,837 - __main__ - WARNING - Attempt 9: Please wait for vllm server to become ready... 2025-06-23 22:04:33,974 - vllm - INFO - INFO 06-23 22:04:33 [__init__.py:244] Automatically detected platform cuda. 2025-06-23 22:04:33,974 - __main__ - INFO - INFO 06-23 22:04:33 [__init__.py:244] Automatically detected platform cuda. 2025-06-23 22:04:34,890 - __main__ - WARNING - Attempt 10: Please wait for vllm server to become ready... 2025-06-23 22:04:35,944 - __main__ - WARNING - Attempt 11: Please wait for vllm server to become ready... 2025-06-23 22:04:37,000 - __main__ - WARNING - Attempt 12: Please wait for vllm server to become ready... 2025-06-23 22:04:38,056 - __main__ - WARNING - Attempt 13: Please wait for vllm server to become ready... 2025-06-23 22:04:39,110 - __main__ - WARNING - Attempt 14: Please wait for vllm server to become ready... 2025-06-23 22:04:40,164 - __main__ - WARNING - Attempt 15: Please wait for vllm server to become ready... 2025-06-23 22:04:41,218 - __main__ - WARNING - Attempt 16: Please wait for vllm server to become ready... 2025-06-23 22:04:41,456 - vllm - INFO - INFO 06-23 22:04:41 [api_server.py:1287] vLLM API server version 0.9.1 2025-06-23 22:04:41,456 - __main__ - INFO - INFO 06-23 22:04:41 [api_server.py:1287] vLLM API server version 0.9.1 2025-06-23 22:04:41,819 - vllm - INFO - INFO 06-23 22:04:41 [cli_args.py:309] non-default args: {'port': 30024, 'uvicorn_log_level': 'warning', 'model': 'allenai/olmOCR-7B-0225-preview', 'served_model_name': ['Qwen/Qwen2-VL-7B-Instruct'], 'gpu_memory_utilization': 0.8, 'disable_log_requests': True} 2025-06-23 22:04:41,819 - __main__ - INFO - INFO 06-23 22:04:41 [cli_args.py:309] non-default args: {'port': 30024, 'uvicorn_log_level': 'warning', 'model': 'allenai/olmOCR-7B-0225-preview', 'served_model_name': ['Qwen/Qwen2-VL-7B-Instruct'], 'gpu_memory_utilization': 0.8, 'disable_log_requests': True} 2025-06-23 22:04:42,278 - __main__ - WARNING - Attempt 17: Please wait for vllm server to become ready... 2025-06-23 22:04:43,332 - __main__ - WARNING - Attempt 18: Please wait for vllm server to become ready... 2025-06-23 22:04:44,385 - __main__ - WARNING - Attempt 19: Please wait for vllm server to become ready... 2025-06-23 22:04:45,439 - __main__ - WARNING - Attempt 20: Please wait for vllm server to become ready... 2025-06-23 22:04:46,492 - __main__ - WARNING - Attempt 21: Please wait for vllm server to become ready... 2025-06-23 22:04:47,546 - __main__ - WARNING - Attempt 22: Please wait for vllm server to become ready... 2025-06-23 22:04:48,608 - __main__ - WARNING - Attempt 23: Please wait for vllm server to become ready... 2025-06-23 22:04:49,661 - __main__ - WARNING - Attempt 24: Please wait for vllm server to become ready... 2025-06-23 22:04:50,713 - __main__ - WARNING - Attempt 25: Please wait for vllm server to become ready... 2025-06-23 22:04:51,767 - __main__ - WARNING - Attempt 26: Please wait for vllm server to become ready... 2025-06-23 22:04:52,819 - __main__ - WARNING - Attempt 27: Please wait for vllm server to become ready... 2025-06-23 22:04:53,878 - __main__ - WARNING - Attempt 28: Please wait for vllm server to become ready... 2025-06-23 22:04:54,931 - __main__ - WARNING - Attempt 29: Please wait for vllm server to become ready... 2025-06-23 22:04:55,182 - vllm - INFO - INFO 06-23 22:04:55 [config.py:823] This model supports multiple tasks: {'reward', 'classify', 'score', 'embed', 'generate'}. Defaulting to 'generate'. 2025-06-23 22:04:55,182 - __main__ - INFO - INFO 06-23 22:04:55 [config.py:823] This model supports multiple tasks: {'reward', 'classify', 'score', 'embed', 'generate'}. Defaulting to 'generate'. 2025-06-23 22:04:55,221 - vllm - INFO - INFO 06-23 22:04:55 [config.py:2195] Chunked prefill is enabled with max_num_batched_tokens=2048. 2025-06-23 22:04:55,221 - __main__ - INFO - INFO 06-23 22:04:55 [config.py:2195] Chunked prefill is enabled with max_num_batched_tokens=2048. 2025-06-23 22:04:55,983 - __main__ - WARNING - Attempt 30: Please wait for vllm server to become ready... 2025-06-23 22:04:57,034 - __main__ - WARNING - Attempt 31: Please wait for vllm server to become ready... 2025-06-23 22:04:57,160 - vllm - INFO - WARNING 06-23 22:04:57 [env_override.py:17] NCCL_CUMEM_ENABLE is set to 0, skipping override. This may increase memory overhead with cudagraph+allreduce: https://github.com/NVIDIA/nccl/issues/1234 2025-06-23 22:04:57,160 - __main__ - INFO - WARNING 06-23 22:04:57 [env_override.py:17] NCCL_CUMEM_ENABLE is set to 0, skipping override. This may increase memory overhead with cudagraph+allreduce: https://github.com/NVIDIA/nccl/issues/1234 2025-06-23 22:04:58,089 - __main__ - WARNING - Attempt 32: Please wait for vllm server to become ready... 2025-06-23 22:04:59,149 - __main__ - WARNING - Attempt 33: Please wait for vllm server to become ready... 2025-06-23 22:05:00,206 - __main__ - WARNING - Attempt 34: Please wait for vllm server to become ready... 2025-06-23 22:05:01,260 - __main__ - WARNING - Attempt 35: Please wait for vllm server to become ready... 2025-06-23 22:05:01,839 - vllm - INFO - INFO 06-23 22:05:01 [__init__.py:244] Automatically detected platform cuda. 2025-06-23 22:05:01,840 - __main__ - INFO - INFO 06-23 22:05:01 [__init__.py:244] Automatically detected platform cuda. 2025-06-23 22:05:02,314 - __main__ - WARNING - Attempt 36: Please wait for vllm server to become ready... 2025-06-23 22:05:03,368 - __main__ - WARNING - Attempt 37: Please wait for vllm server to become ready... 2025-06-23 22:05:04,422 - __main__ - WARNING - Attempt 38: Please wait for vllm server to become ready... 2025-06-23 22:05:05,483 - __main__ - WARNING - Attempt 39: Please wait for vllm server to become ready... 2025-06-23 22:05:06,054 - vllm - INFO - INFO 06-23 22:05:06 [core.py:455] Waiting for init message from front-end. 2025-06-23 22:05:06,054 - __main__ - INFO - INFO 06-23 22:05:06 [core.py:455] Waiting for init message from front-end. 2025-06-23 22:05:06,066 - vllm - INFO - INFO 06-23 22:05:06 [core.py:70] Initializing a V1 LLM engine (v0.9.1) with config: model='allenai/olmOCR-7B-0225-preview', speculative_config=None, tokenizer='allenai/olmOCR-7B-0225-preview', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, override_neuron_config={}, tokenizer_revision=None, trust_remote_code=False, dtype=torch.bfloat16, max_seq_len=32768, download_dir=None, load_format=LoadFormat.AUTO, tensor_parallel_size=1, pipeline_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, kv_cache_dtype=auto, device_config=cuda, decoding_config=DecodingConfig(backend='auto', disable_fallback=False, disable_any_whitespace=False, disable_additional_properties=False, reasoning_backend=''), observability_config=ObservabilityConfig(show_hidden_metrics_for_version=None, otlp_traces_endpoint=None, collect_detailed_traces=None), seed=0, served_model_name=Qwen/Qwen2-VL-7B-Instruct, num_scheduler_steps=1, multi_step_stream_outputs=True, enable_prefix_caching=True, chunked_prefill_enabled=True, use_async_output_proc=True, pooler_config=None, compilation_config={"level":3,"debug_dump_path":"","cache_dir":"","backend":"","custom_ops":["none"],"splitting_ops":["vllm.unified_attention","vllm.unified_attention_with_output"],"use_inductor":true,"compile_sizes":[],"inductor_compile_config":{"enable_auto_functionalized_v2":false},"inductor_passes":{},"use_cudagraph":true,"cudagraph_num_of_warmups":1,"cudagraph_capture_sizes":[512,504,496,488,480,472,464,456,448,440,432,424,416,408,400,392,384,376,368,360,352,344,336,328,320,312,304,296,288,280,272,264,256,248,240,232,224,216,208,200,192,184,176,168,160,152,144,136,128,120,112,104,96,88,80,72,64,56,48,40,32,24,16,8,4,2,1],"cudagraph_copy_inputs":false,"full_cuda_graph":false,"max_capture_size":512,"local_cache_dir":null} 2025-06-23 22:05:06,066 - __main__ - INFO - INFO 06-23 22:05:06 [core.py:70] Initializing a V1 LLM engine (v0.9.1) with config: model='allenai/olmOCR-7B-0225-preview', speculative_config=None, tokenizer='allenai/olmOCR-7B-0225-preview', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, override_neuron_config={}, tokenizer_revision=None, trust_remote_code=False, dtype=torch.bfloat16, max_seq_len=32768, download_dir=None, load_format=LoadFormat.AUTO, tensor_parallel_size=1, pipeline_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, kv_cache_dtype=auto, device_config=cuda, decoding_config=DecodingConfig(backend='auto', disable_fallback=False, disable_any_whitespace=False, disable_additional_properties=False, reasoning_backend=''), observability_config=ObservabilityConfig(show_hidden_metrics_for_version=None, otlp_traces_endpoint=None, collect_detailed_traces=None), seed=0, served_model_name=Qwen/Qwen2-VL-7B-Instruct, num_scheduler_steps=1, multi_step_stream_outputs=True, enable_prefix_caching=True, chunked_prefill_enabled=True, use_async_output_proc=True, pooler_config=None, compilation_config={"level":3,"debug_dump_path":"","cache_dir":"","backend":"","custom_ops":["none"],"splitting_ops":["vllm.unified_attention","vllm.unified_attention_with_output"],"use_inductor":true,"compile_sizes":[],"inductor_compile_config":{"enable_auto_functionalized_v2":false},"inductor_passes":{},"use_cudagraph":true,"cudagraph_num_of_warmups":1,"cudagraph_capture_sizes":[512,504,496,488,480,472,464,456,448,440,432,424,416,408,400,392,384,376,368,360,352,344,336,328,320,312,304,296,288,280,272,264,256,248,240,232,224,216,208,200,192,184,176,168,160,152,144,136,128,120,112,104,96,88,80,72,64,56,48,40,32,24,16,8,4,2,1],"cudagraph_copy_inputs":false,"full_cuda_graph":false,"max_capture_size":512,"local_cache_dir":null} 2025-06-23 22:05:06,535 - __main__ - WARNING - Attempt 40: Please wait for vllm server to become ready... 2025-06-23 22:05:07,588 - __main__ - WARNING - Attempt 41: Please wait for vllm server to become ready... 2025-06-23 22:05:08,152 - vllm - INFO - WARNING 06-23 22:05:08 [utils.py:2737] Methods determine_num_available_blocks,device_config,get_cache_block_size_bytes,initialize_cache not implemented in 2025-06-23 22:05:08,152 - __main__ - INFO - WARNING 06-23 22:05:08 [utils.py:2737] Methods determine_num_available_blocks,device_config,get_cache_block_size_bytes,initialize_cache not implemented in 2025-06-23 22:05:08,639 - vllm - INFO - INFO 06-23 22:05:08 [parallel_state.py:1065] rank 0 in world size 1 is assigned as DP rank 0, PP rank 0, TP rank 0, EP rank 0 2025-06-23 22:05:08,639 - __main__ - INFO - INFO 06-23 22:05:08 [parallel_state.py:1065] rank 0 in world size 1 is assigned as DP rank 0, PP rank 0, TP rank 0, EP rank 0 2025-06-23 22:05:08,641 - __main__ - WARNING - Attempt 42: Please wait for vllm server to become ready... 2025-06-23 22:05:09,363 - vllm - INFO - Using a slow image processor as `use_fast` is unset and a slow processor was saved with this model. `use_fast=True` will be the default behavior in v4.52, even if the model was saved with a slow processor. This will result in minor differences in outputs. You'll still be able to use a slow processor with `use_fast=False`. 2025-06-23 22:05:09,363 - __main__ - INFO - Using a slow image processor as `use_fast` is unset and a slow processor was saved with this model. `use_fast=True` will be the default behavior in v4.52, even if the model was saved with a slow processor. This will result in minor differences in outputs. You'll still be able to use a slow processor with `use_fast=False`. 2025-06-23 22:05:09,694 - __main__ - WARNING - Attempt 43: Please wait for vllm server to become ready... 2025-06-23 22:05:10,340 - vllm - INFO - You have video processor config saved in `preprocessor.json` file which is deprecated. Video processor configs should be saved in their own `video_preprocessor.json` file. You can rename the file or load and save the processor back which renames it automatically. Loading from `preprocessor.json` will be removed in v5.0. 2025-06-23 22:05:10,340 - __main__ - INFO - You have video processor config saved in `preprocessor.json` file which is deprecated. Video processor configs should be saved in their own `video_preprocessor.json` file. You can rename the file or load and save the processor back which renames it automatically. Loading from `preprocessor.json` will be removed in v5.0. 2025-06-23 22:05:10,753 - __main__ - WARNING - Attempt 44: Please wait for vllm server to become ready... 2025-06-23 22:05:11,807 - __main__ - WARNING - Attempt 45: Please wait for vllm server to become ready... 2025-06-23 22:05:12,861 - __main__ - WARNING - Attempt 46: Please wait for vllm server to become ready... 2025-06-23 22:05:13,914 - __main__ - WARNING - Attempt 47: Please wait for vllm server to become ready... 2025-06-23 22:05:14,172 - vllm - INFO - Unused or unrecognized kwargs: return_tensors. 2025-06-23 22:05:14,172 - __main__ - INFO - Unused or unrecognized kwargs: return_tensors. 2025-06-23 22:05:14,675 - vllm - INFO - INFO 06-23 22:05:14 [topk_topp_sampler.py:49] Using FlashInfer for top-p & top-k sampling. 2025-06-23 22:05:14,675 - __main__ - INFO - INFO 06-23 22:05:14 [topk_topp_sampler.py:49] Using FlashInfer for top-p & top-k sampling. 2025-06-23 22:05:14,783 - vllm - INFO - INFO 06-23 22:05:14 [gpu_model_runner.py:1595] Starting to load model allenai/olmOCR-7B-0225-preview... 2025-06-23 22:05:14,783 - __main__ - INFO - INFO 06-23 22:05:14 [gpu_model_runner.py:1595] Starting to load model allenai/olmOCR-7B-0225-preview... 2025-06-23 22:05:14,968 - __main__ - WARNING - Attempt 48: Please wait for vllm server to become ready... 2025-06-23 22:05:15,062 - vllm - INFO - INFO 06-23 22:05:15 [gpu_model_runner.py:1600] Loading model from scratch... 2025-06-23 22:05:15,062 - __main__ - INFO - INFO 06-23 22:05:15 [gpu_model_runner.py:1600] Loading model from scratch... 2025-06-23 22:05:15,806 - vllm - INFO - WARNING 06-23 22:05:15 [vision.py:91] Current `vllm-flash-attn` has a bug inside vision module, so we use xformers backend instead. You can run `pip install flash-attn` to use flash-attention backend. 2025-06-23 22:05:15,806 - __main__ - INFO - WARNING 06-23 22:05:15 [vision.py:91] Current `vllm-flash-attn` has a bug inside vision module, so we use xformers backend instead. You can run `pip install flash-attn` to use flash-attention backend. 2025-06-23 22:05:16,025 - __main__ - WARNING - Attempt 49: Please wait for vllm server to become ready... 2025-06-23 22:05:16,058 - vllm - INFO - INFO 06-23 22:05:16 [cuda.py:252] Using Flash Attention backend on V1 engine. 2025-06-23 22:05:16,059 - __main__ - INFO - INFO 06-23 22:05:16 [cuda.py:252] Using Flash Attention backend on V1 engine. 2025-06-23 22:05:16,451 - vllm - INFO - INFO 06-23 22:05:16 [weight_utils.py:292] Using model weights format ['*.safetensors'] 2025-06-23 22:05:16,451 - __main__ - INFO - INFO 06-23 22:05:16 [weight_utils.py:292] Using model weights format ['*.safetensors'] 2025-06-23 22:05:16,774 - vllm - INFO - Loading safetensors checkpoint shards: 0% Completed | 0/4 [00:00 [Errno 104] Connection reset by peer 2025-06-23 22:08:33,587 - __main__ - INFO - Sleeping for 10 seconds on /home/nws8519/git/adaptation-slr/studies_pdfs/032_capiluppi.pdf-7 to allow server restart 2025-06-23 22:08:33,587 - __main__ - WARNING - Client error on attempt 0 for /home/nws8519/git/adaptation-slr/studies_pdfs/032_capiluppi.pdf-9: [Errno 104] Connection reset by peer 2025-06-23 22:08:33,587 - __main__ - INFO - Sleeping for 10 seconds on /home/nws8519/git/adaptation-slr/studies_pdfs/032_capiluppi.pdf-9 to allow server restart 2025-06-23 22:08:34,131 - __main__ - INFO - Queue remaining: 0 2025-06-23 22:08:34,131 - __main__ - INFO - Metric Name Lifetime (tokens/sec) Recently (tokens/sec) ---------------------------------------------------------------------------------- 2025-06-23 22:08:34,131 - __main__ - INFO - Worker ID | started ----------+-------- 0 | 533 1 | 276 2025-06-23 22:08:37,686 - __main__ - WARNING - Client error on attempt 0 for /home/nws8519/git/adaptation-slr/studies_pdfs/016-sojer-henkel.pdf-35: [Errno 104] Connection reset by peer 2025-06-23 22:08:37,686 - __main__ - INFO - Sleeping for 10 seconds on /home/nws8519/git/adaptation-slr/studies_pdfs/016-sojer-henkel.pdf-35 to allow server restart 2025-06-23 22:08:37,686 - __main__ - WARNING - Client error on attempt 0 for /home/nws8519/git/adaptation-slr/studies_pdfs/016-sojer-henkel.pdf-23: [Errno 104] Connection reset by peer 2025-06-23 22:08:37,687 - __main__ - INFO - Sleeping for 10 seconds on /home/nws8519/git/adaptation-slr/studies_pdfs/016-sojer-henkel.pdf-23 to allow server restart 2025-06-23 22:08:37,687 - __main__ - WARNING - Client error on attempt 0 for /home/nws8519/git/adaptation-slr/studies_pdfs/016-sojer-henkel.pdf-42: [Errno 104] Connection reset by peer 2025-06-23 22:08:37,687 - __main__ - INFO - Sleeping for 10 seconds on /home/nws8519/git/adaptation-slr/studies_pdfs/016-sojer-henkel.pdf-42 to allow server restart 2025-06-23 22:08:37,687 - __main__ - WARNING - Client error on attempt 0 for /home/nws8519/git/adaptation-slr/studies_pdfs/016-sojer-henkel.pdf-44: [Errno 104] Connection reset by peer 2025-06-23 22:08:37,687 - __main__ - INFO - Sleeping for 10 seconds on /home/nws8519/git/adaptation-slr/studies_pdfs/016-sojer-henkel.pdf-44 to allow server restart 2025-06-23 22:08:37,688 - __main__ - WARNING - Client error on attempt 0 for /home/nws8519/git/adaptation-slr/studies_pdfs/016-sojer-henkel.pdf-27: [Errno 104] Connection reset by peer 2025-06-23 22:08:37,688 - __main__ - INFO - Sleeping for 10 seconds on /home/nws8519/git/adaptation-slr/studies_pdfs/016-sojer-henkel.pdf-27 to allow server restart 2025-06-23 22:08:37,688 - __main__ - WARNING - Client error on attempt 0 for /home/nws8519/git/adaptation-slr/studies_pdfs/016-sojer-henkel.pdf-41: [Errno 104] Connection reset by peer 2025-06-23 22:08:37,688 - __main__ - INFO - Sleeping for 10 seconds on /home/nws8519/git/adaptation-slr/studies_pdfs/016-sojer-henkel.pdf-41 to allow server restart 2025-06-23 22:08:37,689 - __main__ - WARNING - Client error on attempt 0 for /home/nws8519/git/adaptation-slr/studies_pdfs/016-sojer-henkel.pdf-43: [Errno 104] Connection reset by peer 2025-06-23 22:08:37,689 - __main__ - INFO - Sleeping for 10 seconds on /home/nws8519/git/adaptation-slr/studies_pdfs/016-sojer-henkel.pdf-43 to allow server restart 2025-06-23 22:08:37,689 - __main__ - WARNING - Client error on attempt 0 for /home/nws8519/git/adaptation-slr/studies_pdfs/003-bogart.pdf-38: [Errno 104] Connection reset by peer 2025-06-23 22:08:37,690 - __main__ - INFO - Sleeping for 10 seconds on /home/nws8519/git/adaptation-slr/studies_pdfs/003-bogart.pdf-38 to allow server restart 2025-06-23 22:08:37,690 - __main__ - WARNING - Client error on attempt 0 for /home/nws8519/git/adaptation-slr/studies_pdfs/016-sojer-henkel.pdf-45: [Errno 104] Connection reset by peer 2025-06-23 22:08:37,690 - __main__ - INFO - Sleeping for 10 seconds on /home/nws8519/git/adaptation-slr/studies_pdfs/016-sojer-henkel.pdf-45 to allow server restart 2025-06-23 22:08:37,690 - __main__ - WARNING - Client error on attempt 0 for /home/nws8519/git/adaptation-slr/studies_pdfs/016-sojer-henkel.pdf-38: [Errno 104] Connection reset by peer 2025-06-23 22:08:37,690 - __main__ - INFO - Sleeping for 10 seconds on /home/nws8519/git/adaptation-slr/studies_pdfs/016-sojer-henkel.pdf-38 to allow server restart 2025-06-23 22:08:37,690 - __main__ - WARNING - Client error on attempt 0 for /home/nws8519/git/adaptation-slr/studies_pdfs/016-sojer-henkel.pdf-36: [Errno 104] Connection reset by peer 2025-06-23 22:08:37,690 - __main__ - INFO - Sleeping for 10 seconds on /home/nws8519/git/adaptation-slr/studies_pdfs/016-sojer-henkel.pdf-36 to allow server restart 2025-06-23 22:08:37,690 - __main__ - WARNING - Client error on attempt 0 for /home/nws8519/git/adaptation-slr/studies_pdfs/016-sojer-henkel.pdf-37: [Errno 104] Connection reset by peer 2025-06-23 22:08:37,690 - __main__ - INFO - Sleeping for 10 seconds on /home/nws8519/git/adaptation-slr/studies_pdfs/016-sojer-henkel.pdf-37 to allow server restart 2025-06-23 22:08:37,690 - __main__ - WARNING - Client error on attempt 0 for /home/nws8519/git/adaptation-slr/studies_pdfs/016-sojer-henkel.pdf-29: [Errno 104] Connection reset by peer 2025-06-23 22:08:37,690 - __main__ - INFO - Sleeping for 10 seconds on /home/nws8519/git/adaptation-slr/studies_pdfs/016-sojer-henkel.pdf-29 to allow server restart 2025-06-23 22:08:37,691 - __main__ - WARNING - Client error on attempt 0 for /home/nws8519/git/adaptation-slr/studies_pdfs/020_hilton.pdf-7: [Errno 104] Connection reset by peer 2025-06-23 22:08:37,691 - __main__ - INFO - Sleeping for 10 seconds on /home/nws8519/git/adaptation-slr/studies_pdfs/020_hilton.pdf-7 to allow server restart 2025-06-23 22:08:37,691 - __main__ - WARNING - Client error on attempt 0 for /home/nws8519/git/adaptation-slr/studies_pdfs/016-sojer-henkel.pdf-25: [Errno 104] Connection reset by peer 2025-06-23 22:08:37,691 - __main__ - INFO - Sleeping for 10 seconds on /home/nws8519/git/adaptation-slr/studies_pdfs/016-sojer-henkel.pdf-25 to allow server restart 2025-06-23 22:08:37,691 - __main__ - WARNING - Client error on attempt 0 for /home/nws8519/git/adaptation-slr/studies_pdfs/016-sojer-henkel.pdf-32: [Errno 104] Connection reset by peer 2025-06-23 22:08:37,691 - __main__ - INFO - Sleeping for 10 seconds on /home/nws8519/git/adaptation-slr/studies_pdfs/016-sojer-henkel.pdf-32 to allow server restart 2025-06-23 22:08:37,691 - __main__ - WARNING - Client error on attempt 0 for /home/nws8519/git/adaptation-slr/studies_pdfs/016-sojer-henkel.pdf-34: [Errno 104] Connection reset by peer 2025-06-23 22:08:37,691 - __main__ - INFO - Sleeping for 10 seconds on /home/nws8519/git/adaptation-slr/studies_pdfs/016-sojer-henkel.pdf-34 to allow server restart 2025-06-23 22:08:37,691 - __main__ - WARNING - Client error on attempt 0 for /home/nws8519/git/adaptation-slr/studies_pdfs/016-sojer-henkel.pdf-24: [Errno 104] Connection reset by peer 2025-06-23 22:08:37,691 - __main__ - INFO - Sleeping for 10 seconds on /home/nws8519/git/adaptation-slr/studies_pdfs/016-sojer-henkel.pdf-24 to allow server restart 2025-06-23 22:08:37,691 - __main__ - WARNING - Client error on attempt 0 for /home/nws8519/git/adaptation-slr/studies_pdfs/016-sojer-henkel.pdf-31: [Errno 104] Connection reset by peer 2025-06-23 22:08:37,692 - __main__ - INFO - Sleeping for 10 seconds on /home/nws8519/git/adaptation-slr/studies_pdfs/016-sojer-henkel.pdf-31 to allow server restart 2025-06-23 22:08:37,692 - __main__ - WARNING - Client error on attempt 0 for /home/nws8519/git/adaptation-slr/studies_pdfs/020_hilton.pdf-11: [Errno 104] Connection reset by peer 2025-06-23 22:08:37,692 - __main__ - INFO - Sleeping for 10 seconds on /home/nws8519/git/adaptation-slr/studies_pdfs/020_hilton.pdf-11 to allow server restart 2025-06-23 22:08:41,816 - vllm - INFO - INFO 06-23 22:08:41 [loggers.py:118] Engine 000: Avg prompt throughput: 8479.5 tokens/s, Avg generation throughput: 191.0 tokens/s, Running: 60 reqs, Waiting: 151 reqs, GPU KV cache usage: 77.4%, Prefix cache hit rate: 0.0% 2025-06-23 22:08:41,816 - __main__ - INFO - vllm running req: 60 queue req: 151 2025-06-23 22:08:43,932 - __main__ - INFO - Built page query for /home/nws8519/git/adaptation-slr/studies_pdfs/032_capiluppi.pdf-7 2025-06-23 22:08:43,942 - __main__ - INFO - Built page query for /home/nws8519/git/adaptation-slr/studies_pdfs/032_capiluppi.pdf-9 2025-06-23 22:08:44,132 - __main__ - INFO - Queue remaining: 0 2025-06-23 22:08:44,132 - __main__ - INFO - Metric Name Lifetime (tokens/sec) Recently (tokens/sec) ---------------------------------------------------------------------------------- 2025-06-23 22:08:44,133 - __main__ - INFO - Worker ID | started ----------+-------- 0 | 533 1 | 276 2025-06-23 22:08:48,490 - __main__ - INFO - Built page query for /home/nws8519/git/adaptation-slr/studies_pdfs/016-sojer-henkel.pdf-45 2025-06-23 22:08:48,725 - __main__ - INFO - Built page query for /home/nws8519/git/adaptation-slr/studies_pdfs/016-sojer-henkel.pdf-24 2025-06-23 22:08:48,775 - __main__ - INFO - Built page query for /home/nws8519/git/adaptation-slr/studies_pdfs/016-sojer-henkel.pdf-35 2025-06-23 22:08:48,785 - __main__ - INFO - Built page query for /home/nws8519/git/adaptation-slr/studies_pdfs/016-sojer-henkel.pdf-41 2025-06-23 22:08:48,808 - __main__ - INFO - Built page query for /home/nws8519/git/adaptation-slr/studies_pdfs/016-sojer-henkel.pdf-38 2025-06-23 22:08:48,809 - __main__ - INFO - Built page query for /home/nws8519/git/adaptation-slr/studies_pdfs/016-sojer-henkel.pdf-34 2025-06-23 22:08:48,829 - __main__ - INFO - Built page query for /home/nws8519/git/adaptation-slr/studies_pdfs/016-sojer-henkel.pdf-44 2025-06-23 22:08:48,856 - __main__ - INFO - Built page query for /home/nws8519/git/adaptation-slr/studies_pdfs/016-sojer-henkel.pdf-29 2025-06-23 22:08:48,882 - __main__ - INFO - Built page query for /home/nws8519/git/adaptation-slr/studies_pdfs/016-sojer-henkel.pdf-25 2025-06-23 22:08:48,906 - __main__ - INFO - Built page query for /home/nws8519/git/adaptation-slr/studies_pdfs/016-sojer-henkel.pdf-32 2025-06-23 22:08:48,907 - __main__ - INFO - Built page query for /home/nws8519/git/adaptation-slr/studies_pdfs/016-sojer-henkel.pdf-31 2025-06-23 22:08:48,909 - __main__ - INFO - Built page query for /home/nws8519/git/adaptation-slr/studies_pdfs/016-sojer-henkel.pdf-23 2025-06-23 22:08:48,938 - __main__ - INFO - Built page query for /home/nws8519/git/adaptation-slr/studies_pdfs/016-sojer-henkel.pdf-36 2025-06-23 22:08:48,981 - __main__ - INFO - Built page query for /home/nws8519/git/adaptation-slr/studies_pdfs/020_hilton.pdf-11 2025-06-23 22:08:48,989 - __main__ - INFO - Built page query for /home/nws8519/git/adaptation-slr/studies_pdfs/016-sojer-henkel.pdf-43 2025-06-23 22:08:49,007 - __main__ - INFO - Built page query for /home/nws8519/git/adaptation-slr/studies_pdfs/016-sojer-henkel.pdf-37 2025-06-23 22:08:49,026 - __main__ - INFO - Built page query for /home/nws8519/git/adaptation-slr/studies_pdfs/003-bogart.pdf-38 2025-06-23 22:08:49,026 - __main__ - INFO - Built page query for /home/nws8519/git/adaptation-slr/studies_pdfs/016-sojer-henkel.pdf-42 2025-06-23 22:08:49,062 - __main__ - INFO - Built page query for /home/nws8519/git/adaptation-slr/studies_pdfs/016-sojer-henkel.pdf-27 2025-06-23 22:08:49,096 - __main__ - INFO - Built page query for /home/nws8519/git/adaptation-slr/studies_pdfs/020_hilton.pdf-7 2025-06-23 22:08:49,459 - __main__ - WARNING - Client error on attempt 0 for /home/nws8519/git/adaptation-slr/studies_pdfs/001-adams.pdf-24: [Errno 104] Connection reset by peer 2025-06-23 22:08:49,459 - __main__ - INFO - Sleeping for 10 seconds on /home/nws8519/git/adaptation-slr/studies_pdfs/001-adams.pdf-24 to allow server restart 2025-06-23 22:08:49,459 - __main__ - WARNING - Client error on attempt 0 for /home/nws8519/git/adaptation-slr/studies_pdfs/004-butler.pdf-6: [Errno 104] Connection reset by peer 2025-06-23 22:08:49,459 - __main__ - INFO - Sleeping for 10 seconds on /home/nws8519/git/adaptation-slr/studies_pdfs/004-butler.pdf-6 to allow server restart 2025-06-23 22:08:49,980 - __main__ - WARNING - Client error on attempt 0 for /home/nws8519/git/adaptation-slr/studies_pdfs/014-norskov.pdf-1: [Errno 104] Connection reset by peer 2025-06-23 22:08:49,980 - __main__ - INFO - Sleeping for 10 seconds on /home/nws8519/git/adaptation-slr/studies_pdfs/014-norskov.pdf-1 to allow server restart 2025-06-23 22:08:49,980 - __main__ - WARNING - Client error on attempt 0 for /home/nws8519/git/adaptation-slr/studies_pdfs/014-norskov.pdf-14: [Errno 104] Connection reset by peer 2025-06-23 22:08:49,980 - __main__ - INFO - Sleeping for 10 seconds on /home/nws8519/git/adaptation-slr/studies_pdfs/014-norskov.pdf-14 to allow server restart 2025-06-23 22:08:49,980 - __main__ - WARNING - Client error on attempt 0 for /home/nws8519/git/adaptation-slr/studies_pdfs/014-norskov.pdf-15: [Errno 104] Connection reset by peer 2025-06-23 22:08:49,980 - __main__ - INFO - Sleeping for 10 seconds on /home/nws8519/git/adaptation-slr/studies_pdfs/014-norskov.pdf-15 to allow server restart 2025-06-23 22:08:49,981 - __main__ - WARNING - Client error on attempt 0 for /home/nws8519/git/adaptation-slr/studies_pdfs/014-norskov.pdf-11: [Errno 104] Connection reset by peer 2025-06-23 22:08:49,981 - __main__ - INFO - Sleeping for 10 seconds on /home/nws8519/git/adaptation-slr/studies_pdfs/014-norskov.pdf-11 to allow server restart 2025-06-23 22:08:49,981 - __main__ - WARNING - Client error on attempt 0 for /home/nws8519/git/adaptation-slr/studies_pdfs/014-norskov.pdf-8: [Errno 104] Connection reset by peer 2025-06-23 22:08:49,981 - __main__ - INFO - Sleeping for 10 seconds on /home/nws8519/git/adaptation-slr/studies_pdfs/014-norskov.pdf-8 to allow server restart 2025-06-23 22:08:49,981 - __main__ - WARNING - Client error on attempt 0 for /home/nws8519/git/adaptation-slr/studies_pdfs/014-norskov.pdf-4: [Errno 104] Connection reset by peer 2025-06-23 22:08:49,981 - __main__ - INFO - Sleeping for 10 seconds on /home/nws8519/git/adaptation-slr/studies_pdfs/014-norskov.pdf-4 to allow server restart 2025-06-23 22:08:49,981 - __main__ - WARNING - Client error on attempt 0 for /home/nws8519/git/adaptation-slr/studies_pdfs/014-norskov.pdf-5: [Errno 104] Connection reset by peer 2025-06-23 22:08:49,981 - __main__ - INFO - Sleeping for 10 seconds on /home/nws8519/git/adaptation-slr/studies_pdfs/014-norskov.pdf-5 to allow server restart 2025-06-23 22:08:49,981 - __main__ - WARNING - Client error on attempt 0 for /home/nws8519/git/adaptation-slr/studies_pdfs/001-adams.pdf-31: [Errno 104] Connection reset by peer 2025-06-23 22:08:49,982 - __main__ - INFO - Sleeping for 10 seconds on /home/nws8519/git/adaptation-slr/studies_pdfs/001-adams.pdf-31 to allow server restart 2025-06-23 22:08:49,982 - __main__ - WARNING - Client error on attempt 0 for /home/nws8519/git/adaptation-slr/studies_pdfs/004-butler.pdf-11: [Errno 104] Connection reset by peer 2025-06-23 22:08:49,982 - __main__ - INFO - Sleeping for 10 seconds on /home/nws8519/git/adaptation-slr/studies_pdfs/004-butler.pdf-11 to allow server restart 2025-06-23 22:08:49,982 - __main__ - WARNING - Client error on attempt 0 for /home/nws8519/git/adaptation-slr/studies_pdfs/001-adams.pdf-34: [Errno 104] Connection reset by peer 2025-06-23 22:08:49,982 - __main__ - INFO - Sleeping for 10 seconds on /home/nws8519/git/adaptation-slr/studies_pdfs/001-adams.pdf-34 to allow server restart 2025-06-23 22:08:49,982 - __main__ - WARNING - Client error on attempt 0 for /home/nws8519/git/adaptation-slr/studies_pdfs/001-adams.pdf-36: [Errno 104] Connection reset by peer 2025-06-23 22:08:49,982 - __main__ - INFO - Sleeping for 10 seconds on /home/nws8519/git/adaptation-slr/studies_pdfs/001-adams.pdf-36 to allow server restart 2025-06-23 22:08:49,982 - __main__ - WARNING - Client error on attempt 0 for /home/nws8519/git/adaptation-slr/studies_pdfs/001-adams.pdf-37: [Errno 104] Connection reset by peer 2025-06-23 22:08:49,982 - __main__ - INFO - Sleeping for 10 seconds on /home/nws8519/git/adaptation-slr/studies_pdfs/001-adams.pdf-37 to allow server restart 2025-06-23 22:08:49,983 - __main__ - WARNING - Client error on attempt 0 for /home/nws8519/git/adaptation-slr/studies_pdfs/001-adams.pdf-40: [Errno 104] Connection reset by peer 2025-06-23 22:08:49,983 - __main__ - INFO - Sleeping for 10 seconds on /home/nws8519/git/adaptation-slr/studies_pdfs/001-adams.pdf-40 to allow server restart 2025-06-23 22:08:49,983 - __main__ - WARNING - Client error on attempt 0 for /home/nws8519/git/adaptation-slr/studies_pdfs/014-norskov.pdf-2: [Errno 104] Connection reset by peer 2025-06-23 22:08:49,983 - __main__ - INFO - Sleeping for 10 seconds on /home/nws8519/git/adaptation-slr/studies_pdfs/014-norskov.pdf-2 to allow server restart 2025-06-23 22:08:49,983 - __main__ - WARNING - Client error on attempt 0 for /home/nws8519/git/adaptation-slr/studies_pdfs/001-adams.pdf-33: [Errno 104] Connection reset by peer 2025-06-23 22:08:49,983 - __main__ - INFO - Sleeping for 10 seconds on /home/nws8519/git/adaptation-slr/studies_pdfs/001-adams.pdf-33 to allow server restart 2025-06-23 22:08:49,983 - __main__ - WARNING - Client error on attempt 0 for /home/nws8519/git/adaptation-slr/studies_pdfs/001-adams.pdf-41: [Errno 104] Connection reset by peer 2025-06-23 22:08:49,983 - __main__ - INFO - Sleeping for 10 seconds on /home/nws8519/git/adaptation-slr/studies_pdfs/001-adams.pdf-41 to allow server restart 2025-06-23 22:08:49,983 - __main__ - WARNING - Client error on attempt 0 for /home/nws8519/git/adaptation-slr/studies_pdfs/001-adams.pdf-35: [Errno 104] Connection reset by peer 2025-06-23 22:08:49,983 - __main__ - INFO - Sleeping for 10 seconds on /home/nws8519/git/adaptation-slr/studies_pdfs/001-adams.pdf-35 to allow server restart 2025-06-23 22:08:49,984 - __main__ - WARNING - Client error on attempt 0 for /home/nws8519/git/adaptation-slr/studies_pdfs/001-adams.pdf-42: [Errno 104] Connection reset by peer 2025-06-23 22:08:49,984 - __main__ - INFO - Sleeping for 10 seconds on /home/nws8519/git/adaptation-slr/studies_pdfs/001-adams.pdf-42 to allow server restart 2025-06-23 22:08:49,984 - __main__ - WARNING - Client error on attempt 0 for /home/nws8519/git/adaptation-slr/studies_pdfs/001-adams.pdf-39: [Errno 104] Connection reset by peer 2025-06-23 22:08:49,984 - __main__ - INFO - Sleeping for 10 seconds on /home/nws8519/git/adaptation-slr/studies_pdfs/001-adams.pdf-39 to allow server restart 2025-06-23 22:08:49,984 - __main__ - WARNING - Client error on attempt 0 for /home/nws8519/git/adaptation-slr/studies_pdfs/014-norskov.pdf-3: [Errno 104] Connection reset by peer 2025-06-23 22:08:49,984 - __main__ - INFO - Sleeping for 10 seconds on /home/nws8519/git/adaptation-slr/studies_pdfs/014-norskov.pdf-3 to allow server restart 2025-06-23 22:08:52,006 - vllm - INFO - INFO 06-23 22:08:52 [loggers.py:118] Engine 000: Avg prompt throughput: 5136.4 tokens/s, Avg generation throughput: 852.2 tokens/s, Running: 75 reqs, Waiting: 261 reqs, GPU KV cache usage: 99.8%, Prefix cache hit rate: 21.6% 2025-06-23 22:08:52,006 - __main__ - INFO - vllm running req: 75 queue req: 261 2025-06-23 22:08:52,530 - __main__ - WARNING - Client error on attempt 0 for /home/nws8519/git/adaptation-slr/studies_pdfs/003-bogart.pdf-24: [Errno 104] Connection reset by peer 2025-06-23 22:08:52,530 - __main__ - INFO - Sleeping for 10 seconds on /home/nws8519/git/adaptation-slr/studies_pdfs/003-bogart.pdf-24 to allow server restart 2025-06-23 22:08:52,530 - __main__ - WARNING - Client error on attempt 0 for /home/nws8519/git/adaptation-slr/studies_pdfs/003-bogart.pdf-23: [Errno 104] Connection reset by peer 2025-06-23 22:08:52,530 - __main__ - INFO - Sleeping for 10 seconds on /home/nws8519/git/adaptation-slr/studies_pdfs/003-bogart.pdf-23 to allow server restart 2025-06-23 22:08:52,531 - __main__ - WARNING - Client error on attempt 0 for /home/nws8519/git/adaptation-slr/studies_pdfs/007-gamalielsson.pdf-10: [Errno 104] Connection reset by peer 2025-06-23 22:08:52,531 - __main__ - INFO - Sleeping for 10 seconds on /home/nws8519/git/adaptation-slr/studies_pdfs/007-gamalielsson.pdf-10 to allow server restart 2025-06-23 22:08:52,531 - __main__ - WARNING - Client error on attempt 0 for /home/nws8519/git/adaptation-slr/studies_pdfs/003-bogart.pdf-28: [Errno 104] Connection reset by peer 2025-06-23 22:08:52,531 - __main__ - INFO - Sleeping for 10 seconds on /home/nws8519/git/adaptation-slr/studies_pdfs/003-bogart.pdf-28 to allow server restart 2025-06-23 22:08:52,531 - __main__ - WARNING - Client error on attempt 0 for /home/nws8519/git/adaptation-slr/studies_pdfs/007-gamalielsson.pdf-8: [Errno 104] Connection reset by peer 2025-06-23 22:08:52,531 - __main__ - INFO - Sleeping for 10 seconds on /home/nws8519/git/adaptation-slr/studies_pdfs/007-gamalielsson.pdf-8 to allow server restart 2025-06-23 22:08:52,531 - __main__ - WARNING - Client error on attempt 0 for /home/nws8519/git/adaptation-slr/studies_pdfs/003-bogart.pdf-27: [Errno 104] Connection reset by peer 2025-06-23 22:08:52,531 - __main__ - INFO - Sleeping for 10 seconds on /home/nws8519/git/adaptation-slr/studies_pdfs/003-bogart.pdf-27 to allow server restart 2025-06-23 22:08:53,043 - __main__ - WARNING - Client error on attempt 0 for /home/nws8519/git/adaptation-slr/studies_pdfs/022_lotter.pdf-5: [Errno 104] Connection reset by peer 2025-06-23 22:08:53,043 - __main__ - INFO - Sleeping for 10 seconds on /home/nws8519/git/adaptation-slr/studies_pdfs/022_lotter.pdf-5 to allow server restart 2025-06-23 22:08:53,043 - __main__ - WARNING - Client error on attempt 0 for /home/nws8519/git/adaptation-slr/studies_pdfs/023_abdalkareem.pdf-4: [Errno 104] Connection reset by peer 2025-06-23 22:08:53,043 - __main__ - INFO - Sleeping for 10 seconds on /home/nws8519/git/adaptation-slr/studies_pdfs/023_abdalkareem.pdf-4 to allow server restart 2025-06-23 22:08:53,044 - __main__ - WARNING - Client error on attempt 0 for /home/nws8519/git/adaptation-slr/studies_pdfs/023_abdalkareem.pdf-5: [Errno 104] Connection reset by peer 2025-06-23 22:08:53,044 - __main__ - INFO - Sleeping for 10 seconds on /home/nws8519/git/adaptation-slr/studies_pdfs/023_abdalkareem.pdf-5 to allow server restart 2025-06-23 22:08:53,044 - __main__ - WARNING - Client error on attempt 0 for /home/nws8519/git/adaptation-slr/studies_pdfs/021_he.pdf-6: [Errno 104] Connection reset by peer 2025-06-23 22:08:53,044 - __main__ - INFO - Sleeping for 10 seconds on /home/nws8519/git/adaptation-slr/studies_pdfs/021_he.pdf-6 to allow server restart 2025-06-23 22:08:53,044 - __main__ - WARNING - Client error on attempt 0 for /home/nws8519/git/adaptation-slr/studies_pdfs/022_lotter.pdf-2: [Errno 104] Connection reset by peer 2025-06-23 22:08:53,044 - __main__ - INFO - Sleeping for 10 seconds on /home/nws8519/git/adaptation-slr/studies_pdfs/022_lotter.pdf-2 to allow server restart 2025-06-23 22:08:53,045 - __main__ - WARNING - Client error on attempt 0 for /home/nws8519/git/adaptation-slr/studies_pdfs/022_lotter.pdf-1: [Errno 104] Connection reset by peer 2025-06-23 22:08:53,045 - __main__ - INFO - Sleeping for 10 seconds on /home/nws8519/git/adaptation-slr/studies_pdfs/022_lotter.pdf-1 to allow server restart 2025-06-23 22:08:53,045 - __main__ - WARNING - Client error on attempt 0 for /home/nws8519/git/adaptation-slr/studies_pdfs/015-santos.pdf-12: [Errno 104] Connection reset by peer 2025-06-23 22:08:53,045 - __main__ - INFO - Sleeping for 10 seconds on /home/nws8519/git/adaptation-slr/studies_pdfs/015-santos.pdf-12 to allow server restart 2025-06-23 22:08:53,045 - __main__ - WARNING - Client error on attempt 0 for /home/nws8519/git/adaptation-slr/studies_pdfs/011-jahanshahi.pdf-46: [Errno 104] Connection reset by peer 2025-06-23 22:08:53,045 - __main__ - INFO - Sleeping for 10 seconds on /home/nws8519/git/adaptation-slr/studies_pdfs/011-jahanshahi.pdf-46 to allow server restart 2025-06-23 22:08:54,134 - __main__ - INFO - Queue remaining: 0 2025-06-23 22:08:54,134 - __main__ - INFO - Metric Name Lifetime (tokens/sec) Recently (tokens/sec) ---------------------------------------------------------------------------------- 2025-06-23 22:08:54,134 - __main__ - INFO - Worker ID | started ----------+-------- 0 | 533 1 | 276 2025-06-23 22:08:56,626 - __main__ - WARNING - Client error on attempt 0 for /home/nws8519/git/adaptation-slr/studies_pdfs/031_businger.pdf-36: [Errno 104] Connection reset by peer 2025-06-23 22:08:56,627 - __main__ - INFO - Sleeping for 10 seconds on /home/nws8519/git/adaptation-slr/studies_pdfs/031_businger.pdf-36 to allow server restart 2025-06-23 22:08:56,627 - __main__ - WARNING - Client error on attempt 0 for /home/nws8519/git/adaptation-slr/studies_pdfs/031_businger.pdf-47: [Errno 104] Connection reset by peer 2025-06-23 22:08:56,627 - __main__ - INFO - Sleeping for 10 seconds on /home/nws8519/git/adaptation-slr/studies_pdfs/031_businger.pdf-47 to allow server restart 2025-06-23 22:08:56,627 - __main__ - WARNING - Client error on attempt 0 for /home/nws8519/git/adaptation-slr/studies_pdfs/031_businger.pdf-39: [Errno 104] Connection reset by peer 2025-06-23 22:08:56,627 - __main__ - INFO - Sleeping for 10 seconds on /home/nws8519/git/adaptation-slr/studies_pdfs/031_businger.pdf-39 to allow server restart 2025-06-23 22:08:59,800 - __main__ - INFO - Built page query for /home/nws8519/git/adaptation-slr/studies_pdfs/001-adams.pdf-24 2025-06-23 22:08:59,806 - __main__ - INFO - Built page query for /home/nws8519/git/adaptation-slr/studies_pdfs/004-butler.pdf-6 2025-06-23 22:09:00,593 - __main__ - INFO - Built page query for /home/nws8519/git/adaptation-slr/studies_pdfs/014-norskov.pdf-3 2025-06-23 22:09:00,636 - __main__ - INFO - Built page query for /home/nws8519/git/adaptation-slr/studies_pdfs/014-norskov.pdf-2 2025-06-23 22:09:00,663 - __main__ - INFO - Built page query for /home/nws8519/git/adaptation-slr/studies_pdfs/014-norskov.pdf-4 2025-06-23 22:09:00,794 - __main__ - INFO - Built page query for /home/nws8519/git/adaptation-slr/studies_pdfs/001-adams.pdf-41 2025-06-23 22:09:00,830 - __main__ - INFO - Built page query for /home/nws8519/git/adaptation-slr/studies_pdfs/014-norskov.pdf-1 2025-06-23 22:09:00,945 - __main__ - INFO - Built page query for /home/nws8519/git/adaptation-slr/studies_pdfs/001-adams.pdf-37 2025-06-23 22:09:00,971 - __main__ - INFO - Built page query for /home/nws8519/git/adaptation-slr/studies_pdfs/001-adams.pdf-39 2025-06-23 22:09:00,975 - __main__ - INFO - Built page query for /home/nws8519/git/adaptation-slr/studies_pdfs/014-norskov.pdf-5 2025-06-23 22:09:00,975 - __main__ - INFO - Built page query for /home/nws8519/git/adaptation-slr/studies_pdfs/001-adams.pdf-42 2025-06-23 22:09:00,992 - __main__ - INFO - Built page query for /home/nws8519/git/adaptation-slr/studies_pdfs/014-norskov.pdf-11 2025-06-23 22:09:01,003 - __main__ - INFO - Built page query for /home/nws8519/git/adaptation-slr/studies_pdfs/014-norskov.pdf-8 2025-06-23 22:09:01,003 - __main__ - INFO - Built page query for /home/nws8519/git/adaptation-slr/studies_pdfs/014-norskov.pdf-15 2025-06-23 22:09:01,005 - __main__ - INFO - Built page query for /home/nws8519/git/adaptation-slr/studies_pdfs/014-norskov.pdf-14 2025-06-23 22:09:01,044 - __main__ - INFO - Built page query for /home/nws8519/git/adaptation-slr/studies_pdfs/001-adams.pdf-40 2025-06-23 22:09:01,069 - __main__ - INFO - Built page query for /home/nws8519/git/adaptation-slr/studies_pdfs/001-adams.pdf-35 2025-06-23 22:09:01,075 - __main__ - INFO - Built page query for /home/nws8519/git/adaptation-slr/studies_pdfs/001-adams.pdf-33 2025-06-23 22:09:01,162 - __main__ - INFO - Built page query for /home/nws8519/git/adaptation-slr/studies_pdfs/001-adams.pdf-31 2025-06-23 22:09:01,223 - __main__ - INFO - Built page query for /home/nws8519/git/adaptation-slr/studies_pdfs/004-butler.pdf-11 2025-06-23 22:09:01,243 - __main__ - INFO - Built page query for /home/nws8519/git/adaptation-slr/studies_pdfs/001-adams.pdf-36 2025-06-23 22:09:01,247 - __main__ - INFO - Built page query for /home/nws8519/git/adaptation-slr/studies_pdfs/001-adams.pdf-34 2025-06-23 22:09:02,117 - vllm - INFO - INFO 06-23 22:09:02 [loggers.py:118] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 2075.3 tokens/s, Running: 68 reqs, Waiting: 383 reqs, GPU KV cache usage: 99.8%, Prefix cache hit rate: 42.7% 2025-06-23 22:09:02,117 - __main__ - INFO - vllm running req: 68 queue req: 383 2025-06-23 22:09:03,397 - __main__ - INFO - Built page query for /home/nws8519/git/adaptation-slr/studies_pdfs/003-bogart.pdf-28 2025-06-23 22:09:03,497 - __main__ - INFO - Built page query for /home/nws8519/git/adaptation-slr/studies_pdfs/003-bogart.pdf-24 2025-06-23 22:09:03,516 - __main__ - INFO - Built page query for /home/nws8519/git/adaptation-slr/studies_pdfs/003-bogart.pdf-23 2025-06-23 22:09:03,564 - __main__ - INFO - Built page query for /home/nws8519/git/adaptation-slr/studies_pdfs/007-gamalielsson.pdf-10 2025-06-23 22:09:03,614 - __main__ - INFO - Built page query for /home/nws8519/git/adaptation-slr/studies_pdfs/007-gamalielsson.pdf-8 2025-06-23 22:09:03,688 - __main__ - INFO - Built page query for /home/nws8519/git/adaptation-slr/studies_pdfs/003-bogart.pdf-27 2025-06-23 22:09:03,736 - __main__ - INFO - Built page query for /home/nws8519/git/adaptation-slr/studies_pdfs/023_abdalkareem.pdf-4 2025-06-23 22:09:03,910 - __main__ - INFO - Built page query for /home/nws8519/git/adaptation-slr/studies_pdfs/022_lotter.pdf-5 2025-06-23 22:09:03,964 - __main__ - INFO - Built page query for /home/nws8519/git/adaptation-slr/studies_pdfs/015-santos.pdf-12 2025-06-23 22:09:03,973 - __main__ - INFO - Built page query for /home/nws8519/git/adaptation-slr/studies_pdfs/011-jahanshahi.pdf-46 2025-06-23 22:09:03,977 - __main__ - INFO - Built page query for /home/nws8519/git/adaptation-slr/studies_pdfs/022_lotter.pdf-2 2025-06-23 22:09:03,986 - __main__ - INFO - Built page query for /home/nws8519/git/adaptation-slr/studies_pdfs/022_lotter.pdf-1 2025-06-23 22:09:03,996 - __main__ - INFO - Built page query for /home/nws8519/git/adaptation-slr/studies_pdfs/023_abdalkareem.pdf-5 2025-06-23 22:09:04,135 - __main__ - INFO - Queue remaining: 0 2025-06-23 22:09:04,135 - __main__ - INFO - Metric Name Lifetime (tokens/sec) Recently (tokens/sec) ---------------------------------------------------------------------------------- completed_pages 0.00 0.00 server_input_tokens 4.16 5.79 server_output_tokens 0.92 1.28 2025-06-23 22:09:04,136 - __main__ - INFO - Worker ID | finished | started ----------+----------+-------- 0 | 1 | 533 1 | 0 | 276 2025-06-23 22:09:04,192 - __main__ - INFO - Built page query for /home/nws8519/git/adaptation-slr/studies_pdfs/021_he.pdf-6 2025-06-23 22:09:04,307 - __main__ - WARNING - Client error on attempt 0 for /home/nws8519/git/adaptation-slr/studies_pdfs/021_he.pdf-4: [Errno 104] Connection reset by peer 2025-06-23 22:09:04,307 - __main__ - INFO - Sleeping for 10 seconds on /home/nws8519/git/adaptation-slr/studies_pdfs/021_he.pdf-4 to allow server restart 2025-06-23 22:09:04,307 - __main__ - WARNING - Client error on attempt 0 for /home/nws8519/git/adaptation-slr/studies_pdfs/021_he.pdf-9: [Errno 104] Connection reset by peer 2025-06-23 22:09:04,307 - __main__ - INFO - Sleeping for 10 seconds on /home/nws8519/git/adaptation-slr/studies_pdfs/021_he.pdf-9 to allow server restart 2025-06-23 22:09:04,307 - __main__ - WARNING - Client error on attempt 0 for /home/nws8519/git/adaptation-slr/studies_pdfs/003-bogart.pdf-51: [Errno 104] Connection reset by peer 2025-06-23 22:09:04,308 - __main__ - INFO - Sleeping for 10 seconds on /home/nws8519/git/adaptation-slr/studies_pdfs/003-bogart.pdf-51 to allow server restart 2025-06-23 22:09:04,308 - __main__ - WARNING - Client error on attempt 0 for /home/nws8519/git/adaptation-slr/studies_pdfs/003-bogart.pdf-50: [Errno 104] Connection reset by peer 2025-06-23 22:09:04,308 - __main__ - INFO - Sleeping for 10 seconds on /home/nws8519/git/adaptation-slr/studies_pdfs/003-bogart.pdf-50 to allow server restart 2025-06-23 22:09:04,308 - __main__ - WARNING - Client error on attempt 0 for /home/nws8519/git/adaptation-slr/studies_pdfs/003-bogart.pdf-49: [Errno 104] Connection reset by peer 2025-06-23 22:09:04,308 - __main__ - INFO - Sleeping for 10 seconds on /home/nws8519/git/adaptation-slr/studies_pdfs/003-bogart.pdf-49 to allow server restart 2025-06-23 22:09:06,960 - __main__ - INFO - Built page query for /home/nws8519/git/adaptation-slr/studies_pdfs/031_businger.pdf-36 2025-06-23 22:09:06,963 - __main__ - INFO - Built page query for /home/nws8519/git/adaptation-slr/studies_pdfs/031_businger.pdf-47 2025-06-23 22:09:06,973 - __main__ - INFO - Built page query for /home/nws8519/git/adaptation-slr/studies_pdfs/031_businger.pdf-39 2025-06-23 22:09:12,138 - vllm - INFO - INFO 06-23 22:09:12 [loggers.py:118] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 1360.2 tokens/s, Running: 62 reqs, Waiting: 489 reqs, GPU KV cache usage: 98.2%, Prefix cache hit rate: 42.5% 2025-06-23 22:09:12,139 - __main__ - INFO - vllm running req: 62 queue req: 489 2025-06-23 22:09:14,137 - __main__ - INFO - Queue remaining: 0 2025-06-23 22:09:14,137 - __main__ - INFO - Metric Name Lifetime (tokens/sec) Recently (tokens/sec) ---------------------------------------------------------------------------------- completed_pages 0.03 0.05 server_input_tokens 72.62 103.42 server_output_tokens 19.92 28.37 2025-06-23 22:09:14,137 - __main__ - INFO - Worker ID | finished | started ----------+----------+-------- 0 | 14 | 533 1 | 0 | 276 2025-06-23 22:09:14,721 - __main__ - INFO - Built page query for /home/nws8519/git/adaptation-slr/studies_pdfs/003-bogart.pdf-49 2025-06-23 22:09:14,787 - __main__ - INFO - Built page query for /home/nws8519/git/adaptation-slr/studies_pdfs/003-bogart.pdf-51 2025-06-23 22:09:14,793 - __main__ - INFO - Built page query for /home/nws8519/git/adaptation-slr/studies_pdfs/003-bogart.pdf-50 2025-06-23 22:09:14,890 - __main__ - INFO - Built page query for /home/nws8519/git/adaptation-slr/studies_pdfs/021_he.pdf-9 2025-06-23 22:09:14,945 - __main__ - INFO - Built page query for /home/nws8519/git/adaptation-slr/studies_pdfs/021_he.pdf-4 2025-06-23 22:09:22,266 - vllm - INFO - INFO 06-23 22:09:22 [loggers.py:118] Engine 000: Avg prompt throughput: 1037.0 tokens/s, Avg generation throughput: 1265.2 tokens/s, Running: 60 reqs, Waiting: 610 reqs, GPU KV cache usage: 99.3%, Prefix cache hit rate: 41.6% 2025-06-23 22:09:22,266 - __main__ - INFO - vllm running req: 60 queue req: 610 2025-06-23 22:09:24,139 - __main__ - INFO - Queue remaining: 0 2025-06-23 22:09:24,139 - __main__ - INFO - Metric Name Lifetime (tokens/sec) Recently (tokens/sec) ---------------------------------------------------------------------------------- completed_pages 0.06 0.08 server_input_tokens 135.92 198.10 server_output_tokens 38.88 56.67 2025-06-23 22:09:24,139 - __main__ - INFO - Worker ID | finished | started ----------+----------+-------- 0 | 25 | 533 1 | 0 | 276 2025-06-23 22:09:32,267 - vllm - INFO - INFO 06-23 22:09:32 [loggers.py:118] Engine 000: Avg prompt throughput: 4705.7 tokens/s, Avg generation throughput: 737.0 tokens/s, Running: 64 reqs, Waiting: 709 reqs, GPU KV cache usage: 99.4%, Prefix cache hit rate: 38.7% 2025-06-23 22:09:32,267 - __main__ - INFO - vllm running req: 64 queue req: 709 2025-06-23 22:09:34,141 - __main__ - INFO - Queue remaining: 0 2025-06-23 22:09:34,142 - __main__ - INFO - Metric Name Lifetime (tokens/sec) Recently (tokens/sec) ---------------------------------------------------------------------------------- completed_pages 0.08 0.13 server_input_tokens 215.97 321.97 server_output_tokens 62.79 93.60 2025-06-23 22:09:34,142 - __main__ - INFO - Worker ID | finished | started ----------+----------+-------- 0 | 38 | 533 1 | 0 | 276 2025-06-23 22:09:42,268 - vllm - INFO - INFO 06-23 22:09:42 [loggers.py:118] Engine 000: Avg prompt throughput: 3143.5 tokens/s, Avg generation throughput: 1032.8 tokens/s, Running: 68 reqs, Waiting: 694 reqs, GPU KV cache usage: 99.2%, Prefix cache hit rate: 41.5% 2025-06-23 22:09:42,269 - __main__ - INFO - vllm running req: 68 queue req: 694 2025-06-23 22:09:44,143 - __main__ - INFO - Queue remaining: 0 2025-06-23 22:09:44,144 - __main__ - INFO - Metric Name Lifetime (tokens/sec) Recently (tokens/sec) ---------------------------------------------------------------------------------- completed_pages 0.11 0.16 server_input_tokens 277.54 423.01 server_output_tokens 81.61 124.38 2025-06-23 22:09:44,144 - __main__ - INFO - Worker ID | finished | started ----------+----------+-------- 0 | 49 | 533 1 | 0 | 276 2025-06-23 22:09:52,269 - vllm - INFO - INFO 06-23 22:09:52 [loggers.py:118] Engine 000: Avg prompt throughput: 3296.0 tokens/s, Avg generation throughput: 1125.6 tokens/s, Running: 65 reqs, Waiting: 683 reqs, GPU KV cache usage: 97.6%, Prefix cache hit rate: 38.5% 2025-06-23 22:09:52,269 - __main__ - INFO - vllm running req: 65 queue req: 683 2025-06-23 22:09:54,145 - __main__ - INFO - Queue remaining: 0 2025-06-23 22:09:54,146 - __main__ - INFO - Metric Name Lifetime (tokens/sec) Recently (tokens/sec) ---------------------------------------------------------------------------------- completed_pages 0.13 0.21 server_input_tokens 353.36 550.36 server_output_tokens 106.73 166.23 2025-06-23 22:09:54,146 - __main__ - INFO - Worker ID | finished | started ----------+----------+-------- 0 | 63 | 533 1 | 0 | 276 2025-06-23 22:10:02,270 - vllm - INFO - INFO 06-23 22:10:02 [loggers.py:118] Engine 000: Avg prompt throughput: 2454.5 tokens/s, Avg generation throughput: 1322.9 tokens/s, Running: 65 reqs, Waiting: 674 reqs, GPU KV cache usage: 99.3%, Prefix cache hit rate: 33.5% 2025-06-23 22:10:02,271 - __main__ - INFO - vllm running req: 65 queue req: 674 2025-06-23 22:10:04,147 - __main__ - INFO - Queue remaining: 0 2025-06-23 22:10:04,147 - __main__ - INFO - Metric Name Lifetime (tokens/sec) Recently (tokens/sec) ---------------------------------------------------------------------------------- completed_pages 0.15 0.24 server_input_tokens 389.51 619.66 server_output_tokens 118.93 189.19 2025-06-23 22:10:04,147 - __main__ - INFO - Worker ID | finished | started ----------+----------+-------- 0 | 71 | 533 1 | 0 | 276 2025-06-23 22:10:12,272 - vllm - INFO - INFO 06-23 22:10:12 [loggers.py:118] Engine 000: Avg prompt throughput: 538.7 tokens/s, Avg generation throughput: 1524.0 tokens/s, Running: 61 reqs, Waiting: 671 reqs, GPU KV cache usage: 99.0%, Prefix cache hit rate: 34.1% 2025-06-23 22:10:12,273 - __main__ - INFO - vllm running req: 61 queue req: 671 2025-06-23 22:10:14,148 - __main__ - INFO - Queue remaining: 0 2025-06-23 22:10:14,148 - __main__ - INFO - Metric Name Lifetime (tokens/sec) Recently (tokens/sec) ---------------------------------------------------------------------------------- completed_pages 0.16 0.27 server_input_tokens 428.79 696.43 server_output_tokens 135.56 220.17 2025-06-23 22:10:14,148 - __main__ - INFO - Worker ID | finished | started ----------+----------+-------- 0 | 80 | 533 1 | 0 | 276 2025-06-23 22:10:22,272 - vllm - INFO - INFO 06-23 22:10:22 [loggers.py:118] Engine 000: Avg prompt throughput: 4194.4 tokens/s, Avg generation throughput: 802.4 tokens/s, Running: 63 reqs, Waiting: 654 reqs, GPU KV cache usage: 99.4%, Prefix cache hit rate: 32.3% 2025-06-23 22:10:22,273 - __main__ - INFO - vllm running req: 63 queue req: 654 2025-06-23 22:10:24,149 - __main__ - INFO - Queue remaining: 0 2025-06-23 22:10:24,149 - __main__ - INFO - Metric Name Lifetime (tokens/sec) Recently (tokens/sec) ---------------------------------------------------------------------------------- completed_pages 0.19 0.31 server_input_tokens 490.36 812.79 server_output_tokens 160.71 266.37 2025-06-23 22:10:24,150 - __main__ - INFO - Worker ID | finished | started ----------+----------+-------- 0 | 94 | 533 1 | 0 | 276 2025-06-23 22:10:32,274 - vllm - INFO - INFO 06-23 22:10:32 [loggers.py:118] Engine 000: Avg prompt throughput: 3463.1 tokens/s, Avg generation throughput: 1030.5 tokens/s, Running: 62 reqs, Waiting: 643 reqs, GPU KV cache usage: 99.8%, Prefix cache hit rate: 36.2% 2025-06-23 22:10:32,274 - __main__ - INFO - vllm running req: 62 queue req: 643 2025-06-23 22:10:34,150 - __main__ - INFO - Queue remaining: 0 2025-06-23 22:10:34,151 - __main__ - INFO - Metric Name Lifetime (tokens/sec) Recently (tokens/sec) ---------------------------------------------------------------------------------- completed_pages 0.21 0.36 server_input_tokens 543.42 918.85 server_output_tokens 179.77 303.97 2025-06-23 22:10:34,151 - __main__ - INFO - Worker ID | finished | started ----------+----------+-------- 0 | 105 | 533 1 | 2 | 276 2025-06-23 22:10:42,274 - vllm - INFO - INFO 06-23 22:10:42 [loggers.py:118] Engine 000: Avg prompt throughput: 3233.2 tokens/s, Avg generation throughput: 995.9 tokens/s, Running: 62 reqs, Waiting: 630 reqs, GPU KV cache usage: 99.6%, Prefix cache hit rate: 34.9% 2025-06-23 22:10:42,274 - __main__ - INFO - vllm running req: 62 queue req: 630 2025-06-23 22:10:44,154 - __main__ - INFO - Queue remaining: 0 2025-06-23 22:10:44,154 - __main__ - INFO - Metric Name Lifetime (tokens/sec) Recently (tokens/sec) ---------------------------------------------------------------------------------- completed_pages 0.23 0.40 server_input_tokens 596.17 1027.93 server_output_tokens 200.04 344.92 2025-06-23 22:10:44,154 - __main__ - INFO - Worker ID | finished | started ----------+----------+-------- 0 | 117 | 533 1 | 3 | 276 2025-06-23 22:10:52,275 - vllm - INFO - INFO 06-23 22:10:52 [loggers.py:118] Engine 000: Avg prompt throughput: 2268.2 tokens/s, Avg generation throughput: 1248.2 tokens/s, Running: 61 reqs, Waiting: 623 reqs, GPU KV cache usage: 99.7%, Prefix cache hit rate: 35.7% 2025-06-23 22:10:52,275 - __main__ - INFO - vllm running req: 61 queue req: 623 2025-06-23 22:10:54,155 - __main__ - INFO - Queue remaining: 0 2025-06-23 22:10:54,155 - __main__ - INFO - Metric Name Lifetime (tokens/sec) Recently (tokens/sec) ---------------------------------------------------------------------------------- completed_pages 0.24 0.42 server_input_tokens 632.42 1111.50 server_output_tokens 217.21 381.76 2025-06-23 22:10:54,156 - __main__ - INFO - Worker ID | finished | started ----------+----------+-------- 0 | 122 | 533 1 | 5 | 276 2025-06-23 22:11:02,276 - vllm - INFO - INFO 06-23 22:11:02 [loggers.py:118] Engine 000: Avg prompt throughput: 3053.6 tokens/s, Avg generation throughput: 1048.6 tokens/s, Running: 63 reqs, Waiting: 612 reqs, GPU KV cache usage: 99.1%, Prefix cache hit rate: 37.1% 2025-06-23 22:11:02,276 - __main__ - INFO - vllm running req: 63 queue req: 612 2025-06-23 22:11:04,156 - __main__ - INFO - Queue remaining: 0 2025-06-23 22:11:04,157 - __main__ - INFO - Metric Name Lifetime (tokens/sec) Recently (tokens/sec) ---------------------------------------------------------------------------------- completed_pages 0.25 0.45 server_input_tokens 663.06 1187.46 server_output_tokens 229.91 411.74 2025-06-23 22:11:04,157 - __main__ - INFO - Worker ID | finished | started ----------+----------+-------- 0 | 126 | 533 1 | 9 | 276 2025-06-23 22:11:12,277 - vllm - INFO - INFO 06-23 22:11:12 [loggers.py:118] Engine 000: Avg prompt throughput: 2501.2 tokens/s, Avg generation throughput: 1177.7 tokens/s, Running: 63 reqs, Waiting: 602 reqs, GPU KV cache usage: 98.2%, Prefix cache hit rate: 34.2% 2025-06-23 22:11:12,277 - __main__ - INFO - vllm running req: 63 queue req: 602 2025-06-23 22:11:14,158 - __main__ - INFO - Queue remaining: 0 2025-06-23 22:11:14,159 - __main__ - INFO - Metric Name Lifetime (tokens/sec) Recently (tokens/sec) ---------------------------------------------------------------------------------- completed_pages 0.26 0.48 server_input_tokens 703.41 1283.18 server_output_tokens 244.87 446.69 2025-06-23 22:11:14,159 - __main__ - INFO - Worker ID | finished | started ----------+----------+-------- 0 | 132 | 533 1 | 13 | 276 2025-06-23 22:11:22,278 - vllm - INFO - INFO 06-23 22:11:22 [loggers.py:118] Engine 000: Avg prompt throughput: 3782.5 tokens/s, Avg generation throughput: 956.0 tokens/s, Running: 62 reqs, Waiting: 590 reqs, GPU KV cache usage: 98.3%, Prefix cache hit rate: 34.5% 2025-06-23 22:11:22,279 - __main__ - INFO - vllm running req: 62 queue req: 590 2025-06-23 22:11:24,160 - __main__ - INFO - Queue remaining: 0 2025-06-23 22:11:24,160 - __main__ - INFO - Metric Name Lifetime (tokens/sec) Recently (tokens/sec) ---------------------------------------------------------------------------------- completed_pages 0.28 0.53 server_input_tokens 754.96 1402.39 server_output_tokens 261.06 484.94 2025-06-23 22:11:24,160 - __main__ - INFO - Worker ID | finished | started ----------+----------+-------- 0 | 140 | 533 1 | 18 | 276 2025-06-23 22:11:32,281 - vllm - INFO - INFO 06-23 22:11:32 [loggers.py:118] Engine 000: Avg prompt throughput: 3195.2 tokens/s, Avg generation throughput: 1069.6 tokens/s, Running: 64 reqs, Waiting: 577 reqs, GPU KV cache usage: 98.9%, Prefix cache hit rate: 33.8% 2025-06-23 22:11:32,281 - __main__ - INFO - vllm running req: 64 queue req: 577 2025-06-23 22:11:34,161 - __main__ - INFO - Queue remaining: 0 2025-06-23 22:11:34,161 - __main__ - INFO - Metric Name Lifetime (tokens/sec) Recently (tokens/sec) ---------------------------------------------------------------------------------- completed_pages 0.30 0.57 server_input_tokens 799.80 1512.33 server_output_tokens 279.24 528.01 2025-06-23 22:11:34,161 - __main__ - INFO - Worker ID | finished | started ----------+----------+-------- 0 | 147 | 533 1 | 23 | 276 2025-06-23 22:11:42,281 - vllm - INFO - INFO 06-23 22:11:42 [loggers.py:118] Engine 000: Avg prompt throughput: 3252.9 tokens/s, Avg generation throughput: 1100.7 tokens/s, Running: 64 reqs, Waiting: 565 reqs, GPU KV cache usage: 99.1%, Prefix cache hit rate: 34.7% 2025-06-23 22:11:42,281 - __main__ - INFO - vllm running req: 64 queue req: 565 2025-06-23 22:11:44,162 - __main__ - INFO - Queue remaining: 0 2025-06-23 22:11:44,163 - __main__ - INFO - Metric Name Lifetime (tokens/sec) Recently (tokens/sec) ---------------------------------------------------------------------------------- completed_pages 0.31 0.60 server_input_tokens 833.40 1603.66 server_output_tokens 293.39 564.55 2025-06-23 22:11:44,163 - __main__ - INFO - Worker ID | finished | started ----------+----------+-------- 0 | 152 | 533 1 | 28 | 276 2025-06-23 22:11:52,281 - vllm - INFO - INFO 06-23 22:11:52 [loggers.py:118] Engine 000: Avg prompt throughput: 3266.0 tokens/s, Avg generation throughput: 1115.2 tokens/s, Running: 63 reqs, Waiting: 556 reqs, GPU KV cache usage: 99.0%, Prefix cache hit rate: 35.6% 2025-06-23 22:11:52,281 - __main__ - INFO - vllm running req: 63 queue req: 556 2025-06-23 22:11:54,163 - __main__ - INFO - Queue remaining: 0 2025-06-23 22:11:54,164 - __main__ - INFO - Metric Name Lifetime (tokens/sec) Recently (tokens/sec) ---------------------------------------------------------------------------------- completed_pages 0.33 0.64 server_input_tokens 882.61 1727.78 server_output_tokens 313.58 613.85 2025-06-23 22:11:54,164 - __main__ - INFO - Worker ID | finished | started ----------+----------+-------- 0 | 158 | 533 1 | 34 | 276 2025-06-23 22:12:02,281 - vllm - INFO - INFO 06-23 22:12:02 [loggers.py:118] Engine 000: Avg prompt throughput: 2427.6 tokens/s, Avg generation throughput: 1288.0 tokens/s, Running: 61 reqs, Waiting: 550 reqs, GPU KV cache usage: 97.7%, Prefix cache hit rate: 36.1% 2025-06-23 22:12:02,281 - __main__ - INFO - vllm running req: 61 queue req: 550 2025-06-23 22:12:04,165 - __main__ - INFO - Queue remaining: 0 2025-06-23 22:12:04,165 - __main__ - INFO - Metric Name Lifetime (tokens/sec) Recently (tokens/sec) ---------------------------------------------------------------------------------- completed_pages 0.34 0.67 server_input_tokens 908.52 1808.77 server_output_tokens 322.85 642.77 2025-06-23 22:12:04,165 - __main__ - INFO - Worker ID | finished | started ----------+----------+-------- 0 | 161 | 533 1 | 40 | 276 2025-06-23 22:12:12,282 - vllm - INFO - INFO 06-23 22:12:12 [loggers.py:118] Engine 000: Avg prompt throughput: 2678.3 tokens/s, Avg generation throughput: 1085.5 tokens/s, Running: 62 reqs, Waiting: 537 reqs, GPU KV cache usage: 99.1%, Prefix cache hit rate: 36.7% 2025-06-23 22:12:12,282 - __main__ - INFO - vllm running req: 62 queue req: 537 2025-06-23 22:12:14,166 - __main__ - INFO - Queue remaining: 0 2025-06-23 22:12:14,167 - __main__ - INFO - Metric Name Lifetime (tokens/sec) Recently (tokens/sec) ---------------------------------------------------------------------------------- completed_pages 0.35 0.71 server_input_tokens 944.56 1912.03 server_output_tokens 333.03 674.14 2025-06-23 22:12:14,167 - __main__ - INFO - Worker ID | finished | started ----------+----------+-------- 0 | 162 | 533 1 | 50 | 276 2025-06-23 22:12:22,284 - vllm - INFO - INFO 06-23 22:12:22 [loggers.py:118] Engine 000: Avg prompt throughput: 3142.0 tokens/s, Avg generation throughput: 1143.2 tokens/s, Running: 63 reqs, Waiting: 526 reqs, GPU KV cache usage: 99.2%, Prefix cache hit rate: 37.8% 2025-06-23 22:12:22,284 - __main__ - INFO - vllm running req: 63 queue req: 526 2025-06-23 22:12:24,167 - __main__ - INFO - Queue remaining: 0 2025-06-23 22:12:24,168 - __main__ - INFO - Metric Name Lifetime (tokens/sec) Recently (tokens/sec) ---------------------------------------------------------------------------------- completed_pages 0.36 0.74 server_input_tokens 978.24 2012.82 server_output_tokens 345.07 710.01 2025-06-23 22:12:24,168 - __main__ - INFO - Worker ID | finished | started ----------+----------+-------- 0 | 162 | 533 1 | 61 | 276 2025-06-23 22:12:32,284 - vllm - INFO - INFO 06-23 22:12:32 [loggers.py:118] Engine 000: Avg prompt throughput: 4035.8 tokens/s, Avg generation throughput: 822.3 tokens/s, Running: 66 reqs, Waiting: 510 reqs, GPU KV cache usage: 99.0%, Prefix cache hit rate: 37.5% 2025-06-23 22:12:32,284 - __main__ - INFO - vllm running req: 66 queue req: 510 2025-06-23 22:12:34,169 - __main__ - INFO - Queue remaining: 0 2025-06-23 22:12:34,169 - __main__ - INFO - Metric Name Lifetime (tokens/sec) Recently (tokens/sec) ---------------------------------------------------------------------------------- completed_pages 0.37 0.78 server_input_tokens 1021.46 2135.79 server_output_tokens 365.30 763.81 2025-06-23 22:12:34,169 - __main__ - INFO - Worker ID | finished | started ----------+----------+-------- 0 | 167 | 533 1 | 68 | 276 2025-06-23 22:12:42,286 - vllm - INFO - INFO 06-23 22:12:42 [loggers.py:118] Engine 000: Avg prompt throughput: 3874.3 tokens/s, Avg generation throughput: 1082.5 tokens/s, Running: 66 reqs, Waiting: 501 reqs, GPU KV cache usage: 99.5%, Prefix cache hit rate: 37.7% 2025-06-23 22:12:42,286 - __main__ - INFO - vllm running req: 66 queue req: 501 2025-06-23 22:12:44,170 - __main__ - INFO - Queue remaining: 0 2025-06-23 22:12:44,170 - __main__ - INFO - Metric Name Lifetime (tokens/sec) Recently (tokens/sec) ---------------------------------------------------------------------------------- completed_pages 0.38 0.82 server_input_tokens 1057.70 2246.84 server_output_tokens 383.96 815.63 2025-06-23 22:12:44,170 - __main__ - INFO - Worker ID | finished | started ----------+----------+-------- 0 | 171 | 533 1 | 74 | 276 2025-06-23 22:12:52,287 - vllm - INFO - INFO 06-23 22:12:52 [loggers.py:118] Engine 000: Avg prompt throughput: 3513.4 tokens/s, Avg generation throughput: 1063.8 tokens/s, Running: 64 reqs, Waiting: 490 reqs, GPU KV cache usage: 99.4%, Prefix cache hit rate: 39.5% 2025-06-23 22:12:52,287 - __main__ - INFO - vllm running req: 64 queue req: 490 2025-06-23 22:12:54,172 - __main__ - INFO - Queue remaining: 0 2025-06-23 22:12:54,172 - __main__ - INFO - Metric Name Lifetime (tokens/sec) Recently (tokens/sec) ---------------------------------------------------------------------------------- completed_pages 0.40 0.86 server_input_tokens 1088.49 2348.52 server_output_tokens 392.48 846.82 2025-06-23 22:12:54,172 - __main__ - INFO - Worker ID | finished | started ----------+----------+-------- 0 | 175 | 533 1 | 83 | 276 2025-06-23 22:13:02,288 - vllm - INFO - INFO 06-23 22:13:02 [loggers.py:118] Engine 000: Avg prompt throughput: 2743.1 tokens/s, Avg generation throughput: 1161.5 tokens/s, Running: 61 reqs, Waiting: 481 reqs, GPU KV cache usage: 97.6%, Prefix cache hit rate: 38.7% 2025-06-23 22:13:02,288 - __main__ - INFO - vllm running req: 61 queue req: 481 2025-06-23 22:13:04,173 - __main__ - INFO - Queue remaining: 0 2025-06-23 22:13:04,174 - __main__ - INFO - Metric Name Lifetime (tokens/sec) Recently (tokens/sec) ---------------------------------------------------------------------------------- completed_pages 0.41 0.90 server_input_tokens 1123.77 2462.11 server_output_tokens 402.62 882.11 2025-06-23 22:13:04,174 - __main__ - INFO - Worker ID | finished | started ----------+----------+-------- 0 | 177 | 533 1 | 92 | 276 2025-06-23 22:13:12,289 - vllm - INFO - INFO 06-23 22:13:12 [loggers.py:118] Engine 000: Avg prompt throughput: 3174.5 tokens/s, Avg generation throughput: 1182.3 tokens/s, Running: 58 reqs, Waiting: 472 reqs, GPU KV cache usage: 99.4%, Prefix cache hit rate: 39.0% 2025-06-23 22:13:12,289 - __main__ - INFO - vllm running req: 58 queue req: 472 2025-06-23 22:13:14,175 - __main__ - INFO - Queue remaining: 0 2025-06-23 22:13:14,175 - __main__ - INFO - Metric Name Lifetime (tokens/sec) Recently (tokens/sec) ---------------------------------------------------------------------------------- completed_pages 0.42 0.94 server_input_tokens 1152.91 2564.40 server_output_tokens 411.53 915.35 2025-06-23 22:13:14,175 - __main__ - INFO - Worker ID | finished | started ----------+----------+-------- 0 | 185 | 533 1 | 97 | 276 2025-06-23 22:13:22,291 - vllm - INFO - INFO 06-23 22:13:22 [loggers.py:118] Engine 000: Avg prompt throughput: 3734.4 tokens/s, Avg generation throughput: 917.6 tokens/s, Running: 61 reqs, Waiting: 456 reqs, GPU KV cache usage: 99.4%, Prefix cache hit rate: 38.3% 2025-06-23 22:13:22,292 - __main__ - INFO - vllm running req: 61 queue req: 456 2025-06-23 22:13:24,176 - __main__ - INFO - Queue remaining: 0 2025-06-23 22:13:24,176 - __main__ - INFO - Metric Name Lifetime (tokens/sec) Recently (tokens/sec) ---------------------------------------------------------------------------------- completed_pages 0.43 0.98 server_input_tokens 1189.02 2684.34 server_output_tokens 424.32 957.96 2025-06-23 22:13:24,177 - __main__ - INFO - Worker ID | finished | started ----------+----------+-------- 0 | 192 | 533 1 | 102 | 276 2025-06-23 22:13:32,292 - vllm - INFO - INFO 06-23 22:13:32 [loggers.py:118] Engine 000: Avg prompt throughput: 2386.4 tokens/s, Avg generation throughput: 1273.2 tokens/s, Running: 59 reqs, Waiting: 449 reqs, GPU KV cache usage: 98.5%, Prefix cache hit rate: 39.6% 2025-06-23 22:13:32,293 - __main__ - INFO - vllm running req: 59 queue req: 449 2025-06-23 22:13:34,177 - __main__ - INFO - Queue remaining: 0 2025-06-23 22:13:34,177 - __main__ - INFO - Metric Name Lifetime (tokens/sec) Recently (tokens/sec) ---------------------------------------------------------------------------------- completed_pages 0.44 1.01 server_input_tokens 1207.97 2767.41 server_output_tokens 430.83 987.00 2025-06-23 22:13:34,178 - __main__ - INFO - Worker ID | finished | started ----------+----------+-------- 0 | 199 | 533 1 | 104 | 276 2025-06-23 22:13:42,293 - vllm - INFO - INFO 06-23 22:13:42 [loggers.py:118] Engine 000: Avg prompt throughput: 3770.0 tokens/s, Avg generation throughput: 903.0 tokens/s, Running: 62 reqs, Waiting: 436 reqs, GPU KV cache usage: 99.2%, Prefix cache hit rate: 38.9% 2025-06-23 22:13:42,293 - __main__ - INFO - vllm running req: 62 queue req: 436 2025-06-23 22:13:44,179 - __main__ - INFO - Queue remaining: 0 2025-06-23 22:13:44,179 - __main__ - INFO - Metric Name Lifetime (tokens/sec) Recently (tokens/sec) ---------------------------------------------------------------------------------- completed_pages 0.45 1.04 server_input_tokens 1232.57 2864.85 server_output_tokens 438.51 1019.23 2025-06-23 22:13:44,180 - __main__ - INFO - Worker ID | finished | started ----------+----------+-------- 0 | 206 | 533 1 | 106 | 276 2025-06-23 22:13:52,293 - vllm - INFO - INFO 06-23 22:13:52 [loggers.py:118] Engine 000: Avg prompt throughput: 2824.9 tokens/s, Avg generation throughput: 1106.9 tokens/s, Running: 59 reqs, Waiting: 427 reqs, GPU KV cache usage: 97.3%, Prefix cache hit rate: 38.3% 2025-06-23 22:13:52,293 - __main__ - INFO - vllm running req: 59 queue req: 427 2025-06-23 22:13:54,181 - __main__ - INFO - Queue remaining: 0 2025-06-23 22:13:54,181 - __main__ - INFO - Metric Name Lifetime (tokens/sec) Recently (tokens/sec) ---------------------------------------------------------------------------------- completed_pages 0.46 1.08 server_input_tokens 1269.62 2993.30 server_output_tokens 449.30 1059.29 2025-06-23 22:13:54,181 - __main__ - INFO - Worker ID | finished | started ----------+----------+-------- 0 | 218 | 533 1 | 107 | 276 2025-06-23 22:14:02,293 - vllm - INFO - INFO 06-23 22:14:02 [loggers.py:118] Engine 000: Avg prompt throughput: 3528.5 tokens/s, Avg generation throughput: 918.0 tokens/s, Running: 65 reqs, Waiting: 412 reqs, GPU KV cache usage: 98.4%, Prefix cache hit rate: 39.7% 2025-06-23 22:14:02,294 - __main__ - INFO - vllm running req: 65 queue req: 412 2025-06-23 22:14:04,182 - __main__ - INFO - Queue remaining: 0 2025-06-23 22:14:04,182 - __main__ - INFO - Metric Name Lifetime (tokens/sec) Recently (tokens/sec) ---------------------------------------------------------------------------------- completed_pages 0.47 1.11 server_input_tokens 1296.63 3094.40 server_output_tokens 457.29 1092.08 2025-06-23 22:14:04,182 - __main__ - INFO - Worker ID | finished | started ----------+----------+-------- 0 | 227 | 533 1 | 107 | 276 2025-06-23 22:14:12,294 - vllm - INFO - INFO 06-23 22:14:12 [loggers.py:118] Engine 000: Avg prompt throughput: 2983.9 tokens/s, Avg generation throughput: 1132.6 tokens/s, Running: 62 reqs, Waiting: 402 reqs, GPU KV cache usage: 98.3%, Prefix cache hit rate: 37.6% 2025-06-23 22:14:12,294 - __main__ - INFO - vllm running req: 62 queue req: 402 2025-06-23 22:14:14,183 - __main__ - INFO - Queue remaining: 0 2025-06-23 22:14:14,184 - __main__ - INFO - Metric Name Lifetime (tokens/sec) Recently (tokens/sec) ---------------------------------------------------------------------------------- completed_pages 0.48 1.11 server_input_tokens 1322.77 3103.36 server_output_tokens 466.04 1101.46 2025-06-23 22:14:14,184 - __main__ - INFO - Worker ID | finished | started ----------+----------+-------- 0 | 240 | 533 1 | 107 | 276 2025-06-23 22:14:22,296 - vllm - INFO - INFO 06-23 22:14:22 [loggers.py:118] Engine 000: Avg prompt throughput: 2252.7 tokens/s, Avg generation throughput: 1373.4 tokens/s, Running: 60 reqs, Waiting: 398 reqs, GPU KV cache usage: 99.7%, Prefix cache hit rate: 37.6% 2025-06-23 22:14:22,297 - __main__ - INFO - vllm running req: 60 queue req: 398 2025-06-23 22:14:24,185 - __main__ - INFO - Queue remaining: 0 2025-06-23 22:14:24,185 - __main__ - INFO - Metric Name Lifetime (tokens/sec) Recently (tokens/sec) ---------------------------------------------------------------------------------- completed_pages 0.48 1.10 server_input_tokens 1332.38 3084.51 server_output_tokens 468.30 1097.13 2025-06-23 22:14:24,185 - __main__ - INFO - Worker ID | finished | started ----------+----------+-------- 0 | 246 | 533 1 | 108 | 276 2025-06-23 22:14:32,297 - vllm - INFO - INFO 06-23 22:14:32 [loggers.py:118] Engine 000: Avg prompt throughput: 2789.6 tokens/s, Avg generation throughput: 977.0 tokens/s, Running: 61 reqs, Waiting: 385 reqs, GPU KV cache usage: 97.6%, Prefix cache hit rate: 33.9% 2025-06-23 22:14:32,297 - __main__ - INFO - vllm running req: 61 queue req: 385 2025-06-23 22:14:34,186 - __main__ - INFO - Queue remaining: 0 2025-06-23 22:14:34,186 - __main__ - INFO - Metric Name Lifetime (tokens/sec) Recently (tokens/sec) ---------------------------------------------------------------------------------- completed_pages 0.49 1.09 server_input_tokens 1364.44 3076.82 server_output_tokens 479.98 1102.01 2025-06-23 22:14:34,186 - __main__ - INFO - Worker ID | finished | started ----------+----------+-------- 0 | 252 | 533 1 | 113 | 276 2025-06-23 22:14:42,297 - vllm - INFO - INFO 06-23 22:14:42 [loggers.py:118] Engine 000: Avg prompt throughput: 4087.2 tokens/s, Avg generation throughput: 933.2 tokens/s, Running: 64 reqs, Waiting: 371 reqs, GPU KV cache usage: 99.3%, Prefix cache hit rate: 34.7% 2025-06-23 22:14:42,298 - __main__ - INFO - vllm running req: 64 queue req: 371 2025-06-23 22:14:44,187 - __main__ - INFO - Queue remaining: 0 2025-06-23 22:14:44,188 - __main__ - INFO - Metric Name Lifetime (tokens/sec) Recently (tokens/sec) ---------------------------------------------------------------------------------- completed_pages 0.50 1.09 server_input_tokens 1391.13 3088.64 server_output_tokens 489.22 1110.56 2025-06-23 22:14:44,188 - __main__ - INFO - Worker ID | finished | started ----------+----------+-------- 0 | 259 | 533 1 | 118 | 276 2025-06-23 22:14:52,298 - vllm - INFO - INFO 06-23 22:14:52 [loggers.py:118] Engine 000: Avg prompt throughput: 3368.6 tokens/s, Avg generation throughput: 1008.5 tokens/s, Running: 65 reqs, Waiting: 359 reqs, GPU KV cache usage: 98.7%, Prefix cache hit rate: 33.5% 2025-06-23 22:14:52,298 - __main__ - INFO - vllm running req: 65 queue req: 359 2025-06-23 22:14:52,928 - __main__ - WARNING - JSON decode error on attempt 0 for /home/nws8519/git/adaptation-slr/studies_pdfs/014-norskov.pdf-56: Unterminated string starting at: line 1 column 125 (char 124) 2025-06-23 22:14:52,929 - __main__ - INFO - Reducing anchor text len to 3000 for /home/nws8519/git/adaptation-slr/studies_pdfs/014-norskov.pdf-56 2025-06-23 22:14:53,075 - __main__ - INFO - Built page query for /home/nws8519/git/adaptation-slr/studies_pdfs/014-norskov.pdf-56 2025-06-23 22:14:54,190 - __main__ - INFO - Queue remaining: 0 2025-06-23 22:14:54,190 - __main__ - INFO - Metric Name Lifetime (tokens/sec) Recently (tokens/sec) ---------------------------------------------------------------------------------- completed_pages 0.50 1.08 server_input_tokens 1412.24 3073.89 server_output_tokens 496.47 1107.94 2025-06-23 22:14:54,190 - __main__ - INFO - Worker ID | finished | started ----------+----------+-------- 0 | 262 | 533 1 | 123 | 276 2025-06-23 22:15:02,300 - vllm - INFO - INFO 06-23 22:15:02 [loggers.py:118] Engine 000: Avg prompt throughput: 1827.1 tokens/s, Avg generation throughput: 1439.4 tokens/s, Running: 62 reqs, Waiting: 354 reqs, GPU KV cache usage: 99.5%, Prefix cache hit rate: 35.4% 2025-06-23 22:15:02,300 - __main__ - INFO - vllm running req: 62 queue req: 354 2025-06-23 22:15:04,191 - __main__ - INFO - Queue remaining: 0 2025-06-23 22:15:04,191 - __main__ - INFO - Metric Name Lifetime (tokens/sec) Recently (tokens/sec) ---------------------------------------------------------------------------------- completed_pages 0.51 1.08 server_input_tokens 1425.85 3074.71 server_output_tokens 500.91 1108.67 2025-06-23 22:15:04,192 - __main__ - INFO - Worker ID | finished | started ----------+----------+-------- 0 | 269 | 533 1 | 126 | 276 2025-06-23 22:15:12,301 - vllm - INFO - INFO 06-23 22:15:12 [loggers.py:118] Engine 000: Avg prompt throughput: 3419.9 tokens/s, Avg generation throughput: 1065.1 tokens/s, Running: 63 reqs, Waiting: 343 reqs, GPU KV cache usage: 99.2%, Prefix cache hit rate: 37.2% 2025-06-23 22:15:12,302 - __main__ - INFO - vllm running req: 63 queue req: 343 2025-06-23 22:15:14,193 - __main__ - INFO - Queue remaining: 0 2025-06-23 22:15:14,193 - __main__ - INFO - Metric Name Lifetime (tokens/sec) Recently (tokens/sec) ---------------------------------------------------------------------------------- completed_pages 0.52 1.09 server_input_tokens 1450.64 3118.13 server_output_tokens 508.76 1117.77 2025-06-23 22:15:14,193 - __main__ - INFO - Worker ID | finished | started ----------+----------+-------- 0 | 275 | 533 1 | 131 | 276 2025-06-23 22:15:22,303 - vllm - INFO - INFO 06-23 22:15:22 [loggers.py:118] Engine 000: Avg prompt throughput: 2790.3 tokens/s, Avg generation throughput: 1168.1 tokens/s, Running: 62 reqs, Waiting: 331 reqs, GPU KV cache usage: 99.2%, Prefix cache hit rate: 38.2% 2025-06-23 22:15:22,303 - __main__ - INFO - vllm running req: 62 queue req: 331 2025-06-23 22:15:22,740 - __main__ - INFO - Reducing anchor text len to 3000 for /home/nws8519/git/adaptation-slr/studies_pdfs/004-butler.pdf-14 2025-06-23 22:15:22,740 - __main__ - WARNING - ValueError on attempt 0 for /home/nws8519/git/adaptation-slr/studies_pdfs/004-butler.pdf-14: - Response exceeded model_max_context, cannot use this response 2025-06-23 22:15:23,013 - __main__ - INFO - Built page query for /home/nws8519/git/adaptation-slr/studies_pdfs/004-butler.pdf-14 2025-06-23 22:15:24,194 - __main__ - INFO - Queue remaining: 0 2025-06-23 22:15:24,194 - __main__ - INFO - Metric Name Lifetime (tokens/sec) Recently (tokens/sec) ---------------------------------------------------------------------------------- completed_pages 0.52 1.08 server_input_tokens 1466.97 3085.92 server_output_tokens 514.14 1100.03 2025-06-23 22:15:24,194 - __main__ - INFO - Worker ID | finished | started ----------+----------+-------- 0 | 279 | 533 1 | 138 | 276 2025-06-23 22:15:30,120 - __main__ - WARNING - JSON decode error on attempt 0 for /home/nws8519/git/adaptation-slr/studies_pdfs/021_he.pdf-5: Invalid \escape: line 1 column 1851 (char 1850) 2025-06-23 22:15:30,121 - __main__ - INFO - Reducing anchor text len to 3000 for /home/nws8519/git/adaptation-slr/studies_pdfs/021_he.pdf-5 2025-06-23 22:15:30,428 - __main__ - INFO - Built page query for /home/nws8519/git/adaptation-slr/studies_pdfs/021_he.pdf-5 2025-06-23 22:15:32,305 - vllm - INFO - INFO 06-23 22:15:32 [loggers.py:118] Engine 000: Avg prompt throughput: 3054.3 tokens/s, Avg generation throughput: 1156.1 tokens/s, Running: 64 reqs, Waiting: 323 reqs, GPU KV cache usage: 99.4%, Prefix cache hit rate: 39.2% 2025-06-23 22:15:32,305 - __main__ - INFO - vllm running req: 64 queue req: 323 2025-06-23 22:15:34,195 - __main__ - INFO - Queue remaining: 0 2025-06-23 22:15:34,196 - __main__ - INFO - Metric Name Lifetime (tokens/sec) Recently (tokens/sec) ---------------------------------------------------------------------------------- completed_pages 0.53 1.06 server_input_tokens 1479.70 3070.82 server_output_tokens 518.29 1093.48 2025-06-23 22:15:34,196 - __main__ - INFO - Worker ID | finished | started ----------+----------+-------- 0 | 282 | 533 1 | 142 | 276 2025-06-23 22:15:42,305 - vllm - INFO - INFO 06-23 22:15:42 [loggers.py:118] Engine 000: Avg prompt throughput: 2381.0 tokens/s, Avg generation throughput: 1207.6 tokens/s, Running: 62 reqs, Waiting: 314 reqs, GPU KV cache usage: 97.3%, Prefix cache hit rate: 39.0% 2025-06-23 22:15:42,305 - __main__ - INFO - vllm running req: 62 queue req: 314 2025-06-23 22:15:44,198 - __main__ - INFO - Queue remaining: 0 2025-06-23 22:15:44,198 - __main__ - INFO - Metric Name Lifetime (tokens/sec) Recently (tokens/sec) ---------------------------------------------------------------------------------- completed_pages 0.53 1.05 server_input_tokens 1496.43 3056.93 server_output_tokens 525.01 1088.33 2025-06-23 22:15:44,198 - __main__ - INFO - Worker ID | finished | started ----------+----------+-------- 0 | 289 | 533 1 | 145 | 276 2025-06-23 22:15:52,307 - vllm - INFO - INFO 06-23 22:15:52 [loggers.py:118] Engine 000: Avg prompt throughput: 3543.4 tokens/s, Avg generation throughput: 1015.0 tokens/s, Running: 64 reqs, Waiting: 301 reqs, GPU KV cache usage: 99.7%, Prefix cache hit rate: 39.9% 2025-06-23 22:15:52,307 - __main__ - INFO - vllm running req: 64 queue req: 301 2025-06-23 22:15:54,200 - __main__ - INFO - Queue remaining: 0 2025-06-23 22:15:54,200 - __main__ - INFO - Metric Name Lifetime (tokens/sec) Recently (tokens/sec) ---------------------------------------------------------------------------------- completed_pages 0.54 1.07 server_input_tokens 1520.12 3093.15 server_output_tokens 533.95 1098.93 2025-06-23 22:15:54,201 - __main__ - INFO - Worker ID | finished | started ----------+----------+-------- 0 | 294 | 533 1 | 152 | 276 2025-06-23 22:16:02,307 - vllm - INFO - INFO 06-23 22:16:02 [loggers.py:118] Engine 000: Avg prompt throughput: 2900.1 tokens/s, Avg generation throughput: 1116.9 tokens/s, Running: 64 reqs, Waiting: 289 reqs, GPU KV cache usage: 98.7%, Prefix cache hit rate: 40.1% 2025-06-23 22:16:02,308 - __main__ - INFO - vllm running req: 64 queue req: 289 2025-06-23 22:16:04,203 - __main__ - INFO - Queue remaining: 0 2025-06-23 22:16:04,203 - __main__ - INFO - Metric Name Lifetime (tokens/sec) Recently (tokens/sec) ---------------------------------------------------------------------------------- completed_pages 0.55 1.08 server_input_tokens 1536.45 3107.65 server_output_tokens 539.99 1097.50 2025-06-23 22:16:04,203 - __main__ - INFO - Worker ID | finished | started ----------+----------+-------- 0 | 304 | 533 1 | 154 | 276 2025-06-23 22:16:12,309 - vllm - INFO - INFO 06-23 22:16:12 [loggers.py:118] Engine 000: Avg prompt throughput: 4759.1 tokens/s, Avg generation throughput: 715.2 tokens/s, Running: 60 reqs, Waiting: 276 reqs, GPU KV cache usage: 97.5%, Prefix cache hit rate: 39.3% 2025-06-23 22:16:12,309 - __main__ - INFO - vllm running req: 60 queue req: 276 2025-06-23 22:16:14,205 - __main__ - INFO - Queue remaining: 0 2025-06-23 22:16:14,205 - __main__ - INFO - Metric Name Lifetime (tokens/sec) Recently (tokens/sec) ---------------------------------------------------------------------------------- completed_pages 0.56 1.10 server_input_tokens 1565.93 3147.35 server_output_tokens 550.10 1109.60 2025-06-23 22:16:14,205 - __main__ - INFO - Worker ID | finished | started ----------+----------+-------- 0 | 319 | 533 1 | 155 | 276 2025-06-23 22:16:22,309 - vllm - INFO - INFO 06-23 22:16:22 [loggers.py:118] Engine 000: Avg prompt throughput: 3168.1 tokens/s, Avg generation throughput: 1088.3 tokens/s, Running: 61 reqs, Waiting: 263 reqs, GPU KV cache usage: 97.6%, Prefix cache hit rate: 40.9% 2025-06-23 22:16:22,309 - __main__ - INFO - vllm running req: 61 queue req: 263 2025-06-23 22:16:24,207 - __main__ - INFO - Queue remaining: 0 2025-06-23 22:16:24,208 - __main__ - INFO - Metric Name Lifetime (tokens/sec) Recently (tokens/sec) ---------------------------------------------------------------------------------- completed_pages 0.57 1.10 server_input_tokens 1588.32 3144.31 server_output_tokens 556.64 1108.31 2025-06-23 22:16:24,208 - __main__ - INFO - Worker ID | finished | started ----------+----------+-------- 0 | 331 | 533 1 | 156 | 276 2025-06-23 22:16:26,616 - __main__ - INFO - Reducing anchor text len to 3000 for /home/nws8519/git/adaptation-slr/studies_pdfs/007-gamalielsson.pdf-18 2025-06-23 22:16:26,616 - __main__ - WARNING - ValueError on attempt 0 for /home/nws8519/git/adaptation-slr/studies_pdfs/007-gamalielsson.pdf-18: - Response exceeded model_max_context, cannot use this response 2025-06-23 22:16:26,943 - __main__ - INFO - Built page query for /home/nws8519/git/adaptation-slr/studies_pdfs/007-gamalielsson.pdf-18 2025-06-23 22:16:32,312 - vllm - INFO - INFO 06-23 22:16:32 [loggers.py:118] Engine 000: Avg prompt throughput: 2709.9 tokens/s, Avg generation throughput: 1197.2 tokens/s, Running: 60 reqs, Waiting: 257 reqs, GPU KV cache usage: 98.6%, Prefix cache hit rate: 40.4% 2025-06-23 22:16:32,312 - __main__ - INFO - vllm running req: 60 queue req: 257 2025-06-23 22:16:34,209 - __main__ - INFO - Queue remaining: 0 2025-06-23 22:16:34,210 - __main__ - INFO - Metric Name Lifetime (tokens/sec) Recently (tokens/sec) ---------------------------------------------------------------------------------- completed_pages 0.57 1.08 server_input_tokens 1594.86 3105.04 server_output_tokens 557.18 1084.24 2025-06-23 22:16:34,210 - __main__ - INFO - Worker ID | finished | started ----------+----------+-------- 0 | 337 | 533 1 | 157 | 276 2025-06-23 22:16:42,313 - vllm - INFO - INFO 06-23 22:16:42 [loggers.py:118] Engine 000: Avg prompt throughput: 1616.9 tokens/s, Avg generation throughput: 1380.9 tokens/s, Running: 60 reqs, Waiting: 250 reqs, GPU KV cache usage: 99.8%, Prefix cache hit rate: 40.8% 2025-06-23 22:16:42,313 - __main__ - INFO - vllm running req: 60 queue req: 250 2025-06-23 22:16:44,212 - __main__ - INFO - Queue remaining: 0 2025-06-23 22:16:44,213 - __main__ - INFO - Metric Name Lifetime (tokens/sec) Recently (tokens/sec) ---------------------------------------------------------------------------------- completed_pages 0.57 1.07 server_input_tokens 1591.23 3069.89 server_output_tokens 555.73 1069.02 2025-06-23 22:16:44,213 - __main__ - INFO - Worker ID | finished | started ----------+----------+-------- 0 | 340 | 533 1 | 159 | 276 2025-06-23 22:16:52,313 - vllm - INFO - INFO 06-23 22:16:52 [loggers.py:118] Engine 000: Avg prompt throughput: 1644.1 tokens/s, Avg generation throughput: 1215.9 tokens/s, Running: 58 reqs, Waiting: 243 reqs, GPU KV cache usage: 98.2%, Prefix cache hit rate: 41.2% 2025-06-23 22:16:52,313 - __main__ - INFO - vllm running req: 58 queue req: 243 2025-06-23 22:16:54,214 - __main__ - INFO - Queue remaining: 0 2025-06-23 22:16:54,214 - __main__ - INFO - Metric Name Lifetime (tokens/sec) Recently (tokens/sec) ---------------------------------------------------------------------------------- completed_pages 0.57 1.06 server_input_tokens 1609.73 3033.38 server_output_tokens 564.13 1054.70 2025-06-23 22:16:54,214 - __main__ - INFO - Worker ID | finished | started ----------+----------+-------- 0 | 348 | 533 1 | 162 | 276 2025-06-23 22:17:02,316 - vllm - INFO - INFO 06-23 22:17:02 [loggers.py:118] Engine 000: Avg prompt throughput: 4747.5 tokens/s, Avg generation throughput: 821.8 tokens/s, Running: 62 reqs, Waiting: 228 reqs, GPU KV cache usage: 99.2%, Prefix cache hit rate: 38.0% 2025-06-23 22:17:02,316 - __main__ - INFO - vllm running req: 62 queue req: 228 2025-06-23 22:17:04,216 - __main__ - INFO - Queue remaining: 0 2025-06-23 22:17:04,217 - __main__ - INFO - Metric Name Lifetime (tokens/sec) Recently (tokens/sec) ---------------------------------------------------------------------------------- completed_pages 0.58 1.07 server_input_tokens 1630.51 3077.18 server_output_tokens 576.79 1085.11 2025-06-23 22:17:04,217 - __main__ - INFO - Worker ID | finished | started ----------+----------+-------- 0 | 357 | 533 1 | 164 | 276 2025-06-23 22:17:12,316 - vllm - INFO - INFO 06-23 22:17:12 [loggers.py:118] Engine 000: Avg prompt throughput: 2757.1 tokens/s, Avg generation throughput: 1122.0 tokens/s, Running: 61 reqs, Waiting: 218 reqs, GPU KV cache usage: 98.5%, Prefix cache hit rate: 42.2% 2025-06-23 22:17:12,317 - __main__ - INFO - vllm running req: 61 queue req: 218 2025-06-23 22:17:14,219 - __main__ - INFO - Queue remaining: 0 2025-06-23 22:17:14,219 - __main__ - INFO - Metric Name Lifetime (tokens/sec) Recently (tokens/sec) ---------------------------------------------------------------------------------- completed_pages 0.59 1.07 server_input_tokens 1647.83 3080.03 server_output_tokens 581.99 1088.56 2025-06-23 22:17:14,219 - __main__ - INFO - Worker ID | finished | started ----------+----------+-------- 0 | 367 | 533 1 | 166 | 276 2025-06-23 22:17:22,318 - vllm - INFO - INFO 06-23 22:17:22 [loggers.py:118] Engine 000: Avg prompt throughput: 3352.2 tokens/s, Avg generation throughput: 947.6 tokens/s, Running: 64 reqs, Waiting: 204 reqs, GPU KV cache usage: 97.5%, Prefix cache hit rate: 41.3% 2025-06-23 22:17:22,318 - __main__ - INFO - vllm running req: 64 queue req: 204 2025-06-23 22:17:24,221 - __main__ - INFO - Queue remaining: 0 2025-06-23 22:17:24,221 - __main__ - INFO - Metric Name Lifetime (tokens/sec) Recently (tokens/sec) ---------------------------------------------------------------------------------- completed_pages 0.59 1.07 server_input_tokens 1663.26 3073.03 server_output_tokens 588.56 1089.67 2025-06-23 22:17:24,221 - __main__ - INFO - Worker ID | finished | started ----------+----------+-------- 0 | 375 | 533 1 | 168 | 276 2025-06-23 22:17:32,321 - vllm - INFO - INFO 06-23 22:17:32 [loggers.py:118] Engine 000: Avg prompt throughput: 3580.5 tokens/s, Avg generation throughput: 1014.6 tokens/s, Running: 69 reqs, Waiting: 189 reqs, GPU KV cache usage: 99.5%, Prefix cache hit rate: 40.6% 2025-06-23 22:17:32,321 - __main__ - INFO - vllm running req: 69 queue req: 189 2025-06-23 22:17:34,222 - __main__ - INFO - Queue remaining: 0 2025-06-23 22:17:34,222 - __main__ - INFO - Metric Name Lifetime (tokens/sec) Recently (tokens/sec) ---------------------------------------------------------------------------------- completed_pages 0.60 1.07 server_input_tokens 1685.07 3072.93 server_output_tokens 594.50 1073.86 2025-06-23 22:17:34,223 - __main__ - INFO - Worker ID | finished | started ----------+----------+-------- 0 | 385 | 533 1 | 170 | 276 2025-06-23 22:17:42,321 - vllm - INFO - INFO 06-23 22:17:42 [loggers.py:118] Engine 000: Avg prompt throughput: 4972.9 tokens/s, Avg generation throughput: 796.0 tokens/s, Running: 69 reqs, Waiting: 173 reqs, GPU KV cache usage: 98.3%, Prefix cache hit rate: 37.8% 2025-06-23 22:17:42,321 - __main__ - INFO - vllm running req: 69 queue req: 173 2025-06-23 22:17:44,224 - __main__ - INFO - Queue remaining: 0 2025-06-23 22:17:44,224 - __main__ - INFO - Metric Name Lifetime (tokens/sec) Recently (tokens/sec) ---------------------------------------------------------------------------------- completed_pages 0.61 1.08 server_input_tokens 1706.87 3096.57 server_output_tokens 600.92 1067.50 2025-06-23 22:17:44,224 - __main__ - INFO - Worker ID | finished | started ----------+----------+-------- 0 | 396 | 533 1 | 172 | 276 2025-06-23 22:17:52,321 - vllm - INFO - INFO 06-23 22:17:52 [loggers.py:118] Engine 000: Avg prompt throughput: 3408.6 tokens/s, Avg generation throughput: 1231.2 tokens/s, Running: 60 reqs, Waiting: 166 reqs, GPU KV cache usage: 96.6%, Prefix cache hit rate: 35.1% 2025-06-23 22:17:52,322 - __main__ - INFO - vllm running req: 60 queue req: 166 2025-06-23 22:17:54,225 - __main__ - INFO - Queue remaining: 0 2025-06-23 22:17:54,225 - __main__ - INFO - Metric Name Lifetime (tokens/sec) Recently (tokens/sec) ---------------------------------------------------------------------------------- completed_pages 0.62 1.09 server_input_tokens 1731.66 3125.79 server_output_tokens 604.26 1062.50 2025-06-23 22:17:54,226 - __main__ - INFO - Worker ID | finished | started ----------+----------+-------- 0 | 412 | 533 1 | 173 | 276 2025-06-23 22:18:02,324 - vllm - INFO - INFO 06-23 22:18:02 [loggers.py:118] Engine 000: Avg prompt throughput: 2932.3 tokens/s, Avg generation throughput: 1077.2 tokens/s, Running: 63 reqs, Waiting: 152 reqs, GPU KV cache usage: 99.4%, Prefix cache hit rate: 34.0% 2025-06-23 22:18:02,324 - __main__ - INFO - vllm running req: 63 queue req: 152 2025-06-23 22:18:04,227 - __main__ - INFO - Queue remaining: 0 2025-06-23 22:18:04,228 - __main__ - INFO - Metric Name Lifetime (tokens/sec) Recently (tokens/sec) ---------------------------------------------------------------------------------- completed_pages 0.62 1.10 server_input_tokens 1737.98 3124.56 server_output_tokens 605.55 1061.80 2025-06-23 22:18:04,228 - __main__ - INFO - Worker ID | finished | started ----------+----------+-------- 0 | 421 | 533 1 | 173 | 276 2025-06-23 22:18:12,324 - vllm - INFO - INFO 06-23 22:18:12 [loggers.py:118] Engine 000: Avg prompt throughput: 2062.9 tokens/s, Avg generation throughput: 1291.2 tokens/s, Running: 63 reqs, Waiting: 144 reqs, GPU KV cache usage: 99.7%, Prefix cache hit rate: 33.2% 2025-06-23 22:18:12,324 - __main__ - INFO - vllm running req: 63 queue req: 144 2025-06-23 22:18:14,230 - __main__ - INFO - Queue remaining: 0 2025-06-23 22:18:14,230 - __main__ - INFO - Metric Name Lifetime (tokens/sec) Recently (tokens/sec) ---------------------------------------------------------------------------------- completed_pages 0.62 1.08 server_input_tokens 1748.20 3089.42 server_output_tokens 609.36 1053.60 2025-06-23 22:18:14,230 - __main__ - INFO - Worker ID | finished | started ----------+----------+-------- 0 | 427 | 533 1 | 176 | 276 2025-06-23 22:18:22,325 - vllm - INFO - INFO 06-23 22:18:22 [loggers.py:118] Engine 000: Avg prompt throughput: 3189.8 tokens/s, Avg generation throughput: 1003.6 tokens/s, Running: 63 reqs, Waiting: 131 reqs, GPU KV cache usage: 98.9%, Prefix cache hit rate: 31.4% 2025-06-23 22:18:22,325 - __main__ - INFO - vllm running req: 63 queue req: 131 2025-06-23 22:18:24,232 - __main__ - INFO - Queue remaining: 0 2025-06-23 22:18:24,232 - __main__ - INFO - Metric Name Lifetime (tokens/sec) Recently (tokens/sec) ---------------------------------------------------------------------------------- completed_pages 0.63 1.08 server_input_tokens 1768.02 3093.29 server_output_tokens 615.63 1050.90 2025-06-23 22:18:24,232 - __main__ - INFO - Worker ID | finished | started ----------+----------+-------- 0 | 442 | 533 1 | 176 | 276 2025-06-23 22:18:32,325 - vllm - INFO - INFO 06-23 22:18:32 [loggers.py:118] Engine 000: Avg prompt throughput: 4033.2 tokens/s, Avg generation throughput: 920.9 tokens/s, Running: 62 reqs, Waiting: 116 reqs, GPU KV cache usage: 97.0%, Prefix cache hit rate: 30.7% 2025-06-23 22:18:32,326 - __main__ - INFO - vllm running req: 62 queue req: 116 2025-06-23 22:18:34,234 - __main__ - INFO - Queue remaining: 0 2025-06-23 22:18:34,234 - __main__ - INFO - Metric Name Lifetime (tokens/sec) Recently (tokens/sec) ---------------------------------------------------------------------------------- completed_pages 0.64 1.10 server_input_tokens 1788.03 3117.25 server_output_tokens 621.54 1058.56 2025-06-23 22:18:34,235 - __main__ - INFO - Worker ID | finished | started ----------+----------+-------- 0 | 456 | 533 1 | 176 | 276 2025-06-23 22:18:42,327 - vllm - INFO - INFO 06-23 22:18:42 [loggers.py:118] Engine 000: Avg prompt throughput: 3174.4 tokens/s, Avg generation throughput: 1168.8 tokens/s, Running: 64 reqs, Waiting: 106 reqs, GPU KV cache usage: 99.6%, Prefix cache hit rate: 30.2% 2025-06-23 22:18:42,328 - __main__ - INFO - vllm running req: 64 queue req: 106 2025-06-23 22:18:44,236 - __main__ - INFO - Queue remaining: 0 2025-06-23 22:18:44,237 - __main__ - INFO - Metric Name Lifetime (tokens/sec) Recently (tokens/sec) ---------------------------------------------------------------------------------- completed_pages 0.64 1.10 server_input_tokens 1796.36 3113.99 server_output_tokens 625.59 1062.35 2025-06-23 22:18:44,237 - __main__ - INFO - Worker ID | finished | started ----------+----------+-------- 0 | 464 | 533 1 | 177 | 276 2025-06-23 22:18:52,329 - vllm - INFO - INFO 06-23 22:18:52 [loggers.py:118] Engine 000: Avg prompt throughput: 3094.5 tokens/s, Avg generation throughput: 1075.1 tokens/s, Running: 64 reqs, Waiting: 95 reqs, GPU KV cache usage: 99.5%, Prefix cache hit rate: 35.2% 2025-06-23 22:18:52,329 - __main__ - INFO - vllm running req: 64 queue req: 95 2025-06-23 22:18:54,238 - __main__ - INFO - Queue remaining: 0 2025-06-23 22:18:54,238 - __main__ - INFO - Metric Name Lifetime (tokens/sec) Recently (tokens/sec) ---------------------------------------------------------------------------------- completed_pages 0.65 1.09 server_input_tokens 1813.96 3097.64 server_output_tokens 631.77 1062.07 2025-06-23 22:18:54,238 - __main__ - INFO - Worker ID | finished | started ----------+----------+-------- 0 | 476 | 533 1 | 177 | 276 2025-06-23 22:19:02,330 - vllm - INFO - INFO 06-23 22:19:02 [loggers.py:118] Engine 000: Avg prompt throughput: 2686.2 tokens/s, Avg generation throughput: 1085.1 tokens/s, Running: 66 reqs, Waiting: 83 reqs, GPU KV cache usage: 98.4%, Prefix cache hit rate: 36.3% 2025-06-23 22:19:02,330 - __main__ - INFO - vllm running req: 66 queue req: 83 2025-06-23 22:19:04,239 - __main__ - INFO - Queue remaining: 0 2025-06-23 22:19:04,239 - __main__ - INFO - Metric Name Lifetime (tokens/sec) Recently (tokens/sec) ---------------------------------------------------------------------------------- completed_pages 0.65 1.09 server_input_tokens 1824.70 3087.65 server_output_tokens 632.50 1051.53 2025-06-23 22:19:04,239 - __main__ - INFO - Worker ID | finished | started ----------+----------+-------- 0 | 485 | 533 1 | 177 | 276 2025-06-23 22:19:12,331 - vllm - INFO - INFO 06-23 22:19:12 [loggers.py:118] Engine 000: Avg prompt throughput: 3121.4 tokens/s, Avg generation throughput: 1113.3 tokens/s, Running: 66 reqs, Waiting: 71 reqs, GPU KV cache usage: 98.3%, Prefix cache hit rate: 36.5% 2025-06-23 22:19:12,332 - __main__ - INFO - vllm running req: 66 queue req: 71 2025-06-23 22:19:14,241 - __main__ - INFO - Queue remaining: 0 2025-06-23 22:19:14,241 - __main__ - INFO - Metric Name Lifetime (tokens/sec) Recently (tokens/sec) ---------------------------------------------------------------------------------- completed_pages 0.66 1.09 server_input_tokens 1839.87 3109.67 server_output_tokens 636.03 1053.52 2025-06-23 22:19:14,241 - __main__ - INFO - Worker ID | finished | started ----------+----------+-------- 0 | 496 | 533 1 | 177 | 276 2025-06-23 22:19:22,334 - vllm - INFO - INFO 06-23 22:19:22 [loggers.py:118] Engine 000: Avg prompt throughput: 4383.2 tokens/s, Avg generation throughput: 819.6 tokens/s, Running: 67 reqs, Waiting: 58 reqs, GPU KV cache usage: 99.0%, Prefix cache hit rate: 36.0% 2025-06-23 22:19:22,334 - __main__ - INFO - vllm running req: 67 queue req: 58 2025-06-23 22:19:24,243 - __main__ - INFO - Queue remaining: 0 2025-06-23 22:19:24,243 - __main__ - INFO - Metric Name Lifetime (tokens/sec) Recently (tokens/sec) ---------------------------------------------------------------------------------- completed_pages 0.66 1.11 server_input_tokens 1864.18 3171.49 server_output_tokens 643.90 1075.60 2025-06-23 22:19:24,244 - __main__ - INFO - Worker ID | finished | started ----------+----------+-------- 0 | 510 | 533 1 | 178 | 276 2025-06-23 22:19:32,335 - vllm - INFO - INFO 06-23 22:19:32 [loggers.py:118] Engine 000: Avg prompt throughput: 2733.1 tokens/s, Avg generation throughput: 1271.4 tokens/s, Running: 67 reqs, Waiting: 48 reqs, GPU KV cache usage: 99.2%, Prefix cache hit rate: 36.8% 2025-06-23 22:19:32,335 - __main__ - INFO - vllm running req: 67 queue req: 48 2025-06-23 22:19:34,245 - __main__ - INFO - Queue remaining: 0 2025-06-23 22:19:34,245 - __main__ - INFO - Metric Name Lifetime (tokens/sec) Recently (tokens/sec) ---------------------------------------------------------------------------------- completed_pages 0.67 1.11 server_input_tokens 1870.93 3132.95 server_output_tokens 646.43 1061.19 2025-06-23 22:19:34,245 - __main__ - INFO - Worker ID | finished | started ----------+----------+-------- 0 | 517 | 533 1 | 180 | 276 2025-06-23 22:19:38,303 - __main__ - WARNING - JSON decode error on attempt 0 for /home/nws8519/git/adaptation-slr/studies_pdfs/025_venturini.pdf-8: Expecting ',' delimiter: line 1 column 2504 (char 2503) 2025-06-23 22:19:38,303 - __main__ - INFO - Reducing anchor text len to 3000 for /home/nws8519/git/adaptation-slr/studies_pdfs/025_venturini.pdf-8 2025-06-23 22:19:38,539 - __main__ - INFO - Built page query for /home/nws8519/git/adaptation-slr/studies_pdfs/025_venturini.pdf-8 2025-06-23 22:19:42,336 - vllm - INFO - INFO 06-23 22:19:42 [loggers.py:118] Engine 000: Avg prompt throughput: 3463.6 tokens/s, Avg generation throughput: 1051.7 tokens/s, Running: 67 reqs, Waiting: 35 reqs, GPU KV cache usage: 99.7%, Prefix cache hit rate: 36.1% 2025-06-23 22:19:42,337 - __main__ - INFO - vllm running req: 67 queue req: 35 2025-06-23 22:19:44,246 - __main__ - INFO - Queue remaining: 0 2025-06-23 22:19:44,246 - __main__ - INFO - Metric Name Lifetime (tokens/sec) Recently (tokens/sec) ---------------------------------------------------------------------------------- completed_pages 0.67 1.11 server_input_tokens 1885.46 3133.66 server_output_tokens 650.63 1058.22 2025-06-23 22:19:44,246 - __main__ - INFO - Worker ID | finished | started ----------+----------+-------- 0 | 518 | 533 1 | 192 | 276 2025-06-23 22:19:52,337 - vllm - INFO - INFO 06-23 22:19:52 [loggers.py:118] Engine 000: Avg prompt throughput: 1922.4 tokens/s, Avg generation throughput: 1288.0 tokens/s, Running: 63 reqs, Waiting: 28 reqs, GPU KV cache usage: 97.3%, Prefix cache hit rate: 34.9% 2025-06-23 22:19:52,337 - __main__ - INFO - vllm running req: 63 queue req: 28 2025-06-23 22:19:54,249 - __main__ - INFO - Queue remaining: 0 2025-06-23 22:19:54,249 - __main__ - INFO - Metric Name Lifetime (tokens/sec) Recently (tokens/sec) ---------------------------------------------------------------------------------- completed_pages 0.68 1.12 server_input_tokens 1893.00 3122.97 server_output_tokens 652.22 1050.72 2025-06-23 22:19:54,249 - __main__ - INFO - Worker ID | finished | started ----------+----------+-------- 0 | 519 | 533 1 | 202 | 276 2025-06-23 22:20:02,337 - vllm - INFO - INFO 06-23 22:20:02 [loggers.py:118] Engine 000: Avg prompt throughput: 4802.2 tokens/s, Avg generation throughput: 846.5 tokens/s, Running: 64 reqs, Waiting: 12 reqs, GPU KV cache usage: 99.9%, Prefix cache hit rate: 34.9% 2025-06-23 22:20:02,338 - __main__ - INFO - vllm running req: 64 queue req: 12 2025-06-23 22:20:04,250 - __main__ - INFO - Queue remaining: 0 2025-06-23 22:20:04,250 - __main__ - INFO - Metric Name Lifetime (tokens/sec) Recently (tokens/sec) ---------------------------------------------------------------------------------- completed_pages 0.68 1.14 server_input_tokens 1913.56 3177.61 server_output_tokens 659.94 1072.11 2025-06-23 22:20:04,250 - __main__ - INFO - Worker ID | finished | started ----------+----------+-------- 0 | 521 | 533 1 | 216 | 276 2025-06-23 22:20:12,338 - vllm - INFO - INFO 06-23 22:20:12 [loggers.py:118] Engine 000: Avg prompt throughput: 3430.7 tokens/s, Avg generation throughput: 1007.1 tokens/s, Running: 63 reqs, Waiting: 1 reqs, GPU KV cache usage: 98.3%, Prefix cache hit rate: 34.3% 2025-06-23 22:20:12,339 - __main__ - INFO - vllm running req: 63 queue req: 1 2025-06-23 22:20:14,251 - __main__ - INFO - Queue remaining: 0 2025-06-23 22:20:14,252 - __main__ - INFO - Metric Name Lifetime (tokens/sec) Recently (tokens/sec) ---------------------------------------------------------------------------------- completed_pages 0.69 1.14 server_input_tokens 1927.76 3180.25 server_output_tokens 664.76 1074.28 2025-06-23 22:20:14,252 - __main__ - INFO - Worker ID | finished | started ----------+----------+-------- 0 | 523 | 533 1 | 225 | 276 2025-06-23 22:20:21,974 - __main__ - INFO - Reducing anchor text len to 3000 for /home/nws8519/git/adaptation-slr/studies_pdfs/004-butler.pdf-13 2025-06-23 22:20:21,975 - __main__ - WARNING - ValueError on attempt 0 for /home/nws8519/git/adaptation-slr/studies_pdfs/004-butler.pdf-13: - Response exceeded model_max_context, cannot use this response 2025-06-23 22:20:22,274 - __main__ - INFO - Built page query for /home/nws8519/git/adaptation-slr/studies_pdfs/004-butler.pdf-13 2025-06-23 22:20:22,339 - vllm - INFO - INFO 06-23 22:20:22 [loggers.py:118] Engine 000: Avg prompt throughput: 555.6 tokens/s, Avg generation throughput: 1757.9 tokens/s, Running: 42 reqs, Waiting: 0 reqs, GPU KV cache usage: 70.8%, Prefix cache hit rate: 34.0% 2025-06-23 22:20:22,340 - __main__ - INFO - vllm running req: 42 queue req: 0 2025-06-23 22:20:24,253 - __main__ - INFO - Queue remaining: 0 2025-06-23 22:20:24,253 - __main__ - INFO - Metric Name Lifetime (tokens/sec) Recently (tokens/sec) ---------------------------------------------------------------------------------- completed_pages 0.70 1.17 server_input_tokens 1961.53 3284.33 server_output_tokens 676.07 1109.23 2025-06-23 22:20:24,253 - __main__ - INFO - Worker ID | finished | started ----------+----------+-------- 0 | 527 | 533 1 | 241 | 276 2025-06-23 22:20:32,341 - vllm - INFO - INFO 06-23 22:20:32 [loggers.py:118] Engine 000: Avg prompt throughput: 329.3 tokens/s, Avg generation throughput: 1344.3 tokens/s, Running: 22 reqs, Waiting: 0 reqs, GPU KV cache usage: 41.2%, Prefix cache hit rate: 33.9% 2025-06-23 22:20:32,341 - __main__ - INFO - vllm running req: 22 queue req: 0 2025-06-23 22:20:34,254 - __main__ - INFO - Queue remaining: 0 2025-06-23 22:20:34,255 - __main__ - INFO - Metric Name Lifetime (tokens/sec) Recently (tokens/sec) ---------------------------------------------------------------------------------- completed_pages 0.71 1.22 server_input_tokens 2004.27 3416.28 server_output_tokens 690.33 1153.40 2025-06-23 22:20:34,255 - __main__ - INFO - Worker ID | finished | started ----------+----------+-------- 0 | 528 | 533 1 | 261 | 276 2025-06-23 22:20:42,342 - vllm - INFO - INFO 06-23 22:20:42 [loggers.py:118] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 859.4 tokens/s, Running: 9 reqs, Waiting: 0 reqs, GPU KV cache usage: 18.9%, Prefix cache hit rate: 33.9% 2025-06-23 22:20:42,342 - __main__ - INFO - vllm running req: 9 queue req: 0 2025-06-23 22:20:44,256 - __main__ - INFO - Queue remaining: 0 2025-06-23 22:20:44,256 - __main__ - INFO - Metric Name Lifetime (tokens/sec) Recently (tokens/sec) ---------------------------------------------------------------------------------- completed_pages 0.72 1.23 server_input_tokens 2020.84 3488.51 server_output_tokens 697.63 1182.37 2025-06-23 22:20:44,256 - __main__ - INFO - Worker ID | finished | started ----------+----------+-------- 0 | 529 | 533 1 | 271 | 276 2025-06-23 22:20:46,223 - __main__ - WARNING - JSON decode error on attempt 1 for /home/nws8519/git/adaptation-slr/studies_pdfs/014-norskov.pdf-56: Unterminated string starting at: line 1 column 125 (char 124) 2025-06-23 22:20:46,223 - __main__ - INFO - Reducing anchor text len to 1500 for /home/nws8519/git/adaptation-slr/studies_pdfs/014-norskov.pdf-56 2025-06-23 22:20:46,372 - __main__ - INFO - Built page query for /home/nws8519/git/adaptation-slr/studies_pdfs/014-norskov.pdf-56 2025-06-23 22:20:52,343 - vllm - INFO - INFO 06-23 22:20:52 [loggers.py:118] Engine 000: Avg prompt throughput: 180.7 tokens/s, Avg generation throughput: 434.0 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 12.0%, Prefix cache hit rate: 33.9% 2025-06-23 22:20:52,343 - __main__ - INFO - vllm running req: 6 queue req: 0 2025-06-23 22:20:54,257 - __main__ - INFO - Queue remaining: 0 2025-06-23 22:20:54,258 - __main__ - INFO - Metric Name Lifetime (tokens/sec) Recently (tokens/sec) ---------------------------------------------------------------------------------- completed_pages 0.71 1.19 server_input_tokens 2019.62 3397.47 server_output_tokens 699.92 1157.77 2025-06-23 22:20:54,258 - __main__ - INFO - Worker ID | finished | started ----------+----------+-------- 0 | 529 | 533 1 | 275 | 276 2025-06-23 22:21:02,344 - vllm - INFO - INFO 06-23 22:21:02 [loggers.py:118] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 333.2 tokens/s, Running: 5 reqs, Waiting: 0 reqs, GPU KV cache usage: 11.1%, Prefix cache hit rate: 33.9% 2025-06-23 22:21:02,344 - __main__ - INFO - vllm running req: 5 queue req: 0 2025-06-23 22:21:04,260 - __main__ - INFO - Queue remaining: 0 2025-06-23 22:21:04,260 - __main__ - INFO - Metric Name Lifetime (tokens/sec) Recently (tokens/sec) ---------------------------------------------------------------------------------- completed_pages 0.71 1.19 server_input_tokens 2001.86 3397.47 server_output_tokens 693.77 1157.77 2025-06-23 22:21:04,260 - __main__ - INFO - Worker ID | finished | started ----------+----------+-------- 0 | 529 | 533 1 | 275 | 276 2025-06-23 22:21:06,731 - __main__ - INFO - Finished TaskGroup for worker on a33f691ea15b24c747ed3f2369ced021b03cea55 2025-06-23 22:21:06,732 - __main__ - INFO - Got 13 docs for a33f691ea15b24c747ed3f2369ced021b03cea55 2025-06-23 22:21:06,749 - __main__ - INFO - Writing 13 markdown files for a33f691ea15b24c747ed3f2369ced021b03cea55 2025-06-23 22:21:06,758 - __main__ - INFO - Worker 1 exiting due to empty queue 2025-06-23 22:21:11,889 - __main__ - WARNING - JSON decode error on attempt 2 for /home/nws8519/git/adaptation-slr/studies_pdfs/014-norskov.pdf-56: Unterminated string starting at: line 1 column 125 (char 124) 2025-06-23 22:21:11,889 - __main__ - INFO - Reducing anchor text len to 750 for /home/nws8519/git/adaptation-slr/studies_pdfs/014-norskov.pdf-56 2025-06-23 22:21:12,037 - __main__ - INFO - Built page query for /home/nws8519/git/adaptation-slr/studies_pdfs/014-norskov.pdf-56 2025-06-23 22:21:12,345 - vllm - INFO - INFO 06-23 22:21:12 [loggers.py:118] Engine 000: Avg prompt throughput: 144.2 tokens/s, Avg generation throughput: 261.5 tokens/s, Running: 3 reqs, Waiting: 0 reqs, GPU KV cache usage: 5.9%, Prefix cache hit rate: 33.9% 2025-06-23 22:21:12,345 - __main__ - INFO - vllm running req: 3 queue req: 0 2025-06-23 22:21:14,261 - __main__ - INFO - Queue remaining: 0 2025-06-23 22:21:14,262 - __main__ - INFO - Metric Name Lifetime (tokens/sec) Recently (tokens/sec) ---------------------------------------------------------------------------------- completed_pages 0.70 1.11 finished_input_tokens 671.32 2567.49 finished_output_tokens 234.75 897.82 server_input_tokens 1992.10 3210.25 server_output_tokens 694.44 1106.62 2025-06-23 22:21:14,262 - __main__ - INFO - Worker ID | finished | started ----------+----------+-------- 0 | 530 | 533 1 | 276 | 276 2025-06-23 22:21:22,345 - vllm - INFO - INFO 06-23 22:21:22 [loggers.py:118] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 118.7 tokens/s, Running: 1 reqs, Waiting: 0 reqs, GPU KV cache usage: 0.9%, Prefix cache hit rate: 33.9% 2025-06-23 22:21:22,346 - __main__ - INFO - vllm running req: 1 queue req: 0 2025-06-23 22:21:24,265 - __main__ - INFO - Queue remaining: 0 2025-06-23 22:21:24,265 - __main__ - INFO - Metric Name Lifetime (tokens/sec) Recently (tokens/sec) ---------------------------------------------------------------------------------- completed_pages 0.70 1.10 finished_input_tokens 665.51 2567.49 finished_output_tokens 232.72 897.82 server_input_tokens 1980.54 3184.50 server_output_tokens 694.17 1112.84 2025-06-23 22:21:24,265 - __main__ - INFO - Worker ID | finished | started ----------+----------+-------- 0 | 532 | 533 1 | 276 | 276 2025-06-23 22:21:31,565 - __main__ - WARNING - JSON decode error on attempt 3 for /home/nws8519/git/adaptation-slr/studies_pdfs/014-norskov.pdf-56: Unterminated string starting at: line 1 column 125 (char 124) 2025-06-23 22:21:31,565 - __main__ - INFO - Reducing anchor text len to 375 for /home/nws8519/git/adaptation-slr/studies_pdfs/014-norskov.pdf-56 2025-06-23 22:21:31,720 - __main__ - INFO - Built page query for /home/nws8519/git/adaptation-slr/studies_pdfs/014-norskov.pdf-56 2025-06-23 22:21:32,347 - vllm - INFO - INFO 06-23 22:21:32 [loggers.py:118] Engine 000: Avg prompt throughput: 124.8 tokens/s, Avg generation throughput: 70.1 tokens/s, Running: 1 reqs, Waiting: 0 reqs, GPU KV cache usage: 0.5%, Prefix cache hit rate: 33.9% 2025-06-23 22:21:32,347 - __main__ - INFO - vllm running req: 1 queue req: 0 2025-06-23 22:21:34,267 - __main__ - INFO - Queue remaining: 0 2025-06-23 22:21:34,268 - __main__ - INFO - Metric Name Lifetime (tokens/sec) Recently (tokens/sec) ---------------------------------------------------------------------------------- completed_pages 0.69 1.05 finished_input_tokens 659.81 2567.49 finished_output_tokens 230.73 897.82 server_input_tokens 1964.80 3064.29 server_output_tokens 689.40 1079.47 2025-06-23 22:21:34,268 - __main__ - INFO - Worker ID | finished | started ----------+----------+-------- 0 | 532 | 533 1 | 276 | 276 2025-06-23 22:21:42,347 - vllm - INFO - INFO 06-23 22:21:42 [loggers.py:118] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 72.3 tokens/s, Running: 1 reqs, Waiting: 0 reqs, GPU KV cache usage: 0.8%, Prefix cache hit rate: 33.9% 2025-06-23 22:21:42,348 - __main__ - INFO - vllm running req: 1 queue req: 0 2025-06-23 22:21:44,269 - __main__ - INFO - Queue remaining: 0 2025-06-23 22:21:44,269 - __main__ - INFO - Metric Name Lifetime (tokens/sec) Recently (tokens/sec) ---------------------------------------------------------------------------------- completed_pages 0.69 1.05 finished_input_tokens 654.21 2567.49 finished_output_tokens 228.77 897.82 server_input_tokens 1948.11 3064.29 server_output_tokens 683.55 1079.47 2025-06-23 22:21:44,269 - __main__ - INFO - Worker ID | finished | started ----------+----------+-------- 0 | 532 | 533 1 | 276 | 276 2025-06-23 22:21:48,089 - __main__ - WARNING - JSON decode error on attempt 4 for /home/nws8519/git/adaptation-slr/studies_pdfs/014-norskov.pdf-56: Unterminated string starting at: line 1 column 125 (char 124) 2025-06-23 22:21:48,089 - __main__ - INFO - Reducing anchor text len to 187 for /home/nws8519/git/adaptation-slr/studies_pdfs/014-norskov.pdf-56 2025-06-23 22:21:48,235 - __main__ - INFO - Built page query for /home/nws8519/git/adaptation-slr/studies_pdfs/014-norskov.pdf-56 2025-06-23 22:21:52,348 - vllm - INFO - INFO 06-23 22:21:52 [loggers.py:118] Engine 000: Avg prompt throughput: 112.2 tokens/s, Avg generation throughput: 70.7 tokens/s, Running: 1 reqs, Waiting: 0 reqs, GPU KV cache usage: 0.6%, Prefix cache hit rate: 33.9% 2025-06-23 22:21:52,349 - __main__ - INFO - vllm running req: 1 queue req: 0 2025-06-23 22:21:54,271 - __main__ - INFO - Queue remaining: 0 2025-06-23 22:21:54,272 - __main__ - INFO - Metric Name Lifetime (tokens/sec) Recently (tokens/sec) ---------------------------------------------------------------------------------- completed_pages 0.68 1.01 finished_input_tokens 648.70 2567.49 finished_output_tokens 226.84 897.82 server_input_tokens 1932.75 2953.05 server_output_tokens 678.78 1041.06 2025-06-23 22:21:54,272 - __main__ - INFO - Worker ID | finished | started ----------+----------+-------- 0 | 532 | 533 1 | 276 | 276 2025-06-23 22:22:02,349 - vllm - INFO - INFO 06-23 22:22:02 [loggers.py:118] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 72.1 tokens/s, Running: 1 reqs, Waiting: 0 reqs, GPU KV cache usage: 0.9%, Prefix cache hit rate: 33.9% 2025-06-23 22:22:02,349 - __main__ - INFO - vllm running req: 1 queue req: 0 2025-06-23 22:22:04,273 - __main__ - INFO - Queue remaining: 0 2025-06-23 22:22:04,273 - __main__ - INFO - Metric Name Lifetime (tokens/sec) Recently (tokens/sec) ---------------------------------------------------------------------------------- completed_pages 0.67 1.01 finished_input_tokens 643.28 2567.49 finished_output_tokens 224.95 897.82 server_input_tokens 1916.61 2953.05 server_output_tokens 673.11 1041.06 2025-06-23 22:22:04,273 - __main__ - INFO - Worker ID | finished | started ----------+----------+-------- 0 | 532 | 533 1 | 276 | 276 2025-06-23 22:22:10,051 - __main__ - WARNING - JSON decode error on attempt 5 for /home/nws8519/git/adaptation-slr/studies_pdfs/014-norskov.pdf-56: Unterminated string starting at: line 1 column 125 (char 124) 2025-06-23 22:22:10,052 - __main__ - INFO - Reducing anchor text len to 93 for /home/nws8519/git/adaptation-slr/studies_pdfs/014-norskov.pdf-56 2025-06-23 22:22:10,198 - __main__ - INFO - Built page query for /home/nws8519/git/adaptation-slr/studies_pdfs/014-norskov.pdf-56 2025-06-23 22:22:12,351 - vllm - INFO - INFO 06-23 22:22:12 [loggers.py:118] Engine 000: Avg prompt throughput: 106.9 tokens/s, Avg generation throughput: 70.4 tokens/s, Running: 1 reqs, Waiting: 0 reqs, GPU KV cache usage: 0.5%, Prefix cache hit rate: 33.9% 2025-06-23 22:22:12,351 - __main__ - INFO - vllm running req: 1 queue req: 0 2025-06-23 22:22:14,274 - __main__ - INFO - Queue remaining: 0 2025-06-23 22:22:14,274 - __main__ - INFO - Metric Name Lifetime (tokens/sec) Recently (tokens/sec) ---------------------------------------------------------------------------------- completed_pages 0.67 0.94 finished_input_tokens 637.95 2567.49 finished_output_tokens 223.08 897.82 server_input_tokens 1901.66 2717.47 server_output_tokens 668.83 948.51 2025-06-23 22:22:14,274 - __main__ - INFO - Worker ID | finished | started ----------+----------+-------- 0 | 532 | 533 1 | 276 | 276 2025-06-23 22:22:22,351 - vllm - INFO - INFO 06-23 22:22:22 [loggers.py:118] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 72.3 tokens/s, Running: 1 reqs, Waiting: 0 reqs, GPU KV cache usage: 0.8%, Prefix cache hit rate: 33.9% 2025-06-23 22:22:22,351 - __main__ - INFO - vllm running req: 1 queue req: 0 2025-06-23 22:22:24,276 - __main__ - INFO - Queue remaining: 0 2025-06-23 22:22:24,276 - __main__ - INFO - Metric Name Lifetime (tokens/sec) Recently (tokens/sec) ---------------------------------------------------------------------------------- completed_pages 0.66 0.94 finished_input_tokens 632.71 2567.49 finished_output_tokens 221.25 897.82 server_input_tokens 1886.04 2717.47 server_output_tokens 663.34 948.51 2025-06-23 22:22:24,277 - __main__ - INFO - Worker ID | finished | started ----------+----------+-------- 0 | 532 | 533 1 | 276 | 276 2025-06-23 22:22:32,353 - vllm - INFO - INFO 06-23 22:22:32 [loggers.py:118] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 71.7 tokens/s, Running: 1 reqs, Waiting: 0 reqs, GPU KV cache usage: 1.1%, Prefix cache hit rate: 33.9% 2025-06-23 22:22:32,353 - __main__ - INFO - vllm running req: 1 queue req: 0 2025-06-23 22:22:32,624 - __main__ - WARNING - JSON decode error on attempt 6 for /home/nws8519/git/adaptation-slr/studies_pdfs/014-norskov.pdf-56: Unterminated string starting at: line 1 column 125 (char 124) 2025-06-23 22:22:32,624 - __main__ - INFO - Reducing anchor text len to 46 for /home/nws8519/git/adaptation-slr/studies_pdfs/014-norskov.pdf-56 2025-06-23 22:22:32,779 - __main__ - INFO - Built page query for /home/nws8519/git/adaptation-slr/studies_pdfs/014-norskov.pdf-56 2025-06-23 22:22:34,278 - __main__ - INFO - Queue remaining: 0 2025-06-23 22:22:34,278 - __main__ - INFO - Metric Name Lifetime (tokens/sec) Recently (tokens/sec) ---------------------------------------------------------------------------------- completed_pages 0.66 0.85 finished_input_tokens 627.55 2567.49 finished_output_tokens 219.45 897.82 server_input_tokens 1871.54 2475.65 server_output_tokens 659.25 864.54 2025-06-23 22:22:34,278 - __main__ - INFO - Worker ID | finished | started ----------+----------+-------- 0 | 532 | 533 1 | 276 | 276 2025-06-23 22:22:42,354 - vllm - INFO - INFO 06-23 22:22:42 [loggers.py:118] Engine 000: Avg prompt throughput: 106.9 tokens/s, Avg generation throughput: 71.1 tokens/s, Running: 1 reqs, Waiting: 0 reqs, GPU KV cache usage: 0.7%, Prefix cache hit rate: 34.0% 2025-06-23 22:22:42,354 - __main__ - INFO - vllm running req: 1 queue req: 0 2025-06-23 22:22:44,280 - __main__ - INFO - Queue remaining: 0 2025-06-23 22:22:44,280 - __main__ - INFO - Metric Name Lifetime (tokens/sec) Recently (tokens/sec) ---------------------------------------------------------------------------------- completed_pages 0.65 0.85 finished_input_tokens 622.48 2567.49 finished_output_tokens 217.67 897.82 server_input_tokens 1856.41 2475.65 server_output_tokens 653.92 864.54 2025-06-23 22:22:44,280 - __main__ - INFO - Worker ID | finished | started ----------+----------+-------- 0 | 532 | 533 1 | 276 | 276 2025-06-23 22:22:52,355 - vllm - INFO - INFO 06-23 22:22:52 [loggers.py:118] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 71.8 tokens/s, Running: 1 reqs, Waiting: 0 reqs, GPU KV cache usage: 1.0%, Prefix cache hit rate: 34.0% 2025-06-23 22:22:52,355 - __main__ - INFO - vllm running req: 1 queue req: 0 2025-06-23 22:22:54,281 - __main__ - INFO - Queue remaining: 0 2025-06-23 22:22:54,282 - __main__ - INFO - Metric Name Lifetime (tokens/sec) Recently (tokens/sec) ---------------------------------------------------------------------------------- completed_pages 0.65 0.85 finished_input_tokens 617.49 2567.49 finished_output_tokens 215.93 897.82 server_input_tokens 1841.53 2475.65 server_output_tokens 648.67 864.54 2025-06-23 22:22:54,282 - __main__ - INFO - Worker ID | finished | started ----------+----------+-------- 0 | 532 | 533 1 | 276 | 276 2025-06-23 22:22:55,207 - __main__ - WARNING - JSON decode error on attempt 7 for /home/nws8519/git/adaptation-slr/studies_pdfs/014-norskov.pdf-56: Unterminated string starting at: line 1 column 125 (char 124) 2025-06-23 22:22:55,208 - __main__ - INFO - Reducing anchor text len to 23 for /home/nws8519/git/adaptation-slr/studies_pdfs/014-norskov.pdf-56 2025-06-23 22:22:55,208 - __main__ - ERROR - Failed to process /home/nws8519/git/adaptation-slr/studies_pdfs/014-norskov.pdf-56 after 8 attempts. 2025-06-23 22:22:55,229 - __main__ - ERROR - Document /home/nws8519/git/adaptation-slr/studies_pdfs/014-norskov.pdf has 1 fallback pages out of 58 exceeding max_page_error_rate of 0.004, discarding document. 2025-06-23 22:22:55,230 - __main__ - INFO - Finished TaskGroup for worker on 2e60af4aea64f23cc30c38b01b3cf7f0b1c0a024 2025-06-23 22:22:55,230 - __main__ - INFO - Got 20 docs for 2e60af4aea64f23cc30c38b01b3cf7f0b1c0a024 2025-06-23 22:22:55,254 - __main__ - INFO - Writing 20 markdown files for 2e60af4aea64f23cc30c38b01b3cf7f0b1c0a024 2025-06-23 22:22:55,274 - __main__ - INFO - Worker 0 exiting due to empty queue 2025-06-23 22:22:55,274 - __main__ - INFO - ================================================================================ 2025-06-23 22:22:55,275 - __main__ - INFO - FINAL METRICS SUMMARY 2025-06-23 22:22:55,275 - __main__ - INFO - ================================================================================ 2025-06-23 22:22:55,275 - __main__ - INFO - Total elapsed time: 1248.38 seconds 2025-06-23 22:22:55,275 - __main__ - INFO - Total Server Input tokens: 2,298,171 2025-06-23 22:22:55,275 - __main__ - INFO - Total Server Output tokens: 810,762 2025-06-23 22:22:55,275 - __main__ - INFO - Finished input tokens: 2,143,438 2025-06-23 22:22:55,276 - __main__ - INFO - Finished output tokens: 772,257 2025-06-23 22:22:55,276 - __main__ - INFO - Completed pages: 808 2025-06-23 22:22:55,276 - __main__ - INFO - Failed pages: 1 2025-06-23 22:22:55,276 - __main__ - INFO - Page Failure rate: 0.12% 2025-06-23 22:22:55,276 - __main__ - INFO - Server Input tokens/sec rate: 1840.92 2025-06-23 22:22:55,276 - __main__ - INFO - Server Output tokens/sec rate: 649.45 2025-06-23 22:22:55,276 - __main__ - INFO - Finished Input tokens/sec rate: 1716.97 2025-06-23 22:22:55,276 - __main__ - INFO - Finished Output tokens/sec rate: 618.61 2025-06-23 22:22:55,276 - __main__ - INFO - ================================================================================ 2025-06-23 22:22:55,277 - __main__ - INFO - Work done 2025-06-23 22:22:55,277 - __main__ - INFO - Got cancellation request for VLLM server