![]() ![]() Mon 12:44:57 AM PST | | Processor: 32 GenuineIntel Intel(R) Xeon(R) CPU E5-2698 v3 2.30GHz Mon 12:44:56 AM PST | | OpenCL: NVIDIA GPU 1: Quadro K6000 (driver version 418.87.01, device version OpenCL 1.2 CUDA, 11434MB, 4007MB available, 5193 GFLOPS peak) Mon 12:44:56 AM PST | | OpenCL: NVIDIA GPU 0: Tesla K40c (driver version 418.87.01, device version OpenCL 1.2 CUDA, 11441MB, 4007MB available, 4291 GFLOPS peak) Mon 12:44:56 AM PST | | CUDA: NVIDIA GPU 1: Quadro K6000 (driver version 418.87, CUDA version 10.1, compute capability 3.5, 4096MB, 4007MB available, 5193 GFLOPS peak) ![]() Mon 12:44:56 AM PST | | CUDA: NVIDIA GPU 0: Tesla K40c (driver version 418.87, CUDA version 10.1, compute capability 3.5, 4096MB, 4007MB available, 4291 GFLOPS peak) Mon 12:44:56 AM PST | | Data directory: /var/lib/boinc-client Mon 12:44:56 AM PST | | log flags: file_xfer, sched_ops, task Mon 12:44:56 AM PST | | Starting BOINC client version 7.9.3 for x86_64-pc-linux-gnu | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. Here is the output from nvidia-smi while running jobs: Average time per iteration = 31.164673 ms Using clWaitForEvents() for polling with initial wait of 12 ms (mode 0) Using a block size of 7680 with 5 blocks/chunk Switching to Parameter File 'astronomy_parameters.txt'Įxtensions: cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_fp64 cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll cl_nv_copy_opts cl_nv_create_bufferĭevice 'Quadro K6000' (NVIDIA Corporation:0x10de) (CL_DEVICE_TYPE_GPU)Įstimated Nvidia GPU GFLOP/s: 865 SP GFLOP/s, 108 DP FLOP/s Setting process priority to 0 (13): Permission deniedĮrror loading Lua script 'astronomy_parameters.txt': :1: '' expected near '4' milkyway_separation 1.46 Linux x86_64 double OpenCL īOINC GPU type suggests using OpenCL vendor 'NVIDIA Corporation' Is this the source of the problem? And is there a way to speed up the calculations on the K6000? Sample outputs are included below. (For example, the K40 has 73 blocks/chunk with "num chunks: 1", while the K6000 is showing 5 blocks/chunk with "num chunks: 15"). When I check the workunit output, the only significant different I notice is the "blocks/chunk" and number of chunks. When I check the GPU utilization using nvidia-smi, the K6000 never has more than 25% utilization, but the K40 is going at 90-100%. The K6000 takes around 220 seconds to complete a job while the K40 finishes them in ~45 seconds. I've been noticing that the K6000 processes jobs much more slowly than the K40 when the two should be comparable in performance (same chip). I have a workstation equipped with two Nvidia cards: a Quadro K6000 and a Tesla K40. ![]()
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |