if(use_sleep){//R: spins with Sleep(1) while readback finished cl_event ev; clEnqueueMarker(cq,&ev);clFlush(cq); size_t wait_time=0;cl_int ret; do{SwitchToThread();/*nanosleep(100);*//*Sleep(use_sleep_ex);*/wait_time++; err=clGetEventInfo(ev,CL_EVENT_COMMAND_EXECUTION_STATUS,sizeof(ret),&ret,NULL); }while(ret>CL_COMPLETE); cl_ulong start=0,end=0; err=clGetEventProfilingInfo(ev,CL_PROFILING_COMMAND_QUEUED,sizeof(cl_ulong),&start,NULL); err|=clGetEventProfilingInfo(ev,CL_PROFILING_COMMAND_END,sizeof(cl_ulong),&end,NULL); OCL_LOG_ERR("clGetEventProfilingInfo"); float cur_quantum=(end-start)/(wait_time*1e6); clReleaseEvent(ev); if(use_sleep_ex==1 && wait_time>7)SleepQuantumCounter::update(cur_quantum); if(verbose==6){ if(use_sleep_ex==1)fprintf(stderr,"current sleep quantum %2.4gms\t",cur_quantum); fprintf(stderr,"Sleep before triplet result map: Awaited %d iterations for completion; elapsed %2.4gms\n", wait_time,(end-start)/1e6); } }
Now going to the hospital to see my new grand child.Its just 10 hours old.
I don`t think changing high prec timer is a good idea for stock development.Especially for hosts which are used for other things than crunching.
In next testing window please try with CPU busy and mandatory -use_sleep option too.There is one very important difference between your GPU and my C-60 regarding this test. My C-60 one of slowest ATi devices so get low-perf path with sleep enabled by default.Hence I omit it in testing (still going, BTW, on netbook).And all changes between these builds embraced with if(use_sleep) so to enable sleep is requirement.EDIT: and as usual these days, seems you need some full-length tasks, not PG set. GPU too fast.For example, all CPU time you see most probably came from startup code, not icfft loop processing.
No tuning line at all so fully default.CPU fixation to 1GHz reapplied after reboot.Binaries used for this test attached so reader can repeat it on any ATi GPU equipped host.
Sleep0: class SleepQuantum: total=91.016396, N=40, <>=2.2754099, min=0.076670475 max=4.2926121Sleep1: class SleepQuantum: total=66.940231, N=62, <>=1.0796812, min=0.80534756 max=8.826087STT class SleepQuantum: total=162.07121, N=33, <>=4.9112489, min=4.226912 max=6.0012178default:class SleepQuantum: total=46.431198, N=47, <>=0.98789783, min=0.90345198 max=1.0177377