Seti@Home optimized science apps and information
Optimized Seti@Home apps => Windows => Topic started by: Raistmer on 19 Nov 2007, 10:35:24 am
-
Hello.
Anybody tried "Flush denormal results to zero" option enabled?
AFAIK it may give some improvement in speed and simulate situation we will have on GPU chips w/o denormalized floating number support. Or it will bring unvalidatable results?
-
Hello.
Anybody tried "Flush denormal results to zero" option enabled?
AFAIK it may give some improvement in speed and simulate situation we will have on GPU chips w/o denormalized floating number support. Or it will bring unvalidatable results?
I have looked into manually threshholding parts of the signals for that sort of thing (NaN's etc.) Using special IPP functions for that purpose, but haven't gotten around to it yet. There are supposedly massive speed increases possible without significant precision penalty (NaN's are already NaN's after all) . Vtune shows there are significant speed penalties going on in the FFT/IFFT for processing denormal data. [In the other SSE routines too for that matter]
If you have an app that makes valid results now, Maybe you could try a separate build with "Flush Denormal results to zero" on the optimiser .cpp files only. It may not Validate, The precision penalty may be high for the brute force 'Flush to Zero', but even if not valid results, the time improvement data may be valuable.
Iit would give an indication of how much speedup might be obtainable with more careful use of the more precise thresholding routines instead. [Also nothing stops us trying to make custom thresholding, or finetuning any existing data maintenance already in the code (there is a little there I think) ]
Jason
[Afterthought: Joe has mentioned to me in the past that there is only one place, I think in the baseline smoothing, that requires full reversability, so I am thinking that just maybe selective liberal use of heavy handed thresholding all the other places might be possible...]
-
Well, i rebuilt with "Flush denormals to zero" ON for Optimizer and seti_boinc projects. Results still strong similar but there was no improvement in run time (on testWU-1 at least).
-
Well, i rebuilt with "Flush denormals to zero" ON for Optimizer and seti_boinc projects. Results still strong similar but there was no improvement in run time (on testWU-1 at least).
I suppose that could imply there might not be much effect from underflows with that data. It's a while ago since I ran vtune with TestWU-1, So when I build again I'll make sure to grab some fresh profile data and look for SSE and x87 Input assists.
I'm glad you found the WU still validates. that maybe means we could be more aggressive later if needed.
Jason
-
app build with "flush denormals to zero" option enabled throws exeption on restart... so not viable option :/
ERROR: Invalid parameter detected in function (null). File: (null) Line: 0
ERROR: Expression: (null)
Unhandled Exception Detected...
- Unhandled Exception Record -
Reason: Breakpoint Encountered (0x80000003) at address 0x7D61002D
Engaging BOINC Windows Runtime Debugger...
********************
BOINC Windows Runtime Debugger Version 5.10.20
Dump Timestamp : 11/23/07 12:01:48
LoadLibraryA( srcsrv.dll ): GetLastError = 126
Debugger Engine : 4.0.5.0
Symbol Search Path: D:\BTR\SETI\seti_boinc\client\win_build\Release32-NoGFX-xW;D:\BTR\SETI\seti_boinc\client\win_build\Release32-NoGFX-xW;srv*G:\DOCUME~1\ADMINI~1\LOCALS~1\Temp\2\symbols*http://msdl.microsoft.com/download/symbols
ModLoad: 00400000 001ac000 KWSN_R_2.4_SSE2_x86_flush_to_zero.exe (2.3.0.7) (PDB Symbols Loaded)
File Version : 2, 3, 0, 7
Company Name : Lunatics.at
Product Name : SETI@Home Enhanced Worker
Product Version: 5, 0, 1, 5
ModLoad: 7d600000 000f0000 ntdll.dll (5.2.3790.1830) (-exported- Symbols Loaded)
File Version : 5.2.3790.1830 (srv03_sp1_rtm.050324-1447)
Company Name : Microsoft Corporation
Product Name : MicrosoftR WindowsR Operating System
Product Version: 5.2.3790.1830
ModLoad: 7d4c0000 00130000 kernel32.dll (5.2.3790.2919) (-exported- Symbols Loaded)
File Version : 5.2.3790.2919 (srv03_sp1_gdr.070417-2346)
Company Name : Microsoft Corporation
Product Name : MicrosoftR WindowsR Operating System
Product Version: 5.2.3790.2919
ModLoad: 10000000 00017000 ippcore-5.3.dll (5.3.85.461) (-exported- Symbols Loaded)
File Version : 5,3,85,461
Company Name : Intel Corporation.
Product Name : core. IntelR Integrated Performance Primitives. Core Library.
Product Version: 5.3 build 85.13
ModLoad: 005b0000 0004e000 libguide40.dll (4.0.2007.602) (-exported- Symbols Loaded)
File Version : 20070602
Company Name : Intel Corporation
Product Name : Intel(R) OMP Runtime Library
Product Version: 4.0
ModLoad: 00600000 0003d000 ipps-5.3.dll (5.3.85.498) (-exported- Symbols Loaded)
File Version : 5,3,85,498
Company Name : Intel Corporation.
Product Name : ippSP. IntelR Integrated Performance Primitives. Signal Processing.
Product Version: 5.3 build 85.13
ModLoad: 7d930000 000d0000 USER32.dll (5.2.3790.2892) (-exported- Symbols Loaded)
File Version : 5.2.3790.2892 (srv03_sp1_gdr.070301-0030)
Company Name : Microsoft Corporation
Product Name : MicrosoftR WindowsR Operating System
Product Version: 5.2.3790.2892
ModLoad: 7d800000 00090000 GDI32.dll (5.2.3790.2960) (-exported- Symbols Loaded)
File Version : 5.2.3790.2960 (srv03_sp1_gdr.070620-2335)
Company Name : Microsoft Corporation
Product Name : MicrosoftR WindowsR Operating System
Product Version: 5.2.3790.2960
ModLoad: 77f50000 0009c000 ADVAPI32.dll (5.2.3790.1830) (-exported- Symbols Loaded)
File Version : 5.2.3790.1830 (srv03_sp1_rtm.050324-1447)
Company Name : Microsoft Corporation
Product Name : MicrosoftR WindowsR Operating System
Product Version: 5.2.3790.1830
ModLoad: 7da20000 000e0000 RPCRT4.dll (5.2.3790.1830) (-exported- Symbols Loaded)
File Version : 5.2.3790.1830 (srv03_sp1_rtm.050324-1447)
Company Name : Microsoft Corporation
Product Name : MicrosoftR WindowsR Operating System
Product Version: 5.2.3790.1830
ModLoad: 71c20000 00012000 tsappcmp.dll (5.2.3790.1830) (-exported- Symbols Loaded)
File Version : 5.2.3790.1830 (srv03_sp1_rtm.050324-1447)
Company Name : Microsoft Corporation
Product Name : MicrosoftR WindowsR Operating System
Product Version: 5.2.3790.1830
ModLoad: 77ba0000 0005a000 msvcrt.dll (7.0.3790.1830) (-exported- Symbols Loaded)
File Version : 7.0.3790.1830 (srv03_sp1_rtm.050324-1447)
Company Name : Microsoft Corporation
Product Name : MicrosoftR WindowsR Operating System
Product Version: 7.0.3790.1830
ModLoad: 7dee0000 00060000 IMM32.DLL (5.2.3790.1830) (-exported- Symbols Loaded)
File Version : 5.2.3790.1830 (srv03_sp1_rtm.050324-1447)
Company Name : Microsoft Corporation
Product Name : MicrosoftR WindowsR Operating System
Product Version: 5.2.3790.1830
ModLoad: 01dc0000 002da000 ippst7-5.3.dll (5.3.85.498) (-exported- Symbols Loaded)
File Version : 5,3,85,498
Company Name : Intel Corporation.
Product Name : ippSP. IntelR Integrated Performance Primitives. Signal Processing.
Product Version: 5.3 build 85.13
ModLoad: 6d580000 000a8000 dbghelp.dll (5.2.3790.1830) (-exported- Symbols Loaded)
File Version : 5.2.3790.1830 (srv03_sp1_rtm.050324-1447)
Company Name : Microsoft Corporation
Product Name : MicrosoftR WindowsR Operating System
Product Version: 5.2.3790.1830
ModLoad: 77b90000 00008000 VERSION.dll (5.2.3790.1830) (-exported- Symbols Loaded)
File Version : 5.2.3790.1830 (srv03_sp1_rtm.050324-1447)
Company Name : Microsoft Corporation
Product Name : MicrosoftR WindowsR Operating System
Product Version: 5.2.3790.1830
ModLoad: 03770000 00082000 symsrv.dll (6.6.3.5) (-exported- Symbols Loaded)
File Version : 6.6.0003.5 (vbl_core_fbrel(DrewB).051021-1446)
Company Name : Microsoft Corporation
Product Name : Debugging Tools for Windows(R)
Product Version: 6.6.0003.5
*** Dump of the Process Statistics: ***
- I/O Operations Counters -
Read: 104, Write: 0, Other 178
- I/O Transfers Counters -
Read: 0, Write: 106, Other 0
- Paged Pool Usage -
QuotaPagedPoolUsage: 82064, QuotaPeakPagedPoolUsage: 82064
QuotaNonPagedPoolUsage: 5328, QuotaPeakNonPagedPoolUsage: 5328
- Virtual Memory Usage -
VirtualSize: 66924544, PeakVirtualSize: 66924544
- Pagefile Usage -
PagefileUsage: 20967424, PeakPagefileUsage: 21651456
- Working Set Size -
WorkingSetSize: 22929408, PeakWorkingSetSize: 22929408, PageFaultCount: 5869
*** Dump of the Worker thread (904): ***
- Information -
Status: Ready, Base Priority: Above Normal, Priority: Above Normal, Kernel Time: 312500.000000, User Time: 17187500.000000, Wait Time: 344692.000000
- Unhandled Exception Record -
Reason: Breakpoint Encountered (0x80000003) at address 0x7D61002D
- Registers -
eax=0000001a ebx=0012f2dc ecx=004b0058 edx=0012e704 esi=0053ec98 edi=0012f16c
eip=7d61002d esp=0012f04c ebp=0012f2b8
cs=0023 ss=002b ds=002b es=002b fs=0053 gs=002b efl=00000206
- Callstack -
ChildEBP RetAddr Args to Child
0012f2b8 004a3519 00000002 0012fd9c 00000002 0053ecb8 ntdll!DbgBreakPoint+0x0
0012f978 004a2d26 00000000 00000000 00000000 00000000 KWSN_R_2.4_SSE2_x86_flush_to_ze!read_wu_state+0xa (..\worker.cpp:155)
0012fbb8 0041419a 7d61c8f9 7d4e3397 00000084 00000003 KWSN_R_2.4_SSE2_x86_flush_to_ze!worker+0x0 (..\worker.cpp:243)
0012fd84 00413e70 00000001 0012fd9c 00222450 0012fd9c KWSN_R_2.4_SSE2_x86_flush_to_ze!main+0x0 (..\main.cpp:289)
0012ff28 004b3298 00400000 00000000 002224b3 00000001 KWSN_R_2.4_SSE2_x86_flush_to_ze!WinMain+0xb (..\main.cpp:310) FPO: [4,100,0]
0012ffc0 7d4e992a 00000000 00000000 7efde000 00000000 KWSN_R_2.4_SSE2_x86_flush_to_ze!__tmainCRTStartup+0x1c (f:\rtm\vctools\crt_bld\self_x86\crt\src\crt0.c:315)
0012fff0 00000000 004b3301 00000000 00000000 00000000 kernel32!BaseProcessInitPostImport+0x0 (f:\rtm\vctools\crt_bld\self_x86\crt\src\crt0.c:315)
*** Dump of the Timer thread (560): ***
- Information -
Status: Waiting, Wait Reason: ExecutionDelay, Kernel Time: 0.000000, User Time: 0.000000, Wait Time: 344662.000000
- Registers -
eax=00000000 ebx=020b50d8 ecx=00000000 edx=0000001c esi=00000000 edi=022fff38
eip=7d61cca4 esp=022ffefc ebp=022fff60
cs=0023 ss=002b ds=002b es=002b fs=0053 gs=002b efl=00000202
- Callstack -
ChildEBP RetAddr Args to Child
022fff60 7d4d14ef 000003e8 00000000 022fffb0 004c9252 ntdll!ZwDelayExecution+0x0
022fff70 004c9252 000003e8 005050bb 00000000 53a217f4 kernel32!Sleep+0x0
022fffb0 00505160 00000000 7d4dfff1 020b50d8 00000000 KWSN_R_2.4_SSE2_x86_flush_to_ze!ippsFFTFwd_CToC_32fc+0x0
022fffb8 7d4dfff1 020b50d8 00000000 00000000 020b50d8 KWSN_R_2.4_SSE2_x86_flush_to_ze!_threadstartex+0x5 (f:\rtm\vctools\crt_bld\self_x86\crt\src\threadex.c:326)
022fffec 00000000 005050e1 020b50d8 00000000 00000000 kernel32!FlsSetValue+0x0 (f:\rtm\vctools\crt_bld\self_x86\crt\src\threadex.c:326)
*** Debug Message Dump ****
*** Foreground Window Data ***
Window Name :
Window Class :
Window Process ID: 0
Window Thread ID : 0
Exiting...
-
Oh well, was worth a try :D,
that might be telling us something, maybe there is some denormal data somewhere, that needs to be that way ? maybe in the becnchmarks some random data. good to know this stuff :D
Jason
-
It seems it's not "flush to zero".... it's something else... maybe commented out constructor - rebuild failed to restart too.
Probably my changes for VS 2005 build broke checkpointing :/
@Jason
You did analogical changes in code, does your build OK with restarts?
-
Yes, it built okay though it has been a while since I rebuilt it. I'd be wondering which boinc API version you built against first.
I think I know the commented out constructor you mean. I felt uneasy about that too so so I restored it, but commented out the '0' parameter instead, allowing use of the default constructor. Not sure if it made any difference.
Jason
-
Well, does BOINC API do anything with restoring app state from checkpoint?? I htought it too app specific info to relay on BOINC API.
In SETI case it's state.sah who hold checkpoint info i belive. original 2.4 app version restores fine from checkpoint created by my build but my own build throws exception.
So something wrong with initialization. I tried your method, commented out only zero. It builds fine but gives exception too.
"
SETI_WU_INFO::SETI_WU_INFO( void ) :
track_mem <SETI_WU_INFO> ( "SETI_WU_INFO" ),
data_class( 0 ),
start_ra( 0 ),
start_dec( 0 ),
end_ra( 0 ),
end_dec( 0 ),
true_angle_range( 0 ),
time_recorded( 0 ),
subband_center( 0 ),
subband_base( 0 ),
subband_sample_rate( 0 ),
fft_len( 0 ),
ifft_len( 0 ),
subband_number( 0 ),
nsamples( 0 ),
bits_per_sample( 0 ),
position_history(/* 0*/ ),
num_positions( 0 ),
.....
"
(current SVN version of this file shows " position_history(), " so it should be OK now, but....)
Call stack shows some trouble in this statement
best_triplet->pot_min = pot_min;
in function parse_state_file
Please, try if your app will continue from this checkpoint (I attached needed files).
[attachment deleted by admin]
-
Please, try if your app will continue from this checkpoint (I attached needed files).
Deleted result file first:
Ran with -bench first to check --- > no probs there
Running now [not -bench] a couple of minutes, no probs yet, watching it.
Are you using knabench or your own batch file maybe? if your own then you mightn't have a valid Init_data.xml in the folder ?
Jason
[Hope it isn't a long one!... It's election day here in Oz and so I have to go and get some beer to be well lubricated for the trip to the polling booth :D ]
.. 10 mins so far no crash [slow machine or long workunit .. or both :D]
Note ... After it finishes I will check it will start off from your checkpoint Indeed.
-
Stpped mine, gotta go soon,
Your checkpoint : Restarted at 65.72 percent.
going now 2 mins
-
No, you need run it only 1-2 min, just see if it restores from checkpoint, noo need to complete task :)
And what it be if result.sah will not be deleted? Cause in normal run under BOINC result file stays...
I run app just by running exe-file from the same directory where data files contained. Minimal run environment for SETI is exe-file + these 3 files, init_data.xml will change some app defaults but not required (as my previous experiments show).
-
Ok, trhank you!
So problem is in reading/applying state from saved file, files OK itself...
-
Oh for your checkpoint run I kept your state.sah & result.sah when starting (I has saved them in a folder.
Hmm I always had problems if there was no init_data.xml so I've always used one of the ones from the test packages.
-
when task will be completed, app_info.xml will be generated with some default values (mostly zero in all fields :) ) and CPU time of compelet task.
-
<wu_cpu_time>45502.234375</wu_cpu_time>
I guess that this includes your ~65% time ? seems like a rather long workunit ... or both our computers / apps are really slow :D [ I think it spent a shade under 3 hours for me to finish that ~35% :o...but my weekly virus scan did start so I might represent a smaller proportion of that ~46k seconds]
-
:)
Not sure about keeping CPU time along with intermediate results. It seems flops counter is keeped instread. And % of work done. Didnt notice CPU time field in state.sah.
-
:)
Not sure about keeping CPU time along with intermediate results. It seems flops counter is keeped instread. And % of work done. Didnt notice CPU time field in state.sah.
CPU time is kept in the init_data.xml file. When working with BOINC that's in the slot directory so you get a fresh copy when starting a new WU. When the app shuts down it adds its current internal elapsed time to whatever was there before. So stopping a standalone run and deleting state.sah to restart at the beginning just keeps adding to the <wu_cpu_time> in init_data.xml.
The internal elapsed time is only updated when checkpointing, so using <wu_cpu_time> as an indicator of speed involves very poor granularity. It's not the same CPU time which is reported to BOINC.
Joe
-
It's not the same CPU time which is reported to BOINC.
But in conjunction of % done field of state.sah? Are they written "atomicaly" at checkpointing? I mean if for example state.sah said 50% done in both cases and init_data.xml shows different times is it that difference that BOINC would show at 50% done moment for that tasks?
-
It's not the same CPU time which is reported to BOINC.
But in conjunction of % done field of state.sah? Are they written "atomicaly" at checkpointing? I mean if for example state.sah said 50% done in both cases and init_data.xml shows different times is it that difference that BOINC would show at 50% done moment for that tasks?
If you have two separate runs stopped when the <prog> field in state.sah indicates 50% the <wu_cpu_time> in init_data.xml does reflect the time associated with the checkpoint. If the <prog> values are quite close, doing a speed comparison that way should be reliable.
However, the progress calculation is not extremely linear. It's based on approximations of the relative time to do FFTs, chirping, and the various kinds of signal searches. Those approximations were last adjusted before setiathome_enhanced was released to the main project and even then it only produced roughly linear progress. With optimizations applied, a single set of approximations cannot be accurate. Theoretically an adjustment could be applied based on the relative timings of the "standard" routines and the chosen optimized routines, but it would take a fair amount of programming to implement and much testing time to get right.
Because all my hosts are single CPU, my preference has been to adjust the chirp limits in a WU to give a useful full run time, do the standalone testing with very little else running, and use the elapsed wall time in speed comparison. The extremely shortened testWU-1 and similar are definitely a compromise between accuracy and how much time is reasonable to ask of volunteer testers. I'd prefer test WUs with the chirp limits scaled down by a factor of 5 at most, TestWU-1 was scaled by 20 relative to the old 20 and 50 limits and that's about 30 or 40 relative to MB WUs.
For multi-CPU systems I think what would work best is using full run test WUs, but divide the final <wu_cpu_time> from init_data.xml by the <prog> value in the final state.sah to adjust for the partial checkpoint interval at the end.
Joe