BTW, if you feel current VLAR processing times too long for production app on main you could continue to use my V5b that will abotr VLARs and release task to be processed by another host possibly with CPU opt app (that will do processing much faster and effective). This is not perfect but some way of load balansing between CUDA/CPU in current conditions.
(I need some time to synch with threads now so sorry if this was proposed already).