Improving Execution Efficiency of Just-in-time Compilation based Query Processing on GPUs
On-GPU Thread-Data Remapping for Branch Divergence Reduction
Parallelizing and Optimizing Programs for GPU Acceleration using CUDA - ppt download
Characterization and transformation of unstructured control flow in bulk synchronous GPU applications - Haicheng Wu, Gregory Diamos, Jin Wang, Si Li, Sudhakar Yalamanchili, 2012