Manipal Journal of Science and Technology


This paper describes a strategy to parallelize and implement in real-time the multi-plane tomosynthetic image reconstruction algorithm used in common x-ray fluoroscopy systems such as the Scanning Beam Digital X-Ray (SBDX) system, on high performance computing platforms such as general purpose Graphical Processing Units (GPU). The authors contrast two different parallelizing schemata, namely, the detector centric and the pixel centric parallelization approaches under the specific assumption of a regular detector grid and separability of the x and y dimensions. In both of these schemes, the use of look-up tables (LUT) helps to reduce run time computations but requires effective memory management strategies. An optimal implementation of these schemes on GPUs also needs to maintain a high level of achieved occupancy by setting an appropriate thread-block configuration. The paper reports results of implementing the two parallelization schemes and associated optimizations on Nvidia GPUs and demonstrates 15 fps performance using a single GeForce GTX690 card. The paper concludes that the pixel centric approach has better arithmetic intensity and superior scalability properties which makes it ideal for use in multi-GPU systems.

Included in

Engineering Commons