Acceleration of seismic modeling of the near subsurface of the earth through heterogeneous architectures
Similar to atmospheric modeling, seismic modeling is also another domain that has been evolving with supercomputing technologies for many years. As shown in Fig. 1, my work in this area is divided into two parts: (1) seismic modeling and imaging for oil and gas exploration, shown on the left half; and (2) seismic modeling for natural earthquakes, shown on the right half.
Seismic modeling and imaging for oil and gas exploration
This part of research, seismic modeling for exploration, originates from my postdoc period at Stanford, where I worked in the Center for Computational Earth and Environmental Sciences (CEES). Looking back, the two-year postdoc work formed the starting point of my research that connects geoscience and HPC. Since then, I have been working on exploring various algorithms, architectures, and tuning techniques that would enable more accurate and faster seismic modeling. Over the years, the algorithms have evolved from linear acoustic modeling to nonlinear elastic modeling, and elastic models with Q attenuations.
Our first project investigates the potential of FPGAs for performing acoustic reverse time migrations. We propose our dataflow oriented solution for the Reverse Time Migration (RTM) algorithm, which is one of the most computationally demanding imaging algorithms in oil and gas exploration. Through a full set of algorithmic and architectural optimizations, we achieved a balanced utilization of different resources (computational units, buffering units, and memory bandwidth) in the system, avoiding any of them becoming a performance bottleneck. Using our design, one single FPGA chip provides equivalent performance to 72 Intel CPU cores, with 10 times better power efficiency [FPGA11, IEEE Micro14].
Collaborating with Dr. Guojie Song from the Department of Mathematics, we start to explore the potential of elastic forward modeling on GPUs and MICs [IPDPSW13, IJHPCA14].
With these research results, our group gradually becomes well known in the oil and gas industry for our expertise on HPC, especially on emerging heterogeneous accelerators. As a result, in the following years, we start a number of collaboration projects with major oil and gas companies.
Sponsored by Statoil (the national oil and gas company in Norway), we achieve a design of a highly efficient GPU-based beam migration. By parallelizing both the ray tracing and the beam mapping kernels with millions of GPU threads and using an asynchronous IO scheme, we derive a parallel beam migration design that fits current CPU-GPU hybrid clusters, with 2-6 times speedup compared to a parallel 16-core CPU design [SEG15a].
Sponsored by BGP (Bureau of Geophysical Prospecting) of CNPC (China National Petroleum Corporation), we explore the potential performance of the innovative Explicit Time Evolution (ETE) method on GPU. We present a set of new optimization strategies for ETE stencils according to the memory hierarchy of NVIDIA GPU. Based on the state-of-the-art GPU architecture, combining with existing spatial and temporal stencil blocking schemes, we manage to achieve 9.6x and 9.9x speedups compared with a well-tuned 12-core CPUs version for 37-point and 73-point ETE stencils, respectively. Our designs lead to an ETE method that is 31.2x faster than conventional CPU-FD method and make it a practical seismic imaging technology [SEG15b, ICPADS15]. The work is also extended to the MIC platform, and significantly improves the performance of stencil operations in a real seismic imaging application and introduces a new option to write highly efficient memory-bound stencil-like loops [HiPC16].
Collaborating with the SEP (Stanford Exploration Project) group in Stanford, we also derive an efficient design for the most advanced and most complex Elastic Q modeling on GPU [EAGE17].
Seismic modeling for natural earthquakes
Based on our continuous work of seismic modeling in small-scale exploration scenarios, we recently also extend to the simulation of large-scale natural earthquake using similar technologies, which brings computational challenges to a completely different scale.
Given a simulation domain of 300 km by 300 km by 50 km, and a resolution of 20 m, the computation problem involves over 500 billion gird points. With 30 to 40 variables and over 500 flops per grid in a typical nonlinear model, such a simulation translates into a total memory space of 150 TB and 100 Exa-flops, thus only feasible on leadership supercomputers.
Over the years, my group have accumulated a rich set of experience on performing highly efficient seismic modeling, for both linear and nonlinear cases, on heterogeneous architectures. However, both the physics features and the numerical schemes in the domain of large-scale natural earthquakes are unknown territories. Fortunately, in the project, we receive tremendous support and advice from the team led by Dr.Yifeng Cui from SCEC (Southern California Earthquake Center), and have Prof. Xiaofei Chen’s research group from Southern University of Science and Technology to join as our earthquake experts.
Originated from the AWP-ODC (from SCEC) and CG-FDM (from Prof. Xiaofei Chen’s research group) codes, we develop a fully-optimized large-scale nonlinear earthquake simulation software on Sunway TaihuLight. Combining a customized parallelization scheme, an elaborate memory scheme, and on-the-fly compression, we manage to remove the memory constraints of Sunway TaihuLight, achieving over 15% of the system's peak, better than the 11.8% efficiency achieved by a similar software running on Titan, whose byte to flop ratio is 5 times better than TaihuLight. The extreme cases demonstrate a sustained performance of over 18.9 Pflops, enabling the simulation of Tangshan earthquake as an 18-Hz scenario with an 8-meter resolution [SC17b]. This work won the 2017 ACM Gordon Bell Prize, with myself as the lead author.
Again, I led collaborations across different groups, different disciplines, different universities, and even different countries, which provide indispensable foundations for the complex scientific challenges that our projects address.
Key Publications for Seismic Modeling Software
[SC17b] Haohuan Fu, Conghui He, Bingwei Chen, and et al., “18.9-Pflops Nonlinear Earthquake Simulation on Sunway TaihuLight: Enabling Depiction of 18-Hz and 8-Meter Scenarios”, in Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC17), 12 pages, ACM Gordon Bell Prize, 2017.
[EAGE17] Conghui He, Haohuan Fu, Yi Shen, Robert G. Clapp, and Guangwen Yang, “Approximating Q Propagations for Elastic Modeling on GPUs”, In 79th EAGE Conference and Exhibition 2017.
[HiPC16] Jiarui Fang, Haohuan Fu and Guangwen Yang, “Cache-friendly Design for Complex Spatially-variable Coefficient Stencils on Many-core Architectures”, in Proceedings of the IEEE 23rd International Conference on High Performance Computing, Data, and Analytics (HiPC), pp. 222-231, Hyderabad, India, 2016.
[ICPADS15] Jiarui Fang, Haohuan Fu, He Zhang, Wei Wu, Nanxun Dai, Gan Lin and Guangwen Yang, “Optimizing Complex Spatially-Variant Coefficient Stencils For Seismic Modeling on GPU”, in Proceedings of the 21st IEEE International Conference on Parallel and Distributed Systems (ICPADS), pp. 641-648, Melbourne, Australia, 2015.
[SEG15a] Conghui He, Haohuan Fu, Bangtian Liu, Huabin Ruan, Guangwen Yang, Hui Yang, and Are Osen, “A GPU-based Parallel Beam Migration Design”, Expanded Abstract, 83rd Society of Exploration Geophysicist (SEG) Meeting, pp. 4313-4317, 2015.
[SEG15b] Jiarui Fang, Haohuan Fu, Wei Wu, Nanxun Dai, and Guangwen Yang, “GPU-based explicit time evolution method”, Expanded Abstract, 83rd Society of Exploration Geophysicist (SEG) Meeting, pp. 3549-3553, 2015.
[IJHPCA14] Yang You, Haohuan Fu*, Shuaiwen Song, and et al., “Evaluating Multi-core and Many-core Architectures through Accelerating the Three-Dimensional Lax-Wendroff Correction Stencil”, International Journal of High Performance Computing Applications, vol. 28, no. 3, pp. 301-318, 2014.
[IEEE Micro14] Haohuan Fu, Lin Gan, R.G. Clapp, and et al., “Scaling Reverse Time Migration Performance Through Reconfigurable Data Flow Engines”, IEEE MICRO, vol. 34, no. 1, pp. 30-40, 2014.
[IPDPSW13] Yang You, Haohuan Fu*, Xiaomeng Huang, Guojie Song, Lin Gan, Wenjian Yu, and Guangwen Yang, “Accelerating the 3D Elastic Wave Forward Modeling on GPU and MIC”, In Proceedings of the International Parallel and Distributed Processing Symposium (IPDPS) Workshops, pp. 1088-1096, 2013.
[FPGA11] Haohuan Fu, Robert G. Clapp, “Eliminating the Memory Bottleneck: An FPGA-based Solution for 3D Reverse Time Migration”, in Proceedings of the International Symposium on Field-Programmable Gate Arrays (FPGA), pp. 65-74, 2011.