Haohuan Fu

Haohuan Fu.jpg


Mail: haohuan@tsinghua dot edu dot cn

Room: s-817, Meng MinWei Science Building Center for Earth System Science, Tsinghua University, Beijing, 100084


  • 2015 - pres Deputy Director National Supercomputing Center in Wuxi
  • 2010 - pres Associate Professor Ministry of Education Key Laboratory for Earth System Modeling, and Department of Earth System Science, Tsinghua University
  • 2009 - 2010 Postdoctoral Research Fellow Stanford University
  • 2005 - 2009 PhD, Computing Imperial College London
  • 2003 - 2005 MPhil, CS City University of Hong Kong
  • 1999 - 2003 B.E., CS Tsinghua University, Beijing

Research Focus

  • Extreme-Scale Computing on Heterogeneous Supercomputers: one of my research interest is to explore the numerical methods, parallelization schemes, performance tuning techniques, as well as re-designing strategies for mapping some of the most challenging scientific applications, such as atmospheric modeling, and earthquake simulation, onto emerging many-core, and reconfigurable architectures, and to enable depiction of extreme-scale scientific scenarios on leadership peta to exa-scale supercomputers.
  • Data Mining Methods for Analyzing Scientific Data Sets: facing a fast growth in the volume of data from both observations and simulations, we explore parallel data mining methods, enabled by the emerging heterogeneous architectures, to acclerate the process of making new discoveries.
  • Programming Tools: with dramatic architectural changes in recent peta-scale supercomputers, I am also working on techniques and tools that can facilitate or even automate the code transition of large-scale scientific applications to the heterogeneous many-core architectures.


  • One of the Top 10 Research Achievements of Tsinghua University in 2017, 12/2017
  • 2017 ACM Gordon Bell Prize (lead author), SC17, Denver, 11/2017
  • People of the Year Award 2016, Scientific Chinese magazine, Beijing, 06/2017
  • 2016 ACM Gordon Bell Prize (third author), SC16, Salt Lake City, 11/2016
  • Tsinghua-Inspur Computational Earth Science Young Researcher Award for the Year of 2015, Beijing, 02/2016
  • Best Paper Award (3 out of 278 submissions), IEEE International Conference on Tools with Artificial Intelligence (ICTAI), Vietri sul Mare, 11/2015
  • Significant Paper in 25 Years, one of the 27 papers selected from the 1,765 publications in the first 25 years of the International Conference on Field Programmable Logic and Application (FPL), London, 09/2015
  • 2015 IBM Global Shared University Research Award, 10/2015
  • 2014 IBM Global Shared University Research Award, 10/2014
  • Best Paper Award, International Conference on Field Programmable Technology (FPT), Taipei, 12/2008
  • Best Poster Award, Computer System Research Day, Imperial College London, London, 09/2007

Representative Works

  • 18.9-Pflops non-linear earthquake simulation on Sunway TaihuLight: We adopt comprehensive memory-related optimizations to resolve TaihuLight's bandwidth constraints, and propose on-the-fly compression, which doubles the maximum problem size and further improves the performance by 24%; the resulting design achieves 15% of the peak, using 160,000 MPI processes, 10,400,000 cores, and enables simulation for 18-Hz and 8-meter scenarios (2017 ACM Gordon Bell Prize, one of the Top 10 Research Achievements of Tsinghua University in 2017).
  • Redesigning CAM-SE for Peta-Flops Performance on Sunway TaihuLight: First refactoring and redesign of the entire Community Atmospheric Model (over half million lines of code) for Sunway TaihuLight, achieving 3.4 SYPD for 25km global atmospheric simulation and close-to-observation simulation of hurricane Katrina lifecycle; a sustainable double-precision performance of over 3.3 PFlops (750-mresolution) of the dynamical-core using 10,075,000 cores (2017 ACM Gordon Bell Prize Finalist).
  • An ultra-scalable fully-implicit solver for nonhydrostatic atmospheric simulations: The solver scales to 10.5-million heterogeneous cores on Sunway TaihuLight at an unprecedented 488-mresolution with 770-billion unknowns, sustaining 7.95 Pflops performance in double-precision with 0.07 simulated-years-per-day (SYPD) (2016 ACM Gordon Bell Prize).
  • Deep Learning Environment on Sunway TaihuLight: We develop both the underlying library (swDNN) and framework (swCaffe) to enable big data analytics on Sunway TaihuLight. Guided by performance modeling, we adopt a systematic approach to achieve over 54% learning performance out of the theoretical peak in swDNN. The swCaffe can already support parallel training with up to 256 Sunway CPUs, and support scenarios from audio recognition, medical image classification, to land cover mapping, and object dection in remote sensing data.
  • A generalized highly scalable framework for atmospheric modeling on heterogeneous supercomputers: We cover a number of different accelerators ranging from GPU (Graphic Processing Unit), MIC (Many Integrated Core), to FPGAs (Field Programmable Gate Arrays), achieving sustained double-precision performances of 581 Tflops on Tianhe-1A (using 3750 nodes) and 3.74 Pflops on Tianhe-2 (using 8644 nodes).
  • A hybrid CPU-FPGA algorithm for solving shallow wave equations: We utilize single and multiple FPGAs to compute the upwind stencil for the global shallow water equations. Through mixed-precision arithmetic, we manage to build a deep pipepline on a single FPGA with 428 oating-point and 235 fixed-point operations per cycle. The algorithm using four FPGAs is 14 times faster and 9 times more power efficient than a hybrid CPU-GPU node (selected as one of the 27 Significant Papers out of the 1,765 publications in the first 25 years of the FPL conference).
  • Highly-efficient parallel machine learning methods: Our work includes an FPGA-based Gaussian Mixture Model clustering engine (with one FPGA providing equivalent performance to over 500 Intel CPU cores), an Intel MIC-based SVM library (achieving 4.4 to 84 times speedup against the popular LIBSVM), and a novel mutation strategy for differential evolution (Best Paper Award in ICTAI 2015).
  • Deep learning based methods for remote sensing data analysis: We explore the potential of deep learning based methods for both land cover mapping and object detection in remote sensing data. Examples include: stacked autoencoder (SAE) based African land cover mapping, with advantages of SAE in both accuracy and prediction time over random forest and SVM; and CNN-based oil palm tree detection and counting, with over 96% accuracy in the study area.