VSCSE - Virtual School of Computational Science and Engineering

Proven Algorithmic Techniques for Many-core Processors

August 15–19, 2011


NEW DRAFT Schedule


Center for Computation & Technology, Louisiana State University, Baton Rouge, LA

Institute for Cyber Enabled Research, Michigan State University, East Lansing, MI

Institute for Data and High Performance Computing, Georgia Institute of Technology, Atlanta, GA

National Center for Supercomputing Applications, Urbana, IL

National Center for Supercomputing Applications ACCESS Center, Arlington, VA

Ohio Supercomputer Center, Ohio State University, Columbus, OH

Princeton Institute for Computational Science and Engineering, Princeton University, Princeton, NJ

University of Michigan, Ann Arbor, MI

University of Texas at El Paso, El Paso, TX

University of Utah, Salt Lake City, UT

Vanderbilt University, Nashville, TN


  • Experience working in a Unix environment
  • Experience developing and running scientific codes written in C or C++
  • Basic knowledge of CUDA (A short online course, Introduction to CUDA, will be available to registered on-site students who need assistance in meeting this prerequisite)

Students who took the course Many-core Processors in 2009 are encouraged to take this follow-on course, which includes new topics and lab exercises.

Wen-Mei W. Hwu, professor of electrical and computer engineering and principal investigator of the CUDA Center of Excellence, University of Illinois at Urbana-Champaign

David Kirk, NVIDIA fellow

Draft Course outline:

  • Introduction
    • why problem formulation and algorithm design choices can have dramatic effect on performance
    • common algorithmic strategies for high performance
  • Increasing locality in dense arrays
    • tiling of data access and layout
  • Improving efficiency and vectorization in dense arrays
    • granularity coarsening
  • Reducing output interference
    • conversion from scatter to gather
    • parallelizing reductions and histograms
  • Dealing with non-uniform data
    • data sorting and binning
  • Dealing with sparse data
    • sorting and packing
  • Dealing with dynamic data
    • parallel queue-based algorithms
  • Improving data efficiency in large data traversal
    • stencil and other grid-based computation
  • Extending beyond many-core processors
    • MPI+CUDA
    • MPI+OpenCL
  • Overview of use of techniques in application domains
    • molecular dynamics
    • computational fluid dynamics
    • medical imaging
    • computer vision
    • gene sequencing
  • Case studies:
    • molecular dynamics (NAMD/VMD, MPI, use of algorithm strategies)
    • medical imaging
    • gene sequencing, financial analysis, etc.
  • Hands-on Lab

NOTE: Students are required to provide their own laptops.