Home     RSSRSS


Open|SpeedShop (O|SS) is an open source multi-platform performance tool enabling performance analysis of HPC applications running on both single node and large scale Intel, AMD, ARM, Intel Phi, Power PC, Power 8, GPU processor based systems, including Cray and IBM Blue Gene platforms.

O|SS is a community effort by The Krell Institute with current direct funding from DOE NNSA. Argo Navis Technologies, LLC, a for-profit company working with Krell, recently won a Phase II NASA SBIR to further develop O|SS. O|SS builds on top of a broad list of community infrastructures, most notably Dyninst and MRNet from UW, libmonitor from Rice, and PAPI from UTK.

O|SS gathers and displays several types of information to aid in solving performance problems, including: program counter sampling for a quick overview of the applications performance, call path profiling to add caller/callee context and locate critical time consuming paths, access to the machine hardware counter information, input/output tracing for finding I/O performance problems, MPI function call tracing for MPI load imbalance detection, memory analysis, POSIX thread tracing, NVIDIA CUDA analysis, and OpenMP analysis. O|SS offers a command-line interface (CLI), a graphical user interface (GUI) and a python scripting API user interface.

The base functionality includes:

  • Program Counter Sampling
  • Support for Call Path Analysis
  • Hardware Performance Counters
  • MPI Profiling and Event Tracing
  • I/O Call Profiling and Tracing
  • OpenMP Profiling and Analysis
  • Memory Analysis
  • POSIX Thread Analysis
  • NVIDIA CUDA Tracing and Analysis

Open|SpeedShop development is hosted by the Krell Institute. The infrastructure and base components of Open|SpeedShop are released as open source code primarily under LGPL.


  • No need to recompile the user’s application.
  • Comprehensive performance analysis for sequential, multi-threaded, MPI, and hybrid applications
  • Supports both first analysis steps as well as deeper analysis options for performance experts
  • Easy to use GUI and fully scriptable through a command line interface and Python
  • Supports Linux Systems and Clusters with Intel, ARM, AMD, Power, and other processors
  • Extensible through new performance analysis plugins ensuring consistent look and feel
  • In production use on all major cluster platforms at LANL, LLNL, and SNL


  • Four user interface options: batch, command line interface, graphical user interface and Python scripting API.
  • Supports multi-platform single system image(SSI) and traditional clusters.
  • Scales to large numbers of processes, threads, and ranks.
  • Ability to automatically create and attach to both sequential and parallel jobs from within Open|SpeedShop.
  • View performance data using multiple customizable views.
  • Save and restore performance experiment data and symbol information for post experiment performance analysis
  • View performance data for all of application’s lifetime or smaller time slices.
  • Compare performance results between processes, threads, or ranks between a previous experiment and current experiment.
  • GUI Wizard facility and context sensitive help.
  • Interactive CLI help facility which lists the CLI commands, syntax, and typical usage.
  • Python Scripting API accesses Open|SpeedShop functionality corresponding to CLI commands.
  • Option to automatically group like performing processes, threads, or ranks.
  • Create traces in OTF (Open Trace Format).

How-To-Use Open|SpeedShop HPC-Admin Magazine Article and Scientific Computing Article

In this article, we will describe how to use Open|SpeedShop through step-by-step examples illustrating how to find a number of different performance bottlenecks. Additionally, we will describe the tool’s most common usage model (workflow) and provide several performance data viewing options.

See Look for Bottlenecks with Open|SpeedShop
See Opening Up Performance with OpenSpeedShop an Open Source Profiler

Open|SpeedShop now on github

The Open|SpeedShop and Component Based Tool Framework (CBTF) sources are now available on github.
The repositories may be found at these locations:

The Open|SpeedShop release tarballs will be moving to github in the future. The release tarballs are currently still being hosted on sourceforge.

Open|SpeedShop at SC16

Members of the Open|SpeedShop team will be at Super Computing in 2016. This year we will, once again, have booth (1442) and will be giving demonstrations on-demand throughout the show at the Open|SpeedShop booth: .

To schedule a meeting with Jim or Don please send email to jeg AT krellinst.org.

Members of our team are presenting the “How to Analyze the Performance of Parallel Codes 101” tutorial on Monday, 11/14/16 (8:30am-5:00pm). The slides are not available at this time, but here are the slides from last years tutorial: Here is the URL.

Comments are disabled