Overview

Overview
Open|SpeedShop (O|SS) is an open source multi-platform performance tool enabling performance analysis of HPC applications running on both single node and large scale Intel, AMD, ARM, Intel Phi, Power PC, Power 8, GPU processor based systems, including Cray and IBM Blue Gene platforms.

O|SS is a community effort. O|SS builds on top of a broad list of community infrastructures, most notably Dyninst and MRNet from UW, libmonitor from Rice, and PAPI from UTK.

O|SS gathers and displays several types of information to aid in solving performance problems, including: a high level summary of performance, program counter sampling for a lightweight flat profile to pinpoint where the slowdowns occurred, call path profiling to add caller/callee context and locate critical time consuming paths, access to the machine hardware counter information, input/output tracing for finding I/O performance problems, MPI function call tracing for MPI load imbalance detection, memory analysis, POSIX thread tracing, NVIDIA CUDA analysis, and OpenMP analysis. O|SS offers a command-line interface (CLI), a graphical user interface (GUI) and a python scripting API user interface.

The base functionality includes:

High Level Overview/Summary
Program Counter Sampling
Support for Call Path Analysis
Hardware Performance Counters
MPI Profiling and Event Tracing
I/O Call Profiling and Tracing
OpenMP Profiling and Analysis
Memory Analysis
POSIX Thread Analysis
NVIDIA CUDA Tracing and Analysis

The infrastructure and base components of Open|SpeedShop are released as open source code primarily under LGPL.

Highlights

No need to recompile the user’s application to get performance data at the function and library level. The debug option “-g” is needed in order to view statement, loop, and vector instruction level information.
Comprehensive performance analysis for sequential, multi-threaded, MPI, and hybrid applications
Supports both first analysis steps as well as deeper analysis options for performance experts
Easy to use GUI and fully scriptable through a command line interface and Python
Supports Linux Systems and Clusters with Intel, ARM, AMD, Power, and other processors
Extensible through new performance analysis plugins ensuring consistent look and feel
Detection of vector instructions, showing address, opcode, time spent, and hardware maximum operand size for the vector instruction.
In production use on all major cluster platforms at LANL, LLNL, and SNL

Features

Four user interface options: batch, command line interface, graphical user interface and Python scripting API.
Supports multi-platform single system image(SSI) and traditional clusters.
See the performance data in several levels of granularity:
- - Per library, per function, per loop, per statement and per vector instruction (only on Intel platforms – helps in AVX512 detection)
Scales to large numbers of processes, threads, and ranks.
Ability to automatically create and attach to both sequential and parallel jobs from within Open|SpeedShop.
View performance data using multiple customizable views.
Save and restore performance experiment data and symbol information for post experiment performance analysis
View performance data for all of application’s lifetime or smaller time slices.
Compare performance results between processes, threads, or ranks between a previous experiment and current experiment.
GUI context sensitive help.
Interactive CLI help facility which lists the CLI commands, syntax, and typical usage.
Python Scripting API accesses Open|SpeedShop functionality corresponding to CLI commands.
Option to automatically group like performing processes, threads, or ranks.

Open|SpeedShop at SC18

SC18.4CBlackRedTextOutline

Members of the Open|SpeedShop team will be at Super Computing in 2018. This year we will, once again, have booth (2840) and will be giving demonstrations on-demand throughout the show at the Open|SpeedShop booth: 2840. We are presenting our new high level, lightweight performance overview tool that gathers performance information for a number of metrics including MPI, OpenMP, I/O, Memory, and hardware performance counters to give a high level view of application performance in one run.

To schedule a meeting with Jim or Don please send email to jeg AT krellinst.org or stop by the booth.

Members of our team are presenting the “How to Analyze the Performance of Parallel Codes 101” tutorial on Monday, 11/11/18 (8:30am-5:00pm). The SC18 tutorial slides are now available: Here is the URL.

Open|SpeedShop at SC17

SC17.4CBlackRedTextOutline
Members of the Open|SpeedShop team will be at Super Computing in 2017. This year we will, once again, have booth (833) and will be giving demonstrations on-demand throughout the show at the Open|SpeedShop booth: 833.

To schedule a meeting with Jim or Don please send email to jeg AT krellinst.org or stop by the booth.

Members of our team are presenting the “How to Analyze the Performance of Parallel Codes 101” tutorial on Monday, 11/13/17 (8:30am-5:00pm). The SC17 tutorial slides are now available: Here is the URL.

How-To-Use Open|SpeedShop HPC-Admin Magazine Article and Scientific Computing Article

In this article, we will describe how to use Open|SpeedShop through step-by-step examples illustrating how to find a number of different performance bottlenecks. Additionally, we will describe the tool’s most common usage model (workflow) and provide several performance data viewing options.

See Look for Bottlenecks with Open|SpeedShop
See Opening Up Performance with OpenSpeedShop an Open Source Profiler

Open|SpeedShop now on github

The Open|SpeedShop and Component Based Tool Framework (CBTF) sources are now available on github.
The repositories may be found at these locations:

The Open|SpeedShop release tarballs will be moving to github in the future. The release tarballs are currently still being hosted on sourceforge.

Open|SpeedShop at SC16

SC16.4CBlackRedTextOutline
Members of the Open|SpeedShop team will be at Super Computing in 2016. This year we will, once again, have booth (1442) and will be giving demonstrations on-demand throughout the show at the Open|SpeedShop booth: .

To schedule a meeting with Jim or Don please send email to jeg AT krellinst.org.

Members of our team are presenting the “How to Analyze the Performance of Parallel Codes 101” tutorial on Monday, 11/14/16 (8:30am-5:00pm). The slides are not available at this time, but here are the slides from last years tutorial: Here is the URL.