• Brian J. N. Wylie, Judit Giménez, Christian Feld, Markus Geimer, Germán Llort, Sandra Mendez, Estanislao Mercadal, Anke Visser, Marta García-Gasulla: 15+ years of joint parallel application performance analysis/tools training with Scalasca/Score-P and Paraver/Extrae toolsets. Future Generation Computer Systems, 162:Article No. 107472, 13 pages, January 2025.
    URL       DOI       BibTeX 

  • Gregor Corbin, Nour Daoud, Bernd Mohr, Gustavo de Morais, Felix Wolf: Are Noise-Resilient Logical Timers useful for Performance Analysis?. In Proc. of the Workshop on Programming and Performance Visualization Tools (ProTools), held in conjunction with the Supercomputing Conference (SC24), Atlanta, GA, USA, pages 1519-1530, IEEE, November 2024.
    DOI       BibTeX 

  • Isabel Thärigen, Marc-André Hermanns, Markus Geimer: An Event Model for Trace-Based Performance Analysis of MPI Partitioned Point-to-Point Communication. In Proc. of the Workshop on Programming and Performance Visualization Tools (ProTools), held in conjunction with the Supercomputing Conference (SC23), Denver, CO, USA, pages 1357–1367, ACM, November 2023.
    URL       DOI       BibTeX 

  • Christian Feld, Markus Geimer, Marc-André Hermanns, Pavel Saviankou, Anke Visser, Bernd Mohr: Detecting Disaster Before It Strikes: On the Challenges of Automated Building and Testing in HPC Environments. In Tools for High Performance Computing 2018 / 2019, pages 3-26, Springer International Publishing, 2021.
    URL       DOI       BibTeX 

  • Brian J. N. Wylie: Exascale potholes for HPC: Execution performance and variability analysis of the flagship application code HemeLB. In Proc. of 2020 IEEE/ACM International Workshop on HPC User Support Tools (HUST) and the Workshop on Programming and Performance Visualization Tools (ProTools), held in conjunction with the Supercomputing Conference (SC20), pages 59–70, IEEE, November 2020.
    URL       DOI       BibTeX 

  • Marcus Ritter, Alexandru Calotoiu, Sebastian Rinke, Thorsten Reimann, Torsten Hoefler, Felix Wolf: Learning Cost-Effective Sampling Strategies for Empirical Performance Modeling. In Proc. of the 34th IEEE International Parallel and Distributed Processing Symposium (IPDPS), New Orleans, LA, USA, pages 884–895, IEEE, May 2020.
    PDF       DOI       BibTeX 

  • Christian Feld, Simon Convent, Marc-André Hermanns, Joachim Protze, Markus Geimer, Bernd Mohr: Score-P and OMPT: Navigating the Perils of Callback-Driven Parallel Runtime Introspection. In Proc. of the 15th International Workshop on OpenMP (IWOMP 2019, September 11–13, 2019, Auckland, New Zealand), volume 11718 of Lecture Notes in Computer Science, pages 21–35, Springer, Cham, 2019.
    PDF       DOI       BibTeX 

  • Marc-André Hermanns, Nathan T. Hjelm, Michael Knobloch, Kathryn Mohror, Martin Schulz: The MPI_T events interface: An early evaluation and overview of the interface. Parallel Computing, 85:119 - 130, 2019.
    PDF       URL       DOI       BibTeX 

  • Jan-Patrick Lehr, Alexandru Calotoiu, Christian Bischof, Felix Wolf: Automatic Instrumentation Refinement for Empirical Performance Modeling. In Proc. of the Workshop on Programming and Performance Visualization Tools (ProTools), held in conjunction with the Supercomputing Conference (SC19), Denver, CO, USA, pages 40–47, November 2019.
    PDF       DOI       BibTeX 

  • Alexandru Calotoiu, Thomas Höhl, Heiko Mantel, Toni Nguyen, Felix Wolf: Designing Efficient Parallel Software via Compositional Performance Modeling. In Proc. of the Workshop on Programming and Performance Visualization Tools (ProTools), held in conjunction with the Supercomputing Conference (SC19), Denver, CO, USA, pages 17–24, November 2019.
    PDF       DOI       BibTeX 

  • Marc Schlütter, Christian Feld, Pavel Saviankou, Michael Knobloch, Marc-André Hermanns, Bernd Mohr: SCIPHI Score-P and Cube Extensions for Intel Phi. In Tools for High Performance Computing 2017, pages 85-104, Cham, Springer International Publishing, September 2019.
    PDF       DOI       BibTeX 

  • Sergei Shudler, Yannick Berens, Alexandru Calotoiu, Torsten Hoefler, Alexandre Strube, Felix Wolf: Engineering Algorithms for Scalability through Continuous Validation of Performance Expectations. IEEE Transactions on Parallel and Distributed Systems, 30(8):1768–1785, August 2019.
    PDF       DOI       BibTeX 

  • Aamer Shah, Chihsong Kuo, Akihiro Nomura, Satoshi Matsuoka, Felix Wolf: How File-access Patterns Influence the Degree of I/O Interference between Cluster Applications. Supercomputing Frontiers and Innovations, 6(2):29–55, July 2019.
    PDF       DOI       BibTeX 

  • Marc-André Hermanns: Understanding the formation of wait states in one-sided communication. PhD thesis, RWTH Aachen University, Jülich, 2018.
    URL       DOI       BibTeX 

  • Philip C. Roth, Kevin Huck, Ganesh Gopalakrishnan, Felix Wolf: Using Deep Learning for Automated Communication Pattern Characterization: Little Steps and Big Challenges. In Proc. of the 5th Workshop on Visual Performance Analysis (VPA), held in conjunction with the Supercomputing Conference (SC18), Dallas, TX, USA, volume 11027 of Lecture Notes in Computer Science, pages 265–272, Springer, November 2018.
    PDF       DOI       BibTeX 

  • Sergei Shudler, Jadran Vrabec, Felix Wolf: Understanding the Scalability of Molecular Simulation using Empirical Performance Modeling. In Proc. of the 7th Workshop on Extreme Scale Programming Tools (ESPT), held in conjunction with the Supercomputing Conference (SC18), Dallas, TX, USA, volume 11027 of Lecture Notes in Computer Science, pages 125–143, Springer, November 2018.
    PDF       DOI       BibTeX 

  • Michael Burger, Christian Bischof, Alexandru Calotoiu, Felix Wolf, Thomas Wunderer, Johannes Buchmann: Exploring the Performance Envelope of the LLL Algorithm. In CSE 2018 - 21st IEEE International Conference of Computational Science and Engineering, Faculty of Automatic Control and Computers, University Politehnica of Bucharest, Romania, pages 36–43, IEEE, October 2018.
    PDF       DOI       BibTeX 

  • Alexandru Calotoiu, Alexander Graf, Torsten Hoefler, Daniel Lorenz, Sebastian Rinke, Felix Wolf: Lightweight Requirements Engineering for Exascale Co-design. In Proc. of the 2018 IEEE International Conference on Cluster Computing (CLUSTER), Belfast, UK, pages 201–211, IEEE, September 2018.
    PDF       DOI       BibTeX 

  • Marc-André Hermanns, Nathan T. Hjelm, Michael Knobloch, Kathryn Mohror, Martin Schulz: Enabling callback-driven runtime introspection via MPI_T. In 25th European MPI Users' Group Meeting (EuroMPI'18), September 23-26, 2018, Barcelona, Spain, New York, NY, USA, ACM, September 2018.
    DOI       BibTeX 

  • Aamer Shah, Matthias S. Müller, Felix Wolf: Estimating the Impact of External Interference on Application Performance. In Proc. of the 24th Euro-Par Conference, Turin, Italy, volume 11014 of Lecture Notes in Computer Science, pages 46–58, Springer, August 2018.
    PDF       DOI       BibTeX 

  • Wendy Sharples, Ilya Zhukov, Markus Geimer, Klaus Görgen, Sebastian Lührs, Thomas Breuer, Bibi Naz, Ketan Kulkarni, Slavko Brdar, Stefan Kollet: A run control framework to streamline profiling, porting, and tuning simulation runs and provenance tracking of geoscientific applications. Geoscientific Model Development, 11(7):2875–2895, July 2018.
    DOI       BibTeX 

  • Sergei Shudler: Scalability Engineering for Parallel Programs Using Empirical Performance Models. PhD thesis, Technische Universität Darmstadt, Darmstadt, Germany, June 2018.
    URL       BibTeX 

  • Marc-André Hermanns, Markus Geimer, Bernd Mohr, Felix Wolf: Trace-based Detection of Lock Contention in MPI One-Sided Communication. In Tools for High Performance Computing 2016, Proc. of the 10th Parallel Tools Workshop, Stuttgart, Germany, October 2016, pages 97–114, Springer, 2017.
    URL       DOI       BibTeX 

  • Daniele Tafani, Marc Schlütter, Markus Geimer, Bernd Mohr, Mathias Nachtmann, José Gracia: The Mont-Blanc Project: Second Phase successfully finished. Innovatives Supercomputing in Deutschland (inSiDE), 15(1):134–141, 2017.
    URL       BibTeX 

  • Alexandru Calotoiu: Automatic Empirical Performance Modeling of Parallel Programs. PhD thesis, Technische Universität Darmstadt, Darmstadt, Germany, October 2017.
    URL       BibTeX 

  • Patrick Reisert, Alexandru Calotoiu, Sergei Shudler, Felix Wolf: Following the Blind Seer – Creating Better Performance Models Using Less Information. In Proc. of the 23rd Euro-Par Conference, Santiago de Compostela, Spain, volume 10417 of Lecture Notes in Computer Science, pages 106–118, Springer, August 2017.
    PDF       DOI       BibTeX 

  • Kashif Ilyas, Alexandru Calotoiu, Felix Wolf: Off-Road Performance Modeling – How to Deal with Segmented Data. In Proc. of the 23rd Euro-Par Conference, Santiago de Compostela, Spain, volume 10417 of Lecture Notes in Computer Science, pages 36–48, Springer, August 2017.
    PDF       DOI       BibTeX 

  • Daniel Lorenz, Christian Feld: Scaling Score-P to the next level. In Proc. of the International Converence of Computational Science Workshops, pages 2180–-2189, Elsevier, June 2017.
    PDF       DOI       BibTeX 

  • Hristo Iliev, Marc-André Hermanns, Jens Henrik Göbbert, René Halver, Christian Terboven, Bernd Mohr, Matthias S. Müller: Performance Optimization of Parallel Applications in Diverse On-Demand Development Teams. In High-Performance Scientific Computing – First JARA-HPC Symposium 2016, October 4–5, 2016, Aachen, Germany, volume 10164 of Lecture Notes in Computer Science, pages 187–199, Springer International Publishing, March 2017.
    URL       DOI       BibTeX 

  • Sergei Shudler, Alexandru Calotoiu, Torsten Hoefler, Felix Wolf: Isoefficiency in Practice: Configuring and Understanding the Performance of Task-based Applications. In Proc. of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP), Austin, TX, USA, pages 131–143, ACM, February 2017.
    PDF       DOI       BibTeX 

  • Tom Vierjahn, Marc-André Hermanns, Bernd Mohr, Matthias S. Müller, Torsten W. Kuhlen, Bernd Hentschel: Using Directed Variance to Identify Meaningful Views in Call-path Performance Profiles. In Proceedings of the 3rd International Workshop on Visual Performance Analysis of VPA '16, pages 9–16, Piscataway, NJ, USA, IEEE Press, 2016.
    URL       DOI       BibTeX 

  • Alexandru Calotoiu, David Beckingsale, Christopher W. Earl, Torsten Hoefler, Ian Karlin, Martin Schulz, Felix Wolf: Fast Multi-Parameter Performance Modeling. In Proc. of the 2016 IEEE International Conference on Cluster Computing (CLUSTER), Taipei, Taiwan, pages 172–181, IEEE, September 2016.
    PDF       DOI       BibTeX 

  • David Böhme, Markus Geimer, Lukas Arnold, Felix Voigtländer, Felix Wolf: Identifying the root causes of wait states in large-scale parallel applications. ACM Transactions on Parallel Computing, 3(2):Article No. 11, 24 pages, July 2016.
    PDF       DOI       BibTeX 

  • Monika Harlacher, Alexandru Calotoiu, John Dennis, Felix Wolf: Analysing the Scalability of Climate Codes Using New Features of Scalasca. In Proc. of the John von Neumann Institute for Computing (NIC) Symposium 2016, Juelich, Germany, volume 48 of NIC Series, pages 343–352. Forschungszentrum Jülich, John von Neumann-Institut for Computing, February 2016.
    BibTeX 

  • Ilya Zhukov, Christian Feld, Markus Geimer, Michael Knobloch, Bernd Mohr, Pavel Saviankou: Scalasca v2: Back to the Future. In Proc. of Tools for High Performance Computing 2014, pages 1-24, Springer, 2015.
    DOI       BibTeX 

  • Laura von Rüden, Marc-André Hermanns, Michael Behrisch, Daniel Keim, Bernd Mohr, Felix Wolf: Separating the Wheat from the Chaff: Identifying Relevant and Similar Performance Data with Visual Analytics. In Proc. of the 2nd Workshop on Visual Performance Analysis (VPA), held in conjunction with the Supercomputing Conference (SC15), Austin, TX, USA, pages 4:1–4:8, ACM, 2015.
    PDF       DOI       BibTeX 

  • Daniel Lorenz, Sergei Shudler, Felix Wolf: Preventing the explosion of exascale profile data with smart thread-level aggregation. In Proc. of the 4th Workshop on Extreme Scale Programming Tools (ESPT), held in conjunction with the Supercomputing Conference (SC15), Austin, TX, USA, pages 1–10, ACM, November 2015.
    PDF       DOI       BibTeX 

  • Andreas Vogel, Alexandru Calotoiu, Alexandre Strube, Sebastian Reiter, Arne Nägel, Felix Wolf, Gabriel Wittum: 10,000 Performance Models per Minute - Scalability of the UG4 Simulation Framework. In Proc. of the 21st Euro-Par Conference, Vienna, Austria, volume 9233 of Lecture Notes in Computer Science, pages 519–531, Springer, August 2015.
    PDF       DOI       BibTeX 

  • Christian Iwainsky, Sergei Shudler, Alexandru Calotoiu, Alexandre Strube, Michael Knobloch, Christian Bischof, Felix Wolf: How Many Threads will be too Many? On the Scalability of OpenMP Implementations. In Proc. of the 21st Euro-Par Conference, Vienna, Austria, volume 9233 of Lecture Notes in Computer Science, pages 451–463, Springer, August 2015.
    PDF       DOI       BibTeX 

  • Sergei Shudler, Alexandru Calotoiu, Torsten Hoefler, Alexandre Strube, Felix Wolf: Exascaling Your Library: Will Your Implementation Meet Your Expectations?. In Proc. of the International Conference on Supercomputing (ICS), Newport Beach, CA, USA, pages 165–175, ACM, June 2015.
    PDF       DOI       BibTeX 

  • Pavel Saviankou, Michael Knobloch, Anke Visser, Bernd Mohr: Cube v4: From Performance Report Explorer to Performance Analysis Tool. Procedia Computer Science, 51:1343–1352, June 2015.
    PDF       DOI       BibTeX 

  • Jie Jiang, Peter Philippen, Michael Knobloch, Bernd Mohr: Performance Measurement and Analysis of Transactional Memory and Speculative Execution on IBM Blue Gene/Q. In Proceedings of Euro-Par 2014 Parallel Processing, volume 8632 of Lecture Notes in Computer Science, pages 26-37, Springer International Publishing, 2014.
    PDF       URL       DOI       BibTeX 

  • Christian Rössel, Bernd Mohr, Markus Geimer, Daniel Becker: Successful Technology Transfer with Siemens – The RAPID Project. Innovatives Supercomputing in Deutschland (inSiDE), 12(3):72–75, 2014.
    URL       BibTeX 

  • Fabian Gasper, Klaus Görgen, Prabhakar Shrestha, Mauro Sulis, Jehan Rihani, Markus Geimer, Stefan Kollet: Implementation and scaling of the fully coupled Terrestrial Systems Modeling Platform (TerrSysMP v1.0) in a massively parallel supercomputing environment – a case study on JUQUEEN (IBM Blue Gene/Q). Geoscientific Model Development, 7(5):2531–2543, October 2014.
    PDF       URL       DOI       BibTeX 

  • Daniel Lorenz, Robert Dietrich, Ronny Tschüter, Felix Wolf: A comparison between OPARI2 and the OpenMP tools interface in the context of Score-P. In Proc. of the 10th International Workshop on OpenMP (IWOMP), Salvador, Brazil, September 2014, volume 8766 of LNCS, pages 161–172, Springer, September 2014.
    PDF       DOI       BibTeX 

  • Gouyong Mao, David Böhme, Marc-André Hermanns, Markus Geimer, Daniel Lorenz, Felix Wolf: Catching Idlers with Ease: A Lightweight Wait-State Profiler for MPI Programs. In EuroMPI '14: Proc. of the 21th European MPI Users' Group Meeting, Kyoto, Japan, pages 103–108, ACM, September 2014.
    PDF       DOI       BibTeX 

  • Chihsong Kuo, Aamer Shah, Akihiro Nomura, Satoshi Matsuoka, Felix Wolf: How File Access Patterns Influence Interference Among Cluster Applications. In Proc. of the IEEE International Conference on Cluster Computing (CLUSTER), Madrid, Spain, pages 1–8, IEEE, September 2014.
    PDF       DOI       BibTeX 

  • Felix Wolf, Christian Bischof, Torsten Hoefler, Bernd Mohr, Gabriel Wittum, Alexandru Calotoiu, Christian Iwainsky, Alexandre Strube, Andreas Vogel: Catwalk: A Quick Development Path for Performance Models. In Euro-Par 2014: Parallel Processing Workshops, volume 8805, 8806 of Lecture Notes in Computer Science, Springer, September 2014.
    DOI       BibTeX 

  • Alexandru Calotoiu, Torsten Hoefler, Felix Wolf: Mass-producing Insightful Performance Models. In Workshop on Modeling & Simulation of Systems and Applications, University of Washington, Seattle, Washington, August 2014.
    PDF       URL       BibTeX 

  • Marc Schlütter, Peter Philippen, Laurent Morin, Markus Geimer, Bernd Mohr: Profiling Hybrid HMPP Applications with Score-P on Heterogeneous Hardware. In Parallel Computing: Accelerating Computational Science and Engineering (CSE), volume 25 of Advances in Parallel Computing, pages 773 - 782, IOS Press, March 2014.
    PDF       URL       DOI       BibTeX 

  • Julien Jaeger, Peter Philippen, Eric Petit, Andres Charif Rubial, Christian Rössel, William Jalby, Bernd Mohr: Binary Instrumentation for Scalable Performance Measurement of OpenMP Applications. In Parallel Computing: Accelerating Computational Science and Engineering (CSE), volume 25 of Advances in Parallel Computing, pages 783-792, IOS Press, March 2014.
    URL       DOI       BibTeX 

  • David Böhme: Characterizing Load and Communication Imbalance in Parallel Applications. PhD thesis, RWTH Aachen University, volume 23 of IAS Series, Forschungszentrum Jülich, February 2014, ISBN 978-3-89336-940-9.
    URL       DOI       BibTeX 

  • Ilya Zhukov, Brian J. N. Wylie: Assessing Measurement and Analysis Performance and Scalability of Scalasca 2.0. In Proc. of the Euro-Par 2013: Parallel Processing Workshops, volume 8374 of Lecture Notes in Computer Science, pages 627-636, Springer, January 2014.
    PDF       DOI       BibTeX 

  • Andreas Knüpfer, Robert Dietrich, Jens Doleschal, Markus Geimer, Marc-André Hermanns, Christian Rössel, Ronny Tschüter, Bert Wesarg, Felix Wolf: Generic Support for Remote Memory Access Operations in Score-P and OTF2. In Tools for High Performance Computing 2012, Proc. of the 6th Parallel Tools Workshop, Stuttgart, Germany, September 2012, pages 57–74, Springer, 2013.
    DOI       BibTeX 

  • Daniel Lorenz, David Böhme, Bernd Mohr, Alexandre Strube, Zoltán Szebenyi: Extending Scalasca’s Analysis Features. In Tools for High Performance Computing 2012, pages 115–126, Springer Berlin Heidelberg, 2013.
    PDF       DOI       BibTeX 

  • Alexandre E. Eichenberger, John M. Mellor-Crummey, Martin Schulz, Michael Wong, Nawal Copty, John DelSignore, Robert Dietrich, Xu Liu, Eugene Loh, Daniel Lorenz: OMPT: OpenMP Tools Application Programming Interfaces for Performance Analysis. In Proc. of the 9th International Workshop on OpenMP (IWOMP), Canberra, Australia of LNCS, pages 171–185, Berlin / Heidelberg, Springer, 2013.
    PDF       DOI       BibTeX 

  • Bernd Mohr, Vladimir Voevodin, Judit Giménez, Erik Hagersten, Andreas Knüpfer, DmitryA. Nikitenko, Mats Nilsson, Harald Servat, Aamer Shah, Frank Winkler, Felix Wolf, Ilya Zhukov: The HOPSA Workflow and Tools. In Tools for High Performance Computing 2012, Proc. of the 6th Parallel Tools Workshop, Stuttgart, Germany, September 2012, pages 127–146, Springer, 2013.
    PDF       DOI       BibTeX 

  • Alexandru Calotoiu, Torsten Hoefler, Marius Poke, Felix Wolf: Using Automated Performance Modeling to Find Scalability Bugs in Complex Codes. In Proc. of the ACM/IEEE Conference on Supercomputing (SC13), Denver, CO, USA, pages 1–12, ACM, November 2013.
    PDF       DOI       BibTeX 

  • Marc-André Hermanns, Manfred Miklosch, David Böhme, Felix Wolf: Understanding the formation of wait states in applications with one-sided communication. In EuroMPI '13: Proc. of the 20th European MPI Users' Group Meeting, Madrid, Spain, September 15–18, 2013, pages 73–78, New York, NY, USA, ACM, September 2013.
    PDF       DOI       BibTeX 

  • Aamer Shah, Felix Wolf, Sergey Zhumatiy, Vladimir Voevodin: Capturing inter-application interference on clusters. In Proc. of the IEEE International Conference on Cluster Computing (CLUSTER), Indianapolis, IN, USA, pages 1–5, IEEE, September 2013.
    PDF       DOI       BibTeX 

  • Brian J. N. Wylie, Wolfgang Frings: Scalasca support for MPI+OpenMP parallel applications on large-scale HPC systems based on Intel Xeon Phi. In Proc. XSEDE'13 Conference on Extreme Science and Engineering Discovery Environment: Gateway to Discovery (San Diego, CA, USA), ACM, July 2013.
    DOI       BibTeX 

  • Daniel Becker, Markus Geimer, Rolf Rabenseifner, Felix Wolf: Extending the scope of the controlled logical clock. Cluster Computing, 16(1):171–189, March 2013.
    PDF       DOI       BibTeX 

  • Marc-André Hermanns, Sriram Krishnamoorthy, Felix Wolf: A scalable infrastructure for the performance analysis of passive target synchronization. Parallel Computing, 39(3):132–145, March 2013.
    PDF       DOI       BibTeX 

  • Markus Geimer, Pavel Saviankou, Alexandre Strube, Zoltán Szebenyi, Felix Wolf, Brian J. N. Wylie: Further improving the scalability of the Scalasca toolset. In Proc. of PARA 2010: State of the Art in Scientific and Parallel Computing, Part II: Minisymposium Scalable tools for High Performance Computing, Reykjavik, Iceland, June 6–9 2010, volume 7134 of Lecture Notes in Computer Science, pages 463–474, Springer, 2012.
    PDF       DOI       BibTeX 

  • Dieter an Mey, Scott Biersdorff, Christian Bischof, Kai Diethelm, Dominic Eschweiler, Michael Gerndt, Andreas Knüpfer, Daniel Lorenz, Allen D. Malony, Wolfgang E. Nagel, Yury Oleynik, Christian Rössel, Pavel Saviankou, Dirk Schmidl, Sameer S. Shende, Michael Wagner, Bert Wesarg, Felix Wolf: Score-P: A Unified Performance Measurement System for Petascale Applications. In Proc. of the CiHPC: Competence in High Performance Computing, HPC Status Konferenz der Gauß-Allianz e.V., Schwetzingen, Germany, June 2010, pages 85–97. Gauß-Allianz, Springer, 2012.
    PDF       DOI       BibTeX 

  • Dominic Eschweiler, Michael Wagner, Markus Geimer, Andreas Knüpfer, Wolfgang E. Nagel, Felix Wolf: Open Trace Format 2 - The Next Generation of Scalable Trace Formats and Support Libraries. In Proc. of the Intl. Conference on Parallel Computing (ParCo), Ghent, Belgium, August 30 – September 2 2011, volume 22 of Advances in Parallel Computing, pages 481–490, IOS Press, 2012.
    PDF       DOI       BibTeX 

  • Ulf Andersson, Brian J. N. Wylie: Performance engineering of GemsFDTD computational electromagnetics solver. In Proc. of PARA 2010:State of the Art in Scientific and Parallel Computing, Reykjavík, Iceland, Part I, volume 7133 of Lecture Notes in Computer Science, pages 314-324, Springer, 2012.
    PDF       DOI       BibTeX 

  • Andreas Knüpfer, Christian Rössel, Dieter an Mey, Scott Biersdorff, Kai Diethelm, Dominic Eschweiler, Markus Geimer, Michael Gerndt, Daniel Lorenz, Allen D. Malony, Wolfgang E. Nagel, Yury Oleynik, Peter Philippen, Pavel Saviankou, Dirk Schmidl, Sameer S. Shende, Ronny Tschüter, Michael Wagner, Bert Wesarg, Felix Wolf: Score-P – A Joint Performance Measurement Run-Time Infrastructure for Periscope, Scalasca, TAU, and Vampir. In Tools for High Performance Computing 2011, Proc. of the 5th Parallel Tools Workshop, Dresden, Germany, September 2011, pages 79–91, Springer, 2012.
    PDF       DOI       BibTeX 

  • Zoltán Szebenyi: Capturing Parallel Performance Dynamics. PhD thesis, RWTH Aachen University, volume 12 of IAS Series, Forschungszentrum Jülich, 2012, ISBN 978-3-89336-798-6.
    URL       BibTeX 

  • Christian Rössel, Bernd Mohr, Michael Gerndt, Felix Wolf: Performance Dynamics of Massively Parallel Codes. Innovatives Supercomputing in Deutschland (inSiDE), 10(2):72–73, 2012.
    PDF       URL       BibTeX 

  • David Böhme, Marc-André Hermanns, Felix Wolf: Scalasca. In Entwicklung und Evolution von Forschungssoftware, Rolduc, November 2011, volume 14 of Aachener Informatik-Berichte, Software Engineering, pages 43–48, Shaker, 2012.
    BibTeX 

  • Christian Rössel, Bernd Mohr, Felix Wolf: Score-P. In Entwicklung und Evolution von Forschungssoftware, Rolduc, Niederlande, November 2011, volume 14 of Aachener Informatik-Berichte, Software Engineering, pages 23–30, Shaker, 2012.
    BibTeX 

  • Daniel Lorenz, Peter Philippen, Dirk Schmidl, Felix Wolf: Profiling of OpenMP tasks with Score-P. In Proc. of the 41st International Conference on Parallel Processing Workshops (ICPPW), Workshop on Parallel Software Tools and Tool Infrastructures (PSTI), pages 444–453, September 2012.
    PDF       DOI       BibTeX 

  • Marc-André Hermanns, Markus Geimer, Bernd Mohr, Felix Wolf: Scalable detection of MPI-2 remote memory access inefficiency patterns. Intl. Journal of High Performance Computing Applications (IJHPCA), 26(3):227–236, August 2012.
    PDF       DOI       BibTeX 

  • Alexandru Calotoiu, Christian Siebert, Felix Wolf: Pattern-Independent Detection of Manual Collectives in MPI Programs. In Proc. of the 18th Euro-Par Conference, Rhodes Island, Greece, volume 7484 of Lecture Notes in Computer Science, pages 28–39, Springer, August 2012.
    PDF       DOI       BibTeX 

  • Dirk Schmidl, Peter Philippen, Daniel Lorenz, Christian Rössel, Markus Geimer, Dieter an Mey, Bernd Mohr, Felix Wolf: Performance Analysis Techniques for Task-Based OpenMP Applications. In Proc. of the 8th International Workshop on OpenMP (IWOMP), Rome, Italy, volume 7312 of Lecture Notes in Computer Science, pages 196–209, Berlin / Heidelberg, Springer, June 2012.
    PDF       DOI       BibTeX 

  • David Böhme, Bronis R. de Supinski, Markus Geimer, Martin Schulz, Felix Wolf: Scalable Critical-Path Based Performance Analysis. In Proc. of the 26th IEEE International Parallel and Distributed Processing Symposium (IPDPS), Shanghai, China, pages 1330–1340, IEEE, May 2012.
    PDF       DOI       BibTeX 

  • David Böhme, Markus Geimer, Felix Wolf: Characterizing Load and Communication Imbalance in Large-Scale Parallel Applications. In Proc. of the 26th IEEE International Parallel and Distributed Processing Symposium Workshops and PhD Forum (IPDPSW), Shanghai, China, pages 2538–2541, IEEE, May 2012.
    PDF       DOI       BibTeX 

  • Felix Wolf: Understanding the Formation of Wait States in Parallel Programs. Innovatives Supercomputing in Deutschland (inSiDE), 1(9):94–95, 2011.
    URL       BibTeX 

  • Felix Wolf: Scalasca. In Encyclopedia of Parallel Computing, pages 1775–1785, Springer, October 2011.
    URL       BibTeX 

  • Jan Mußler, Daniel Lorenz, Felix Wolf: Reducing the overhead of direct application instrumentation using prior static analysis. In Proc. of the 17th Euro-Par Conference, Bordeaux, France, volume 6852 of Lecture Notes in Computer Science, pages 65–76, Springer, September 2011.
    PDF       DOI       BibTeX 

  • Markus Geimer, Marc-André Hermanns, Christian Siebert, Felix Wolf, Brian J. N. Wylie: Scaling Performance Tool MPI Communicator Management. In Proc. of the 18th European MPI Users' Group Meeting (EuroMPI), Santorini, Greece, volume 6960 of Lecture Notes in Computer Science, pages 178–187, Springer, September 2011.
    PDF       DOI       BibTeX 

  • Marc-André Hermanns, Sriram Krishnamoorthy, Felix Wolf: A Scalable Replay-based Infrastructure for the Performance Analysis of One-sided Communication. In Proc. of the 1st Intl. Workshop on High-performance Infrastructure for Scalable Tools (WHIST), held in conjunction with the International Conference on Supercomputing (ICS), Tucson, AZ, USA, June 2011.
    PDF       BibTeX 

  • Zoltán Szebenyi, Todd Gamblin, Martin Schulz, Bronis R. de Supinski, Felix Wolf, Brian J. N. Wylie: Reconciling Sampling and Direct Instrumentation for Unintrusive Call-Path Profiling of MPI Programs. In Proc. of the 25th IEEE International Parallel and Distributed Processing Symposium (IPDPS), Anchorage, AK, USA, pages 640–648, IEEE, May 2011.
    PDF       DOI       BibTeX 

  • Brian J. N. Wylie, Markus Geimer: Large-scale performance analysis of PFLOTRAN with Scalasca. In Proc. of the 53rd Cray User Group meeting, Fairbanks, AK, USA, Cray User Group Inc., May 2011.
    PDF       URL       BibTeX 

  • Zoltán Szebenyi, Felix Wolf, Brian J. N. Wylie: Performance Analysis of Long-running Applications. In Proc. of the 25th IEEE International Parallel and Distributed Processing Symposium (IPDPS) PhD Forum, Anchorage, AK, USA, pages 2100–2103, IEEE, May 2011.
    PDF       DOI       BibTeX 

  • Markus Geimer, Felix Wolf, Brian J. N. Wylie, Daniel Becker, David Böhme, Wolfgang Frings, Marc-André Hermanns, Bernd Mohr, Zoltán Szebenyi: Recent Developments in the Scalasca Toolset. In Tools for High Performance Computing 2009, Proc. of the 3rd Parallel Tools Workshop, Dresden, Germany, September 2009, chapter 4, pages 39–51, Springer, 2010.
    PDF       DOI       BibTeX 

  • Brian J. N. Wylie: Improved Scalasca toolset support for performance analysis of Cray XT systems. In HPC-Europa2: Science and Supercomputing in Europe - Research Highlights 2009, pages 67, CINECA Consorzio Interuniversitario, Casalecchio di Reno (Bologna), Italy, 2010.
    URL       BibTeX 

  • Bernd Mohr, Brian J. N. Wylie, Felix Wolf: Performance measurement and analysis tools for extremely scalable systems. Concurrency and Computation: Practice and Experience, 22(16):2212–2229, 2010, (ISC 2008 Award).
    PDF       DOI       BibTeX 

  • Daniel Becker: Timestamp Synchronization of Concurrent Events. PhD thesis, RWTH Aachen University, volume 4 of IAS Series, Forschungszentrum Jülich, 2010, ISBN 978-3-89336-625-5.
    URL       DOI       BibTeX 

  • Marc-André Hermanns: HPC-Europa2: Science and Supercomputing in Europe research highlights 2009. In HPC-Europa2: Science and Supercomputing in Europe research highlights 2010, pages 101, CINECA Consorzio Interuniversitario, Casalecchio di Reno (Bologna), Italy, 2010.
    PDF       BibTeX 

  • Brian J. N. Wylie, Markus Geimer, Bernd Mohr, David Böhme, Zoltán Szebenyi, Felix Wolf: Large-scale performance analysis of Sweep3D with the Scalasca toolset. Parallel Processing Letters, 20(4):397–414, December 2010.
    PDF       DOI       BibTeX 

  • David Böhme, Markus Geimer, Felix Wolf, Lukas Arnold: Identifying the root causes of wait states in large-scale parallel applications. In Proc. of the 39th International Conference on Parallel Processing (ICPP), San Diego, CA, USA, pages 90–100, IEEE, September 2010, Best Paper Award.
    PDF       DOI       BibTeX 

  • Daniel Becker, Markus Geimer, Rolf Rabenseifner, Felix Wolf: Synchronizing the Timestamps of Concurrent Events in Traces of Hybrid MPI/OpenMP Applications. In Proc. of IEEE International Conference on Cluster Computing (CLUSTER), Heraklion, Greece, pages 38–47, IEEE, September 2010.
    PDF       DOI       BibTeX 

  • Daniel Lorenz, Bernd Mohr, Christian Rössel, Dirk Schmidl, Felix Wolf: How to reconcile event-based performance analysis with tasking in OpenMP. In Proc. of 6th Int. Workshop of OpenMP (IWOMP), Tsukuba, Japan, volume 6132 of Lecture Notes in Computer Science, pages 109–121, Springer, June 2010.
    PDF       DOI       BibTeX 

  • Brian J. N. Wylie, David Böhme, Wolfgang Frings, Markus Geimer, Bernd Mohr, Zoltán Szebenyi, Daniel Becker, Marc-André Hermanns, Felix Wolf: Scalable performance analysis of large-scale parallel applications on Cray XT systems with Scalasca. In Proc. 52nd Cray User Group Meeting, Edinburgh, Scotland, Cray User Group Incorporated, May 2010.
    PDF       URL       BibTeX 

  • Markus Geimer, Felix Wolf, Brian J. N. Wylie, Erika Ábrahám, Daniel Becker, Bernd Mohr: The Scalasca performance toolset architecture. Concurrency and Computation: Practice and Experience, 22(6):702–719, April 2010.
    PDF       DOI       BibTeX 

  • Brian J. N. Wylie, David Böhme, Bernd Mohr, Zoltán Szebenyi, Felix Wolf: Performance analysis of Sweep3D on Blue Gene/P with the Scalasca toolset. In Proc. 24th International Parallel and Distributed Processing Symposium and Workshops (IPDPS), Atlanta, GA, USA, IEEE, April 2010.
    PDF       DOI       BibTeX 

  • David Böhme, Marc-André Hermanns, Markus Geimer, Felix Wolf: Performance Simulation of Non-blocking Communication in Message-Passing Applications. In Proc. of the 2nd Workshop on Productivity and Performance (PROPER) in conjunction with Euro-Par 2009, Delft, The Netherlands, volume 6043 of Lecture Notes in Computer Science, pages 208–217, Springer, March 2010.
    PDF       DOI       BibTeX 

  • Felix Wolf, David Böhme, Markus Geimer, Marc-André Hermanns, Bernd Mohr, Zoltán Szebenyi, Brian J. N. Wylie: Performance Tuning in the Petascale Era. In Proc. of the John von Neumann Institute for Computing (NIC) Symposium 2010, Juelich, Germany, volume 3 of IAS Series, pages 339–346. Forschungszentrum Jülich, John von Neumann-Institut for Computing, February 2010.
    PDF       BibTeX 

  • Zoltán Szebenyi, Brian J. N. Wylie, Felix Wolf: Scalasca Parallel Performance Analyses of PEPC. In Proc. of the 1st Workshop on Productivity and Performance (PROPER) in conjunction with Euro-Par 2008, Las Palmas de Gran Canaria, Spain, volume 5415 of Lecture Notes in Computer Science, pages 305–314, Springer, 2009.
    PDF       DOI       BibTeX 

  • Felix Wolf: Performance Tools for Petascale Systems. Innovatives Supercomputing in Deutschland (inSiDE), 7(2):38–39, 2009.
    URL       BibTeX 

  • Daniel Becker, Rolf Rabenseifner, Felix Wolf, John Linford: Scalable timestamp synchronization for event traces of message-passing applications. Parallel Computing, 35(12):595–607, December 2009.
    PDF       DOI       BibTeX 

  • Zoltán Szebenyi, Felix Wolf, Brian J. N. Wylie: Space-Efficient Time-Series Call-Path Profiling of Parallel Applications. In Proc. of the ACM/IEEE Conference on Supercomputing (SC09), Portland, OR, USA, ACM, November 2009.
    PDF       DOI       BibTeX 

  • Wolfgang Frings, Felix Wolf, Ventsislav Petkov: Scalable Massively Parallel I/O to Task-Local Files. In Proc. of the ACM/IEEE Conference on Supercomputing (SC09), Portland, OR, USA, ACM, November 2009.
    PDF       DOI       BibTeX 

  • Marc-André Hermanns, Markus Geimer, Bernd Mohr, Felix Wolf: Scalable Detection of MPI-2 Remote Memory Access Inefficiency Patterns. In Proc. of the 16th European PVM/MPI Users' Group Meeting (EuroPVM/MPI), Espoo, Finland, volume 5759 of Lecture Notes in Computer Science, pages 31–41, Springer, September 2009.
    PDF       DOI       BibTeX 

  • Markus Geimer, Felix Wolf, Brian J. N. Wylie, Bernd Mohr: A scalable tool architecture for diagnosing wait states in massively parallel applications. Parallel Computing, 35(7):375–388, July 2009.
    PDF       DOI       BibTeX 

  • Markus Geimer, Sameer S. Shende, Allen D. Malony, Felix Wolf: A Generic and Configurable Source-Code Instrumentation Component. In Proc. of the International Conference on Computational Science (ICCS), Baton Rouge, LA, USA, volume 5545 of Lecture Notes in Computer Science, pages 696–705, Springer, May 2009.
    PDF       DOI       BibTeX 

  • Marc-André Hermanns: Trace-based performance simulation of large-scale applications. University of Hagen, May 2009.
    PDF       URL       BibTeX 

  • Daniel Becker, Rolf Rabenseifner, Felix Wolf, John Linford: Replay-based synchronization of timestamps in event traces of massively parallel applications. Scalable Computing: Practice and Experience, 10(1):49–60, March 2009.
    PDF       URL       BibTeX 

  • Marc-André Hermanns, Markus Geimer, Felix Wolf, Brian J. N. Wylie: Verifying Causality Between Distant Performance Phenomena in Large-Scale MPI Applications. In Proc. of the 17th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP), Weimar, Germany, pages 78–84, IEEE, February 2009.
    PDF       DOI       BibTeX 

  • Brian J. N. Wylie, Markus Geimer, Felix Wolf: Performance measurement and analysis of large-scale parallel applications on leadership computing systems. Scientific Programming, 16(2-3):167–181, 2008.
    PDF       URL       DOI       BibTeX 

  • Felix Wolf, Brian J. N. Wylie, Erika Ábrahám, Daniel Becker, Wolfgang Frings, Karl Fürlinger, Markus Geimer, Marc-André Hermanns, Bernd Mohr, Shirley Moore, Matthias Pfeifer, Zoltán Szebenyi: Usage of the SCALASCA Toolset for Scalable Performance Analysis of Large-Scale Parallel Applications. In Tools for High Performance Computing, Proc. of the 2nd Parallel Tools Workshop, Stuttgart, Germany, July 2008, pages 157–167, Springer, 2008.
    PDF       DOI       BibTeX 

  • Ventsislav Petkov: Beiträge zum Wissenschaftlichen Rechnen – Ergebnisse des Gaststudentenprogramms 2008 des John von Neumann-Instituts für Computing, chapter SIONlib - Scalable I/O Library for Native Parallel Access to Binary Files. Forschungszentrum Jülich, Technical Report FZJ-JSC-IB-2008-07, pages 93-105, December 2008.
    PDF       BibTeX 

  • Daniel Becker, Rolf Rabenseifner, Felix Wolf: Implications of non-constant clock drifts for the timestamps of concurrent events. In Proc. of the IEEE International Conference on Cluster Computing (CLUSTER), Tsukuba, Japan, pages 59–68, IEEE, September 2008.
    PDF       DOI       BibTeX 

  • Daniel Becker, John Linford, Rolf Rabenseifner, Felix Wolf: Replay-based synchronization of timestamps in event traces of massively parallel applications. In Proc. of the International Conference on Parallel Processing Workshops (ICPPW), 1st International Workshop on Simulation and Modelling in Emergent Computational Systems (SMECS), Portland, OR, USA, pages 212–219, IEEE, September 2008.
    PDF       DOI       BibTeX 

  • Daniel Becker, Morris Riedel, Achim Streit, Felix Wolf: Grid-Based Workflow Management for Automatic Performance Analysis of Massively Parallel Applications. In Proc. of the 3rd CoreGRID Workshop on Grid Middleware, Barcelona, Spain of CoreGRID Series, pages 103–118, Springer, June 2008.
    PDF       DOI       BibTeX 

  • Zoltán Szebenyi, Brian J. N. Wylie, Felix Wolf: SCALASCA Parallel Performance Analyses of SPEC MPI2007 Applications. In Proc. of the 1st SPEC International Performance Evaluation Workshop (SIPEW), Darmstadt, Germany, volume 5119 of Lecture Notes in Computer Science, pages 99–123, Springer, June 2008.
    PDF       DOI       BibTeX 

  • Markus Geimer, Felix Wolf, Brian J. N. Wylie, Erika Ábrahám, Daniel Becker, Bernd Mohr: The SCALASCA Performance Toolset Architecture. In International Workshop on Scalable Tools for High-End Computing (STHEC), Kos, Greece, pages 51–65, June 2008.
    PDF       BibTeX 

  • Oscar Hernandez, Fengguang Song, Barbara Chapman, Jack Dongarra, Bernd Mohr, Shirley Moore, Felix Wolf: Performance Instrumentation and Compiler Optimizations for MPI/OpenMP Applications. In Proc. of the 2nd International Workshop on OpenMP (IWOMP 2006), Reims, France, volume 4315 of Lecture Notes in Computer Science, pages 267–278, Springer, June 2008.
    PDF       DOI       BibTeX 

  • Marc-André Hermanns, Markus Geimer, Felix Wolf, Brian J. N. Wylie: Verifying Causal Connections between Distant Performance Phenomena in Large-Scale Message-Passing Applications. Technical Report FZJ-JSC-IB-2008-05, Forschungszentrum Jülich, April 2008.
    PDF       BibTeX 

  • Daniel Becker, Wolfgang Frings, Felix Wolf: Performance Evaluation and Optimization of Parallel Grid Computing Applications. In Proc. of the 16th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP), Toulouse, France, pages 193–199, IEEE, February 2008.
    PDF       DOI       BibTeX 

  • Felix Wolf, Daniel Becker, Markus Geimer, Brian J. N. Wylie: Scalable Performance Analysis Methods for the Next Generation of Supercomputers. In Proc. of the John von Neumann Institute for Computing (NIC) Symposium, Jülich, Germany, volume 39 of NIC-Series, pages 315–322, February 2008.
    PDF       BibTeX 

  • Markus Geimer, Felix Wolf, Andreas Knüpfer, Bernd Mohr, Brian J. N. Wylie: A Parallel Trace-Data Interface for Scalable Performance Analysis. In Proc. of the 8th International Workshop on State-of-the-Art in Scientific and Parallel Computing (PARA), Umeå, Sweden, June 2006, volume 4699 of Lecture Notes in Computer Science, pages 398–408, Springer, 2007.
    PDF       DOI       BibTeX 

  • Brian J. N. Wylie, Felix Wolf, Bernd Mohr, Markus Geimer: Integrated Runtime Measurement Summarisation and Selective Event Tracing for Scalable Parallel Execution Performance Diagnosis. In Proc. of the 8th International Workshop on State-of-the-Art in Scientific and Parallel Computing (PARA), Umeå, Sweden, June 2006, volume 4699 of Lecture Notes in Computer Science, pages 460–469, Springer, 2007.
    PDF       DOI       BibTeX 

  • Christian Bischof, Felix Wolf: Produktivität versus Performanz in der Simulation. RWTH Themen, 2:38–39, 2007.
    BibTeX 

  • M. Behbahani, Marek Behr, Christian Bischof, Felix Wolf: Kranken Herzen helfen. RWTH Themen, 1:44–46, 2007.
    BibTeX 

  • Daniel Becker, Wolfgang Frings, Felix Wolf: Performance Evaluation and Optimization of Metacomputing Applications. In Proc. of the 3rd Workshop on Communication in Cluster- and Grid-Systems (KiCC, Kommunikation in Clusterrechnern und Clusterverbundsystemen), Aachen, Germany, pages 32–39. RWTH Aachen University, December 2007.
    PDF       URL       BibTeX 

  • John Linford: CESRI 2007 Research Report - Implementation and Validation of the Extended Controlled Logical Clock FZJ-JSC-IB-2007-11, Forschungszentrum Jülich, November 2007.
    PDF       BibTeX 

  • Markus Geimer, Björn Kuhlmann, Farzona Pulatova, Felix Wolf, Brian J. N. Wylie: Scalable Collation and Presentation of Call-Path Profile Data with CUBE. In Proc. of the Conference on Parallel Computing (ParCo), Aachen/Jülich, Germany, pages 645–652, September 2007, Minisymposium Scalability and Usability of HPC Programming Tools.
    PDF       BibTeX 

  • Daniel Becker, Rolf Rabenseifner, Felix Wolf: Timestamp Synchronization for Event Traces of Large-Scale Message-Passing Applications. In Proc. of the 14th European PVM/MPI Users' Group Meeting (EuroPVM/MPI), Paris, France, volume 4757 of Lecture Notes in Computer Science, pages 315–325, Springer, September 2007.
    PDF       DOI       BibTeX 

  • Brian J. N. Wylie, Markus Geimer, Mike Nicolai, Markus Probst: Performance analysis and tuning of the XNS CFD solver on BlueGene/L. In Proc. of the 14th European PVM/MPI Users' Group Meeting (EuroPVM/MPI), Paris, France, volume 4757 of Lecture Notes in Computer Science, pages 107–116, Springer, September 2007.
    PDF       BibTeX 

  • Allen D. Malony, Sameer S. Shende, Alan Morris, Felix Wolf: Compensation of Measurement Overhead in Parallel Performance Profiling. International Journal of High Performance Computing Applications, 21(2):174–194, May 2007.
    PDF       DOI       BibTeX 

  • Brian J. N. Wylie: Scalable performance analysis of large-scale parallel applications on MareNostrum. In Science and Supercomputing in Europe, pages 453-461, CINECA Consorzio Interuniversitario, Casalecchio di Reno (Bologna), Italy, April 2007, Also available as SSCinEurope 2007 CD.
    PDF       URL       BibTeX 

  • Daniel Becker, Felix Wolf, Wolfgang Frings, Markus Geimer, Brian J. N. Wylie, Bernd Mohr: Automatic Trace-Based Performance Analysis of Metacomputing Applications. In Proc. of the International Parallel and Distributed Processing Symposium (IPDPS), Long Beach, CA, USA, IEEE, March 2007.
    PDF       DOI       BibTeX 

  • Felix Wolf, Bernd Mohr, Jack Dongarra, Shirley Moore: Automatic analysis of inefficiency patterns in parallel applications. Concurrency and Computation: Practice and Experience, 19(11):1481–1496, February 2007.
    PDF       DOI       BibTeX 

  • Markus Geimer, Felix Wolf, Brian J. N. Wylie, Bernd Mohr: Scalable Parallel Trace-Based Performance Analysis. Innovatives Supercomputing in Deutschland (inSiDE), 4(2):16–19, 2006.
    PDF       URL       BibTeX 

  • Markus Geimer, Felix Wolf, Brian J. N. Wylie, Bernd Mohr: Scalable Parallel Trace-Based Performance Analysis. In Proc. of the 13th European PVM/MPI Users' Group Meeting (EuroPVM/MPI), Bonn, Germany, volume 4192 of Lecture Notes in Computer Science, pages 303–312, Springer, September 2006.
    PDF       DOI       BibTeX 

  • Andrej Kühnal, Marc-André Hermanns, Bernd Mohr, Felix Wolf: Specification of Inefficiency Patterns for MPI-2 One-sided Communication. In Proc. of the 12th Euro-Par Conference, Dresden, Germany, volume 4128 of Lecture Notes in Computer Science, pages 47–62, Springer, August 2006.
    PDF       DOI       BibTeX 

  • Gaby Aguilera, Patricia J. Teller, Michaela Taufer, Felix Wolf: A Systematic Multi-step Methodology for Performance Analysis of Communication Traces of Distributed Applications based on Hierarchical Clustering. In Proc. of the 5th International Workshop on Performance Modeling, Evaluation, and Organization of Parallel and Distributed Systems (PMEO-PDS, in conjunction with IPDPS 2006), Rhodes Island, Greece, IEEE, April 2006.
    PDF       DOI       BibTeX 

  • Felix Wolf, Felix Freitag, Bernd Mohr, Shirley Moore, Brian J. N. Wylie: Large Event Traces in Parallel Performance Analysis. In Proc. of the 8th Workshop on Parallel Systems and Algorithms (PASA), Frankfurt, Germany, volume P-81 of Lecture Notes in Informatics, pages 264–273, Gesellschaft für Informatik, March 2006.
    PDF       BibTeX 

  • Felix Wolf, Allen D. Malony, Sameer S. Shende, Alan Morris: Trace-Based Parallel Performance Overhead Compensation. In Proc. of the International Conference on High Performance Computing and Communications (HPCC), Sorrento, Italy, volume 3726 of Lecture Notes in Computer Science, pages 617–628, Springer, September 2005.
    PDF       DOI       BibTeX 

  • Shirley Moore, Felix Wolf, Jack Dongarra, Sameer S. Shende, Allen D. Malony, Bernd Mohr: A Scalable Approach to MPI Application Performance Analysis. In Proc. of the 12th European PVM/MPI Users' Group Meeting (EuroPVM/MPI), Sorrento, Italy, volume 3666 of Lecture Notes in Computer Science, pages 309–316, Springer, September 2005.
    PDF       DOI       BibTeX 

  • Brian J. N. Wylie, Bernd Mohr, Felix Wolf: Holistic Hardware Counter Performance Analysis of Parallel Programs. In Proc. of the Conference on Parallel Computing (ParCo), Malaga, Spain, pages 187–194, September 2005.
    PDF       BibTeX 

  • Bernd Mohr, Andrej Kühnal, Marc-André Hermanns, Felix Wolf: Performance Analysis of One-sided Communication Mechanisms. In Proc. of the Conference on Parallel Computing (ParCo), Malaga, Spain, September 2005, Minisymposium Performance Analysis.
    PDF       BibTeX 

  • Marc-André Hermanns, Bernd Mohr, Felix Wolf: Event-based Measurement and Analysis of One-sided Communication. In Proc. of the 11th Euro-Par Conference, Lisboa, Portugal, volume 3648 of Lecture Notes in Computer Science, pages 156–165, Springer, August 2005.
    PDF       DOI       BibTeX 

  • Bernd Mohr, Luiz A. DeRose, Jeffrey S. Vetter: A Performance Measurement Infrastructure for Co-Array Fortran. In Proc. of the 4th Euro-Par Conference, Lisboa, Portugal, volume 3648 of Lecture Notes in Computer Science, pages 156-165, Springer, August 2005.
    PDF       DOI       BibTeX 

  • Nikhil Bhatia, Fengguang Song, Felix Wolf, Bernd Mohr, Jack Dongarra, Shirley Moore: Automatic Experimental Analysis of Communication Patterns in Virtual Topologies. In Proc. of the International Conference on Parallel Processing (ICPP), Oslo, Norway, pages 465–472, IEEE Society, June 2005.
    PDF       DOI       BibTeX 

  • P. Worley, J. Candy, L. Carrington, K. Huck, T. Kaiser, G. Mahinthakumar, Allen D. Malony, Shirley Moore, D. Reed, P. Roth, H. Shan, Sameer S. Shende, A. Snavely, S. Sreepathi, Felix Wolf, Y. Zhang: Performance Analysis of GYRO: A Tool Evaluation. In Proc. of the 2005 SciDAC Conference, San Francisco, CA, USA, June 2005.
    PDF       BibTeX 

  • Nikhil Bhatia, Shirley Moore, Felix Wolf, Jack Dongarra, Bernd Mohr: A Pattern-Based Approach to Automated Application Performance Analysis. In Workshop on Patterns in High Performance Computing (patHPC 2005), Urbana-Champaign, IL, USA, May 2005.
    PDF       BibTeX 

  • Shirley Moore, Felix Wolf, Jack Dongarra, Bernd Mohr: Improving Time to Solution with Automated Performance Analysis. In 2nd Workshop on Productivity and Performance in High-End Computing (P-PHEC), San Francisco, CA, USA, February 2005.
    PDF       BibTeX 

  • Fengguang Song, Felix Wolf: CUBE User Manual ICL-UT-04-01, University of Tennessee, Innovative Computing Laboratory, 2004.
    PDF       BibTeX 

  • Felix Wolf: EARL - API Documentation ICL-UT-04-03, University of Tennessee, Innovative Computing Laboratory, October 2004.
    PDF       BibTeX 

  • Felix Wolf, Bernd Mohr, Jack Dongarra, Shirley Moore: Efficient Pattern Search in Large Traces through Successive Refinement. In Proc. of the 10th Euro-Par Conference, Pisa, Italy, volume 3149 of Lecture Notes in Computer Science, pages 47–54, Springer, August 2004.
    PDF       DOI       BibTeX 

  • Fengguang Song, Felix Wolf, Nikhil Bhatia, Jack Dongarra, Shirley Moore: An Algebra for Cross-Experiment Performance Analysis. In Proc. of the International Conference on Parallel Processing (ICPP), Montreal, Canada, pages 63–72, IEEE Society, August 2004.
    PDF       DOI       BibTeX 

  • Philip Mucci, Jack Dongarra, Rick Kufrin, Shirley Moore, Fengguang Song, Felix Wolf: Automating the Large-Scale Collection and Analysis of Performance Data on Linux Clusters. In 5th LCI International Conference on Linux Clusters: The HPC Revolution, Austin, TX, USA, May 2004.
    PDF       URL       BibTeX 

  • Felix Wolf, Bernd Mohr: Automatic performance analysis of hybrid MPI/OpenMP applications. Journal of Systems Architecture, 49(10-11):421–439, November 2003.
    PDF       DOI       BibTeX 

  • Felix Wolf, Bernd Mohr: Hardware-Counter Based Automatic Performance Analysis of Parallel Programs. In Proc. of the Conference on Parallel Computing (ParCo), Dresden, Germany, volume 13 of Advances in Parallel Computing, pages 753–760, Elsevier, September 2003, Minisymposium Performance Analysis.
    PDF       DOI       BibTeX 

  • Felix Wolf, Bernd Mohr: KOJAK - A Tool Set for Automatic Performance Analysis of Parallel Applications. In Proc. of the 9th Euro-Par Conference, Klagenfurt, Austria, volume 2790 of Lecture Notes in Computer Science, pages 1301–1304, Springer, August 2003, Demonstrations of Parallel and Distributed Computing.
    PDF       DOI       BibTeX 

  • Felix Wolf: Automatic Performance Analysis on Parallel Computers with SMP Nodes. PhD thesis, RWTH Aachen, Forschungszentrum Jülich, February 2003, NIC Series Volume 17, ISBN 3-00-010003-2.
    URL       BibTeX 

  • Felix Wolf, Bernd Mohr: Automatic Performance Analysis of Hybrid MPI/OpenMP Applications. In Proc. of 11th Euromicro Workshop on Parallel Distributed and Network-Based Processing (PDP), Genua, Italy, pages 13–22, IEEE, February 2003.
    PDF       DOI       BibTeX 

  • Bernd Mohr, Allen D. Malony, H. C. Hoppe, F. Schlimbach, G. Haab, J. Hoeflinger, S. Shah: A Performance Monitoring Interface for OpenMP. In Proceedings of Fourth European Workshop on OpenMP (EWOMP), Rome, Italy, September 2002.
    PDF       BibTeX 

  • Luiz A. DeRose, Felix Wolf: CATCH – A Call-Graph Based Automatic Tool for Capture of Hardware Performance Metrics for MPI and OpenMP Applications. In Proc. of the 8th Euro-Par Conference, Paderborn, Germany, volume 2400 of Lecture Notes in Computer Science, pages 167–176, Springer, August 2002.
    PDF       DOI       BibTeX 

  • Bernd Mohr, Allen D. Malony, Sameer S. Shende, Felix Wolf: Design and Prototype of a Performance Tool Interface for OpenMP. The Journal of Supercomputing, 23(1):105–128, August 2002.
    PDF       DOI       BibTeX 

  • Bernd Mohr, Allen D. Malony, Sameer S. Shende, Felix Wolf: Design and Prototype of a Performance Tool Interface for OpenMP. In 2nd Annual Los Alamos Computer Science Institute Symposium (LACSI), Santa Fe, NM, USA, October 2001.
    PDF       BibTeX 

  • Felix Wolf, Bernd Mohr: Specifying Performance Properties of Parallel Applications Using Compound Events. Parallel and Distributed Computing Practices, 4(3):301–317, September 2001.
    PDF       URL       BibTeX 

  • Bernd Mohr, Allen D. Malony, Sameer S. Shende, Felix Wolf: Towards a Performance Tool Interface for OpenMP: An Approach based on Directive Rewriting. In 3rd European Workshop on OpenMP (EWOMP), Barcelona, Spain, September 2001.
    PDF       BibTeX 

  • Thomas Fahringer, Michael Gerndt, Bernd Mohr, G. Riley, J. L. Träff, Felix Wolf: Knowledge Specification for Automatic Performance Analysis FZJ-ZAM-IB-2001-08, ESPRIT IV Working Group APART, Forschungszentrum Jülich, August 2001, Revised version.
    PDF       BibTeX 

  • K. A. Lindlan, J. Cuny, Allen D. Malony, Bernd Mohr, R. Rivenburgh, C. Rasmussen: A Tool Framework for Static and Dynamic Analysis of Object-Oriented Software with Templates. In Proc. of the Supercomputing Conference (SC2000), Dallas, TX, USA, November 2000.
    PDF       BibTeX 

  • Felix Wolf, Bernd Mohr: Automatic Performance Analysis of MPI Applications Based on Event Traces. In Proc. of the 6th Euro-Par Conference, Munich, Germany, volume 1900 of Lecture Notes in Computer Science, pages 123–132, Springer, August 2000.
    PDF       DOI       BibTeX 

  • Michael Gerndt, Hans-Georg Eßer: Specification Techniques for Automatic Performance Analysis Tools. In Proc. of the 8th International Workshop on Compilers for Parallel Computers (CPC), Aussois, France. Ecole Normale Supérieure Lyon, January 2000.
    PDF       BibTeX 

  • Felix Wolf, Bernd Mohr: EARL - A Programmable and Extensible Toolkit for Analyzing Event Traces of Message Passing Programs. In Proc. of the 7th International Conference on High Performance Computing and Networking Europe (HPCN), Amsterdam, The Netherlands, volume 1593 of Lecture Notes in Computer Science, pages 503–512, Springer, April 1999.
    PDF       DOI       BibTeX 

  • Michael Gerndt, Bernd Mohr, Felix Wolf, Mario Pantano: Performance Analysis on Cray T3E. In Proc. of the 7th Euromicro Workshop on Parallel and Distributed Processing (PDP), Funchal, Madeira, Portugal, pages 241–248, IEEE, February 1999.
    PDF       URL       BibTeX 

  • Michael Gerndt, Bernd Mohr, Mario Pantano, Felix Wolf: Automatic Performance Analysis for Cray T3E. In Proc. of the 7th Workshop on Compilers for Parallel Computers (CPC), University of Linköping, Sweden, pages 69–78, June 1998.
    BibTeX