Intensity Frontier

Common Offline Documentation:

art Workbook and Users Guide

Alpha Release 0.90

June 2, 2016

This version of the documentation is written for version August2015 of the art-workbook code.

Scientific Computing Division

Future Programs and Experiments Department

Scientific Software Infrastructure Group

Principal Author: Rob Kutschke
Other contributors: Marc Paterno, Mike Wang
Editor: Anne Heavey

art Developers: L. Garren, C. Green, K. Knoepfel,
J. Kowalkowski, M. Paterno and P. Russo


List of Chapters

Detailed Table of Contents iv


List of Figures xx


List of Tables xxii


List of Code and Output Listings xxii

chapter*.4 I  Introduction 1


1 How to Read this Documentation 2

chapter.1 1.1 If you are new to HEP Software...
section.1.1 1.2 If you are an HEP Software expert...
section.1.2 1.3 If you are somewhere in between...

2 Conventions Used in this Documentation 4

chapter.2 2.1 Terms in Glossary
section.2.1 2.2 Typing Commands
section.2.2 2.3 Listing Styles
section.2.3 PICT PICT 2.4 Procedures to Follow
section.2.4 2.5 Important Items to Call Out
section.2.5 2.6 Site-specific Information

3 Introduction to the art Event Processing Framework 7

chapter.3 3.1 What is art and Who Uses it?
section.3.1 3.2 Why art?
section.3.2 3.3 C++ and C++11
section.3.3 3.4 Getting Help
section.3.4 3.5 Overview of the Documentation Suite
section.3.5 3.5.1 The Introduction
subsection.3.5.1 3.5.2 The Workbook
subsection.3.5.2 3.5.3 Users Guide
subsection.3.5.3 3.5.4 Reference Manual
subsection.3.5.4 3.5.5 Technical Reference
subsection.3.5.5 3.5.6 Glossary
subsection.3.5.6 3.6 Some Background Material
section.3.6 3.6.1 Events and Event IDs
subsection.3.6.1 3.6.2 art Modules and the Event Loop
subsection.3.6.2 3.6.3 Module Types
subsection.3.6.3 3.6.4 art Data Products
subsection.3.6.4 3.6.5 art Services
subsection.3.6.5 3.6.6 Dynamic Libraries and art
subsection.3.6.6 3.6.7 Build Systems and art
subsection.3.6.7 3.6.8 External Products
subsection.3.6.8 3.6.9 The Event-Data Model and Persistency
subsection.3.6.9 3.6.10 Event-Data Files
subsection.3.6.10 3.6.11 Files on Tape
subsection.3.6.11 3.7 The Toy Experiment
section.3.7 3.7.1 Toy Detector Description
subsection.3.7.1 3.7.2 Workflow for Running the Toy Experiment Code
subsection.3.7.2 3.8 Rules, Best Practices, Conventions and Style

4 Unix Prerequisites 34

chapter.4 4.1 Introduction
section.4.1 4.2 Commands
section.4.2 4.3 Shells
section.4.3 4.4 Scripts: Part 1
section.4.4 4.5 Unix Environments
section.4.5 4.5.1 Building up the Environment
subsection.4.5.1 4.5.2 Examining and Using Environment Variables
subsection.4.5.2 4.6 Paths and $PATH
section.4.6 4.7 Scripts: Part 2
section.4.7 4.8 bash Functions and Aliases
section.4.8 4.9 Login Scripts
section.4.9 4.10 Suggested Unix and bash References

5 Site-Specific Setup Procedure 45


6 Get your C++ up to Speed 48

chapter.6 6.1 Introduction
section.6.1 6.2 File Types Used and Generated in C++ Programming
section.6.2 6.3 Establishing the Environment
section.6.3 6.3.1 Initial Setup
subsection.6.3.1 PICT PICT 6.3.2 Subsequent Logins
subsection.6.3.2 6.4 C++ Exercise 1: Basic C++ Syntax and Building an Executable
section.6.4 6.4.1 Concepts to Understand
subsection.6.4.1 6.4.2 How to Compile, Link and Run
subsection.6.4.2 6.4.3 Discussion
subsection.6.4.3 Primitive types, Initialization and Printing Output
subsubsection. Arrays
subsubsection. Equality testing
subsubsection. Conditionals
subsubsection. Some C++ Standard Library Types
subsubsection. Pointers
subsubsection. References
subsubsection. Loops
subsubsection. 6.5 C++ Exercise 2: About Compiling and Linking
section.6.5 6.5.1 What You Will Learn
subsection.6.5.1 6.5.2 The Source Code for this Exercise
subsection.6.5.2 6.5.3 Compile, Link and Run the Exercise
subsection.6.5.3 6.5.4 Alternate Script build2
subsection.6.5.4 6.5.5 Suggested Homework
subsection.6.5.5 6.6 C++ Exercise 3: Libraries
section.6.6 6.6.1 What You Will Learn
subsection.6.6.1 6.6.2 Building and Running the Exercise
subsection.6.6.2 6.7 Classes
section.6.7 6.7.1 Introduction
subsection.6.7.1 6.7.2 C++ Exercise 4 v1: The Most Basic Version
subsection.6.7.2 6.7.3 C++ Exercise 4 v2: The Default Constructor
subsection.6.7.3 6.7.4 C++ Exercise 4 v3: Constructors with Arguments
subsection.6.7.4 6.7.5 C++ Exercise 4 v4: Colon Initializer Syntax
subsection.6.7.5 6.7.6 C++ Exercise 4 v5: Member functions
subsection.6.7.6 6.7.7 C++ Exercise 4 v6: Private Data and Accessor Methods
subsection.6.7.7 Setters and Getters
subsubsection. What’s the deal with the underscore?
subsubsection. An example to motivate private data
subsubsection. 6.7.8 C++ Exercise 4 v7: The

inline Specifier
subsection.6.7.8 6.7.9 C++ Exercise 4 v8: Defining Member Functions within the Class Declaration
subsection.6.7.9 6.7.10 C++ Exercise 4 v9: The Stream Insertion Operator and Free Functions
subsection.6.7.10 6.7.11 Review
subsection.6.7.11 6.8 Overloading functions
section.6.8 6.9 C++ References

7 Using External Products in UPS 107

chapter.7 7.1 The UPS Database List: PRODUCTS
section.7.1 7.2 UPS Handling of Variants of a Product
section.7.2 7.3 The setup Command: Syntax and Function
section.7.3 7.4 Current Versions of Products
section.7.4 7.5 Environment Variables Defined by UPS
section.7.5 7.6 Finding Header Files
section.7.6 7.6.1 Introduction
subsection.7.6.1 7.6.2 Finding art Header Files
subsection.7.6.2 7.6.3 Finding Headers from Other UPS Products
subsection.7.6.3 7.6.4 Exceptions: The Workbook, ROOT and Geant4
subsection.7.6.4 II  PICT PICT Workbook 119


8 Preparation for Running the Workbook Exercises 120

chapter.8 8.1 Introduction
section.8.1 8.2 Getting Computer Accounts on Workbook-enabled Machines
section.8.2 8.3 Choosing a Machine and Logging In
section.8.3 8.4 Launching new Windows: Verify X Connectivity
section.8.4 8.5 Choose an Editor

9 Exercise 1: Running Pre-built art Modules 124

chapter.9 9.1 Introduction
section.9.1 9.2 Prerequisites
section.9.2 9.3 What You Will Learn
section.9.3 9.4 The art Run-time Environment
section.9.4 9.5 The Input and Configuration Files for the Workbook Exercises
section.9.5 9.6 Setting up to Run Exercise 1
section.9.6 9.6.1 Log In and Set Up
subsection.9.6.1 Initial Setup Procedure using Standard Directory
subsubsection. Initial Setup Procedure allowing Self-managed Working Directory
subsubsection. Setup for Subsequent Exercise 1 Login Sessions
subsubsection. 9.7 Execute art and Examine Output
section.9.7 9.8 Understanding the Configuration
section.9.8 9.8.1 Some Bookkeeping Syntax
subsection.9.8.1 9.8.2 Some Physics Processing Syntax
subsection.9.8.2 9.8.3 art Command line Options
subsection.9.8.3 9.8.4 Maximum Number of Events to Process
subsection.9.8.4 9.8.5 Changing the Input Files
subsection.9.8.5 9.8.6 Skipping Events
subsection.9.8.6 9.8.7 Identifying the User Code to Execute
subsection.9.8.7 9.8.8 Paths and the art Workflow
subsection.9.8.8 Paths and the art Workflow: Details
subsubsection. Order of Module Execution
subsubsection. 9.8.9 Writing an Output File
subsection.9.8.9 9.9 Understanding the Process for Exercise 1
section.9.9 9.9.1 Follow the Site-Specific Setup Procedure (Details)
subsection.9.9.1 9.9.2 Make a Working Directory (Details)
subsection.9.9.2 9.9.3 Setup the toyExperiment UPS Product (Details)
subsection.9.9.3 9.9.4 Copy Files to your Current Working Directory (Details)
subsection.9.9.4 9.9.5 Source makeLinks.sh (Details)
subsection.9.9.5 9.9.6 Run art (Details)
subsection.9.9.6 9.10 How does art find Modules?
section.9.10 9.11 How does art find FHiCL Files?
section.9.11 9.11.1 The -c command line argument
subsection.9.11.1 9.11.2 #include Files
subsection.9.11.2 9.12 Review
section.9.12 9.13 Test your Understanding
section.9.13 9.13.1 Tests
subsection.9.13.1 9.13.2 Answers
subsection.9.13.2 PICT PICT

10 Exercise 2: Building and Running Your First Module 162

chapter.10 10.1 Introduction
section.10.1 10.2 Prerequisites
section.10.2 10.3 What You Will Learn
section.10.3 10.4 Initial Setup to Run Exercises
section.10.4 10.4.1 “Source Window” Setup
subsection.10.4.1 10.4.2 Examine Source Window Setup
subsection.10.4.2 About git and What it Did
subsubsection. Contents of the Source Directory
subsubsection. 10.4.3 “Build Window” Setup
subsection.10.4.3 Standard Procedure
subsubsection. Using Self-managed Working Directory
subsubsection. 10.4.4 Examine Build Window Setup
subsection.10.4.4 10.5 The art Development Environment
section.10.5 10.6 Running the Exercise
section.10.6 10.6.1 Run art on first.fcl
subsection.10.6.1 10.6.2 The FHiCL File first.fcl
subsection.10.6.2 10.6.3 The Source Code File First_module.cc
subsection.10.6.3 The #include Statements
subsubsection. The Declaration of the Class First, an Analyzer Module
subsubsection. An Introduction to Analyzer Modules
subsubsection. The Constructor for the Class First
subsubsection. Aside: Omitting Argument Names in Function Declarations
subsubsection. The Member Function analyze and the Representation of an Event
subsubsection. Representing an Event Identifier with art::EventID
subsubsection. DEFINE_ART_MACRO: The Module Maker Macros
subsubsection. Some Alternate Styles
subsubsection. 10.7 What does the Build System Do?
section.10.7 10.7.1 The Basic Operation
subsection.10.7.1 10.7.2 Incremental Builds and Complete Rebuilds
subsection.10.7.2 10.7.3 Finding Header Files at Compile Time
subsection.10.7.3 10.7.4 Finding Dynamic Library Files at Link Time
subsection.10.7.4 10.7.5 Build System Details
subsection.10.7.5 10.8 Suggested Activities
section.10.8 10.8.1 Create Your Second Module
subsection.10.8.1 10.8.2 Use artmod to Create Your Third Module
subsection.10.8.2 10.8.3 Running Many Modules at Once
subsection.10.8.3 10.8.4 Access Parts of the EventID
subsection.10.8.4 10.9 Final Remarks
section.10.9 10.9.1 Why is there no First_module.h File?
subsection.10.9.1 10.9.2 The Three-File Module Style
subsection.10.9.2 10.10 Flow of Execution from Source to FHiCL File
section.10.10 10.11 Review
section.10.11 10.12 Test Your Understanding
section.10.12 10.12.1 Tests
subsection.10.12.1 10.12.2 Answers
subsection.10.12.2 FirstBug01
subsubsection. FirstBug02

11 General Setup for Login Sessions 218

chapter.11 11.1 Source Window
section.11.1 11.2 Build Window
section.11.2 PICT PICT

12 Keeping Up to Date with Workbook Code and Documentation 220

chapter.12 12.1 Introduction
section.12.1 12.2 Special Instructions for Summer 2014
section.12.2 12.3 How to Update
section.12.3 12.3.1 Get Updated Documentation
subsection.12.3.1 12.3.2 Get Updated Code and Build It
subsection.12.3.2 12.3.3 See which Files you have Modified or Added

13 Exercise 3: Some other Member Functions of Modules 226

chapter.13 13.1 Introduction
section.13.1 13.2 Prerequisites
section.13.2 13.3 What You Will Learn
section.13.3 13.4 Setting up to Run this Exercise
section.13.4 13.5 The Source File Optional_module.cc
section.13.5 13.5.1 About the begin* Member Functions
subsection.13.5.1 13.5.2 About the art::*ID Classes
subsection.13.5.2 13.5.3 Use of the override Identifier
subsection.13.5.3 13.5.4 Use of const References
subsection.13.5.4 13.5.5 The analyze Member Function
subsection.13.5.5 13.6 Running this Exercise
section.13.6 13.7 The Member Function beginJob versus the Constructor
section.13.7 13.8 Suggested Activities
section.13.8 13.8.1 Add the Matching end Member functions
subsection.13.8.1 13.8.2 Run on Multiple Input Files
subsection.13.8.2 13.8.3 The Option --trace
subsection.13.8.3 13.9 Review
section.13.9 13.10 Test Your Understanding
section.13.10 13.10.1 Tests
subsection.13.10.1 13.10.2 Answers

14 Exercise 4: A First Look at Parameter Sets 238

chapter.14 14.1 Introduction
section.14.1 14.2 Prerequisites
section.14.2 14.3 What You Will Learn
section.14.3 14.4 Setting up to Run this Exercise
section.14.4 14.5 The Configuration File pset01.fcl
section.14.5 14.6 The Source code file PSet01_module.cc
section.14.6 14.7 Running the Exercise
section.14.7 14.8 Member Function Templates and their Arguments
section.14.8 14.8.1 Types Known to ParameterSet::get<T>
subsection.14.8.1 14.8.2 User-Defined Types
subsection.14.8.2 14.9 Exceptions (as in “Errors”)
section.14.9 14.9.1 Error Conditions
subsection.14.9.1 14.9.2 Error Handling
subsection.14.9.2 14.9.3 Suggested Exercises
subsection.14.9.3 14.10 Parameters and Data Members
section.14.10 14.11 Optional Parameters with Default Values
section.14.11 14.11.1 Policies About Optional Parameters
subsection.14.11.1 14.12 Numerical Types: Precision and Canonical Forms
section.14.12 14.12.1 Why Have Canonical Forms?
subsection.14.12.1 14.12.2 Suggested Exercises
subsection.14.12.2 Formats
subsubsection. Fractional versus Integral Types
subsubsection. 14.13 Dealing with Invalid Parameter Values
section.14.13 14.14 Review
section.14.14 14.15 Test Your Understanding
section.14.15 14.15.1 Tests
subsection.14.15.1 PICT PICT 14.15.2 Answers

15 Exercise 5: Making Multiple Instances of a Module 264

chapter.15 15.1 Introduction
section.15.1 15.2 Prerequisites
section.15.2 15.3 What You Will Learn
section.15.3 15.4 Setting up to Run this Exercise
section.15.4 15.5 The Source File Magic_module.cc
section.15.5 15.6 The FHiCL File magic.fcl
section.15.6 15.7 Running the Exercise
section.15.7 15.8 Discussion
section.15.8 15.8.1 Order of Analyzer Modules is not Important
subsection.15.8.1 15.8.2 Two Meanings of Module Label
subsection.15.8.2 15.9 Review
section.15.9 15.10 Test Your Understanding
section.15.10 15.10.1 Tests
subsection.15.10.1 15.10.2 Answers

16 Exercise 6: Accessing Data Products 271

chapter.16 16.1 Introduction
section.16.1 16.2 Prerequisites
section.16.2 16.3 What You Will Learn
section.16.3 16.4 Background Information for this Exercise
section.16.4 16.4.1 The Data Type GenParticleCollection
subsection.16.4.1 16.4.2 Data Product Names
subsection.16.4.2 16.4.3 Specifying a Data Product
subsection.16.4.3 16.4.4 The Data Product used in this Exercise
subsection.16.4.4 16.5 Setting up to Run this Exercise
section.16.5 16.6 Running the Exercise
section.16.6 16.7 Understanding the First Version, ReadGens1
section.16.7 16.7.1 The Source File ReadGens1_module.cc
subsection.16.7.1 16.7.2 Adding a Link Library to CMakeLists.txt
subsection.16.7.2 16.7.3 The FHiCL File readGens1.fcl
subsection.16.7.3 16.8 The Second Version, ReadGens2
section.16.8 16.9 The Third Version, ReadGens3
section.16.9 16.10 Suggested Activities
section.16.10 16.11 Review
section.16.11 16.12 Test Your Understanding
section.16.12 16.12.1 Tests
subsection.16.12.1 16.12.2 Answers

17 Exercise 7: Making a Histogram 291

chapter.17 17.1 Introduction
section.17.1 17.2 Prerequisites
section.17.2 17.3 What You Will Learn
section.17.3 17.4 Setting up to Run this Exercise
section.17.4 17.5 The Source File FirstHist1_module.cc
section.17.5 17.5.1 Introducing art::ServiceHandle
subsection.17.5.1 17.5.2 Creating a Histogram
subsection.17.5.2 17.5.3 Filling a Histogram
subsection.17.5.3 17.5.4 A Few Last Comments
subsection.17.5.4 17.6 The Configuration File firstHist1.fcl
section.17.6 17.7 The file CMakeLists.txt
section.17.7 17.8 Running the Exercise
section.17.8 17.9 Inspecting the Histogram File
section.17.9 17.9.1 A Short Cut: the browse command
subsection.17.9.1 17.9.2 Using CINT Scripts
subsection.17.9.2 PICT PICT 17.10 Finding ROOT Documentation
section.17.10 17.10.1 Overwriting Histogram Files
subsection.17.10.1 17.10.2 Changing the Name of the Histogram File
subsection.17.10.2 17.10.3 Changing the Module Label
subsection.17.10.3 17.10.4 Printing From the TBrowser
subsection.17.10.4 17.11 Review
section.17.11 17.12 Test Your Understanding
section.17.12 17.12.1 Tests
subsection.17.12.1 17.12.2 Answers

18 Exercise 8: Looping Over Collections 318

chapter.18 18.1 Introduction
section.18.1 18.2 Prerequisites
section.18.2 18.3 What You Will Learn
section.18.3 18.4 Setting Up to Run Exercise
section.18.4 18.5 The Class GenParticle
section.18.5 18.5.1 The Included Header Files
subsection.18.5.1 18.5.2 Particle Parent-Child Relationships
subsection.18.5.2 18.5.3 The Public Interface for the Class GenParticle
subsection.18.5.3 18.5.4 Conditionally Excluded Sections of Header File
subsection.18.5.4 18.6 The Module LoopGens1
section.18.6 18.7 CMakeLists.txt
section.18.7 18.8 Running the Exercise
section.18.8 18.9 Variations on the Exercise
section.18.9 18.9.1 LoopGens2_module.cc
subsection.18.9.1 18.9.2 LoopGens3_module.cc
subsection.18.9.2 18.9.3 LoopGens3a_module.cc
subsection.18.9.3 18.10 Review
section.18.10 18.11 Test Your Understanding
section.18.11 18.11.1 Test 1
subsection.18.11.1 18.11.2 Test 2
subsection.18.11.2 18.11.3 Test 3
subsection.18.11.3 18.11.4 Answers
subsection.18.11.4 Test 1
subsubsection. Test 2
subsubsection. Test 3

19 3D Event Displays 342

chapter.19 19.1 Introduction
section.19.1 19.2 Prerequisites
section.19.2 19.3 What You Will Learn
section.19.3 19.4 Setting up to Run this Exercise
section.19.4 19.5 Running the Exercise
section.19.5 19.5.1 Startup and General Layout
subsection.19.5.1 19.5.2 The Control Panel
subsection.19.5.2 The List-Tree Widget and Context-Sensitive Menus
subsubsection. The Event-Navigation Pane
subsubsection. 19.5.3 Main EVE Display Area
subsection.19.5.3 19.6 Understanding How the 3D Event Display Module Works
section.19.6 19.6.1 Overview of the Source Code File EventDisplay3D_module.cc
subsection.19.6.1 19.6.2 Class Declaration and Constructor
subsection.19.6.2 19.6.3 Creating the GUI and Drawing the Static Detector Components in the beginJob() Member Function
subsection.19.6.3 The Default GUI
subsubsection. Adding the Global Elements
subsubsection. Customizing the GUI
subsubsection. Adding the Navigation Pane
subsubsection. PICT PICT 19.6.4 Drawing the Generated Hits and Tracks in the analyze() Member Function

20 Troubleshooting 379

chapter.20 20.1 Updating Workbook Code
section.20.1 20.2 XWindows (xterm and Other XWindows Products)
section.20.2 20.2.1 Mac OSX 10.9
subsection.20.2.1 20.3 Trouble Building
section.20.3 20.4 art Won’t Run
section.20.4 III  User’s Guide 381


21 git 382

chapter.21 21.1 Aside: More Details about git
section.21.1 21.1.1 Central Repository, Local Repository and Working Directory
subsection.21.1.1 Files that you have Added
subsubsection. Files that you have Modified
subsubsection. Files with Resolvable Conflicts
subsubsection. Files with Unresolvable Conflicts
subsubsection. 21.1.2 git Branches
subsection.21.1.2 21.1.3 Seeing which Files you have Modified or Added

22 art Run-time and Development Environments 391

chapter.22 22.1 The art Run-time Environment
section.22.1 22.2 The art Development Environment

23 art Framework Parameters 399

chapter.23 23.1 Parameter Types
section.23.1 23.2 Structure of art Configuration Files
section.23.2 23.3 Services
section.23.3 23.3.1 System Services
subsection.23.3.1 23.3.2 FloatingPointControl
subsection.23.3.2 23.3.3 Message Parameters
subsection.23.3.3 23.3.4 Optional Services
subsection.23.3.4 23.3.5 Sources
subsection.23.3.5 23.3.6 Modules

24 Job Configuration in art: FHiCL 405

chapter.24 24.1 Basics of FHiCL Syntax
section.24.1 24.1.1 Specifying Names and Values
subsection.24.1.1 24.1.2 FHiCL-reserved Characters and Identifiers
subsection.24.1.2 24.2 FHiCL Identifiers Reserved to art
section.24.2 24.3 Structure of a FHiCL Run-time Configuration File for art
section.24.3 24.4 Order of Elements in a FHiCL Run-time Configuration File for art
section.24.4 PICT PICT 24.5 The physics Portion of the FHiCL Configuration
section.24.5 24.6 Choosing and Using Module Labels and Path Names
section.24.6 24.7 Scheduling Strategy in art
section.24.7 24.8 Scheduled Reconstruction using Trigger Paths
section.24.8 24.9 Reconstruction On-Demand
section.24.9 24.10 Bits and Pieces
section.24.10 IV  Appendices 424


A Obtaining Credentials to Access Fermilab Computing Resources 425

appendix.A A.1 Kerberos Authentication
section.A.1 A.2 Fermilab Services Account

B Installing Locally 427

appendix.B B.1 Install the Binary Distributions: A Cheat Sheet
section.B.1 B.2 Preparing the Site Specific Setup Script
section.B.2 B.3 Links to the Full Instructions

C art Completion Codes 432


D Viewing and Printing Figure Files 435

appendix.D D.1 Viewing Figure Files Interactively
section.D.1 D.2 Printing Figure Files


appendix.E E.1 Introduction
section.E.1 E.2 Multiple Meanings of Vector in CLHEP
section.E.2 E.3 CLHEP Documentation
section.E.3 E.4 CLHEP Header Files
section.E.4 E.4.1 Naming Conventions and Syntax
subsection.E.4.1 E.4.2 .icc Files
subsection.E.4.2 E.5 The CLHEP Namespace
section.E.5 E.5.1 using Declarations and Directives
subsection.E.5.1 E.6 The Vector Package
section.E.6 E.6.1 CLHEP::Hep3Vector
subsection.E.6.1 E.6.1.1 Some Fragile Member Functions
subsubsection.E.6.1.1 E.6.2 CLHEP::HepLorentzVector
subsection.E.6.2 E.6.2.1 HepBoost
subsubsection.E.6.2.1 E.7 The Matrix Package
section.E.7 E.8 The Random Package

F Include Guards 453 PICT PICT

appendix.F V  Index 455


Index 456

section*.87 PICT PICT

Detailed Table of Contents

List of Figures
List of Tables
List of Code and Output Listings
I  Introduction
1 How to Read this Documentation
 1.1 If you are new to HEP Software...
 1.2 If you are an HEP Software expert...
 1.3 If you are somewhere in between...
2 Conventions Used in this Documentation
 2.1 Terms in Glossary
 2.2 Typing Commands
 2.3 Listing Styles
 2.4 Procedures to Follow
 2.5 Important Items to Call Out
 2.6 Site-specific Information
3 Introduction to the art Event Processing Framework
 3.1 What is art and Who Uses it? PICT PICT
 3.2 Why art?
 3.3 C++ and C++11
 3.4 Getting Help
 3.5 Overview of the Documentation Suite
  3.5.1 The Introduction
  3.5.2 The Workbook
  3.5.3 Users Guide
  3.5.4 Reference Manual
  3.5.5 Technical Reference
  3.5.6 Glossary
 3.6 Some Background Material
  3.6.1 Events and Event IDs
  3.6.2 art Modules and the Event Loop
  3.6.3 Module Types
  3.6.4 art Data Products
  3.6.5 art Services
  3.6.6 Dynamic Libraries and art
  3.6.7 Build Systems and art
  3.6.8 External Products
  3.6.9 The Event-Data Model and Persistency
  3.6.10 Event-Data Files
  3.6.11 Files on Tape
 3.7 The Toy Experiment
  3.7.1 Toy Detector Description PICT PICT
  3.7.2 Workflow for Running the Toy Experiment Code
 3.8 Rules, Best Practices, Conventions and Style
4 Unix Prerequisites
 4.1 Introduction
 4.2 Commands
 4.3 Shells
 4.4 Scripts: Part 1
 4.5 Unix Environments
  4.5.1 Building up the Environment
  4.5.2 Examining and Using Environment Variables
 4.6 Paths and $PATH
 4.7 Scripts: Part 2
 4.8 bash Functions and Aliases
 4.9 Login Scripts
 4.10 Suggested Unix and bash References
5 Site-Specific Setup Procedure
6 Get your C++ up to Speed
 6.1 Introduction
 6.2 File Types Used and Generated in C++ Programming
 6.3 Establishing the Environment
  6.3.1 Initial Setup
  6.3.2 Subsequent Logins
 6.4 C++ Exercise 1: Basic C++ Syntax and Building an Executable
  6.4.1 Concepts to Understand PICT PICT
  6.4.2 How to Compile, Link and Run
  6.4.3 Discussion
 6.5 C++ Exercise 2: About Compiling and Linking
  6.5.1 What You Will Learn
  6.5.2 The Source Code for this Exercise
  6.5.3 Compile, Link and Run the Exercise
  6.5.4 Alternate Script build2
  6.5.5 Suggested Homework
 6.6 C++ Exercise 3: Libraries
  6.6.1 What You Will Learn
  6.6.2 Building and Running the Exercise
 6.7 Classes
  6.7.1 Introduction
  6.7.2 C++ Exercise 4 v1: The Most Basic Version
  6.7.3 C++ Exercise 4 v2: The Default Constructor
  6.7.4 C++ Exercise 4 v3: Constructors with Arguments
  6.7.5 C++ Exercise 4 v4: Colon Initializer Syntax
  6.7.6 C++ Exercise 4 v5: Member functions
  6.7.7 C++ Exercise 4 v6: Private Data and Accessor Methods
  6.7.8 C++ Exercise 4 v7: The inline Specifier
  6.7.9 C++ Exercise 4 v8: Defining Member Functions within the Class Declaration
  6.7.10 C++ Exercise 4 v9: The Stream Insertion Operator and Free Functions
  6.7.11 Review PICT PICT
 6.8 Overloading functions
 6.9 C++ References
7 Using External Products in UPS
 7.1 The UPS Database List: PRODUCTS
 7.2 UPS Handling of Variants of a Product
 7.3 The setup Command: Syntax and Function
 7.4 Current Versions of Products
 7.5 Environment Variables Defined by UPS
 7.6 Finding Header Files
  7.6.1 Introduction
  7.6.2 Finding art Header Files
  7.6.3 Finding Headers from Other UPS Products
  7.6.4 Exceptions: The Workbook, ROOT and Geant4
II  Workbook
8 Preparation for Running the Workbook Exercises
 8.1 Introduction
 8.2 Getting Computer Accounts on Workbook-enabled Machines
 8.3 Choosing a Machine and Logging In
 8.4 Launching new Windows: Verify X Connectivity
 8.5 Choose an Editor
9 Exercise 1: Running Pre-built art Modules
 9.1 Introduction
 9.2 Prerequisites
 9.3 What You Will Learn PICT PICT
 9.4 The art Run-time Environment
 9.5 The Input and Configuration Files for the Workbook Exercises
 9.6 Setting up to Run Exercise 1
  9.6.1 Log In and Set Up
 9.7 Execute art and Examine Output
 9.8 Understanding the Configuration
  9.8.1 Some Bookkeeping Syntax
  9.8.2 Some Physics Processing Syntax
  9.8.3 art Command line Options
  9.8.4 Maximum Number of Events to Process
  9.8.5 Changing the Input Files
  9.8.6 Skipping Events
  9.8.7 Identifying the User Code to Execute
  9.8.8 Paths and the art Workflow
  9.8.9 Writing an Output File
 9.9 Understanding the Process for Exercise 1
  9.9.1 Follow the Site-Specific Setup Procedure (Details)
  9.9.2 Make a Working Directory (Details)
  9.9.3 Setup the toyExperiment UPS Product (Details)
  9.9.4 Copy Files to your Current Working Directory (Details)
  9.9.5 Source makeLinks.sh (Details)
  9.9.6 Run art (Details)
 9.10 How does art find Modules?
 9.11 How does art find FHiCL Files? PICT PICT
  9.11.1 The -c command line argument
  9.11.2 #include Files
 9.12 Review
 9.13 Test your Understanding
  9.13.1 Tests
  9.13.2 Answers
10 Exercise 2: Building and Running Your First Module
 10.1 Introduction
 10.2 Prerequisites
 10.3 What You Will Learn
 10.4 Initial Setup to Run Exercises
  10.4.1 “Source Window” Setup
  10.4.2 Examine Source Window Setup
  10.4.3 “Build Window” Setup
  10.4.4 Examine Build Window Setup
 10.5 The art Development Environment
 10.6 Running the Exercise
  10.6.1 Run art on first.fcl
  10.6.2 The FHiCL File first.fcl
  10.6.3 The Source Code File First_module.cc
 10.7 What does the Build System Do?
  10.7.1 The Basic Operation
  10.7.2 Incremental Builds and Complete Rebuilds
  10.7.3 Finding Header Files at Compile Time PICT PICT
  10.7.4 Finding Dynamic Library Files at Link Time
  10.7.5 Build System Details
 10.8 Suggested Activities
  10.8.1 Create Your Second Module
  10.8.2 Use artmod to Create Your Third Module
  10.8.3 Running Many Modules at Once
  10.8.4 Access Parts of the EventID
 10.9 Final Remarks
  10.9.1 Why is there no First_module.h File?
  10.9.2 The Three-File Module Style
 10.10 Flow of Execution from Source to FHiCL File
 10.11 Review
 10.12 Test Your Understanding
  10.12.1 Tests
  10.12.2 Answers
11 General Setup for Login Sessions
 11.1 Source Window
 11.2 Build Window
12 Keeping Up to Date with Workbook Code and Documentation
 12.1 Introduction
 12.2 Special Instructions for Summer 2014
 12.3 How to Update
  12.3.1 Get Updated Documentation
  12.3.2 Get Updated Code and Build It PICT PICT
  12.3.3 See which Files you have Modified or Added
13 Exercise 3: Some other Member Functions of Modules
 13.1 Introduction
 13.2 Prerequisites
 13.3 What You Will Learn
 13.4 Setting up to Run this Exercise
 13.5 The Source File Optional_module.cc
  13.5.1 About the begin* Member Functions
  13.5.2 About the art::*ID Classes
  13.5.3 Use of the override Identifier
  13.5.4 Use of const References
  13.5.5 The analyze Member Function
 13.6 Running this Exercise
 13.7 The Member Function beginJob versus the Constructor
 13.8 Suggested Activities
  13.8.1 Add the Matching end Member functions
  13.8.2 Run on Multiple Input Files
  13.8.3 The Option --trace
 13.9 Review
 13.10 Test Your Understanding
  13.10.1 Tests
  13.10.2 Answers
14 Exercise 4: A First Look at Parameter Sets
 14.1 Introduction PICT PICT
 14.2 Prerequisites
 14.3 What You Will Learn
 14.4 Setting up to Run this Exercise
 14.5 The Configuration File pset01.fcl
 14.6 The Source code file PSet01_module.cc
 14.7 Running the Exercise
 14.8 Member Function Templates and their Arguments
  14.8.1 Types Known to ParameterSet::get<T>
  14.8.2 User-Defined Types
 14.9 Exceptions (as in “Errors”)
  14.9.1 Error Conditions
  14.9.2 Error Handling
  14.9.3 Suggested Exercises
 14.10 Parameters and Data Members
 14.11 Optional Parameters with Default Values
  14.11.1 Policies About Optional Parameters
 14.12 Numerical Types: Precision and Canonical Forms
  14.12.1 Why Have Canonical Forms?
  14.12.2 Suggested Exercises
 14.13 Dealing with Invalid Parameter Values
 14.14 Review
 14.15 Test Your Understanding
  14.15.1 Tests
  14.15.2 Answers PICT PICT
15 Exercise 5: Making Multiple Instances of a Module
 15.1 Introduction
 15.2 Prerequisites
 15.3 What You Will Learn
 15.4 Setting up to Run this Exercise
 15.5 The Source File Magic_module.cc
 15.6 The FHiCL File magic.fcl
 15.7 Running the Exercise
 15.8 Discussion
  15.8.1 Order of Analyzer Modules is not Important
  15.8.2 Two Meanings of Module Label
 15.9 Review
 15.10 Test Your Understanding
  15.10.1 Tests
  15.10.2 Answers
16 Exercise 6: Accessing Data Products
 16.1 Introduction
 16.2 Prerequisites
 16.3 What You Will Learn
 16.4 Background Information for this Exercise
  16.4.1 The Data Type GenParticleCollection
  16.4.2 Data Product Names
  16.4.3 Specifying a Data Product
  16.4.4 The Data Product used in this Exercise PICT PICT
 16.5 Setting up to Run this Exercise
 16.6 Running the Exercise
 16.7 Understanding the First Version, ReadGens1
  16.7.1 The Source File ReadGens1_module.cc
  16.7.2 Adding a Link Library to CMakeLists.txt
  16.7.3 The FHiCL File readGens1.fcl
 16.8 The Second Version, ReadGens2
 16.9 The Third Version, ReadGens3
 16.10 Suggested Activities
 16.11 Review
 16.12 Test Your Understanding
  16.12.1 Tests
  16.12.2 Answers
17 Exercise 7: Making a Histogram
 17.1 Introduction
 17.2 Prerequisites
 17.3 What You Will Learn
 17.4 Setting up to Run this Exercise
 17.5 The Source File FirstHist1_module.cc
  17.5.1 Introducing art::ServiceHandle
  17.5.2 Creating a Histogram
  17.5.3 Filling a Histogram
  17.5.4 A Few Last Comments
 17.6 The Configuration File firstHist1.fcl PICT PICT
 17.7 The file CMakeLists.txt
 17.8 Running the Exercise
 17.9 Inspecting the Histogram File
  17.9.1 A Short Cut: the browse command
  17.9.2 Using CINT Scripts
 17.10 Finding ROOT Documentation
  17.10.1 Overwriting Histogram Files
  17.10.2 Changing the Name of the Histogram File
  17.10.3 Changing the Module Label
  17.10.4 Printing From the TBrowser
 17.11 Review
 17.12 Test Your Understanding
  17.12.1 Tests
  17.12.2 Answers
18 Exercise 8: Looping Over Collections
 18.1 Introduction
 18.2 Prerequisites
 18.3 What You Will Learn
 18.4 Setting Up to Run Exercise
 18.5 The Class GenParticle
  18.5.1 The Included Header Files
  18.5.2 Particle Parent-Child Relationships
  18.5.3 The Public Interface for the Class GenParticle
  18.5.4 Conditionally Excluded Sections of Header File PICT PICT
 18.6 The Module LoopGens1
 18.7 CMakeLists.txt
 18.8 Running the Exercise
 18.9 Variations on the Exercise
  18.9.1 LoopGens2_module.cc
  18.9.2 LoopGens3_module.cc
  18.9.3 LoopGens3a_module.cc
 18.10 Review
 18.11 Test Your Understanding
  18.11.1 Test 1
  18.11.2 Test 2
  18.11.3 Test 3
  18.11.4 Answers
19 3D Event Displays
 19.1 Introduction
 19.2 Prerequisites
 19.3 What You Will Learn
 19.4 Setting up to Run this Exercise
 19.5 Running the Exercise
  19.5.1 Startup and General Layout
  19.5.2 The Control Panel
  19.5.3 Main EVE Display Area
 19.6 Understanding How the 3D Event Display Module Works PICT PICT
  19.6.1 Overview of the Source Code File EventDisplay3D_module.cc
  19.6.2 Class Declaration and Constructor
  19.6.3 Creating the GUI and Drawing the Static Detector Components in the beginJob() Member Function
  19.6.4 Drawing the Generated Hits and Tracks in the analyze() Member Function
20 Troubleshooting
 20.1 Updating Workbook Code
 20.2 XWindows (xterm and Other XWindows Products)
  20.2.1 Mac OSX 10.9
 20.3 Trouble Building
 20.4 art Won’t Run
III  User’s Guide
21 git
 21.1 Aside: More Details about git
  21.1.1 Central Repository, Local Repository and Working Directory
  21.1.2 git Branches
  21.1.3 Seeing which Files you have Modified or Added
22 art Run-time and Development Environments
 22.1 The art Run-time Environment
 22.2 The art Development Environment
23 art Framework Parameters
 23.1 Parameter Types
 23.2 Structure of art Configuration Files PICT PICT
 23.3 Services
  23.3.1 System Services
  23.3.2 FloatingPointControl
  23.3.3 Message Parameters
  23.3.4 Optional Services
  23.3.5 Sources
  23.3.6 Modules
24 Job Configuration in art : FHiCL
 24.1 Basics of FHiCL Syntax
  24.1.1 Specifying Names and Values
  24.1.2 FHiCL-reserved Characters and Identifiers
 24.2 FHiCL Identifiers Reserved to art
 24.3 Structure of a FHiCL Run-time Configuration File for art
 24.4 Order of Elements in a FHiCL Run-time Configuration File for art
 24.5 The physics Portion of the FHiCL Configuration
 24.6 Choosing and Using Module Labels and Path Names
 24.7 Scheduling Strategy in art
 24.8 Scheduled Reconstruction using Trigger Paths
 24.9 Reconstruction On-Demand
 24.10 Bits and Pieces
IV  Appendices
A Obtaining Credentials to Access Fermilab Computing Resources
 A.1 Kerberos Authentication
 A.2 Fermilab Services Account PICT PICT
B Installing Locally
 B.1 Install the Binary Distributions: A Cheat Sheet
 B.2 Preparing the Site Specific Setup Script
 B.3 Links to the Full Instructions
C art Completion Codes
D Viewing and Printing Figure Files
 D.1 Viewing Figure Files Interactively
 D.2 Printing Figure Files
 E.1 Introduction
 E.2 Multiple Meanings of Vector in CLHEP
 E.3 CLHEP Documentation
 E.4 CLHEP Header Files
  E.4.1 Naming Conventions and Syntax
  E.4.2 .icc Files
 E.5 The CLHEP Namespace
  E.5.1 using Declarations and Directives
 E.6 The Vector Package
  E.6.1 CLHEP::Hep3Vector
  E.6.2 CLHEP::HepLorentzVector
 E.7 The Matrix Package
 E.8 The Random Package
F Include Guards


List of Figures

3.1 Principal components of the art documentation suite
3.2 Flowchart describing the art event loop
3.3 Geometry of the toy experiment’s detector
3.4 Event display of a simulated event in the toy detector.
3.5 Event display of another simulated event in the toy detector
3.6 Invariant mass of reconstructed pairs of oppositely charged tracks
4.1 Computing environment hierarchies
6.1 Memory diagram at the end of a run of Classes/v1/ptest.cc
6.2 Memory diagram at the end of a run of Classes/v6/ptest.cc
9.1 Elements of the art run-time environment for the first exercise
10.1 Representation of reader’s source directory structure
10.2 Representation of reader’s build directory structure
10.3 Elements of the art development environment and information flow
10.4 Reader’s directory structure once development environment is established
17.1 TBrowser window after opening output/firstHist1.root
17.2 TBrowser window after displaying the histogram hNGens;1.
17.3 Figure made by running the CINT script drawHist1.C.
18.1 Histograms made by loopGens1.fcl
19.1 The TEveBrowser is a specialization of the ROOT TBrowser for the ROOT EVE Event Visualization Environment. Shown above is the one used in this workbook exercise which is divided into three major regions: 1) a control panel, 2) a main EVE display area, and 3) a ROOT command console.
a Top-level items
b Second-level items
19.3 Shown above are two different views of the list-tree widget, showing the top-level items in (a), and expanded to second-level items in (b).
a WindowManager
b Viewers
19.5 Expanded views of the (a) WindowManager and (b) Viewers list-tree items.
a Scenes
b Event
19.7 Expanded views of the (a) Scenes and (b) Event list-tree items.
19.8 The context-sensitive menu below the list-tree widget changes in response to the selected list-tree item. PICT PICT Shown above is the menu for a viewer-type item. In this example, we have enabled an interactive clipping plane.
19.9 Shown above is the context-sensitive menu displayed below the list-tree widget when a track element is selected.
19.10 The Event Nav pane on the control panel.
19.11 The orthographic XY and RZ views in the Ortho Views tabbed pane of the main EVE display panel.
19.12 Hovering the mouse cursor over the lower edge of the tite bar in a viewport reveals a pull-down menu bar with more options.
a Track pop-up
b Hit pop-up
19.14 Tooltips with relevant information show up when the mouse cursor is hovered over (a) track and (b) hit elements
21.1 Illustration of git branches, simple
21.2 Illustration of git branches
22.1 art run-time environment (same as Figure 9.1)
22.2 art run-time environment (everything pre-built)
22.3 art run-time environment (with officially tracked inputs)
22.4 art development environment for Workbook (same as Figure 10.3)
22.5 art development environment (for building full code base)
22.6 art development environment (for building against prebuilt base)


List of Tables

3.1 Compiler flags for the optimization levels defined by cetbuildtools
3.2 Units used in the Workbook
5.1 Site-specific setup procedures for experiments that run art
7.1 Namespaces for selected UPS products
8.1 Experiment-specific Information for new users
8.2 Login machines for running the Workbook exercises
9.1 Input files provided for the Workbook exercises
10.1 Compiler and linker flags for a profile build
14.1 Canonical forms of numerical values in FHiCL files
23.1 art Floating Point Parameters
23.2 art Message Parameters
C.1 art completion status codes. The return code is the least signficant byte of the status code.
E.1 Selected member functions of CLHEP::Hep3Vector


List of Code and Output Listings





Part I


Chapter 1
How to Read this Documentation

The art document suite, which is currently in an alpha release form, consists of an introductory section and the first few exercises of the Workbook1 , plus a glossary and an index. There are also some preliminary (incomplete and unreviewed) portions of the Users Guide included in the compilation.

The Workbook exercises require you to download some code to edit, execute and evaluate. Both the documentation and the code it references are expected to undergo continual development throughout 2013 and 2014. The latest is always available at the art Documentation website. PIC Chapter 12 tells you how to keep up-to-date with improvements and additions to the Workbook code and documentation.

1.1 If you are new to HEP Software...

Read Parts I and II (the introductory material and the Workbook) from start to finish. The Workbook is aimed at an audience who is familiar with (although not necessarily expert in) Unix, C++ and Fermilab’s UPS product management system, and who understands the basic art framework concepts. The introductory chapters prepare the “just starting out” reader in all these areas.

1.2 If you are an HEP Software expert...

Read chapters 12 and 3: this is where key terms and concepts used throughout the art document suite get defined. Skip the rest of the introductory material and jump straight into running Exercise 1 in Chapter 9 of the Workbook. Take the approach of: Don’t need it? Don’t read it. PICT PICT

1.3 If you are somewhere in between...

Read chapters 12 and 3 and skim the remaining introductory material in Part I to glean what you need. Along with the experts, you can take the approach of: Don’t need it? Don’t read it. PICT PICT

Chapter 2
Conventions Used in this Documentation

Most of the material in this introduction and in the Workbook is written so that it can be understood by those new to HEP computing; if it is not, please let us know (see Section 3.4)!

2.1 Terms in Glossary

The first instance of each term that is defined in the glossary is written in italics PIC followed by a γ (Greek letter gamma), e.g., framework(γ).

2.2 Typing Commands

Unix commands that you must type are shown in the format unix command. Portions of the command for which you must substitute values are shown in slanted font within the command. e.g., you would type your actual username when you see username).

While art supports OS X as well as flavors of Linux, the instructions for using art are nearly identical for all supported systems. When operating-system specific instructions are needed they are noted in the exercises.

When an example Unix command line would overflow the page width, this documentation will use a trailing backslash to indicate that the command is continued on the next line. We indent the second line to make clear that it is not a separate command from the first line. For example:

mkdir -p $ART_WORKBOOK_WORKING_BASE/username/workbook-tutorial/\

You can type the entire command on a single line if it fits, without typing the backslash, or on two lines with the backslash as the final character of the first line. Do not leave a space before the backslash unless it is required in the command syntax, e.g., before an option, as in

mkdir \
-p mydir

2.3 Listing Styles

Code listings in C++ are shown as:

1// This is a C++ file listing. 
2float* pa = &a;  

Code listings in FHiCL are shown as:

1// This is a FHiCL file listing. 
2source: { 
3   module_type : RootInput 

Other script or file content is denoted:

1This represents script contents.

Computer output from a command is shown as:

This is output from a command.

2.4 Procedures to Follow

Step-by-step procedures that the reader is asked to follow are denoted in the following way:


SVG-Viewer needed.

2.5 Important Items to Call Out

Occasionally, text will be called out to make sure that you don’t miss it. Important or tricky terms and PIC concepts will be marked with an “pointing finger” symbol in the margin, as shown at right.

Items that are even trickier will be marked with a “bomb” symbol in the margin, as shown at right. You PIC really want to avoid the problems they describe.

In some places it will be necessary for a paragraph or two to be written for experts. Such paragraphs will be marked with a “dangerous bends” symbol in the margin, as shown at right. Less experienced users can skip these sections on first reading and PIC come back to them at a later time.

2.6 Site-specific Information

Text that refers in particular to Fermilab-specific information is marked with a Fermilab picture, as shown at right. PIC

Text that refers in particular to information about using art at non-Fermilab sites is marked with a “generic site” picture, as shown at right. PIC A site is defined as a unique PICT PICT combination of experiment and location, and is used to refer to a set of computing resources configured for use by a particular experiment at a particular location. Two examples of sites are the Fermilab supplied resources used by your experiment and the group computing resources an institution that collaborates on your experiment. If you have the necessary software installed on your own laptop, it is also a site. Similarly for your own desktop.

Experiment-specific information will be kept to an absolute minimum; wherever it appears, it will be marked with an experiment-specific icon, e.g., the Mu2e icon PIC at right. PICT PICT

Chapter 3
Introduction to the art Event Processing Framework

3.1 What is art and Who Uses it?

art(γ) is an event-processing framework(γ) developed and supported by the Fermilab Scientific Computing Division (SCD). The art framework is used to build physics programs by loading physics algorithms, provided as plug-in modules. Each experiment or user group may write and manage its own modules. art also provides infrastructure for common tasks, such as reading input, writing output, provenance tracking, database access and run-time configuration.

The initial clients of art are the Fermilab Intensity Frontier experiments but nothing prevents other experiments from using it as well. The name art is always written in italic lower case; it is not an acronym.

art is written in C++ and is intended to be used with user code written in C++. (User code includes experiment-specific code and any other user-written, non-art, non-external-product(γ) code.)

art has been designed for use in most places that a typical HEP experiment might require a software framework, including:

art is not designed for use in real-time environments, such as the direct interface with data-collection hardware.

The Fermilab SCD has also developed a related product named artdaq(γ), a layer that lives on top of art and provides features to support the construction of data-acquisition (DAQ(γ)) systems based on commodity servers. Further discussion of artdaq is outside the scope of this documentation; for more information consult the artdaq home page:

A technial paper on artdaq is available at: http://inspirehep.net/record/1229212?ln=en;

The design of art has been informed by the lessons learned by the many High Energy Physics (HEP) experiments that have developed C++ based frameworks over the past 20 years. In particular, it was originally forked from the framework for the CMS experiment, cmsrun.

Experiments using art are listed at the art Documentation website under “Experiments using art.”

3.2 Why art?

In all previous experiments at Fermilab, and in most previous experiments elsewhere, infrastructure software (i.e., the framework, broadly construed – mostly forms of bookkeeping) has been written in-house by each experiment, and each implementation has been tightly coupled to that experiment’s code. This tight coupling has made it difficult to share the framework among experiments, resulting in both great duplication of effort and mixed quality.

art was created as a way to share a single framework across many experiments. In particular, the PICT PICT design of art draws a clear boundary between the framework and the user code; the art framework (and other aspects of the infrastructure) is developed and maintained by software engineers who are specialists in the field of HEP infrastructure software; this provides a robust, professionaly maintained foundation upon which physicists can develop the code for their experiments. Experiments use art as an external package. Despite some constraints that this separation imposes, it has improved the overall quality of the framework and reduced the duplicated effort.

3.3 C++ and C++11

In 2011, the International Standards Committee voted to approve a new standard for C++, called C++ 11.

Much of the existing user code was written prior to the adoption of the C++ 11 standard and has not yet been updated. As you work on your experiment, you are likely to encounter both code written the new way and code written the old way. Therefore, the Workbook will often illustrate both practices.

A very useful compilation of what is new in C++ 11 can be found at


This reference material is written for advanced C++ users. PIC PICT PICT

3.4 Getting Help

Please send your questions and comments to art-users@fnal.gov. More support information is listed at https://web.fnal.gov/project/ArtDoc/SitePages/Support.aspx.

3.5 Overview of the Documentation Suite

When complete, this documentation suite will contain several principal components, or volumes: the introduction that you are reading now, a Workbook, a Users Guide, a Reference Manual, a Technical Reference and a Glossary. At the time of writing, drafts exist for the Introduction, the Workbook, the Users Guide and the Glossary. The components in the documentation suite are illustrated in Figure 3.1.



3.5.1 The Introduction

This introductory volume is intended to set the stage for using art. It introduces art, provides background material, describes some of the software tools on which art depends, describes its interaction with related software and identifies prerequisites for successfully completing the Workbook exercises.

3.5.2 The Workbook

The Workbook is a series of standalone, self-paced exercises that will introduce the building blocks of the art framework and the concepts around which it is built, show practical applications of this framework, and provide references to other portions of the documentation suite as needed. It is targeted towards physicists who are new users of art, with the understanding that such users will frequently be new to the field of computing for HEP and to C++.

One of the Workbook’s primary functions is training readers how and where to find more extensive documentation on both art and external software tools; they will need this information as they move on to develop and use the scientific software for their experiment.

The Workbook assumes some basic computing skills and some basic familiarity with the C++ computing language; Chapter 6 provides a tutorial/refresher for readers who need to improve their C++ skills.

The Workbook is written using recommended best practices that have become current since the adoption of C++ 11 (see Section 3.8).

Because art is being used by many experiments, the Workbook exercises are designed around a toy experiment that is greatly simplified compared to any actual detector, but it incorporates enough richness to illustrate most of the features of art. The goal is to enable the physicists who PICT PICT work through the exercises to translate the lessons learned there into the environment of their own experiments.

3.5.3 Users Guide

The Users Guide is targeted at physicists who have reached an intermediate level of competence with art and its underlying tools. It contains detailed descriptions of the features of art, as seen by the physicists. The Users Guide will provide references to the external products(γ) on which art depends, information on how art uses these products, and as needed, documentation that is missing from the external products’ own documentation.

3.5.4 Reference Manual

The Reference Manual will be targeted at physicists who already understand the major ideas underlying art and who need a compact reference to the Application Programmer Interface (API(γ)). The Reference Manual will likely be generated from annoted source files, possibly using Doxygen(γ).

3.5.5 Technical Reference

The Technical Reference will be targeted at the experts who develop and maintain art; few physicists will ever want or need to consult it. It will document the internals of art so that a broader group of people can participate in development and maintenance.

3.5.6 Glossary

The glossary will evolve as the documentation set grows. At the time of writing, it includes definitions of art-specific terms as well as some HEP, Fermilab, C++ and other relevant computing-related terms used in the Workbook and the Users Guide. PICT PICT

3.6 Some Background Material

This section defines some language and some background material about the art framework that you will need to understand before starting the Workbook.

3.6.1 Events and Event IDs

In almost all HEP experiments, the core idea underlying all bookkeeping is the event(γ). In a triggered experiment, an event is defined as all of the information associated with a single trigger; in an untriggered, spill-oriented experiment, an event is defined as all of the information associated with a single spill of the beam from the accelerator. Another way of saying this is that an event contains all of the information associated with some time interval, but the precise definition of the time interval changes from one experiment to another 1. Typically these time intervals are a few nanoseconds to a few tens of mircoseconds. The information within an event includes both the raw data read from the Data Acquisition System (DAQ) and all information that is derived from that raw data by the reconstruction and analysis algorithms. An event is the smallest unit of data that art can process at one time.

In a typical HEP experiment, the trigger or DAQ system assigns an event identifier (event ID) to each event; this ID uniquely identifies each event, satisfying a critical requirement imposed by art that each event be uniquely identifable by its event ID. This requirement also applies to PICT PICT simulated events.

The simplest event ID is a monotonically increasing integer. A more common practice is to define a multi-part ID and art has chosen to use a three-part ID, including:

There are two common methods of using this event ID scheme and art allows experiments to chose either:

  1. When an experiment takes data, the event number is incremented every event. When some predefined condition occurs, the event number is reset to 1 and the subRun number is incremented, keeping the run number unchanged. This cycle repeats until some other predefined condition occurs, at which time the event number is reset to 1, the subRun number is reset to 0 (0 not 1 for historical reasons) and the run number is incremented.
  2. The second method is the same as the first except that the event number monontonically increases throughout a run and does not reset to 1 on subRun boundaries. The event number does reset to 1 at the start of each run.

art does not define what conditions cause these transitions; those decisions are left to each experiment. Typically experiments will choose to start new runs or new subRuns when one of the following happens: a preset number of events is acquired; a preset time interval expires; a disk file holding the ouptut reaches a preset size; or certain running conditions change.

art requires only that a subRun contain zero or more events and that a run contain zero or more subRuns.

When an experiment takes data, events read from the DAQ are typically written to disk files, PICT PICT with copies made on tape. The events in a single subRun may be spread over several files; conversely, a single file may contain many runs, each of which contains many subRuns.

3.6.2 art Modules and the Event Loop

Users provide executable code to art in pieces called art modules(γ)2 that are dynamically loaded as plugins and that operate on event data. The concept of reading events and, in response to each new event, calling the appropriate member functions of each module, is referred to as the event loop(γ). The concepts of the art module and the event loop will be illustrated via the following discussion of how art processes a job.

The simplest command to run art looks like:

art -c filename.fcl

The argument to -c is the run-time configuration file(γ), a text file that tells one run of art what it should do. Run-time configuration files for art are written in the Fermilab Hierarchical Configuration Language FHiCL(γ) (pronounced “fickle”) and the filenames end in .fcl. As you progress through the Workbook, this language and the conventions used in the run-time configuration file will be explained; the full details are available in Chapter 24 of the Users Guide. (The run-time configuration file is often referred to as simply the configuration file or PICT PICT even more simply as just the configuration(γ).)

When art starts up, it reads the configuration file to learn what input files it should read, what user code it should run and what output files it should write. As mentioned above, an experiment’s code (including any code written by individual experimenters) is provided in units called art modules. A module is simply a C++ class, provided by the experiment or user, that obeys a set of rules defined by art and whose source code(γ) file gets compiled into a PIC dynamic library(γ) that can be loaded at run-time by art.

These rules will be explained as you work through the Workbook and they are summarized in a future chapter in the User’s Guide.

The code base of a typical experiment will contain many C++ classes. Only a small fraction of these will be modules; most of the rest will be ordinary C++ classes that are used within modules3 .

A user can tell art the order in which modules should be run by specifying that order in the configuration file. A user can also tell art to determine, on its own, the correct order in which to run modules; the latter option is referred to as reconstruction on demand.

Imagine the processing of each event as the assembly of a widget on an assembly line and imagine each module as a worker that needs to perform a set task on each widget. Each worker has a task that must be done on each widget that passes by; in addition some workers may need to do some start-up or close-down jobs. Following this metaphor, art requires that each module provide code that will be called once for every event and art allows any module to provide code that will be called at the following times:

For those of you who are familiar with inheritance in C++, a module class (i.e., a “module”) must inherit from one of a few different module base classes. Each module class must override one pure-virtual member function from the base class and PIC it may override other virtual member functions from the base class.

After art completes its initialization phase (intentionally not detailed here), it executes the event loop, illustrated in Figure 3.2, and enumerated below.



The event loop

  1. calls the constructor(γ) of every module in the configuration.
  2. calls the beginJob member function(γ) of every module that provides one.
  3. reads one event from the input source, and for that event
    1. determines if it is from a run different from that of the previous event (true for first event in loop);
    2. if so, calls the beginRun member function of each module that provides one;
    3. determines if the event is from a subRun different from that of the previous event (true for first event in loop);
    4. if so, calls the beginSubRun member function of each module that provides one;
    5. calls each module’s (required) per-event member function.
  4. reads the next event and repeats the above per-event steps until it encounters a new subRun.
  5. closes out the current subRun by calling the endSubRun member function of each module that provides one.
  6. repeats steps 4 and 5 until it encounters a new run.
  8. closes out the current run by calling the endRun member function of each module that provides one.
  9. repeats steps 3 through 7 until it reaches the end of the input source.
  10. calls the endJob member function of each module that provides one.
  11. calls the destructor(γ) of each module.

This entire set of steps comprises the event loop. One of art’s most visible jobs is controlling the event loop.

3.6.3 Module Types

Every art module must be one of the following five types, which are defined by the ways in which they interact with each event and with the event loop:

analyzer module(γ)
May inspect information found in the event but may not add new information to the event. .
producer module(γ)
May inspect information found in the event and may add new information to the event.
filter module(γ)
Same functions as a producer module but may also tell art to skip the processing of some, or all, modules for the current event; may also control which events are written to which output.
source module(γ)
Reads events, one at a time, from some source; art requires that every PICT PICT art job contain exactly one source module. A source is often a disk file but other options exist and will be described in the Workbook and Users Guide.
output module(γ)
Reads selected data products from memory and writes them to an output destination; an art job may contain zero or more output modules. An ouptut destination is often a disk file but other options exist and will be described in the Users’ Guide. .

Note that no module may change information that is already present in an event. PIC

What does an analyzer do if it may neither alter information in an event nor add to it? Typically it creates printout and it creates ROOT files containing histograms, trees(γ) and nuples(γ) that can be used for downstream analysis. (If you have not yet encountered these terms, the Workbook will provide explanations as they are introduced.)

Most novice users will only write analyzer modules and filter modules; readers with a little more experience may also write producer modules. The Workbook will provide examples of all three. Few people other than art experts and each experiment’s software experts will write source or output modules, however, the Workbook will teach you what you need to know about configuring source and output modules.

3.6.4 art Data Products

This section introduces more ideas and terms dealing with event information that you will need as you progress through the Workbook.

The term data product(γ) is used in art to mean the unit of information that user code may add to an event or retrieve from an event. Data products are created in a number of ways.

  1. The DAQ system will package the raw data into data products, perhaps one or two data products for each major subsystem.
  2. Each module in the reconstruction chain will create one or more data products. PICT PICT
  3. Some modules in the analysis chain will produce data products; others may just make histograms and write information in non-art formats for analysis outside of art; they may, for example, write user-defined ROOT TTrees.
  4. The simulation chain will usually create many data products. Some will be simulated event-data while others will describe the true properties of the simulated event. These data products can be used to study the response of the detector to simulated events; they can also be used to develop, debug and characterize the reconstruction algorithms.

Because these data products are intrinsically experiment-dependent, each experiment defines its own data products. In the Workbook, you will learn about a set of data products designed for use with the toy experiment. There are a small number of data products that are defined by art and that hold bookkeeping information; these will be described as you encounter them in the Workbook.

A data product is just a C++ type(γ) (a class, struct(γ) or typedef) that obeys a set of rules defined by art; these rules are very PIC different than the rules that must be followed for a class to be a module; when the sections that describe these rules in detail have been prepared, we will add references here. A data product can be a single integer, a large complex class hierarchy, or anything in between.

Add the missing references alluded to in the previous para.

Very often, a data product is a collection(γ) of some experiment-defined type. The C++ standard libraries define many sorts of collection types; art supports many of these and also provides a custom collection type named cet::map_vector . Workbook exercises will clarify the data product and collection type concepts.

3.6.5 art Services

Previous sections of this Introduction have introduced the concept of C++ classes that have to obey a certain set of rules defined by art, in particular, modules in Section 3.6.2 and data products in Section 3.6.4. art services(γ) are yet other examples of this. PICT PICT

In a typical art job, two sorts of information need to be shared among the modules. The first sort is stored in the data products themselves and is passed from module to module via the event. The second sort is not associated with each event, but rather is valid for some aggregation of events, subRuns or runs, or over some other time interval. Three examples of this second sort include the geometry specification, the conditions information4 and, for simulations, the table of particle properties.

To provide managed access to the second sort of information, art supports an idea named art services (again, shortened to services). Services may also be used to provide certain types of utility functions. Again, a service in art is just a C++ class that obeys a set of rules defined by art. The rules for services are different than those for modules or data products.

art implements a number of services that it uses for internal functions, a few of which you will encounter in the first couple of Workbook exercises. The message service(γ) is used by both art and experiment-specific code to limit printout of messages with a low severity level and to route messages to appropriate destinations. It can be configured to provide summary information at the end of the art job. The TFileService(γ) and the RandomNumberGenerator service are not used internally by art, but are used by most experiments. Experiments may also create and implement their own services.

After art completes its initialization phase and before it constructs any modules (see Section 3.6.2), it

  1. reads the configuration to learn what services are requested, and
  2. calls the constructor of each requested service.

Once a service has been constructed, any code in any module can ask art for a smart pointer(γ) PICT PICT to that service and use the features provided by that service. Because services are constructed before modules, they are available for use by modules over the full life cycle of each module.

It is also legal for one service to request information from another service as long as the dependency chain does not have any loops. That is, if Service A uses Service B, then Service B may not use Service A, either directly or indirectly.

For those of you familiar with the C++ Singleton Design Pattern, an art service has some differences and some similarities to a Singleton. The most important difference is that the lifetime of a service is managed by art, which calls the constructors of all services at a well-defined time in a well-defined order. PIC Contrast this with the behavior of Singletons, for which the order of initialization is undefined by the C++ standard and which is an accident of the implementation details of the loader. art also includes services under the umbrella of its powerful run-time configuration system; in the Singleton Design pattern this issue is simply not addressed.

3.6.6 Dynamic Libraries and art

When code is executed within the art framework, art, not the experiment, provides the main executable. The experiment provides its code to the art executable in the form of dynamic libraries that art loads at run time; these libraries are also called dynamic load libraries, shareable object libraries, or plugins. On Linux, their filenames typically end in .so; on OS X, the suffixes .dylib and .so are both used. PICT PICT

3.6.7 Build Systems and art

To make an experiment’s code available to art, the source code must be compiled and linked (i.e., built) to produce dynamic libraries (Section 3.6.6). The tool that creates the dynamic libraries from the C++ source files is called a build system(γ).

Experiments that use art are free to choose their own build systems, as long as the system follows the conventions that allow art to find the name of the .so file given the name of the module class, as discussed in Section ??. The Workbook will use a build system named cetbuildtools, which is a layer on top of cmake5 .

The cetbuildtools system defines three standard compiler optimization levels, called “debug”, “profile” and “optimized”; the last two are often abbreviated “prof” and “opt”. When code is compiled with the “opt” option, it runs as quickly as possible but is difficult to debug. When code is compiled with the “debug” option, it is much easier to debug but it runs more slowly. When code is compiled with the “prof” option the speed is almost as fast as for an “opt” build and the most useful subset of the debugging information is retained. The “prof” build retains enough debugging information that one may use a profiling tool to identify in which functions the program spends most of its time; hence its name “profile”. The “prof” build provides enough information to get a useful traceback from a core dump. Most experiments using art use the “prof” build for production and the “debug” build for development.

The compiler options corresponding to the three levels are listed in Table 3.1. PIC



Name flags

debug -O0 -g
prof -O3 -g -fno-omit-frame-pointer -DNDEBUG
opt -O3 -DNDEBUG


3.6.8 External Products

As you progress through the Workbook, you will see that the exercises use some software packages that are part of neither art nor the toy experiment’s code. The Workbook code, art and the software for your experiment all rely heavily on some external tools and, in order to be an effective user of art-based HEP software, you will need at least some familiarity with them; you may, in fact, need to become expert in some.

These packages and tools are referred to as external products(γ) (sometimes called simply products).

An initial list of the external products you will need to become familiar with includes:

the event processing framework
the run-time configuration language used by art
a utility library used by art
a message facility that is used by art and by (some) experiments that use art
an analysis, data presentation and data storage tool widely used in HEP
a set of utility classes; the name is an acronym for Class Library for HEP
a class library with new functionality that is being prototyped for inclusion in future C++ standards PICT PICT
the GNU C++ compiler and run-time libraries; both the core language and the standard library are used by art and by your experiment’s code.
a source code management system that is used for the Workbook and by some experiments; similar in concept to the older CVS and SVN, but with enhanced functionality
the build system that is used by the art Workbook (and by art itself).
a Fermilab-developed system for accessing software products; it is an acronym for Unix Product Support.
a Fermilab-developed system for distributing software products; it is an acronym for Unix Product Distribution.
tools for submitting jobs to the Fermigrid batch system and monitoring them.
allows art to use SAM(γ) as an external run-time agent that can deliver remote files to local disk space and can copy output files to tape. SAM is a Fermilab-supplied resource that provides the functions of a file catalog, a replica manager and some functions of a batch-oriented workflow manager

Any particular line of code in a Workbook exercise may use elements from, say, four or five of these packages. Knowing how to parse a line and identify which feature comes from which package is a critical skill. The Workbook will provide a tour of the above packages so that you will recognize elements when they are used and you will learn where to find the necessary documentation.

For the art Workbook, external products are made available to your code via a mechanism called UPS, which will be described in Section 7. Many Fermilab experiments also use UPS to manage their external products; this is not required by art and you may choose to manage external PICT PICT products whichever way you prefer. UPS is, itself, just another external product. From the point of view of your experiment, art is an external product. From the point of view of the Workbook code, both art and the code for the toy experiment are external products.

Finally, it is important to recognize an overloaded word, products. When a line of documentation simply says products, it may be refering either to data products or to external products. PIC If it is not clear from the context which is meant, please let us know (see Section 3.4).

3.6.9 The Event-Data Model and Persistency

Section 3.6.4 introduced the idea of art data products. In a small experiment, a fully reconstructed event may contain on the order of ten data products; in a large experiment there may be hundreds.

While each experiment will define its own data product classes, there is a common set of questions that art users on any experiment need to consider:

  1. How does my module access data products that are already in the event?
  2. How does my module publish a data product so that other modules can see it?
  3. How is a data product represented in the memory of a running program?
  4. How does an object in one data product refer to an object in another data product?
  5. What metadata is there to describe each data product? (Such metadata might include: the module that created it; the run-time configuration of that module; the data products read by that module; the code version of the module that created it.)
  7. How does my module access the metadata associated with a particular data product?

The answers to these questions form what is called the Event-Data Model(γ) (EDM) that is supported by the framework.

A question that is closely related to the EDM is: what technologies are supported to write data products from memory to a disk file and to read them from the disk file back into memory in a separate art job? A framework may support several such technologies. art currently supports only one disk file format, a ROOT-based format, but the art EDM has been designed so that it will be straightforward to support other disk file formats as it becomes useful to do so.

A few other related terms that you will encounter include:

  1. transient representation: the in-memory representation of a data product
  2. persistent representation: the on-disk representation of a data product
  3. persistency: the technology to convert data products back and forth between their persistent and transient representations

3.6.10 Event-Data Files

When you read data from an experiment and write the data to a disk file, that disk file is usually called a data file.

When you simulate an experiment and write a disk file that holds the information produced by the simulation, what should you call the file? The Particle Data Group has recommended that this not be called a “data file” or a “simulated data file;” they prefer that the word “data” be strictly reserved for information that comes from an actual experiment. They recommend that we refer to these files as “files of simulated events” or “files of Monte Carlo events” PICT PICT 6. Note the use of “events,” not “data.”

This leaves us with a need for a collective noun to describe both data files and files of simulated events. The name in current use is event-data files(γ); yes this does contain the word “data” but the hyphenated word, “event-data”, is unambiguous and this has become the standard name.

3.6.11 Files on Tape

Many experiments do not have access to enough disk space to hold all of their event-data files, ROOT files and log files. The solution is to copy a subset of the disk files to tape and to read them back from tape as necessary.

At any given time, a snapshot of an experiment’s files will show some on tape only, some on tape with copies on disk, and some on disk only. For any given file, there may be multiple copies on disk and those copies may be distributed across many sites(γ), some at Fermilab and others at collaborating laboratories or universities.

Conceptually, two pieces of software are used to keep track of which files are where, a File Catalog and a Replica Manager. One software package that fills both of these roles is called SAM, which is an acronym for “Sequential data Access via Metadata.” SAM also provides some tools for Workflow management. SAM is in wide use at Fermilab and you can learn more about SAM at:
https://cdcvs.fnal.gov/redmine/projects/sam-main/wiki. PICT PICT

3.7 The Toy Experiment

The Workbook exercises are based around a made-up (toy) experiment. The code for the toy experiment is deployed as a UPS product named toyExperiment. The rest of this section will describe the physics content of toyExperiment; the discussion of the code in the toyExperiment UPS product will unfold in the Workbook, in parallel to the exposition of art.

The software for the toy experiment is designed around a toy detector, which is shown in Figure 3.3. The toyExperiment code contains many C++ classes: some modules, some data products, some services and some plain old C++ classes. About half of the modules are producers that individually perform either one step of the simulation process or one step of the reconstruction/analysis process. The other modules are analyzers that make histograms and ntuples of the information produced by the producers. There are also event display modules.

3.7.1 Toy Detector Description



The toy detector is a central detector made up of 15 concentric shells, with their axes centered on the z axis; the left-hand part of Figure 3.3 shows an xy view of these shells and the right shows the radius vs z view. The inner five shells are closely spaced radially and are short in z; the ten outer shells are more widely spaced radially and are longer in z. The detector sits in a uniform magnetic field of 1.5 T oriented in the +z direction. The origin of the coordinate system is at the center of the detector. The detector is placed in a vacuum.

Each shell is a detector that measures (φ,z), where φ is the azimuthal angle of a line from the origin to the measurement point. Each measurement has perfectly gaussian measurement errors and the detector always has perfect separation of hits that are near to each other. The geometry of each shell, its efficiency and resolution are all configurable at run-time.

All of the code in the toyExperiment product works in the set of units described in Table 3.2. Because the code in the Workbook is built on toyExperiment, it uses the same units. art itself is not unit-aware and places no constraints on which units your experiment may use.

The first six units listed in Table 3.2 are the base units defined by the CLHEP SystemOfUnits package. These are also the units used by Geant4. PIC



Quantity Unit

Length mm
Energy MeV
Time ns
Plane Angle radian
Solid Angle steradian
Electric Charge Charge of the proton = +1
Magnetic Field Tesla


3.7.2 Workflow for Running the Toy Experiment Code

The workflow of the toy experiment code includes five steps: three simulation steps, a reconstruction step and an analysis step:

  1. event generation
  2. detector simulation
  3. hit-making
  4. track reconstruction
  5. analysis of the mass resolution

For each event, the event generator creates some signal particles and some background particles. The first signal particle is generated with the following properties:

  • Its mass is the rest mass of the ϕ meson; the event generator does not simulate a natural width for this particle.
  • It is produced at the origin.
  • It has a momentum that is chosen randomly from a distribution that is uniform between 0 and 2000 MeV∕c.
  • Its direction is chosen randomly on the unit sphere.

The event generator then decays this particle to K+K-; the center-of-mass decay angles are chosen randomly on the unit sphere.

The background particles are generated by the following algorithm:

  • Background particles are generated in pairs, one π+ and one π-.
  • The number of pairs in each event is a random variate chosen from a Poisson distribution with a mean of 0.75.
  • Each of the pions is generated as follows:
    • It is produced at the origin.
    • It has a momentum that is chosen randomly from a distribution that is uniform between 0 and 800 MeV∕c.
    • Its direction is chosen randomly on the unit sphere.

The above algorithm generates events with a total charge of zero but there is no concept of momentum or energy balance. About 47% of these events will not have any background tracks.

In the detector simulation step, particles neither scatter nor lose energy when they pass through the detector cylinders; nor do they decay. Therefore, the charged particles follow a perfectly helical trajectory. The simulation follows each charged particle until it either exits the detector or until it completes the outward-going arc of the helix. When the simulated trajectory crosses one of the detector shells, the simulation records the true point of intersection. All intersections are recorded; at this stage in the simulation, there is no notion of inefficiency or resolution. The simulation does not follow the trajectory of the ϕ meson because it was decayed in the generator. PICT PICT

Figure 3.4 shows an event display of a simulated event that has no background tracks. In this event the ϕ meson was travelling close to 90 to the z axis and it decayed nearly symmetrically; both tracks intersect all 15 detector cylinders. The left-hand figure shows an xy view of the event; the solid lines show the trajectory of the kaons, red for K+ and blue for K-; the solid dots mark the intersections of the trajectories with the detector shells. The right-hand figure shows the same event but in an rz view.



Figure 3.5 shows an event display of another simulated event, one that has four background tracks, all drawn in green. In the xy view it is difficult to see the two π- tracks, which have very low transverse momentum, but they are clear in the rz view. Look at the K+ track, draw in red; its trajectory just stops in the middle of the detector. Why does this happen? In order to keep the exercises focused on art details, not geometric corner cases, the simulation stops a particle when it completes the outward-going arc of the helix and starts to curl back towards the z axis; it does this even if the the particle is still inside the detector.



The third step in the simulation chain (hit-making) is to inspect the intersections produced by the detector simulation and turn them into data-like hits. In this step, a simple model of inefficiency is applied and some intersections will not produce hits. Each hit represents a 2D measurement (φ,z); each component is smeared with a gaussian distribution.

The three simulation steps use tools provided by art to record the truth information(γ) about each hit. Therefore it is possible to navigate from any hit back to the intersection from which it is derived, and from there back to the particle that made the intersection.

The fourth step is the reconstruction step. The toyExperiment does not yet have properly working reconstruction code; instead it mocks up credible looking results. The output of this code is a data product that represents a fitted helix; it contains the fitted track parameters of the helix, their covariance matrix and collection of smart pointers that point to the hits that are on the reconstructed track. When we write proper tracking finding and track fitting code for the toyExperiment, the classes that describe the fitted helix will not change. Because the main point of the Workbook exercises is to illustrate the bookkeeping features in art, this is good enough for the task at hand. The mocked-up reconstruction code will only create a fitted helix object if the number of hits on a track is greater than some minimum value. Therefore there may be some events in which the output data product is be empty.



The fifth step in the workflow does a simulated analysis using the fitted helices from the reconstruction step. It forms all distinct pairs of tracks and requires that they be oppositely charged. It then computes the invariant mass of the pair, under the assumption that both fitted helices are kaons.7 This module is an analyzer module and does not make any output data product. But it does make some histograms, one of which is a histogram of the reconstructed invariant mass of all pairs of oppositely charged tracks; this histogram is shown in Figure 3.6. When you run the Workbook exercises, you will make this plot and can compare it to Figure 3.6. In the figure you can see a clear peak that is created when the two reconstructed tracks are the two true daughters of the generated φ meson. You can also see an almost flat contribution that occurs when at least one of the reconstructed tracks comes from one of the generated background particles.

3.8 Rules, Best Practices, Conventions and Style

In many places, the Workbook will recommend that you write fragments of code in a particular way. The reason for any particular recommendation may be one of the following:

  • It is a hard rule enforced by the C++ language or by one of the external products.
  • It is a recommended best practice that might not save you time or effort now but will PICT PICT in the long run.
  • It is a convention that is widely adopted; C++ is a rich enough language that it will let you do some things in many different ways. Code is much easier to understand and debug if an experiment chooses to always write code fragments with similar intent using a common set of conventions.
  • It is simply a question of style.

It is important to be able to distinguish between rules, best practices, conventions and styles; you must follow the rules; it wise to use best practices and established conventions; but style suggestions are just that, suggestions. This documentation will distinguish among these options when discussing the recommendations that it makes.

If you follow the recommendations for best practices and common conventions, it will be easier to verify that your code is correct and your code will be easier to understand, develop and maintain. PICT PICT

Chapter 4
Unix Prerequisites

4.1 Introduction

You will work through the Workbook exercises on a computer that is running some version of the Unix operating system. This chapter describes where to find information about Unix and gives a list of Unix commands that you should understand before starting the Workbook exercises. This chapter also describes a few ideas that you will need immediately but which are usually not covered in the early chapters of standard Unix references.

If you are already familiar with Unix and the bash(γ) shell, you can safely skip this chapter.

4.2 Commands

In the Workbook exercises, most of the commands you will enter at the Unix prompt will be standard Unix commands, but some will be defined by the software tools that are used to support the Workbook. The non-standard commands will be explained as they are encountered. To understand the standard Unix commands, any standard Linux or Unix reference will do. Section 4.10 provides links to Unix references.

Most Unix commands are documented via the man page system (short for “manual”). To get help on a particular command, type the following at the command prompt, replacing command-name with the actual name of the command:

man command-name

In Unix, everything is case sensitive; so the command man must be typed in lower case. You can also try the following; it works on some commands and not others:

command-name --help


command-name -?

Before starting the Workbook, make sure that you understand the basic usage of the following Unix commands:

   cat, cd, cp, echo, export, gzip, head, less, ln -s, ls,

   mkdir, more, mv, printenv, pwd, rm, rmdir, tail, tar

You also need to be familiar with the following Unix concepts:

  • filename vs pathname
  • absolute path vs relative path
  • directories and subdirectories (equivalent to folders in the Windows and Mac worlds)
  • current working directory
  • home directory (aka login directory)
  • ../ notation for viewing the directory above your current working directory
  • environment variables (discussed briefly in Section 4.5)
  • paths(γ) (in multiple senses; see Section 4.6)
  • file protections (read-write-execute, owner-group-other)
  • symbolic links
  • stdin, stdout and stderr
  • redirecting stdin, stdout and stderr
  • putting a command in the background via the & character
  • pipes

4.3 Shells

When you type a command at the prompt, a command-line interpreter called a Unix shell, or simply a shell, reads your command and figures out what to do. Most versions of Unix support a variety of different shells, e.g., bash or csh. The art Workbook code expects to be run in the bash shell. You can see which shell you’re running by entering:

echo $SHELL

For those of you with accounts on a Fermilab machine, your login shell was initially set to the bash PIC shell1 .

If you are working on a non-Fermilab machine and bash is not your default shell, consult a local expert to PIC learn how to change your login shell to bash.

Some commands are executed internally by the shell but other commands are dispatched to an appropriate program or script, and launch a child shell (of the same variety) called a subshell. PICT PICT

4.4 Scripts: Part 1

In order to automate repeated operations, you may write multiple Unix commands into a file and tell bash to run all of the commands in the file as if you had typed them sequentially. Such a file is an example of a shell script or a bash script. The bash scripting language is a powerful language that supports looping, conditional execution, tests to learn about properties of files and many other features.

Throughout the Workbook exercises you will run many scripts. You should understand the big picture of what they do, but you don’t need to understand PIC the details of how they work.

If you would like to learn more about bash, some references are listed in Section 4.10.

4.5 Unix Environments

4.5.1 Building up the Environment

Very generally, a Unix environment is a set of information that is made available to programs so that they can find everything they need in order to run properly. The Unix operating system itself defines a generic environment, but often this is insufficient for everyday use. However, an environment sufficient to run a particular set of applications doesn’t just pop out of the ether, it must be established or set up, either manually or via a script. Typically, on institutional machines at least, system administrators provide a set of login scripts that run automatically and enhance the generic Unix environment. This gives users access to a variety of system resources, including, for example:

  • disk space to which you have read access
  • disk space to which you have write access
  • commands, scripts and programs that you are authorized to run
  • proxies and tickets that authorize you to use resources available over the network
  • the actual network resources that you are authorized to use, e.g., tape drives and DVD drives

This constitutes a basic working environment or computing environment. Environment information is largely conveyed by means of environment variables that point to various program executable locations, data files, and so on. A simple example of an environment variable is HOME, the variable whose value is the absolute path to your home directory. Environment variables are inherited by subshells, which is a child process launched by a shell or a shell script.

Particular programs (e.g., art) usually require extra information, e.g., paths to the program’s executable(s) and to its dependent programs, paths indicating where it can find input files and where to direct its output, and so on. In addition to environment variables, the art-enabled computing environment includes some aliases and bash functions that have been defined; these are discussed in Section 4.8.

In turn, the Workbook code, which must work for all experiments and at Fermilab as well as at collaborating institutions, requires yet more environment configuration – a site-specific configuration.

Given the different experiments using art and the variety of laboratories and universities at which the users work, a site(γ) in art is a PIC unique combination of experiment and institution. It is used to refer to a set of computing resources configured for use by a particular experiment at a particular institution. Setting up your site-specific environment will be discussed in Section 4.7.

When you finish the Workbook and start to run real code, you will set up your experiment-specific environment on top of the more generic art-enabled environment, in place of the Workbook’s. To switch between these two environments, you will log out and log back in, then run PICT PICT the script appropriate for the environment you want. Because of potential naming “collisions,” it is not guaranteed that these two environments can be overlain and always work properly.

This concept of the environment hierarchy is illustrated in Figure 4.1.



4.5.2 Examining and Using Environment Variables

One way to see the value of an environment variable is to use the printenv command:

printenv HOME

At any point in an interactive command or in a shell script, you can tell the shell that you want the value of the environment variable by prefixing its name with the $ character:

echo $HOME

Here, echo is a standard Unix command that copies its arguments to its output, in this case the screen.

By convention, environment variables are virtually always written in all capital letters2 .

There may be times when the Workbook instructions tell you to set an environment variable to some value. To do so, type the following at the command prompt:

export ENVNAME=value

If you read bash scripts written by others, you may see the following variant, which accomplishes the same thing:

export ENVNAME

4.6 Paths and $PATH

Path (or PATH) is an overloaded word in computing. Here are the ways in which it is used:

Lowercase path
can refer to the location of a file or a directory; a path may be absolute or relative, e.g.
/absolute/path/to/mydir/myfile or
relative/path/to/mydir/myfile or
refers to the standard Unix environment variable set by your login scripts and updated by other scripts that extend your environment; it is a colon-separated list of directory names, e.g.,
It contains the list of directories that the shell searches to find programs/files required by Unix shell commands (i.e., PATH is used by the shell to “resolve” commands).
generically, refers to any environment variable whose value is a colon-separated list of directory names e.g.,

In addition, art defines a fourth idea, also called a path, that is unrelated to any of the above; it will be described as you encounter it in the Workbook, e.g., Section 9.8.8.

All of these path concepts are important to users of art. In addition to PATH itself, there are three PATH-like environment variables (colon-separated list of directory names) that are particularly important:

(Linux only) used by art to resolve dynamic libraries PICT PICT
(OS X only) used by art to resolve dynamic libraries
used by UPS to resolve external products
use by FHiCL to resolve #include directives.

When you source the scripts that setup your environment for art, these will be defined and additional colon-separated elements will be added to your PATH. To look at the value of PATH (or the others), enter:

printenv PATH

To make the output easier to read by replacing all of the colons with newline characters, enter:

printenv PATH | tr : \\n

In the above line, the vertical bar is referred to as a pipe and tr is a standard Unix command. A pipe takes the output of the command to its left and makes that the input of the command to its right. The tr command replaces patterns of characters with other patterns of characters; in this case it replaces every occurrence of the colon character with the newline character. To learn why a double back slash is needed, read bash documentation to learn about escaping special characters.

4.7 Scripts: Part 2

There are two ways to run a bash script (actually three, but two of them are the same). Suppose that you are given a bash script named file.sh. You can run any of these commands:

source file.sh
. file.sh

The first version, file.sh, starts a new bash shell, called a subshell, and it executes the commands from file.sh in that subshell; upon completion of the script, control returns to the parent shell. At the startup of a subshell, the environment of that subshell is initialized to be a copy of the environment of its parent shell. If file.sh modifies its environment, then it will modify only the environment of the subshell, leaving the environment of the parent shell unchanged. This version is called executing the script.

The second and third versions are equivalent. They do not start a subshell; they execute the commands from file.sh in your current shell. If file.sh modifies any environment variables, then those modifications remain in effect when the script completes and control returns to the command prompt. This is called sourcing the script.

Some shell scripts are designed so that they must be sourced and others are designed so that they must be executed. Many shell scripts will work either way.

If the purpose of a shell script is to modify your working environment then it must be sourced, not executed. As you work through the Workbook exercises, pay careful attention to which scripts it tells you to source PIC and which to execute. In particular, the scripts to setup your environment (the first scripts you will run) are bash scripts that must be sourced because their purpose is to configure your environment so that it is ready to run the Workbook exercises.

Some people adopt the convention that all bash scripts end in .sh; others adopt the convention that only scripts designed to be sourced end in .sh while scripts that must be executed have no file-type ending (no “.something” at the end). Neither convention is uniformly applied either in the Workbook or in HEP in general.

If you would like to learn more about bash, some references are listed in Section 4.10.

4.8 bash Functions and Aliases

The bash shell also has the notion of a bash function. Typically bash functions are defined by sourcing a bash script; once defined, they become part of your environment and they can be invoked as if they were regular commands. The setup product “command” PICT PICT that you will sometimes need to issue, described in Chapter 7, is an example. A bash function is similar to a bash script in that it is just a collection of bash commands that are accessible via a name; the difference is that bash holds the definition of a function as part of the environment while it must open a file every time that a bash script is invoked.

You can see the names of all defined bash functions using:

declare -F

The bash shell also supports the idea of aliases; this allows you to define a new command in terms of other commands. You can see the definition of all aliases using:


You can read more about bash shell functions and aliases in any standard bash reference.

When you type a command at the command prompt, bash will resolve the command using the following order:

  1. Is the command a known alias?
  2. Is the command a bash keyword, such as if or declare?
  3. Is the command a shell function?
  4. Is the command a shell built-in command?
  5. Is the command found in $PATH?

To learn how bash will resolve a particular command, enter:

type command-name

4.9 Login Scripts

When you first login to a computer running the Unix operating system, the system will look for specially named files in your home directory that are scripts to set up your working environment; if it finds these files it will source them before you first get a shell prompt. As mentioned in Section 4.5, these scripts modify your PATH and define bash functions, aliases and environment variables. All of these become part of your environment.

When your account on a Fermilab computer was first created, you were given standard versions of the files .profile and .bashrc; these files are used by bash3 . PIC You can read about login scripts in any standard bash reference. You may add to these files but you should not remove anything that is present.

If you are working on a non-Fermilab computer, inspect the login scripts PIC to understand what they do.

It can be useful to inspect the login scripts of your colleagues to find useful customizations.

If you read generic Unix documentation, you will see that there are other login scripts with names like, .login, .cshrc and .tcshrc. These are used by the csh family of shells and are not relevant for the Workbook exercises, which require the bash shell.

4.10 Suggested Unix and bash References

The following cheat sheet provides some of the basics:


A more comprehensive summary is available from:

Information about writing bash scripts and using bash interactive features can be found in:

The first of these is a compact introduction and the second is more comprehensive.

The above guides were all found at the Linux Documentation Project: Workbook:

Books about Unix are numerous, of course. Examples include Mark Sobell’s A practical guide to the UNIX system and Graham Glass’ UNIX for programmers and users: a complete guide, both of which are in the Fermilab library along with many others (http://ccd.fnal.gov/library/). PICT PICT

Chapter 5
Site-Specific Setup Procedure

Section 4.5 discussed the notion of a working environment on a computer. This chapter answers the question: How do I make sure that my environment is configured so that I can run the Workbook exercises or my experiment’s code?

This chapter will explain how to do this in several different situations:

  1. If you are logged in to one of your experiment’s computers.
  2. If you are logged in to one of the machines supported for the August 2015 art/LArSoft course; at this writing there are two machines named alcourse.fnal.gov and alcourse2.fnal.gov; more may be added.
  3. If you install art and its tool chain to your own computer.

On every computer that hosts the Workbook, a procedure must be established that every user is expected to follow once per login session. In most cases (NOνA being a notable exception), the procedure involves only sourcing a shell script (recall the discussion in Section 4.7). In this documentation, we refer to this procedure as the “site-specific setup procedure.” It is the responsibility of the people who maintain the Workbook software for each site(γ) to ensure that this procedure does the right thing on all the site’s machines.

As a user of the Workbook, you will need to know what the procedure is and you PIC must remember to follow it each time that you log in.

For all of the Intensity Frontier experiments at Fermilab, the site-specific setup procedure defines all of the environment variables that are necessary to create the working environment for either the Workbook PIC exercises or for the experiment’s own code.

Table 5.1 lists the site-specific setup procedure for each experiment. You will follow the procedure when you get to Section 9.6.



Experiment Site-Specific Setup Procedure

ArgoNeut See the instructions for MicroBoone

Darkside source /ds50/app/ds50/ds50.sh

LArIAT Will be available in a future release of the workbook

DUNE source /grid/fermiapp/lbne/software/setup_lbne.sh

MicroBoone source /cvmfs/uboone.opensciencegrid.org/products/setup_uboone.sh
also, the following will work on the Fermilab site
source /grid/fermiapp/products/uboone/setup_uboone.sh

Muon g-2 source source /grid/fermiapp/gm2/setup

Mu2e setup mu2e

NOνA See Listing 5.1

art LArSoft Course source /products/course_setup.sh

Private machine See Appendix B


NOνA users should check that their login scripts do not setup any of the UPS products related to art. Remove any lines that do; then log out and log in again. In particular, make PIC sure that nothing in your login scripts, either directly or indirectly, executes the following line:

1source /grid/fermiapp/nova/novaart/novasvn/srt/srt.sh

Once you have a clean login, follow the procedure given in Listing 5.1. PIC

Listing 5.1: NOvA setup procedure
1source /nusoft/app/externals/setups 
2export PRODUCTS=$PRODUCTS:/grid/fermiapp/products/common/db 
3export ART_WORKBOOK_WORKING_BASE=/nova/app/users 
4export ART_WORKBOOK_QUAL=s12:e7:nu 
5export ART_WORKBOOK_OUTPUT_BASE=/nova/app/users

Chapter 6
Get your C++ up to Speed

6.1 Introduction

This change is for topic one.

There are two goals for this chapter. The first is to provide an overview of the features of C++ that will be important for users of art, especially those features that will be used in the Workbook exercises. It does not attempt to cover C++ comprehensively.

You will need to consult standard documentation to learn about any of the features that you are not already familiar with. The examples and exercises in this chapter will in many cases only skim PIC the surface of C++ features that you will need to know how to manipulate as you work through the Workbook exercises and then use C++ code with art in your work.

The second goal is to explain the process of turning source code files into an executable program. The two steps in this process are compiling and linking. In informal writing, the word build is sometimes used to mean just compiling or just linking, but usually it refers to the two together.

This chapter is designed around a handful of exercises, each of which you will first build and run, then “pick apart” to understand how the results were obtained.

6.2 File Types Used and Generated in C++ Programming

A typical program consists of many source code files, each of which contains a human-readable description of one or more components of the program. In the Workbook, you will see source code files written in the C++ computer language; these files have names that end in .cc.1 In C++, there is a second sort of source code file, called a header file. These typically have names that end PICT PICT in .h2 ; in most cases, but not all, the source file has an associated header file with the same base name but with the different suffix. A header file can be thought of as the “parts list” for its corresponding source file; you will see how these are used in Section 6.5.

In the compilation step each source file is translated into machine code, also called binary code or object code, which is a set of instructions, in the computer’s native language, to do the tasks described by the source code. The output of the compilation step is called an object file; in the examples you will see in the Workbook, object files always end in .o.3 But an object file, by itself, is not an executable program. It is not executable primarily because it lacks the instructions that tell the operating system how to start executing the instructions in the file.

It is often convenient to collect related groups of object files and put them into libraries. There are two kinds of library files, static libraries and dynamic libraries. Static libraries are not used by art, and we do not discuss them further; when this document refers to a library, it means a dynamic library. Putting many object files into a single library allows you to use them as a single coherent entity. We will defer further discussion of libraries until more background information has been provided.

The job of the linking step is to read the information found in the various libraries and object files and form them into either a dynamic library or an executable program. When you run the linker, you tell it the name of the file it is to create. It is a common, but not universal, practice that the filename of an executable program has no extension (i.e., no .something at the end). Dynamic libraries on Linux typically have the extension .so, and on OS X they typically have the extension .dylib. PICT PICT

After the linker has finished, you can run your executable program typing the filename of the program at the bash command prompt. If you do not have the current directory on the PATH, you need to preface the filename of the program with ./ (a dot followed by a forward slash). At this time, the loader does the final steps of linking the program, to allow the use of instructions and data in the dynamic libraries to which your program is linked.

A typical program links both to libraries that were built from the program’s source code and to libraries from other sources. Some of these other libraries might have been developed by the same programmer as general purpose tools to be used by his or her future programs; other libraries are provided by third parties, such as art or your experiment. Many C++ language features are made available to your program by telling the linker to use libraries provided by the C++ compiler vendor. Other libraries are provided by the operating system.

Now that you know about libraries, we can give a second reason why an object file, by itself, is not an executable program: until it is linked, it does not have access to the functions provided by any of the external libraries. Even the simplest program will need to be linked against some of the libraries supplied by the compiler vendor and by the operating system.

The names of all of the libraries and object files that you give to the linker is called the link list.

6.3 Establishing the Environment

6.3.1 Initial Setup

To start these exercises for the first time, do the following:


SVG-Viewer needed.

After these steps, you are ready to begin the exercise in Section 6.4.

6.3.2 Subsequent Logins

If you log out and log back in again, reestablish your environment by following these steps:

SVG-Viewer needed.


6.4 C++ Exercise 1: Basic C++ Syntax and Building an Executable

6.4.1 Concepts to Understand

This section provides a program that illustrates the concepts in C++ that are assumed knowledge for the Workbook material. Brief explanations are provided, but in many cases you will need to consult other sources to gain the level of understanding that you will need. Several C++ references are listed in Section 6.9.

This sample program will introduce you to the following C++ concepts and features:

  • how to indicate comments
  • what is a main program
  • how to compile, link and run the main program
  • how to distinguish between source, object, library, and executable files
  • how to print to standard output, std::cout
  • what is a type
  • how to declare and define variables(γ) of some of the frequently used built-in types: int, float, double, bool
  • the {} initializer syntax (in addition to other forms)
  • assignment of values to variables
  • what are arrays, and how to declare and define them
  • several forms of looping
  • comparisons: ==, !=, <, >, >=, <=
  • if and if-else
  • what are pointers, and how to declare and define them
  • what are references, and how to declare and define them
  • std::string (a type from the C++ Standard Library (std(γ))
  • what is the class template from the standard library, std::vector<T>4

The above list explicitly does not include classes, objects and inheritance, which will be discussed in Sections 6.7 and a future section on inheritance.

6.4.2 How to Compile, Link and Run

In this section you will learn how to compile, link and run the small C++ program that illustrates the features of C++ that are considered prerequisites to the Workbook exercises.

Run the following procedure. The idea here is for you to get used to the steps and see what results you get. Then in Section 6.4.3 you will examine the source file and output. PICT PICT

To compile, link and run the sample C++ program, called t1:

SVG-Viewer needed.

Just to see how the exercise was built, look at the script BasicSyntax/v1/build that you ran to compile and link t1.cc; the following command was issued:

c++ -Wall -Wextra -pedantic -std=c++11 -o t1 t1.cc

This turned the source file t1.cc into an executable program, named t1 (the argument to the -o (for “output”) option). We will discuss compiling and linking in Section 6.5.

6.4.3 Discussion

Look at the file t1.cc, in particular the relationship between the lines in the program and the lines in the output, and see how much you understand. Remember, you will need to consult standard documentation to learn about any of the features that you are not already familiar with; PIC some are listed in Section 6.9. Note that some questions may be answered in Section 6.4.3.

In the source file, it is important to first point out the function called the main program. Every program needs one, and execution of the program takes place within the braces of this function, which is written

int main() { 
     ...executable code... 

Compare your output with the standard example:

diff t1.log t1_example.log

There will almost certainly be a handful of differences, which we will discuss in Section

The following sections correspond to sections of the code in BasicSyntax/v1/t1.cc and provide supplementary information. Primitive types, Initialization and Printing Output

All variables, parameters, arguments, and so on in C++ need to have a type, e.g., int, float, bool, or another so-called primitive (or built-in) type, or a more complicated type defined by a class or structure. The code in this exercise introduces the primitive types.

Now, about the handful of differences in the output of one run versus another. There are two main sources of the differences: (1) an uninitialized variable and (2) variation in object addresses from run to run.

In t1.cc, the line int k; declares that k is a variable whose type is int but it does not initialize the variable. Therefore the value of the variable k is whatever value happened to be sitting in the memory location that the program assigned to k. Each time that the program runs, the operating system will put the program into whatever region of memory makes sense to the operating system; therefore the address of any variable, and thus the value returned, may change unpredictably from run to run.

This line is also the source of the warning message produced by the build script. This line was included to make it clear what we mean by initialized variables and uninitialized variables. Uninitialized variables are frequent sources of errors in code and therefore you should always initialize your variables. In order to help you PIC establish this good coding habit, the remaining exercises in this series and in the Workbook include the compiler option -Werror. This tells the compiler to promote warning messages to error level and to stop compilation without producing an output file.

See Section for other output that may vary between program runs. PICT PICT Arrays

The next section of the example code introduces arrays, sometimes called C-style arrays to distinguish them from std::array, a class template element of the C++ Standard Library. Classes will be discussed in Section 6.7, and class templates will first be used in Chapter 6.7.

While you might find use of arrays in existing code, we recommend avoiding them in new code arrays, and using either std::vector or std::array. PIC See Section for an introduction to these types. Equality testing

Two variables which refer to different objects that contain the same value (either by design or by coincidence) are equal. Equality is tested using the equality testing operator, ==. It is important to distinguish between the assignment operator (=) and the equality testing operator. Using = where == is intended PIC is a common mistake.

Another distinction to be made is that of two variables being identical versus equal. In contrast to equality, two variables are identical if they refer to the same object, and thus have the same memory address. How can two variables be identical? One common case can be seen in the call to a function like maxSize:

#include <algoritm> 
#include <string> 
std::size_t maxSize(std::string const& a, 
                    std::string const& b) { 
  return std::max(a.size(), b.size()); 

If we consider the call:

std::string s(~cow~); 
auto sz = maxSize(s, s);  

then, in the body of the function maxSize, and for this call, the variables a and b refer to the same object—so they are identical. Conditionals

The primary conditional statements in C++ are if and if-else. There is also the ternary PICT PICT operator, ?:. The ternary operator, which evaluates an expression as true or false then chooses a value based on the result, can replace more cumbersome code with a single line. It is especially useful in places where an expression, not a statement, is needed. It works this way (shown here on three lines to emphasize the syntax):

// Note: this is pseudocode, not C++ 
type variable-to-initialize  (expression-to-evaluate) ? 
                                      value-if-true : 

An example is shown in the code. Some C++ Standard Library Types

The C++ Standard Library is quite large, and contains many classes, functions, class templates, and function templates. Our sample code introduces only three: the class std::string, and the templates std::vector<T> and std::array.

A std::vector<T> behaves much like is an array of objects of some type T (e.g., int or std::string). It has the extra capability that its size can change as needed, unlike a C-style array, whose size is fixed.

The std::array is new in C++, and should be used in preference to the older C-style array discussed in Section due to its greater capabilities. Unlike the C-style array, std::array knows its own size, can be copied, and can be returned from a function. Pointers

A pointer is a variable whose value is the memory address of another object. The type of pointer must correspond to the type of the object to which it points.

In addition to the sources of difference in the program output between runs discussed in Section, another stems from the line:

float* pa = &a;  

This line declares a variable pa and initializes it to be the memory address of the variable a. PICT PICT The variable a is of type float; therefore pa must be declared as type pointer(γ) to float.

Note that this line could have been written with the asterisk next to pa:

float *pa = &a;  

This latter style is common in the C community. In the C++ community, the former style is preferred, because it emphasizes the type of the variable pa, rather than the type of the expression *pa.

Since the address may change from run to run, so may the printout that starts pa =.

The next line,

std::cout << ~*pa = ~ << *pa << std::endl;  

shows how to access the value to which a pointer points. This is called defererencing the pointer, or following the pointer, and is done with the dereferencing operator, *. The expression *pa dereferences the pointer pa, yielding the value of the object to which pa points, in this case, the value of a.

In Section 6.7 you will learn about classes. One attribute of classes is that they have separately addressable parts, called members. Members of a class are selected using syntax like classname.membername. The combination of dereferencing a pointer and selecting a member of the pointed-to object (the pointee) can be done in two steps: first dereferencing then selecting, or in one step using the member selection operator operator->(). The following two expressions are equivalent:


In the example code, the lines

std::cout << ~The size of animal is: ~ 
          << (*panimal).size() << std::endl; 
std::cout << ~The size of animal is: ~ 
          << panimal->size() << std::endl;  

do exactly the same thing. Note that the parentheses in the first line are necessary because the precedence of . is higher than that of *. PICT PICT

Note that in many situations, the compiler is free to convert an array-of-T into a pointer-to-T. In such cases, the value of the pointer-to-T is the address of the initial element in the array. References

A reference is a variable that acts as an alias for another object, and it can not be re-seated to reference a different object. It is not an object itself, and thus a reference does not have an address. The address-of operator operator&, when used on a reference, yields the address of the referent:

float  a; 
float& ra = a; 
float* p = &a; 
float* q = &ra;  

The values of p and q will be the same. Because they print memory addresses determined by the compiler and linker, the lines in the printout that start &a = and &ra = may also change from run to run. Loops

Loops, also called iteration statements, appear in several forms in C++. The most prevalent is the for loop. New in C++11 is the range-based for loop; this is the looping construction that should be preferred for cases to which it applies.

6.5 C++ Exercise 2: About Compiling and Linking

6.5.1 What You Will Learn

In the previous exercise, the user code was found in a single file and the build script performed compiling and linking in a single step. For all but the smallest programs, this is not practical. It would mean, for example, that you would need to recompile and relink everything when you made even the smallest change anywhere in the code; generally this would take much too long. To address this, some computer languages, including C++, allow you to break up a large program into many smaller files and rebuild only a small subset of files when you make changes in one. PICT PICT

There are two exercises in this section. In the first one the source code consists of three files. This example has enough richness to discuss the details of what happens during compiling and linking, without being overwhelming. The second exercise introduces the idea of libraries.

6.5.2 The Source Code for this Exercise

The source code for this exercise is found in Build/v1, relative to your working directory. The relevant files are t1.cc, times2.cc and times2.h. Open these files and read along with the discussion below.

The file t1.cc contains the source code for the function main for this exercise. Every C++ program must have one and only one function named main, which is where the program actually starts execution. Note that the term main program sometimes refers to this function, but other times refers to the source file that contains it. In either case, main program refers to this PIC function, either directly or indirectly. For more information, consult any standard C++ reference. The file times2.h is a header file that declares a function named times2. The file times2.cc is another source code file; it provides the definition of that function.

Look at t1.cc: it both declares and defines the program’s function main, with the signature int main(): it takes no arguments, and returns an int. A function with this signature(γ) has special meaning to the complier and the linker: they recognize it as a C++ main program. There are other signatures that the compiler and linker will recognize as a C++ main program; consult the standard C++ documentation.

To be recognized as a main program, there is one more requirement: mainPIC must be declared in the global namespace.

The body of the main program (between the braces), declares and defines a variable a of type double and initializes it to the value of 3.0; it prints out the value of a. Then it calls a function that takes a as an argument and prints out the value returned by that function. PICT PICT

You, as the programmer using that function, need to know what the function does but the C++ compiler doesn’t. It only needs to know the name, argument list and return type of the function — information that is provided in the header file, times2.h. This file contains the line

double times2(double);  

This line is called the declaration(γ) of the function. It says (1) that the identifier times2 is the name of a function that (2) takes an argument of type double (the “double” inside the parentheses) and (3) returns a value of type double (the “double” at the start of the line). The file t1.cc includes this header file, thereby giving the compiler these three pieces of information it needs to know about function.

The other three lines in times2.h make up an include guard, described in Appendix F. In brief, they deal with the following scenario: suppose that we have two header files, A.h and B.h, and that A.h includes B.h; there are many scenarios in which it makes good sense for a third file, either .h or .cc, to include both A.h and B.h. The include guards ensure that, when all of the includes have been expanded, the compiler sees exactly one copy of B.h.

Finally, the file times2.cc contains the source code for the function named times2:

double times2(double i) { 
  return 2 * i; 

It names its argument i, multiplies this argument by two and returns that value. This code fragment is called the definition of the function or the implementation(γ) of the function. (The C++ standard uses the word definition but implementation is in common use.)

We now have a rich enough example to discuss a case in which the same word is frequently used for two different things — instead of two words used for the same thing.

Sometimes people use the phrase “the source code of the function named times2” to refer collectively to both times2.h and times2.cc; sometimes they use it to refer exclusively to times2.cc. Unfortunately the only way to distinguish the two uses is from context. PICT PICT

The phrase header file always refers unambiguously to the .h file. The term implementation file is used to refer unambiguously to the .cc file. This name follows from the its contents: it describes how to implement the items declared in the header file.

Based on the above description, when this exercise is run, we expect it to print out:

  a =       3 
  times2(a) 6

6.5.3 Compile, Link and Run the Exercise

To perform this exercise, first log in and cd to your working directory if you haven’t already, then

SVG-Viewer needed.

This matches the expected printout.

Look at the file build that you just ran. It has three steps:

  1. It compiles the main program, t1.cc, into the object file (with the default name) t1.o (which will now be the thing that the term main program refers to):
    c++ -Wall -Wextra -pedantic -Werror -std=c++11 -c t1.cc
  2. It (separately) compiles times2.cc into the object file times2.o:
    c++ -Wall -Wextra -pedantic -Werror -std=c++11 -c times2.cc
  3. It links t1.o and times2.o (and some system libraries) to form the executable program t1 (the name of the main program is the argument of the -o option):
    c++ -std=c++11 -o t1 t1.o times2.o

You should have noticed that the same command, c++, is used both for compiling and linking. PICT PICT The full story is that when you run the command c++, you are actually running a program that parses its command line to determine which, if any, files need to be compiled and which, if any, files need to be linked. It also determines which of its command line arguments should be forwarded to the compiler and which to the linker. It then runs the compiler and linker as many times as required.

If the -c option is present, it tells c++ to compile only, and not to link. If -c is specified, the source file(s) to compile must also be specified. Each of the files will be compiled to create its corresponding object file and then processing stops. In our example, the first two commands each compile a single source file. Note that if any object files are given on the command line, c++ will issue a warning and ignore them.

The third command (with no -c option) is the linking step. Even if the -c option is missing, c++ will first look for source files on the command line; if it finds any, it will compile them and put the output into temporary object files. In our example, there are none, so it goes straight to linking. The two just-created object files are specified (at the end, here, but the order is not important); the -o t1 portion of the command tells the linker to write its output (the executable) to the file t1.

As it is compiling the main program, t1.cc, the compiler recognizes every function that is defined within the file and every function that is called by the code in the file. It recognizes that t1.cc defines a function main and that main calls a function named times2, whose definition is not found inside t1.cc. At the point that main calls times2, the compiler will write to t1.o all of the machine code needed to prepare for the call; it will also write all of the machine code needed to use the result of times2. In between these two pieces, the compiler will write machine code that says “call the function whose memory address is” but it must leave an empty placeholder for the address. The placeholder is empty because the compiler does not know the memory address of that function.

The compiler also makes a table that lists all functions defined by the file and all functions that are called by code within the file. The name of each entry in the table is called a linker symbol and the table is called a symbol table. When the compiler was compiling t1.cc and it found the definition of the main program, it created a linker symbol for the main program and added a notation to say the this file contains the definition of that symbol. When the compiler was compiling t1.cc and it encountered the call to times2, it created a linker symbol for this function; it marked this symbol as an undefined reference (because it could not PICT PICT find the definition of times2 within t1.cc). The symbol table also lists all of the places in the machine code of t1.o that are placeholders that must be updated once the memory address of times2 is known. In this example there is only one such place.

When the compiler writes an object file, it writes out both the compiled code and the table of linker symbols.

The symbol table in the file times2.o is simple; it says that this file defines a function named times2 that takes a single argument of type double and that returns a double. The name in the symbol table encodes not only the function name, but also the number and types of the function arguments. These are necessary for overload resolution(γ).

The job of the linker (also invoked by the command c++) is to play match-maker. First it inspects the symbol tables inside all of the object files listed on the command line and looks for a linker symbol that defines the location of the main program. If it cannot find one, or if it finds more than one, it will issue an error message and stop. In this example

  1. The linker will find the definition of a main program in t1.o.
  2. It will start to build the executable (output) file by copying the machine code from t1.o to the output file.
  3. Then it will try to resolve the unresolved references listed in the symbol table of t1.o; it does this by looking at the symbol tables of the other object files on the command line. It also knows to look at the symbol tables from a standard set of compiler-supplied and system-supplied libraries.
  4. It will discover that times2.o resolves one of the external references from t1.o. So it will copy the machine code from times2.o to the executable file.
  5. It will discover that the the other unresolved references in t1.o are found in the compiler-supplied dynamic libraries. It will put into the executable the necessary PICT PICT information to resolve these references at the time when the program is run.
  6. Once all of the machine code has been copied into the executable, the compiler knows the memory address of every function, or where to find them at run-time. The compiler can then go into the machine code, find all of the placeholders and update them with the correct memory addresses.

Sometimes resolving one unresolved reference will generate new ones. The linker iterates until (a) all references are resolved and no new unresolved references appear (success) or (b) the same unresolved references continue to appear (error). In the former case, the linker writes the output to the file specified by the -o option; if no -o option is specified the linker will write its output to a file named a.out. In the latter case, the linker issues an error message and does not write the output file.

After the link completes, the files t1.o and times2.o are no longer needed because everything that was useful from them was copied into the executable t1. You may delete the object files, and the executable will still run.

6.5.4 Alternate Script build2

The script build2 shows an equivalent way of building t1 that is commonly used for small programs; it does it all on one line. To exercise this script:

SVG-Viewer needed.

Look at the script build2; it contains only one command:

c++ -Wall -Wextra -pedantic -Werror -std=c++11 -o t1 t1.cc times2.cc

This script automatically does the same operations as build but it knows that the object files are temporaries. Perhaps the command c++ kept the contents of the two object files in memory and never actually wrote them out as disk files. Or, perhaps, the command c++ did explcitly create disk files and deleted them when it was finished. In either case you don’t see them when you use build2. PICT PICT

6.5.5 Suggested Homework

It takes a bit of experience to decipher the error messages issued by a C++ compiler. The three exercises in this section are intended to introduce you to them so that you (a) get used to looking at them and (b) understand these particular errors if/when you encounter them later.

Each of the following three exercises is independent of the others. Therefore, when you finish with each exercise, you will need to undo the changes you made in the source file(s) before beginning the next exercise.

  1. In Build/v1/t1.cc, comment out the include directive for times2.h; rebuild and observe the error message.
  2. In Build/v1/times2.cc, change the return type to float; rebuild and observe the error message.
  3. In Build/v1/t1.cc, change double a = 3. to {cppcodefloat a = 3.; rebuild and run. This will work without error and will produce the same output as before.

The first homework exercise will issue the diagnostic:

t1.cc: In function intmain(): 
t1.cc:9:40: error: times2 was not declared in this scope

When you see a message like this one, you can guess that either you have not included a required header file or you have misspelled the name of the function.

The second homework exercise will issue the diagnostic (second and last lines split into two here):

times2.cc: In function floattimes2(double): 
times2.cc:3:22: error: new declaration floattimes2(double) 
 float times2(double i) { 
In file included from times2.cc:1:0: 
times2.h:4:8: error: ambiguates old declaration doubletimes2(double) 
 double times2(double); 
        ^ PICT PICT

This error message says that the compiler has found two functions that have the same signature but different return types. The compiler does not know which of the two functions you want it to use.

The bottom line here is that you must ensure that the definition of a function is consistent with its declaration; and you must ensure that the use of a function is consistent with its declaration.

The third homework exercise illustrates the C++ idea of implicit (type) conversion; in this case the compiler will make a temporary variable of type double and set its value to that of a, as if the code included:

double tmp = a; 
std::cout << ~times2(a) ~ << times2(tmp) << std::endl;  

Consult the standard C++ documentation to understand when implicit type conversions may occur; see Section 6.9.

6.6 C++ Exercise 3: Libraries

Multiple object files can be grouped into a single file known as a library, obviating the need to specify each and every object file when linking; you can reference the libraries instead. This simplifies the multiple use and sharing of software components. Libraries were introduced in Section 6.1); here we introduce the building of libraries.

6.6.1 What You Will Learn

In this section you will repeat the example of Section 6.5 with a variation. You will create a library from times2.o and use that library in the link step. This pattern generalizes easily to the case that you will encounter in your experiment software, where object libraries will typically contain many object files. PICT PICT

6.6.2 Building and Running the Exercise

To perform this exercise, do the following:

SVG-Viewer needed.

This matches the expected printout. Now let’s look at the script build. It has four parts which do the following things:

  1. Compiles times2.cc; the same as the previous exercise:
    c++ -Wall -Wextra -pedantic -Werror -std=c++11 -c times2.cc
  2. Creates the library named libpackage1.so (on OS X, the standard suffix for a dynamic library is different, and the the library is called libpackage1.dylib) from times2.o:
    c++ -o libpackage1.so -shared times2.o
    c++ -o libpackage1.dylib -shared times2.o
    Note that the name of the library must come before the name of the object file. The flag -shared directs the linker to create a dynamic library rather than an executable image; without this flag, this command would produce an error complaining about the lack of a main function.
  3. Compiles t1.cc; the same as the previous exercise:
    c++ -Wall -Wextra -pedantic -Werror -std=c++11 -c t1.cc
  4. Links the main program against the dynamic library (either libpackage1.so or libpackage1.dylib) and, implicitly, the necessary system libraries:
    c++ -o t1 t1.o libpackage1.so
    c++ -o t1 t1.o libpackage1.dylib

Note that from this point on, in order to reduce the verbosity of some library descriptions, we will use the Linux form of library names (e.g. libpackage1.so). PIC If you are working on OS X, you will need to translate all these to the OS X form (e.g. libpackage1.dylib).

The two new features are in step 2, which creates the dynamic library, and step 4, in which times2.o is replaced in the link list with dynamic library. If you have many object files to add to the library, you may add them one at a time by repeating step 2 or you may add them all in one command. When you do the latter you may name each object file separately or may use a wildcard:

c++ -o libpackage1.so -shared *.o

In the filename libpackage1.so the string package1 has no special meaning; it was an arbitrary name chosen for this exercise.

The other parts of the name, the prefix lib and the suffix .so, are part of a long-standing Unix convention. Some Unix tools presume that libraries are named following this convention, so PIC you should always follow it. The use of this convention is illustrated by the scripts build2 and build3.

To perform the exercise using build2, stay in the same directory and clean up and rebuild as follows:

SVG-Viewer needed.


SVG-Viewer needed.

The only difference between build and build2 is the link line. The version from build is:

c++ -o t1 t1.o libpackage1.so

while that from build2 is:

c++ -o t1 t1.o -L. -lpackage1

In the script build, the path to the library, relative or absolute, is written explicitly on the command line. In the script build2, two new elements are introduced. The command line may contain any number of -L options; the argument of each option is the name of a directory. The ensemble of all of the -L options forms a search path to look for named libraries; the path is searched in the order in which the -L options appear on the line. The names of libraries are specified with the -l options (this is a lower case letter L, not the numeral one); if a -l option has an argument of XXX (or package1), then the linker will search the path defined by the -L options for a file with the name libXXX.so (or libpackage1.so).

In the above, the dot in -L. is the usual Unix pathname that denotes the current working directory. And it is important that there be no whitespace after a -L or a -l option and its value. PICT PICT

This syntax generalizes to multiple libraries in multiple directories as follows. Suppose that the libraries libaaa.so, libbbb.so and libccc.so are in the directory L1 and that the libraries libddd.so, libeee.so and libfff.so are in the directory L2. In this case, the link list would look like:

-Lpath-to-L1 -laaa -lbbb -lccc -Lpath-to-L2 -lddd -leee -lfff

The -L -l syntax is in common use throughout many Unix build systems: if your link list contains many object libraries from a single directory then it is not necessary to repeatedly specify the path to the directory; once is enough. If you are writing link lists by hand, this is very convenient. In a script, if the path name of the directory is very long, this convention makes a much more readable link list.

To perform the exercise using build3, stay in the same directory and clean up and rebuild as follows:

SVG-Viewer needed.


SVG-Viewer needed.

The difference between build2 and build3 is that build3 compiles the main program and links it all one one line instead of two.

6.7 Classes

6.7.1 Introduction

The comments in the sample program used in Section 6.4 emphasized that every variable has a type: int, float, std::string, std::vector<std::string>, and so on. One of the basic building blocks of C++ is that users may define their own types; user-defined types may be built-up from all types, including other user-defined types.

The language features that allow users to define types are the class(γ) and the struct(γ). As you work through the Workbook exercises, you will see classes that are defined by the Workbook itself; you will also see classes defined by the toyExperiment UPS product; you will see classes defined by art and you will see classes defined by the many UPS products that support art. You will also write some classes of your own. When you work with the software for your experiment you will work with classes defined within your experiment’s software.

Classes and structures (types introduced by either class or struct) are called user-defined types. std::string, etc., although defined by the Standard Library, are still called user-defined types.

In general, a class is specified by both a definition (that describes what objects of that class’s type consist of) and an implementation(γ) (that describes how the class works). The definition specifies the data that comprise each object of class; these data are call data members or member data. The definition also specifies some functions (called member functions) that will operate on that data. It is legal for a class declaration (and therefore, a PICT PICT class) to contain only data or only functions. A class definition has the form shown in listing 6.1. PICT PICT

class MyClassName { 
  // required: declarations of all members of the class 
  // optional: definitions of some members of the class 
Listing 6.1: Layout of a class.

The string class is a keyword that is reserved to C++ and may not be used for any user-defined identifier. This construct tells the C++ compiler that MyClassName is the name of a class; everything that is between the braces is part of the class definition.

A class declaration (which you will rarely use) presents the name of the newly defined type, and states that the type is a class:

class MyClassName;  

Class declarations are rarely used because a class definition also acts as a class declaration.

The remainder of Section 6.7 will give many examples of members of a class.

In a class definition, the semi-colon after the closing PIC brace is important.

The upcoming sections will illustrate some features of classes, with an emphasis on features that will be important in the first few Workbook exercises. This is not indended to be a comprehensive description of classes. To illustrate, we will show nine versions of a type named Point that represents a point in a plane. The first version will be simple and each subsequent version will add features.

This documentation will use technically correct language so that you will find it easier to read the standard reference materials. We will point out colloquial usage as necessary.

Note that the C++ Standard uses the phrase a type of class-type to mean a type that is either a class or a structure. In this document we will usually use class rather than the more formal a type of class-type; we will indicate when you need to distinguish between types that are classes and types that are structures.

6.7.2 C++ Exercise 4 v1: The Most Basic Version

Here you will see a very basic version of the class Point and an illustration of how Point can be used. The ideas of data members (sometimes called member data), objects and instantiation will be defined.

To build and run this example: PICT PICT

SVG-Viewer needed.

The values printed out in the first line of the output may be different when you run the program (remember initialization?). When you look at the code you will see that p0 is not initialized and therefore contains unpredictable data. The last three lines of output may also differ when you run the program; they are memory addresses.

Look at the header file Point.h, reproduced in Listing 6.2, which shows the basic version of the class Point. PICT PICT

1#ifndef Point_h 
2#define Point_h 
4class Point { 
5 public: 
6  double x; 
7  double y; 
10#endif /* Point_h */  
Listing 6.2: File Point.h with the simplest version of the class Point.

The three lines starting with # make up an include guard, described in Appendix F.

Line 4 introduces the name Point, and states that Point is a class.

The body of the class definition begins on line 4, with the opening brace; the body of the class definition ends on line 8, with the closing brace. The definition of the class is followed by a semicolon. Line 5 states that the following members of the class are public, which means they are accessible from code outside the class (that is, they are accessible not only by member functions of this class, but by member functions of other classes and by free functions). Members can also be private or protected. Section 6.7.7 addresses the meaning of private. The use of protected is beyond the scope of this introduction. Lines 6 and 7 declare the data members x and y, both of which are of type double.

In this exercise there is no file Point.cc because the class has no user-defined member functions to implement.

Look at the function main (the main program) in ptest.cc, reproduced in Listing 6.3. This function illustrates the use of the class Point. PICT PICT

1#include ~Point.h~ 
2#include <iostream> 
4int main() { 
5  Point p0; 
6  std::cout << ~p0: (~ << p0.x << ~, ~ << p0.y << ~)~ 
7            << std::endl; 
9  p0.x = 1.0; 
10  p0.y = 2.0; 
11  std::cout << ~p0: (~ << p0.x << ~, ~ << p0.y << ~)~ 
12            << std::endl; 
14  Point p1; 
15  p1.x = 3.0; 
16  p1.y = 4.0; 
17  std::cout << ~p1: (~ << p1.x << ~, ~ << p1.y << ~)~ 
18            << std::endl; 
20  Point p2 = p0; 
21  std::cout << ~p2: (~ << p2.x << ~, ~ << p2.y << ~)~ 
22            << std::endl; 
24  std::cout << ~Address of p0: ~ << &p0 << std::endl; 
25  std::cout << ~Address of p1: ~ << &p1 << std::endl; 
26  std::cout << ~Address of p2: ~ << &p2 << std::endl; 
28  return 0; 
Listing 6.3: The contents of v1/ptest.cc.

ptest.cc includes Point.h so that the compiler will know about the class Point. It also includes the Standard Library header <iostream> which enables printing with std::cout.

When the first line of code in the main function,

 Point p0;  

is executed, the program will ensure that memory has been allocated5 to hold the data members of p0. If the class Point contained code to initialize data members then the program would also run that, but Point does not have any such code. Therefore the data members take on whatever values happened to preexist in the memory that was allocated for them.

Some other standard pieces of C++ nomenclature can now be defined:

  1. The identifier6 p0 refers to an object in memory. Recall that the C++ meaning of object is a region of memory.
  2. The type of this identifier is Point. The compiler uses this type to interpret the bytes stored in the object.
  3. When the running program executes line 5 of the main program, it constructs(γ) the object(γ) named by the identifier p0. PICT PICT
  4. The object associated with the identifier p0 is an instance(γ) of the class Point.

An important take-away from the above is that a variable is an identifier in a source code file that refers to some object, while an object is something that exists in the computer memory. Most of the time a one-to-one correspondence exists betweeen variables in the source code and objects in memory. There are exceptions, however, for example, sometimes a compiler needs to make anonymous temporary objects that do not correspond to any variable in the source code, and sometimes two or more variables in the source code can refer to the same object in memory.

We have now seen multiple meanings for the word object:

  1. An object is a file containing machine code, the output of a compiler.
  2. An object is a region of memory.
  3. An object is an instance of a class.

Which is meant must be determined from context. In this Workbook, we will use “class instance” rather than “object” to distinguish between the second and third meanings in any place where such differentiation is necessary.

The last section of the main program (and of ptest.cc itself) prints the address of each of the three objects, p0, p1 and p2. The addresses are represented in hexadecimal (base 16) format. On almost all computers, the size of a double is eight bytes. Therefore an object of type Point will have a size of 16 bytes. If you look at the printout made by ptest you will see that the addresses of p0, p01 and p2 are separated by 16 bytes; therefore the three objects are contiguous in memory.

Figure 6.1 shows a diagram of the computer memory at the end of running ptest; the outer box (blue outline) represents the memory of the computer; each filled colored box represents one of the three class instances in this program. The diagram shows them in contiguous memory locations, which is not necessary; there could have been gaps between the memory locations. PICT PICT



Now, for a bit more terminology: each of the objects referred to by the variables p0, p1 and p2 has the three attributes required of an object:

  1. a state(γ), given by the values of its data members;
  2. the ability to have operations performed on it: e.g., setting/reading in value of a data member, assigning value of object of a given type to another of the same type;
  3. a unique address in memory, and therefore a unique identity.

6.7.3 C++ Exercise 4 v2: The Default Constructor

This exercise expands the class Point by adding a user-written default constructor(γ).

To build and run this example:

SVG-Viewer needed.


SVG-Viewer needed.

When you run the code, all of the printout should match the above printout exactly.

Look at Point.h. There is one new line in the body of the class definition:


The parentheses tell you that this new member is some sort of function. A C++ class may have several different kinds of functions.

A function that has the same name as the class itself has a special role and is called a constructor; if a constructor can be called with PIC no arguments it is called a default constructor7 . PICT PICT In informal written material, the word constructor is sometimes written as c’tor.

Point.h declares that the class Point has a default constructor, but does not define it (i.e., provide an implementation). The definition (implementation) of the constructor is found in the file Point.cc.

Look at the file Point.cc. It #includes the header file Point.h because the compiler needs to know all about this class before it can compile the code that it finds in Point.cc. The rest of the file contains a definition of the constructor. The syntax Point:: says that the function to the right of the :: is part of (a member of) the class Point. The body of the constructor gives initial values to the two data members, x and y:

Point::Point() { 
  x = 0.; 
  y = 0.; 

Look at the program ptest.cc. The first line of the main function is again

Point p0;  

When the program executes this line, the first step is the same as before: it ensures that memory has been allocated for the data members of p0. This time, however, it also calls the default constructor of the class Point (declared in Point.h), which initializes the two data members (per Point.cc) such that they have well defined initial values. This is reflected in the printout made by the next line.

The next block of the program assigns new values to the data members of p0 and prints them out.

In the previous example, Classes/v1/ptest.cc, a few things happened behind the scenes that will make more sense now that you know what a constructor is.

  1. Since the source code for class Point did not contain any user-defined constructor, the compiler generated a default constructor for you; this is required by the C++ Standard and will be done for any class that has no user-written constructor.
  2. The compiler puts the generated constructor code directly into the object file; it does PICT PICT not affect the source file.
  3. The generated default constructor will default construct each data member of the class.
  4. Default construction of an object of a primitive type leaves that object uninitialized; this is why the data members x and y of version 1 of Point were uninitialized.

6.7.4 C++ Exercise 4 v3: Constructors with Arguments

This exercise introduces four new ideas:

  1. constructors with arguments,
  2. the copy constructor,
  3. the implicitly generated constructor,
  4. single-phase construction vs. two-phase construction.

To build and run this exercise, cd to the directory Classes/v3 and follow the same instructions as in Section 6.7.3. When you run the ptest program, you should see the following output:

  p0: (1, 2) 
  p1: (1, 2)

Look at the file Point.h. This contains one new line: PICT PICT

Point( double ax, double ay);  

This line declares a second constructor; we know it is a constructor because it is a function whose name is the same as the name of the class. It is distinguishable from the default constructor because its argument list is different than that of the default constructor. As before, the file Point.h contains only the declaration of this constructor, not its definition (implementation).

Look at the file Point.cc. The new content in this file is the implementation of the new constructor; it assigns the values of its arguments to the data members. The names of the arguments, ax and ay, have no meaning to the compiler; they are just identifiers. It is good practice to choose names that bear an obvious relationship to those of the data members. One convention that is sometimes used is to make the name of the argument be the same as that of the data member, but with a prefix lettter a, for argument. Whatever convention you (or your experiment) choose(s), use it consistently. When you update code that was initially written by someone else, we strongly recommend that you follow whatever convention they adopted. Choices of style should be made to reinforce the information present in the code, not to fight it.

Look at the file ptest.cc. The first line of the main function is now:

Point p0(1.,2.);  

This line declares the variable p0 and initializes it by calling the new constructor defined in this section. The next line prints the value of the data members.

The next line of code

  Point p1(p0);  

uses the copy constructor. A copy constructor is used by code (like the above) that wants to create a copy (e.g., p1) of an existing object (e.g, p0). The default meaning of copying is data-member-by-data-member copying. Under the appropriate conditions (to be described later), the compiler will implicitly generate a copy constructor, with public access, for a class; this definition of Point meets the necessary conditions. As is done for a generated default PICT PICT constructor, the compiler puts the generated code directly into the object file; it does not affect the source file.

We recommend that for any class whose data members are either built-in types, of which Point is an example, or simple aggregates of built-in types, you let the compiler write the copy constructor for you.

If your class has data members that are pointers, or data members that manage some external resource, such as a file that you are writing to, these pointers should be smart pointers, such as std::shared_ptr<T> or std::unique_ptr<T>. This will allow the compiler-generated copy constructor to give the correct behavior. For a description of smart pointers, consult the standard C++ references (listed in Section 6.9). There are rare cases in which you will need to write your own copy constructor, but discussing them here is beyond the scope of this document. When you need to write your own copy constructor, you can learn how to do so from any standard C++ reference.

The next line in the file prints the values of the data members of p1 and you can see that the copy constructor worked as expected.

Notice that in the previous version of ptest.cc, the variable p0 was initialized in three lines:

Point p0; 
p0.x = 3.1; 
p0.y = 2.7;  

This is called two-phase construction. In contrast, the present version uses single-phase construction in which the variable p0 is initialized in one line:

Point p0(1.,2.);  

We strongly recommend using single-phase construction whenever possible. Obviously it PIC takes less real estate, but more importantly:

  1. Single-phase construction more clearly conveys the intent of the programmer: the intent is to initialize the object p0. The second version says this directly. In the first version you needed to do some extra work to recognize that the three lines quoted above formed a logical unit distinct from the remainder of the program. This is PICT PICT not difficult for this simple class, but it can become so with even a little additional complexity.
  2. Two-phase construction is less robust. It leaves open the possibility that a future maintainer of the code might not recognize all of the follow-on steps that are part of construction and will use the object before it is fully constructed. This can lead to difficult-to-diagnose run-time errors.
  3. Single-phase construction can be more efficient than two-phase construction.
  4. Single-phase construction is the only way to initialize variables that are declared const. It is good practice to declare const any variable that is not intended to be changed.

6.7.5 C++ Exercise 4 v4: Colon Initializer Syntax

This version of the class Point introduces colon-initializer syntax for constructors.

To build and run this exercise, cd to the directory Classes/v4 and follow the same instructions as in the previous two sections. When you run the ptest program you should see the following output:

  p0: (1, 2) 
  p1: (1, 2)

The file Point.h is unchanged between this version and the previous one.

Now look at the file Point.cc, which contains the definitions of both constructors. The first thing to look at is the default constructor, which has been rewritten using colon-initializer syntax. The rules for the colon-initializer syntax are:

  1. A colon must immediately follow the closing parenthesis of the argument list.
  2. There must be a comma-separated list of data members, each one initialized by calling one of its constructors.
  3. Data members are guaranteed to be initialized in the order in which they appear in the class declaration. Therefore it is good practice to use the same order for the initialization list.
  4. The body of the constructor, enclosed in braces, must follow the initializer list. The body of the constructor will most often be empty.
  5. If a data member is missing from the initializer list, that member will be default-constructed. Thus data members that are of a primitive type and are missing from the initialize list will not be initialized.
  6. If no initializer list is present, the compiler will call the default constructor of every data member, and it will do so in the order in which data members were specified in the class declaration.

If you think about these rules carefully, you will see that in Classes/v3/
Point.cc, the compiler did the following when compiling the default constructor.

  1. The compiler did not find an initializer list, so it generated machine code that created uninitialized x and y.
  2. It then wrote the machine code to make the assignments x=0 and y=0.

On the other hand, when the compiler compiled the source code for the default constructor in Classes/v4/Point.cc, it wrote the machine code to initialize x and y each to zero.

Therefore, the machine code for the v3 version might do more work than that for the v4 version. PICT PICT In practice Point is a sufficiently simple class that the compiler likely recognized and elided all of the unnecessary steps in v3; it is likely that the compiler actually produced identical code for the two versions of the class. For a class containing more complex data, however, the compiler may not be able to recognize meaningless extra work and it will write the machine code to do that extra work.

In some cases it does not matter which of these two ways you use to write a constructor; but on those occasions that it does matter, the right answer is always the colon-initializer syntax. So we strongly recommend that you always use the colon-initializer syntax. In the Workbook, all classes are written with colon-initializer syntax.

Now look at the second constructor in Point.cc; it also uses colon-initializer syntax but it is laid out differently. The difference in layout has no meaning to the compiler — whitespace is whitespace. Choose which ever seems natural to you.

Look at ptest.cc. It is the same as the version v3 and it makes the same printout.

6.7.6 C++ Exercise 4 v5: Member functions

This section will introduce member functions(γ), both const member functions(γ) and non-const member functions. It will also introduce the header <cmath>. Suggested homework for this material follows.

To build and run this exercise, cd to the directory Classes/v5 and follow the same instructions as in Section 6.7.3. When you run the ptest program you should see the following output:

Before p0: (1, 2)  Magnitude: 2.23607  Phi: 1.10715 
After  p0: (3, 6)  Magnitude: 6.7082  Phi: 1.10715

Look at the file Point.h. Compared to version v4, this version contains three additional lines:

double mag() const; 
double phi() const; 
void scale(double factor); PICT PICT  

All three lines declare member functions. As the name suggests, a member function is a function that can be called and it is a member of the class. Contrast this with a data member, such as x or y, which are not functions. A member function may access any or all of the member data of the class.

The first of these member functions is named Point::mag. The name indicates this function is a member of class Point. Point::mag does not take any arguments and it returns a double; you will see that the value of the double is the magnitude of the 2-vector from the origin (0, 0) to (x,y). The qualifier const represents a contract between the definition/implementation of mag and any code that uses mag; it “promises” that the implementation of Point::mag will not modify the value of any data members. The consequences of breaking the contract are illustrated in the homework at the end of this subsection.

Similarly, the member function named Point::phi takes no arguments, returns a value of type double and has the const qualifier. You will see that the value of the double is the azimuthal angle of the vector from the origin (0, 0) to the point (x,y).

The third member function, Point::scale, takes one argument, factor. Its return type is void, which means that it returns nothing. You will see that this member function multiplies both x and y by factor (i.e., changing their values). This function declaration does not have the const qualifier because it actually does modify member data.

If a member function does not modify any data members, you should always declare it const simply as a matter of course. PIC Any negative consequences of not doing so might only become apparent later, at which point a lot of tedious editing will be required to make everything right.

Look at Point.cc. Near the top of the file an additional include directive has been added; <cmath> is a header from the C++ standard library that declares a set of functions for computing common mathematical operations and transformations. Functions from this library are in the namespace(γ) std.

Later on in Point.cc you will find the definition of Point::mag, which computes the magnitude of the 2-vector from the origin (0, 0) to (x,y). To do so, it uses std::sqrt, a PICT PICT function declared in the <cmath> header. This function takes the square root of its argument. The qualifier const that was present in the declaration of Point::mag must also be present in its definition (it must be present in these two places, but not at calling points).

The next part of Point.cc contains the definition of the member function phi. To do its work, this member function uses the atan2 function from the standard library.

The next part of Point.cc contains the definition of the member function Point::scale. You can see that this member function simply multiplies the two data members by the value of the argument.

The file ptest.cc contains a main function that illustrates these new features. The first line of this function declares and initializes an object, p0, of type Point. It then prints out the value of its data members, the value returned from calling the function Point::mag and the value returned from calling Point::phi. This shows how to invoke a member function: you write the name of the variable, followed by a dot (the member selection operator), followed by the unqualified name of the member function, followed by the argument list in the function call parentheses. The unqualified name of a member function is the part of the name that follows the double-colon scope resolution operator (::). Thus the unqualified name of Point::phi is just phi.

The next line calls the member function Point::scale with the argument 3. The printout verifies that the call to Point::scale had the intended effect.

One final comment is in order. Many other modern computer languages have ideas very similar to C++ classes and C++ member functions; in some of those languages, the name method is the technical term corresponding to member function in C++. The name method is not part of the formal definition of C++, but is commonly used nonetheless. In this documentation, the two terms can be considered synonymous.

Here we suggest four activities as homework to help illustrate the meaning of const and to familiarize you with the error messages produced by the C++ compiler. Before moving to a subsequent activity, undo the changes that you made in the current activity.

  1. In the definition of the member function Point::mag(), found in Point.cc, before taking the square root, multiply the member datum x by 2. PICT PICT
    double Point::mag() const { 
      x *= 2.; 
      return std::sqrt( x*x + y*y ); 

    Then build the code again; you should see the following diagnostic message:

    Point.cc: In member function doublePoint::mag()const: 
    Point.cc:13:8: error: assignment of member Point::x in 
         read-only object
  2. In ptest.cc, change the first line to
    Point const p0(1,2);  

    Then build the code again; you should see the following diagnostic message:

    ptest.cc: In function intmain(): 
    ptest.cc:13:14: error: no matching function for call to 
    ptest.cc:13:14: note: candidate is: 
    In file included from ptest.cc:1:0: 
    Point.h:13:8: note: void Point::scale(double) <near match> 
    Point.h:13:8: note:   no known conversion for implicit 
         this parameter from constPoint* to Point*

These first two homework exercises illustrate how the compiler enforces the contract defined by the qualifier const that is present at the end of the declaration of Point::mag and that is absent in the definition of the member function Point::scale. The contract says that the definition of Point::mag may not modify the values of any data members of the Point object on which it is called; users of the class Point may count on this behaviour. The contract also says that the definition of the member function Point::scale may modify the values of data members of the class Point; users of the class Point must assume that Point::scale will indeed modify member data and act PICT PICT accordingly.8

In the first homework exercise, the value of a member datum is modified, thereby breaking the contract. The compiler detects it and issues a diagnostic message.

In the second homework exercise, the variable p0 is declared const; therefore the code may not call non-const member functions of p0, only const member functions. When the compiler sees the call p0.mag() it recognizes that this is a call to const member function and compiles the call; when it sees the call p0.scale(3.) it recognizes that this is a call to a non-const member function and issues a diagnostic message.

  1. In Point.h, remove the const qualifier from the declaration of the member function Point::mag:
    double mag();  

    Then build the code again; you should see the following diagnostic message:

    Point.cc:12:8: error: prototype for doublePoint::mag() 
    const does not match any in class Point 
    In file included from Point.cc:1:0: 
    Point.h:11:10: error: candidate is: double Point::mag()
  2. In Point.cc, remove the const qualifier in the definition of the member function Point::mag. Then build the code again; you should see the following diagnostic message:
    Point.cc:12:8: error: prototype for doublePoint::mag() 
           does not match any in class Point 
    In file included from Point.cc:1:0: 
    Point.h:11:10: error: candidate is: 
           double Point::mag() const PICT PICT

The third and fourth homework exercises illustrate that the compiler considers two member functions that are identical except for the presence of the const identifier to be different functions9 . In homework exercise 3, when the compiler tried to compile the const-qualified version of Point::mag in Point.cc, it looked at the class definition in Point.h and could not find a matching member function declaration; this was a close, but not exact match. Therefore it issued a diagnostic message, telling us about the close match, and then stopped. Similarly, in homework exercise 4, it also could not find a match.

6.7.7 C++ Exercise 4 v6: Private Data and Accessor Methods Setters and Getters

This version of the class Point is used to illustrate the following ideas:

  1. The class Point has been redesigned to have private data members with access to them provided by accessor functions and setter functions.
  2. The keyword this, which in the body of a (non-static) member function is an expression that has the value of the address of the object on which the function is called.
  3. Even if there are many objects of type Point in memory, there is only one copy of PICT PICT the code.

A 2D point class, with member data in Cartesian coordinates, is not a good example of why it is often a good idea to have private data. But it does have enough richness to illustrate the mechanics, which is the purpose of this section. Section discusses an example in which having private data makes obvious sense.

To build and run this exercise, cd to the directory Classes/v6 and follow the same instructions as in Section 6.7.3. When you run the ptest program you should see the following output:

Before p0: (1, 2)  Magnitude: 2.23607  Phi: 1.10715 
After  p0: (3, 6)  Magnitude: 6.7082  Phi: 1.10715 
p1: (0, 1)  Magnitude: 1  Phi: 1.5708 
p1: (1, 0)  Magnitude: 1  Phi: 0 
p1: (3, 6)  Magnitude: 6.7082  Phi: 1.10715

Look at Point.h. Compare it to the version in v5:

diff -wb Point.h ../v5/

Relative to version v5 the following changes were made:

  1. four new member functions have been declared,
    1. double x() const;
    2. double y() const;
    3. void set(double ax, double ay);
    4. void set(Point const& p);
  2. the data members have been declared private PICT PICT
  3. the data members have been renamed from x and y to x_ and y_

Yes, there are two functions named set. At the site of any function call that uses the name set, the compiler makes use of the signature of the function to decide which function with that name to call. In C++ the signature of a member function encodes all of the following information:

  1. the name of the class it is in;
  2. the unqualified name of the member function;
  3. the number, type and order of arguments in the argument list;
  4. whether or not the function is qualified as const;
  5. other qualifications (reference-qualification and volatile-qualification, both of which are beyond the scope of this introduction).

The two member functions named Point::set are completely different member functions with different signatures. A set of different functions with the same name but with different signatures is called an overload set. As you work through the Workbook you will encounter a many of these, and you should develop the habit of looking at the full function signature (i.e., all the parts), not just the function name. In order to distinguish between members of an overload set, C++ compilers typically rely on name mangling. Name mangling is the process of decorating a function name with information that encodes the signature of the function. The mangled name associated with each function is the symbol emitted by the compiler, and used by the linker to identify which member of an overload set is associated with a specific function call. Each C++ compiler does this a little differently.

If you want to see what mangled names are created for the class Point, you can do the following PIC PICT PICT

c++ -Wall -Wextra -pedantic -Werror -std=c++11 -c Point.cc
nm Point.o

You can understand the output of nm by reading its man page.

In a class declaration, if any of the identifiers public, private, or protected appear, then all members following that identifier, and before the next such identifier, have the named property. In Point.h the two data members are private and all other members are public.

Look at Point.cc. Compare it to the version in v5:

diff -wb Point.cc ../v5/

Relative to version v5 the following changes were made:

  1. the data members have been renamed from x and y to x_ and y_
  2. an implementation is present for each of the four new member functions

Inspect the code in the implementation of each of the new member functions. The member function Point::x simply returns the value of the data member x_; similarly for the member function Point::y. These member functions are called accessors, accessor functions, or getters 10. The notion of accessor is often extended to include any member function that returns the value of simple, non-modifying calculations on a subset of the member data; in this sense, the function Point::mag and Point::phi are considered accessors.

The member functions in the overload set for the name Point::set each set the member data PICT PICT of the Point object on which they are called. These are, not surprisingly, called setters, setter functions or modifiers.

There is no requirement that there be accessors and setters for every data member of a class; indeed, many classes provide no such member functions for many of their data members. If a data member is important for managing internal state but is of no direct interest to a user of the class, then you should certainly not provide an accessor or a setter.

Now that the data members of Point are private, only the code within Point is permitted to access these data members directly. All other code must access this information via the accessor and setter functions.

Look at ptest.cc. Compare it to the version in v5:

diff -wb ptest.cc ../v5/

Relative to version v5 the following changes were made:

  1. the printout has been changed to use the accessor functions, and
  2. a new section has been added to illustrate the use of the two set methods.

Figure 6.2 shows a diagram of the computer memory at the end of running this version of ptest. The two boxes with the blue outlines represent sections of the computer memory; the part on the left represents that part that is reserved for storing data (such as objects) and the part on the right represents the part of the computer memory that holds the executable code. This is a big oversimplification because, in a real running program, there are many parts of the memory reserved for different sorts of data and many parts reserved for executable code.



The key point in Figure 6.2 is that each object has its own member data but there is only one copy of the code. Even if there are thousands of objects of type Point, there will only be one copy of the code. When a line of code asks for p0.mag(), the computer will pass the address of p0 as an argument to the function Point::mag, which will then do its work. When a line of code asks for p1.mag(), the computer will pass the address of p1 as an argument to the function Point::mag, which will then do its work. This address is available in the body of the member function as the value of the expression this, which acts as a pointer to the object on which the function was called. In a member function declared as const, the expression acts as a pointer that is const-qualified.

Initially this sounds a little weird: the previous paragraph talks about passing an argument to the function Point::mag but, according to the source code, Point::mag does not take any arguments! The answer is that all member functions have an implied argument that always must be present — the address of the object that the member function will do work on. Because it must always be there, and because the compiler knows that it must always be there, there is no point in actually writing it in the source code! It is by using this so called hidden argument that the code for Point::mag knew that x_ means one thing for p0 but that it means something else for p1.

For example, the accessor Point::x could have been written:

double x() const { return this->x_; }  

This version of the syntax makes it much clearer how there can be one copy of the code even though there are many objects in memory; but it also makes the code harder to read once you have understood how the magic works. There are not many places in which you need to explicitly use the keyword this, but there will be some. For further information, consult standard C++ documentation (listed in Section 6.9). What’s the deal with the underscore?

C++ will not permit you to use the same name for both a data member and its accessor. Since the accessor is part of the public interface, it should get the simple, obvious, PICT PICT easy-to-type name. Therefore the name of the data member needs to be decorated to make it distinct.

The convention used in the Workbook exercises and in the toyExperiment UPS product is that the names of member data end in an underscore character. There are some other conventions that you may encounter:


You may also see the choice of a leading underscore followed by a capital letter, or a double underscore. Never do this. PIC Such names are reserved for use by C++ implementations; use of such names may produce accidental collisions with names used in an implementation, and cause errors that might be very difficult to diagnose. While this is a very small risk, it seems wise to adopt habits that guarantee that it can never happen.

It is common to extend the pattern for decorating the names of member data to all member data, even those without accessors. One reason for doing so is just symmetry. A second reason has to do with writing member functions; the body of a member function will, in general, use both member data and variables that are local to the member function. If the member data are decorated differently than the local variables, it can make the member functions easier to understand. An example to motivate private data

This section describes a class for which it makes sense to have private data: a 2D point class that has data members r and phi instead of x and y. The author of such a class might wish to define a standard representation in which it is guaranteed that r be non-negative and that phi be on the domain 0 ϕ < 2π. If the data are public, the class cannot make these guarantees; any code can modify the data members and break the guarantee.

If this class is implemented with private data manipulated by member functions, then the constructors and member functions can enforce the guarantees.

The language used in the software engineering texts is that a guaranteed relationship among the data members is called an invariant. If a class has an invariant then the class must have private PICT PICT data.

If a class has no invariant then one is free to choose public data. The Workbook and the toyExperiment never make this choice. One reason is that classes that begin life without an invariant sometimes acquire one as the design matures — we recommend that you plan for this unless you are 100% sure that the class will never have an invariant. A second reason is that many people who are just starting to learn C++ find it confusing to encounter some classes with private data and others with public data.

6.7.8 C++ Exercise 4 v7: The inline Specifier

This section introduces the inline specifier.

To build and run this exercise, cd to the directory Classes/v7 and follow the same instructions as in Section 6.7.3. When you run the ptest program you should see the following output:

p0: ( 1, 2 )  Magnitude: 2.23607  Phi: 1.10715

Look at Point.cc and compare it to the version in v6. You will see that the implementations of the accessors Point::x and Point::y has been removed.

Comparing Point.h to the version in v6, you will see that it now contains the implementation of the accessor member functions — an almost exact copy of what was previously found in the file Point.cc. Note that these accessors are defined outside of the class declaration in Point.h and are now preceded by the specifier inline.

The inline specifier on a function sends a suggestion to the compiler to inline the function. If the compiler chooses to inline the function, the body of the function is substituted directly into the machine code at the point of each call to it, instead of the program making a run-time function call.

The specifier does not force inlining on the compiler. Why the option? In some cases inlining is a PICT PICT net positive thing, in other cases it’s a net negative; based on heuristics, the compiler will determine which, and choose. For some functions, offering the option at all (i.e., including the specifier inline) is a net negative no matter which option the compiler would choose; this means that you as the programmer need to know PIC when to use it and when not to.

Specifying a function as inline is typically a good thing only for small and/or simple functions, e.g., an accessor. The compiler will be likely to inline it because it this option is likely to

  • reduce the memory footprint11
  • execute more quickly than a function call
  • allow additional compiler optimizations to be performed.

In the “decline-to-inline” case, the compiler will write a copy of the function once for each source file in which a definition of the function appears12 . During linking, the copy of the compiled function in the same object file will be used to satisfy calls to the function. Result: a larger memory footprint, but no reduction in execution time. Clearly, for a bigger or more complex function, use of the inline specifier is disadvantageous.

C++ does not permit you to force inlining; an inline declaration is only a hint to the compiler that a function is appropriate for inlining. PICT PICT

The bottom line is that you should always declare simple accessors and simple setters inline. Here the adjective simple means that they do not do any significant computation and that they do not contain any if statements or loops. The decision to inline anything else should only follow careful analysis of information produced by a profiling tool.

Look at the definition of the member function Point::y in Point.h. Compared to the definition of the member function Point::x there is only a small change in whitespace, and of course the specifier inline. This whitespace difference is not meaningful to the compiler.

6.7.9 C++ Exercise 4 v8: Defining Member Functions within the Class Declaration

The version of Point in this section introduces the feature that allows you to provide the definition (implementation) of any member function inside the declaration of the class to which it belongs, right at the point where it the function is declared. You will occasionally see this syntax used in the Workbook. The definition of a non-member function (see Section 6.7.10) must remain outside the class declaration.

To build and run this exercise, cd to the directory Classes/v8 and follow the same instructions as in Section 6.7.3. When you run the ptest program you should see the following output:

p0: ( 1, 2 )  Magnitude: 2.23607  Phi: 1.10715

This is the same output made by v7. The files Point.cc and ptest.cc are unchanged with respect to v7, only Point.h has changed.

Relative to v7, the definition of the accessor methods Point::x and Point::y in PICT PICT Point.h has been moved into the Point class declaration. Notice that the function names are no longer prefixed with the class name and the inline specifiers have been removed.

When you define a member function inside the class declaration, the function is implicitly declared PIC inline. Section 6.7.8 discussed some cautions about inappropriate use of inlining; those same cautions apply when a member function is defined inside the class declaration.

When you define a member function within the class declaration, you must not prefix the function name with the class name and the scope resolution operator; that is,

double Point::x() const { return x_; }  

would produce a compiler PIC diagnostic.

In summary, there are two ways to write inlined definitions of member functions. In most cases, the two are entirely equivalent and the choice is simply a matter of style. The one exception occurs when you are writing a class that will become part of an art data product, due to limitations imposed by art. In this case it is recommended that you write the definitions of member functions outside of the class declaration.

When writing an art data product, the code inside the associated header file is parsed by software that determines how to write objects of that type to the output disk files and how to read objects of that type from input disk files. The software that PIC does the parsing has some limitations and we need to work around them. The workarounds are easiest if any member functions definitions in the header file are placed outside of the class declarations. For details see https://cdcvs.fnal.gov/redmine/projects/art/wiki/Data_Product_Design_Guide#Issues-mostly-related-to-ROOT.

6.7.10 C++ Exercise 4 v9: The Stream Insertion Operator and Free Functions

This section illustrates how to write a stream insertion operator for a type, in this case for the class Point. This is the piece of code that lets you print an object of a given type without having to print each data member by hand, for example:

Point p0(1,2); 
std::cout << p0 << std::endl;  

instead of

Point p0(1, 2); 
std::cout << ~p0: (~ << p0.x() << ~, ~ << p0.y() << ~)~  

To build and run this exercise, cd to the directory Classes/v9 and follow the same instructions as in Section 6.7.3. When you run the ptest program you should see the following output:

p0: ( 1, 2 )  Magnitude: 2.23607  Phi: 1.10715

This is the same output made by v7 and v8.

Look at Point.h. The changes relative to v8 are the following two additions:

  1. an include directive for the header <iosfwd>
  2. a declaration for the stream insertion operator, which appears in the file after the declaration of the class Point.

Look at Point.cc. The changes relative to v8 are the following two additions:

  1. an include directive for the header <ostream>
  2. the definition of the stream insertion operator, operator<<.

Look at ptest.cc. The only change relative to v8 is that the printout now uses the stream insertion operator for p0 instead of inserting each data member of p0 by hand. PICT PICT

std::cout << ~p0:~ << p0

In Point.h, the stream insertion operator is declared as (shown here on two lines)

operator<<(std::ostream& ost, Point const& p);  

If the class whose type is used as second argument is declared in a namespace (which it is not, in this case), then the stream insertion operator PIC must be declared in the same namespace.

When the compiler sees the use of a << operator that has an object of type std::ostream on its left hand side and an object of type Point on its right hand side, then the compiler will look for a function named operator<< whose first argument is of type std::ostream& and whose second argument is of type Point const&. If it finds such a function it will call that function to do the work; if it cannot find such a function it will issue a compiler diagnostic.

We write operator<< with a return type of std::ostream& so that one may chain together multiple uses of the << operator:

Point p0(1,2), p1(3,4); 
std::cout << p0 << ~ ~ << p1 << std::endl;  

The C++ compiler parses this left to right. First it recognizes the expression std::cout << p0. Because std::out is of type std::ostream, and because p0 is of type Point, the compiler calls our stream insertion operator to do this work. The return type of the function call is std::ostream&, and so the next expression is recognized as a call to the stream insertion operator for an array of characters (~~). The next is another call to our stream insertion operator for class Point, this time using the object p1. This also returns a std::ostream&, allowing the last part of the expression to be recognized as a call to the stream insertion operator for std::endl, which writes a newline and flushes the output stream.

Look at the implementation of the stream insertion operator in Point.cc:

 std::ostream& operator<<(std::ostream& ost, 
                         Point const& p) { 
  ost << ~( ~ 
      << p.x() << ~, ~ 
      << p.y() 
      << ~ )~; 
  return ost; 

The first argument, ost, is a reference to an object of type std::ostream; the name ost has no special meaning to C++. When writing the implementation for this operator we don’t know and don’t care what the output stream will be connected to; perhaps a file; perhaps standard output. In any case, you send output to ost just as you do to std::cout, which is just another variable of type std::ostream. In this example we chose to enclose the values of x_ and y_ in parentheses in the printout and to separate them with a comma; this is simply our choice, not something required by C++ or by art.

In this example, the stream insertion operator does not end by inserting a newline into ost. This is a very common choice as it allows the user of the operator to have full control about line breaks. For a class whose printout is very long and covers many lines, you might decide that this operator should end by inserting newline character; it’s your choice.

If you wish to write a stream insertion operator for another class, just follow the pattern used here.

If you want to understand more about why the operator is written the way that it is, consult the standard C++ references; see Section 6.9.

The stream insertion operator is a free function(γ), not a member function of the class Point; the tie to the class Point is via its second argument. Because this function is a free function, it could have been declared in its own header file and its implementation could have been provided in its own source file. However that is not common practice. Instead the common practice is as shown in this example: to include it in Point.h and Point.cc.

The choice of whether to put the declaration of the stream insertion operator (or any other free function) into (1) the header file containing a class declaration or (2) its own header file is a tradeoff between the following two criteria: PIC

  1. It may be convenient to have it in the class header file; otherwise users would have PICT PICT to remember to include an additional header file when they want to use this operator (or function).
  2. One can imagine many simple free functions that take an object of type Point as an argument. If they are all inside Point.h, and if each is only infrequently used, then the compiler will waste time processing the declarations every time Point.h is included somewhere.

The definition of this operator is typically put into the implementation file, rather than being inlined. Such functions are generally poor candidates for inlining.

Ultimately this is a judgment call and the code in this example follows the recommendations made by the art development team. Their recommendation is that the following sorts of free functions, and only these sorts, should be included in header files containing a class declaration:

  1. the stream insertion operator for that class
  2. out-of-class arithmetic and comparison operators

With one exception, if including a function declaration in Point.h requires the inclusion of an additional header in Point.h, declare that function in a different header file. The exception is that it is okay to include <iosfwd>.

6.7.11 Review

The class Point is an example of a class that is primarily concerned with providing convenient access to the data it contains. Not all classes are like this; when you work through the Workbook, you will write some classes that are primarily concerned with packaging convenient access to a set of related functions:

  1. class
  2. object PICT PICT
  3. identifier
  4. free function
  5. member function

6.8 Overloading functions

A more complete description of overload sets, and an introduction to the rules for overload resolution, will go here. This should give an example illustrating the kinds of error messages given by the compiler when no suitable overload can be found and also an example of the kind of error message that results when the match from an overload set is ambiguous.

6.9 C++ References

This section lists some recommended C++ references, both text books and online materials.

The following references describe the C++ core language,

The following references describe the C++ Standard Library,

The following contains an introductory tutorial. Many copies of this book are available at the Fermilab library. It is a very good introduction to the big ideas of C++ and Object Oriented Programming but it is not a fast entry point to the C++ skills needed for HEP. It also has not been updated for the current C++ standard.

  • Andrew Koenig and Barbara E. Moo, “Accelerated C++: Practical Programming by Example” Addison-Wesley, 2000. ISBN 0-201-70353-X.

The following contains a discussion of recommended best practices. It has not been updated for the current C++ standard.

  • Herb Sutter and Andrei Alexandrescu, “C++ Coding Standards: 101 Rules, Guidelines, and Best Practices.”, Addison-Wesley, 2005. ISBN 0-321-11358-6.

Chapter 7
Using External Products in UPS

Section 3.6.8 introduced the idea of external products. For the Intensity Frontier experiments (and for Fermilab-based experiments in general), access to external products is provided by a Fermilab-developed product-management package called Unix Product Support (UPS). An important UPS feature – demanded by most experiments as their code evolves – is its support for multiple versions of a product and multiple builds (e.g., for different platforms) per version.

Another notable feature is its capacity to handle multiple databases of products. So, for example, on Fermilab computers, login scripts (see Section 4.9) set up the PIC UPS system, providing access to a database of products commonly used at Fermilab.

The art Workbook and your experiment’s code will require additional products (available in other databases). For example, each experiment will provide a copy of the toyExperiment product in its experiment-specific UPS database.

In this chapter you will learn how to see which products UPS makes available, how UPS handles variants of a given product, how you use UPS to initialize a product provided in one of its databases and about the environment variables that UPS defines.

7.1 The UPS Database List: PRODUCTS

The act of setting up UPS defines a number of environment variables (discussed in Section 7.5), one of which is PRODUCTS. This particularly important environment variable merits its own section.

The environment variable PRODUCTS is a colon-delimited list of directory names, i.e., it is a path (see Section 4.6). Each directory in PRODUCTS is the name of a UPS database, meaning simply that each directory functions as a repository of information about one or more products. PICT PICT When UPS looks for a product, it checks each directory in PRODUCTS, in the order listed, and takes the first match.

If you are on a Fermilab machine, you can look at the value of PRODUCTS just after logging in, before sourcing your site-specific setup script. PIC Run printenv:

printenv PRODUCTS

It should have a value of


This generic Fermilab UPS database contains a handful of software products commonly used at Fermilab; most of these products are used by all of the Intensity Frontier Experiments. This database does not contain any of the experiment-specific software nor does it contain products such as ROOT(γ), Geant4(γ), CLHEP or art. While these last few products are indeed used by multiple experiments, they are often custom-built for each experiment and as such are distributed via the experiment-specific (i.e., separate) UPS databases.

After you source your site-specific setup script, look at PRODUCTS again. It will probably contain multiple directories, thus making many more products available in your “site” environment. For example, on the DS50+Fermilab site, after running the DS50 setup script, PRODUCTS contains: PICT PICT


You can see which products PRODUCTS contains by running ls on its directories, one-by-one, e.g.,

ls /grid/fermiapp/products/common/db
afs   git      ifdhc         mu2e             python          ... 
cpn   gitflow  jobsub_tools  oracle_tnsnames  ... 
encp  gits     login         perl             setpath         ...
ls /ds50/app/products
art                cetpkgsupport  g4neutronxs  libxml2          ... 
artdaq             clhep          g4nucleonxs  messagefacility  ... 
art_suite          cmake          g4photon     mpich            ... 
art_workbook_base  cpp0x          g4pii        mvapich2         ... 
boost              cppunit        g4radiative  python           ... 
caencomm           ds50daq        g4surface    root             ... 

Each directory name in these listings corresponds to the name of a UPS product. If you are on a different experiment, the precise contents of your experiment’s product directory may be slightly different. Among other things, both databases contain a subdirectory named ups1 ; this is for the UPS system itself. In this sense, all these products, including art, toyExperiment and even the product(s) containing your experiment’s code, regard UPS as just another external product. PICT PICT

7.2 UPS Handling of Variants of a Product

An important feature of UPS is its capacity to make multiple variants of a product available to users. This of course includes different versions, but beyond that, a given version of a product may be built more than one way, e.g., for use by different operating systems (what UPS distinguishes as flavors). For example, a product might be built once for use with SLF5 and again for use with SLF6. A product may be built with different versions of the C++ compiler, e.g., with the production version and with a version under test. A product may be built with full compiler optimization or with the maximum debugging features enabled. Many variants can exist. UPS provides a way to select a particular build via an idea named qualifiers.

The full identifier of a UPS product includes its product name, its version, its flavor and its full set of qualifiers. In Section 7.3, you will see how to fully identify a product when you set it up.

7.3 The setup Command: Syntax and Function

Any given UPS database contains several to many, many products. To select a product and make it available for use, you use the setup command.

In most cases the correct flavor can be automatically detected by setup and need not be specified. However, if needed, flavor, in addition to various qualifiers and options can be specified. These are listed in the UPS documentation referenced later in this section. The version, if specified, must directly follow the product name in the command line, e.g.,:

setup options product-name product-version -f flavor  -q qualifiers

Putting in real-looking values, it would look something like:

setup -R myproduct v3_2 -f SLF5 -q BUILD_A

What does the setup command actually do? It may do any or all of the following:

  • define some environment variables PICT PICT
  • define some bash functions
  • define some aliases
  • add elements to your PATH
  • setup additional products on which it depends

Setting up dependent products works recursively. In this way, a single setup command may trigger the setup of, say, 15 or 20 products.

When you follow a given site-specific setup procedure, the PRODUCTS environment variable will be extended to include your experiment-specific UPS repository.

setup is a bash function (defined by the UPS product when it was initialized) that shadows a Unix system-configuration command also named setup, usually found in /usr/bin/setup or /usr/sbin/setup. PIC Running the right ‘setup’ should work automatically as long as UPS is properly initialized. If it’s not, setup returns the error message:

You are attempting to run ‘‘setup which requires administrative 
privileges, but more information is needed in order to do so.

If this happens, the simplest solution is to log out and log in again. Make sure that you carefully follow the instructions for doing the site specific setup procedure.

Few people will need to know more than the above about the UPS system. Those who do can consult the full UPS documentation at:


7.4 Current Versions of Products

For some UPS products, but not all, the site administrator may define a particular fully-qualified version of the product as the default version. In the language of UPS this notion of default is called the current version. If a current version has been defined for a product, you can set up that product with the command:

setup product-name

When you run this, the UPS system will automatically insert the version and qualifiers of the version that has been declared current.

Having a current version is a handy feature for products that add convenience features to your interactive environment; as improvements are added, you automatically get them.

However the notion of a current version is very dangerous if you want to ensure that software built at one site will build in exactly the same way on all other sites. For this reason, the Workbook fully specifies PIC the version number and qualifiers of all products that it requires; and in turn, the products used by the Workbook make fully qualified requests for the products on which they depend.

7.5 Environment Variables Defined by UPS

When your login script or site-specific setup script initializes UPS, it defines many environment variables in addition to PRODUCTS (Section 7.1), one of which is UPS_DIR, the root directory of the currently selected version of UPS. The script also adds $UPS_DIR/bin to your PATH, which makes some UPS-related commands visible to your shell. Finally, it defines the bash function setup (see Sections 4.8 and 7.3). When you use the setup command, as illustrated below, it is this bash function that does the work.

In discussing the other important variables, the toyExperiment product will be used as an example product. For a different product, you would replace “toyExperiment” or “TOYEXPERIMENT” in the following text by the product’s name. Once you have followed your appropriate setup procedure (Table 5.1) you can issue the following command this is informational for the purposes of this section; you don’t need to do it until you start running the first Workbook exercise): PICT PICT

setup toyExperiment v0_00_29 -qe2:prof

The version and qualifiers shown here are the ones to use for the Workbook exercises. When the setup command returns, the following environment variables will be defined:

defines the root DIRectory of the chosen UPS product
defines the path to the root directory of the C++ header files that are provided by this product (so called because the header files are INCluded)
defines the directory that contains all of the dynamic object LIBraries (ending in .so) that are provided by this product

Almost all UPS products that you will use in the Workbook define these three environment variables. Several, including toyExperiment, define many more. Once you’re running the exercises, you will be able to see all of the environment variables defined by the toyExperiment product by issuing the following command:

printenv | grep TOYEXPERIMENT

Many software products have version numbers that contain dot characters. UPS requires that version numbers not contain any dot characters; by convention, version dots are replaced with underscores. Therefore PIC v0.00.14 becomes v0_00_14. Also by convention, the environment variables are all upper case, regardless of the case used in the product names.

7.6 Finding Header Files

7.6.1 Introduction

Header files were introduced in Section 6.4.2. Recall that a header file typically contains the “parts list” for its associated .cc source file and is “included” in the .cc file.

The software for the Workbook depends on a large number of external products; the same is true, on an even larger scale, for the software in your experiment. The preceeding sections in this PICT PICT chapter discussed how to establish a working environment in which all of these software products are available for use.

When you are working with the code in the Workbook, and when you are working on your experiment, you will frequently encounter C++ classes and functions that come from these external products. An important skill is to be able to identify them when you see them and to be able to follow the clues back to their source and documentation. This section will describe how to do that.

An important aid to finding documentation is the use of namespaces; if you are not familiar with namespaces, consult the standard C++ documentation.

7.6.2 Finding art Header Files

This subsection will use the example of the class art::Event to illustrate how to find header files from the art UPS product; this will serve as a model for finding header files from most other UPS products.

The class that holds the art abstraction of an HEP event is named, art::Event; that is, the class Event is in the namespace art. In fact, all classes and functions defined by art are in the namespace art. The primary reason for this is to minimize the chances of accidental name collisions between art and other codes; but it also serves a very useful documentation role and is one of the clues you can use to find header files.

If you look at code that uses art::Event you will almost always find that the file includes the following header file:

1#include ~art/Framework/Principal/Event.h~  

The art UPS product has been designed so that the relative path used to include any art header file starts with the directory art; this is another clue that the class or function of interest is part of art.

When you setup the art UPS product, it defines the environment variable ART_INC, which points to the root of the header file tree for art. You now have enough information to discover where to find the header file for art::Event; it is at PICT PICT


You can follow this same pattern for any class or function that is part of art. This will only work if you are in an environment in which ART_INC has been defined, which will be described in Chapters 9 and 10.

If you are new to C++, you will likely find this header file difficult to understand; you do not need to understand it when you first encounter it but, for future reference, you do need to know where to find it.

Earlier in this section, you read that if a C++ file uses art::Event, it would almost always include the appropriate header file. Why almost always? Because the header file Event.h might already be included within one of the other headers that are included in your file. If Event.h is indirectly included in this way, it does not hurt also to include it explicitly, but it is not required that you do so.2

We can summarize this discussion as follows: if a C++ source file uses art::Event it must always include the appropriate header file, either directly or indirectly.

art does not rigorously follow the pattern that the name of file is the same as the name of the class or function that it defines. The reason is that some files define multiple classes or functions; in most such cases the file is named after the most important class that it defines.

Finally, from time to time, you will need to dig through several layers of header files to find the information you need.

There are two code browsing tools that you can use to help navigate the layering of header files PICT PICT and to help find class declarations that are not in a file named for the class:

  1. use the art redmine(γ) repository browser:
  2. use the LXR code browser: http://cdcvs.fnal.gov/lxr/art/

(In the above, both URLs are live links.)

7.6.3 Finding Headers from Other UPS Products

Section 3.7 introduced the idea that the Workbook is built around a UPS product named toyExperiment, which describes a made-up experiment. All classes and functions defined in this UPS product are defined in the namespace tex, which is an acronym-like shorthand for toyExperiment (ToyEXperiment). (This shorthand makes it (a) easier to focus on the name of each class or function rather than the namespace and (b) quicker to type.)

One of the classes from the toyExperiment UPS product is tex::GenParticle, which describes particles created by the event generator, the first part of the simulation chain (see Section 3.7.2). The include directive for this class looks like

1#include ~toyExperiment/MCDataProducts/GenParticle.h~  

As for headers included from art, the first element in the relative path to the included file is the name of the UPS product in which it is found. Similarly to art, the header file can be found using the environment variable TOYEXPERIMENT_INC:


With a few exceptions, discussed in Section 7.6.4, if a class or function from a UPS product is used in the Workbook code, it will obey the following pattern:

  1. The class will be in a namespace that is unique to the UPS product; the name of the namespace may be the full product name or a shortened version of it. PICT PICT
  2. The lead element of the path specified in the include directive will be the name of the UPS product.
  3. The UPS product setup command will define an environment variable named
    PRODUCT-NAME_INC, where PRODUCT-NAME is in all capital letters.

Using this information, the name of the header file will always be


This pattern holds for all of the UPS products listed in Table 7.1.



UPS Product Namespace

art art
boost boost
cet cetlib
clhep CLHEP
fhiclcpp fhicl
messagefacility mf
toyExperiment tex


A table listing git- and LXR-based code browsers for many of these UPS products can be found near the top of the web page:

7.6.4 Exceptions: The Workbook, ROOT and Geant4

There are three exceptions to the pattern described in Section 7.6.3:

  • the Workbook itself
  • ROOT
  • Geant4

The Workbook is so tightly coupled to the toyExperiment UPS product that all classes in the Workbook are also in its namespace, tex. Note, however, that classes from the Workbook and the toyExperiment UPS product can still be distinguished by the leading element of the relative path found in the include directives for their header files:

  • art-workbook for the Workbook
  • toyExperiment for the toyExperiment

The ROOT package is a CERN-supplied software package that is used by art to write data to disk files and to read it from disk files. It also provides many data analysis and data presentation tools that are widely used by the HEP community. Major design decisions for ROOT were frozen before namespaces were a stable part of the C++ language, therefore ROOT does not use namespaces. Instead ROOT adopts the following PICT PICT conventions:

  1. All class names by defined by ROOT start with the capital letter T followed by another upper case letter; for example, TFile, TH1F, and TCanvas.
  2. With very few exceptions, all header files defined by ROOT also start with the same pattern; for example, TFile.h, TH1F.h, and TCanvas.h.
  3. The names of all global objects defined by ROOT start with a lower case letter g followed by an upper case letter; for example gDirectory, gPad and gFile.

The rule for writing an include directive for a header file from ROOT is to write its name without any leading path elements:

1 #include ~TFile.h~  

All of the ROOT header files are found in the directory that is pointed to by the environment variable $ROOT_INC. For example, to see the contents of this file you could enter:

less $ROOT_INC/TFile.h

Or you can the learn about this class using the reference manual at the CERN web site: http://root.cern.ch/root/html534/ClassIndex.html

You will not see theGeant4 package in the Workbook but it will be used by the software for your experiment, so it is described here for completeness. Geant4 is a toolkit for modeling the propagation particles in electromagnetic fields and for modeling the interactions of particles with matter; it is the core of all detector simulation codes in HEP and is also widely used in both the Medical Imaging community and the Particle Astrophysics community.

As with ROOT, Geant4 was designed before namespaces were a stable part of the C++ language. Therefore Geant4 adopted the following conventions.

  1. The names of all identifiers begin with G4; for example, G4Step and G4Track. PICT PICT
  2. All header file names defined by Geant4 begin with G4; for example, G4Step.h and G4Track.h.

Most of the header files defined by Geant4 are found in a single directory, which is pointed to by the environment variable G4INCLUDE.

The rule for writing an include directive for a header file from Geant4 is to write its name without any leading path elements:

#include ~G4Step.h~

The workbook does not set up a version of Geant4; therefore G4INCLUDE is not defined. If it were, you would look at this file by:

less $G4INCLUDE/G4Step.h

Both ROOT and Geant4 define many thousands of classes, functions and global variables. In order to avoid collisions with these identifiers, PIC do not define any identifiers that begin with any of (case-sensitive):

  • T, followed by an upper case letter
  • g, followed by an upper case letter
  • G4


Part II


Chapter 8
Preparation for Running the Workbook Exercises

8.1 Introduction

The Workbook exercises can be run in several environments:

  1. on a computer that is maintained by your experiment, either at Fermilab or at another institution.
  2. on one of the computeres supplied for the art/LArSoft course.
  3. on your own computer on which you install the necessary software. For details see Appendix B.

Many details of the working environment change from site to site1 and these differences are parameterized so that (a) it is easy to establish the required environment, and (b) the Workbook exercises behave the same way at all sites. In this chapter you will learn how to find and log into the right machine remotely from your local machine (laptop or desktop), and make sure it can support your Workbook work.

8.2 Getting Computer Accounts on Workbook-enabled Machines

In order to run the exercises in the Workbook, you will need an account on a machine that can access your site’s installation of the Workbook code. The experiments provide instructions for getting computer accounts on their machines (and various other information for new users) on PICT PICT web pages that they maintain, as listed in Table 8.1. The URLs in the table are live hyperlinks.

Currently, each of the experiments using art has installed the Workbook code on one of its experiment machines in the Fermilab General Purpose Computing Farm (GPCF). PIC

At time of writing, the new-user instructions for all LArSoft-based experiments are at the LArSoft site; there are no separate instructions for each experiment.

If you are planning to take the art/LArSoft course, see the course web site to learn how to get an account on the machines reserved for the course.

If you would like a computer account on a Fermilab computer in order to evaluate art, contact the art team (see Section 3.4).

8.3 Choosing a Machine and Logging In

The experiment-specific machines confirmed to host the Workbook code are listed in Table 8.2 In most cases the name given is not the name of an actual computer, but rather a round-robin alias for a cluster. For example, if you log into mu2evm, you will actually be connected to one of the five computers mu2egpvm01 through mu2egpvm05. These Mu2e machines share all disks that are relevant to the Workbook exercises, so if you need to log in multiple times, it is perfectly OK if you are logged into two different machines; you will still see all of the same files.



Experiment Name of Login Node

ArgoNeut argoneutvm.fnal.gov
Darkside ds50.fnal.gov
DUNE lbnevm.fnal.gov
MicroBoone uboonevm.fnal.gov
Muon g-2 gm2gpvm.fnal.gov
Mu2e mu2egpvm0x.fnal.gov, for x=1,2,3,4,5
NOνA nova-offline.fnal.gov

art/LArSoft Course alcourse.fnal.gov


Each experiment’s web page has instructions on how to log in to its computers from your local machine.

8.4 Launching new Windows: Verify X Connectivity

Some of the Workbook exercises will launch an X window from the remote machine that opens in your local machine. To test that this works, type xterm &:

xterm &

This should, without any messages, give you a new command prompt. After a few seconds, a new shell window should appear on your laptop screen; if you are logging into a Fermilab computer from a remote site, this may take up to 10 seconds. PIC If the window does not appear, or if the command issues an error message, contact a computing expert on your experiment.

To close the new window, type exit at the command prompt in the new window:


If you have a problem with xterm, it could be a problem with your Kerberos and/or ssh configurations. PIC Try logging in again with ssh -Y.

8.5 Choose an Editor

As you work through the Workbook exericses you will need to edit files. Familiarize yourself with one of the editors available on the computer that is hosting the Workbook. Most Fermilab computers offer four reasonable choices: emacs, vi, vim and nedit. Of these, nedit is probably the most intuitive and user-friendly. All are very powerful once you have learned to use them. Most other sites offer at least the first three choices. You can always contact your local system administrator to suggest that other editors be installed. PICT PICT

A future version of this documentation suite will include recommended configurations for each editor and will provide links to documentation for each editor. PICT PICT

Chapter 9
Exercise 1: Running Pre-built art Modules

9.1 Introduction

In this first exercise of the Workbook, you will be introduced to the FHiCL(γ) configuration language and you will run art on several modules that are distributed as part of the toyExperiment UPS product. You will not compile or link any code.

9.2 Prerequisites

Before running any of the exercises in this Workbook, you need to be familiar enough with the material discussed in Part I (Introduction) of this documentation set and with Chapter 8 to be able to find information as needed.

If you are following the instructions below on an older Mac computer (OSX 10.6, Snow Leopard, or earlier), and if you are reading the instructions from a PDF file, be aware that if you use the mouse or trackpad to cut and paste text from the PDF file into your terminal window, the underscore characters will be turned into spaces. You PIC will have to fix them before the commands will work.

9.3 What You Will Learn

In this exercise you will learn:

  • how to use the site-specific setup procedure, which you must do once at the start of each login session.
  • a little bit about the art run-time environment (Section 9.4)
  • how to set up the toyExperiment UPS product (Section 9.6.1)
  • how to run an art job (Section 9.6.1)
  • how to control the number of events to process (Section 9.8.4)
  • how to select different input files (Section 9.8.5)
  • how to start at a run, subRun or event that is not the first one in the file (Section 9.8.6)
  • how to concatenate input files (Section 9.8.5)
  • how to write an output file (Section 9.8.9)
  • some basics about the grammar and structure of a FHiCL file (Section 9.8 )
  • how art finds modules and configuration (FHiCL) files. (Sections 9.10 and 9.11)

9.4 The art Run-time Environment

This discussion is aimed to help you understand the process described in this chapter as a whole and how the pieces fit together in the art run-time environment. This environment is summarized in Figure 9.1. In this figure the boxes refer either to locations in memory or to files on a disk.



At the center of the figure is a box labelled “art executable;” this represents the art main program resident in memory after being loaded. When the art executable starts up, it reads its run-time configuration (FHiCL) file, represented by the box to its left. Following instructions from the configuration file, art will load dynamic libraries from toyExperiment, from art, from ROOT, from CLHEP and from other UPS products. All of these dynamic libraries (.so or .dylib files) will be found in the appropriate UPS products in LD_LIBRARY_PATH (DYLD_LIBRARY_PATH for OS X), which points to directories in the UPS products area (box at upper right). Also following instructions from the FHiCL file, art will look for input files (box labeled “Event-data input files” at right). The FHiCL file will tell art to write its event-data and histogram output files to a particular directory (box at lower right).

One remaining box in the figure (at right, second from bottom) is not encountered in the first Workbook exercise but has been provided for completeness. In most art jobs it is necessary to access experiment-related geometry and conditions information; in a mature experiment, these are usually stored in a database that stands apart from the other elements in the picture.

The arrows in Figure 9.1 show the direction in which information flows. Everything but the output flows into the art executable.

9.5 The Input and Configuration Files for the Workbook Exercises

Several event-data input files have been provided for use by the Workbook exercises. These input files are packaged as part of the toyExperiment UPS product. Table 9.1 lists the range of event IDs found in each file. You will need to refer back to this table as you proceed.



File Name Run SubRun Range of Event Numbers

input01.art 1 0 110
input02.art 2 0 110
input03.art 3 0 15
3 1 15
3 2 15
input04.art 4 0 11000


A run-time configuration (FHiCL) file has been provided for each exercise. For Exercise 1 it is hello.fcl.

9.6 Setting up to Run Exercise 1

9.6.1 Log In and Set Up

The intent of this section is for the reader to start from “zero” and execute an art job, without necessarily understanding each step, just to get familiar with the process. A detailed discussion of what these steps do will follow in Section 9.9.

Some steps are written as statements, others as commands to issue at the prompt. Notice that art takes the argument -c hello.fcl; this points art to the run-time configuration file that will tell it what to do and where to find the “pieces” on which to operate.

Most readers: Follow the steps in Section, then proceed directly to Section 9.7.

If you wish to manage your working directory yourself, skip Section, follow the PIC steps in Section, then proceed to Section 9.7.

If you log out and wish to log back in to continue this exercise, follow the procedure outlined in Section Initial Setup Procedure using Standard Directory


SVG-Viewer needed.

Proceed to Section 9.7. Initial Setup Procedure allowing Self-managed Working Directory


SVG-Viewer needed.


SVG-Viewer needed.


Proceed to Section 9.7. Setup for Subsequent Exercise 1 Login Sessions

If you log out and later wish to log in again to work on this exercise, you need to do the folllowing:

SVG-Viewer needed.


Compare this with the list given in Section 9.6.1. You will see that three steps are missing because they only need to be done the first time.

You are now ready to run art as you were before.

9.7 Execute art and Examine Output

From your working directory, execute art on the FHiCL file hello.fcl and send the output to output/hello.log:

art -c hello.fcl >& output/hello.log

Compare the ouptut you produced (in the file output/hello.log) against Listing 9.1; the only differences should be the timestamps and some line breaking. art will have processed the first file listed in Table 9.1.

Listing 9.1: Sample output from running hello.fcl
%MSG-i MF_INIT_OK:  art 27-Apr-2013 21:22:13 CDT JobSetup 
Messagelogger initialization complete. 
27-Apr-2013 21:22:14 CDT  Initiating request to open file 
27-Apr-2013 21:22:14 CDT  Successfully opened file 
Begin processing the 1st record. run: 1 subRun: 0 event: 1 at 
27-Apr-2013 21:22:14 CDT 
Hello World! This event has the id: run: 1 subRun: 0 event: 1 
Begin processing the 2nd record. run: 1 subRun: 0 event: 2 at 
27-Apr-2013 21:22:14 CDT 
Hello World! This event has the id: run: 1 subRun: 0 event: 2 
Hello World! This event has the id: run: 1 subRun: 0 event: 3 
Hello World! This event has the id: run: 1 subRun: 0 event: 4 
Hello World! This event has the id: run: 1 subRun: 0 event: 5 
Hello World! This event has the id: run: 1 subRun: 0 event: 6 
Hello World! This event has the id: run: 1 subRun: 0 event: 7 
Hello World! This event has the id: run: 1 subRun: 0 event: 8 
Hello World! This event has the id: run: 1 subRun: 0 event: 9 
Hello World! This event has the id: run: 1 subRun: 0 event: 10 
27-Apr-2013 21:22:14 CDT  Closed file inputFiles/input01.art 
TrigReport ---------- Event  Summary ------------ 
TrigReport Events total = 10 passed = 10 failed = 0 
TrigReport ------ Modules in End-Path: e1 ------------ 
TrigReport  Trig Bit#    Visited     Passed     Failed      Error Name 
TrigReport     0    0         10         10          0          0 hi 
TimeReport ---------- Time  Summary ---[sec]---- 
TimeReport CPU = 0.004000 Real = 0.002411 
Art has completed and will exit with status 0.

Every time you run art, the first thing to check is the last line in your output or log file. It should be Art has completed and will exit with status 0. If the status is not 0, or if this line is missing, it is an error; please contact the art team as described in Section 3.4.

A future version of these instructions will specify how much disk space is needed, including space PICT PICT for all ouptut files.

9.8 Understanding the Configuration

The file hello.fcl shown in Listing 9.2 gives art its run-time configuration.

Listing 9.2: Listing of hello.fcl
1#include ~fcl/minimalMessageService.fcl~ 
3process_name : hello 
5source : { 
6    module_type : RootInput 
7    fileNames   : [ ~inputFiles/input01.art~ ] 
8  } 
10services : { 
11    message : @local::default_message 
12  } 
14physics :{ 
15    analyzers: { 
16      hi : { 
17        module_type : HelloWorld 
18      } 
19  } 
21  e1        : [ hi ] 
22  end_paths : [ e1 ] 

This file is written in the Fermilab Hierarchical Configuration Language (FHiCL, pronounced “fickle”), a language that was developed at Fermilab to support run-time configuration for several projects, including art. By convention, files written in FHiCL end in .fcl. As you work through the Workbook, the features of FHiCL that are relevant for each exericse will be explained.

art accepts some command line options that can be used in place of items in the FHiCL file. You will encounter some of these in this section.

The full details of the FHiCL language, plus the details of how it is used by art, are given in the Users Guide, Chapter 24. PIC Most people will find it much easier to follow the discussion in the Workbook documentation than to digest the full documentation up front.

9.8.1 Some Bookkeeping Syntax

In a FHiCL file, the start of a comment is marked either by the hash sign character (#) or by a C++ style double slash (//); a comment may begin in any column. PICT PICT

The hash sign has one other use. If the first eight characters of a line are exactly #include, followed by whitespace and a quoted file path, then the line will be interpreted as an include directive and the line containing it will be replaced by the contents of the file named in the include directive.

The basic element of FHiCL is the definition, which has the form

1  name : value

A group of FHiCL definitions delimited by braces {} is called a table(γ). Within art, a FHiCL table gets turned into a C++ object called a parameter set(γ); this PIC document set will often refer to a FHiCL table as a parameter set.

The fragment of hello.fcl shown below contains the FHiCL table that configures the source(γ) of events that art will read in and operate on.

5source : { 
6  module_type : RootInput 
7  fileNames   : [ ~inputFiles/input01.art~ ] 

The name source (line 5, above) is an identifier in art; i.e., the name source has no special meaning to FHiCL but it does have a special meaning to art. To be precise, it only has a special meaning to art if it is at the outermost scope(γ) of a FHiCL file; i.e., not inside any braces {} within the file. When art sees a parameter set named source at the outermost scope, PIC it will interpret that parameter set to be the description of the source of events for this run of art. PICT PICT

Within the source parameter set, module_type (line 6) is an identifier in art that tells art the name of a module that it should load and run, RootInput in this case. RootInput is one of the standard source modules provided by art and it reads disk files containing event-data written in an art-defined ROOT-based format. The default behavior of the RootInput module is to start at the first event in the first file and read to the end of the last event in the last file.1

The string fileNames (line 7) is again an identifier, but this time defined in the RootInput module. It gives the input module a list of filenames from which to read events. The list is delimited by square brackets and contains a comma-separated list of filenames. This example shows only one filename, but the square brackets are still required. The proper FHiCL name for a comma-separated list delimited by square brackets is a sequence(γ).

In most cases the filenames in the sequence must be enclosed in quotes. FHiCL, like many other languages has the following rule: if a string contains white space or any special characters, then quoting it is required, otherwise quotes are optional.

FHiCL has its own set of special characters; these include anything except all upper and lower case letters, the numbers 0 through 9 and the underscore character. art restricts the use of the underscore character in some circumstances; these will be discussed as they arise.

It is implied in the foregoing discussion that a FHiCL value need not be a simple thing, such as a number or a quoted string. For example, in hello.fcl, the value of source is a parameter set (of two parameters) and the value of fileNames is a (single-item) PICT PICT sequence.

9.8.2 Some Physics Processing Syntax

The identifier physics(γ), when found at the outermost scope, is an identifier reserved to art. The physics parameter set is so named because it contains most of the information needed to describe the physics workflow of an art job.

The fragment of hello.fcl below shows a rather long-winded way of telling art to find a module named HelloWorld and execute it. Why so long-winded? art has very powerful features that enable execution of multiple complex chains of modules; the price is that specifying something simple takes a lot of keystrokes.

14physics :{ 
15  analyzers: { 
16    hi : { 
17      module_type : HelloWorld 
18          } 
19     } 
20  e1        : [ hi ] 
21  end_paths : [ e1 ] 

At the outermost scope of the FHiCL file, art will interpret the physics parameter set as the description of the physics workflow for this run of art. Within the physics parameter set, notice the identifier analyzers on line 15. When found as a top-level identifier within the physics scope, as shown here, it is recognized as a keyword reserved to art. The analyzers parameter set defines the run-time configuration for all of the analyzer modules that are part of the job — in this case, only HelloWorld (specified on line 17).

For our current purposes, the module HelloWorld does only one thing of interest, namely for every event it prints one line (shown here as three):

Hello World! This event has the id: run: <RR> 
                                    subRun: <SS> 
                                    event: <EE>

where RR, SS and EE are substituted with the actual run, subRun and event number of each event. PICT PICT

If you look back at Listing 9.1, you will see that this line appears ten times, once each for events 1 through 10 of run 1, subRun 0 (as expected, according to Table 9.1). The remainder of the listing is standard output generated by art.

On line 20, e1 (an arbitrary identifier) is called a path; it is a FHiCL sequence of module labels. On line 21, end_paths — an identifier reserved to art— is a FHiCL sequence of path names. Together, these two identifiers specify the workflow; this will be discussed in Section 9.8.8.

The remainder of the lines in hello.fcl appears below. Line 3 (different line number than in Listing 9.2), starting with process_name(γ), tells art that this job has a name and that the name is “hello”; it has no real significance in these simple exericses. However the name of the process must not contain any underscore characters; the reason for this restriction will be explained in Section 16.4.2.

1#include ~fcl/minimalMessageService.fcl~ 
3    process_name : hello 
5    services :  { 
6        message : @local::default_message 
7         }

The services parameter set (lines 5-7) provides the run-time configuration for all the required art services for the job, in this case only the message service. For our present purposes, it is sufficient to know that the configuration for the message service itself is found inside the file that is included in line 1. The message service controls the limiting and routing of debug, informational, warning and error messages generated by art or by user code; it does not control information written directly to std::cout or std::cerr.

9.8.3 art Command line Options

art supports some command line options. To see what they are, type the following command at PICT PICT the bash prompt:

art --help

Note that some options have both a short form and a long form. This is a common convention for Unix programs; the short form is convenient for interacive use and the long form makes scripts more readable. It is also a common convention that the short form of an option begins single dash character, while the long form of an option begins with two dash characters, for example --help above.

9.8.4 Maximum Number of Events to Process

By default art will read all events from all of the specified input files. You can set a maximum number of events in two ways, one way is from the command line:

art -c hello.fcl -n 5 >& output/hello-n5.log
art -c hello.fcl --nevts 4 >& output/hello-nevts4.log

Run each of these commands and observe their output.

The second way is within the FHiCL file. Start by making a copy of hello.fcl:

cp hello.fcl hi.fcl

Edit hi.fcl and add the following line anywhere in the source parameter set:

1maxEvents   : 3

By convention this is added after the fileNames definition but it can go anywhere inside the source parameter set because the order of parameters within a FHiCL table is not important. Run art again, using hi.fcl:

art -c hi.fcl >& output/hi.log

You should see output from the HelloWorld module for only the first three events. PICT PICT

To configure the file for art to process all the events, i.e., to run until art reaches the end of PIC the input files, either leave off the maxEvents parameter or give it a value of -1.

If the maximum number of events is specified both on the command line and in the FHiCL file, then the command line takes precedence. Compare the outputs of the following commands:

art -c hi.fcl >& output/hi2.log
art -c hi.fcl -n 5 >& output/hi-n5.log
art -c hi.fcl -n -1 >& output/hi-neg1.log

9.8.5 Changing the Input Files

For historical reasons, there are multiple ways to specify the input event-data file (or the list of input files) to an art job:

  • within the FHiCL file’s source parameter set
  • on the art command line via the -s option (you may specify one input file only)
  • on the art command line via the -S option (you may specify a text file that lists multiple input files)
  • on the art command line, after the last recognized option (you may specify one or more input files)

If input file names are provided both in the FHiCL file and on the command line, the command line takes PIC precedence.

Let’s run a few examples. PICT PICT

We’ll start with the -s command line option (second bullet). Run art without it (again), for comparison (or recall its output from Listing 9.1):

art -c hello.fcl >& output/hello.log

To see what you should expect given the following input file, check Table 9.1, then run:

art -c hello.fcl -s inputFiles/input02.art >& output/hello-s.log

Notice that the ten events in this output are from run 2 subRun 0, in contrast to the previous printout which showed events from run 1. Notice also that the command line specification overrode that in the FHiCL file. The -s (lower case) command line syntax will only permit you to specify a single filename.

This time, edit the source parameter set inside the hi.fcl file (first bullet); change it to:

1  source : { 
2    module_type : RootInput 
3    fileNames   : [ ~inputFiles/input01.art~, 
4                    ~inputFiles/input02.art~ ] 
5    maxEvents   : -1 
6  }

(Notice that you also added maxEvents : -1.) The names of the two input files could have been written on a single line but this example shows that, in FHiCL, newlines are treated simply as white space.

Check Table 9.1 to see what you should expect, then rerun art as follows:

art -c hi.fcl >& output/hi-2nd.log

You will see 20 lines from the HelloWorld module; you will also see messages from art at the open and close operations on each input file.

Back to the -s command-line option, run:

art -c hi.fcl -s inputFiles/input03.art >& output/run3.log

This will read only inputFiles/input03.art and will ignore the two files specified in the hi.fcl. The output from the HelloWorld module will be the 15 events from the three subRuns of run 3.

There are several ways to specify multiple files at the command line. One choice is to use the -S PICT PICT (upper case) [--source-list] command line option (third bullet) which takes as its argument the name of a text file containing the filename(s) of the input event-data file(s). An example of such as file has been provided, inputs.txt. Look at the contents of this file:

cat inputs.txt

Now run art using inputs.txt to specify the input files:

art -c hi.fcl -S inputs.txt >& output/file010203.log

You should see the HelloWorld output from the 35 events in the three files; you should also see the messages from art about the opening and closing of the three files.

Finally, you can list the input files at the end of the art command line (fourth bullet):

art -c hi.fcl inputFiles/input02.art inputFiles/input03.art >&\

(Remember the Unix convention about a trailing backslash marking a command that continues on another line; see Chapter 2. ) In this case you should see the HelloWorld output from the 25 events in the two files.

In summary, there are three ways to specify input files from the command line; all of them override any input files specified in the FHiCL file. Do not try to use two or more of these methods on a single PIC art command line; the art job will run without issuing any messages but the output will likely be different than you expect.

9.8.6 Skipping Events

The source parameter set supports a syntax to start execution at a given event number or to skip a given number of events at the start of the job. Look, for example, at the file PICT PICT skipEvents.fcl, which differs from hello.fcl by the addition of two lines to the source parameter set:

1  firstEvent  : 5 
2  maxEvents   : 3

art will process events 5, 6, and 7 of run 1, subRun 0. Try it:

art -c skipEvents.fcl >& output/skipevents1.log

An equivalent operation can be done from the command line in two different ways. Try the following two commands and compare the output:

art -c hello.fcl -e 5 -n 3 >& output/skipevents2.log
art -c hello.fcl --nskip 4 -n 3 >& output/skipevents3.log

You can also specify the intial event to process relative to a given event ID (which, recall, contains the run, subRun and event number). Edit hi.fcl and edit the source parameter set as follows:

1  source : { 
2    module_type : RootInput 
3    fileNames   : [ ~inputFiles/input03.art~ ] 
4    firstRun    : 3 
5    firstSubRun : 1 
6    firstEvent  : 6 
7  }

When you run this job, art will process events starting from run 3, subRun 2, event 1 — because there are only five events in subRun 1.

art -c hi.fcl >& output/startatrun3.log

9.8.7 Identifying the User Code to Execute

Recall from Section 9.8.2 that the physics parameter set contains the physics content for the art job. Within this parameter set, art must be able to determine which (user code) modules to process. These must be referenced via module labels(γ), which as you will see, represent the pairing of a module name and a run-time configuration. PICT PICT

Look back at the listing on page 207, which contains the physics parameter set from hello.fcl. The analyzer parameter set, nested inside the physics parameter set, contains the definition:

hi : { 
  module_type : HelloWorld 

The identifier hi is a module label (defined by the user, not by FHiCL or art) whose value must be a parameter set that art will use to configure a module. The parameter set for a module label must contain (at least) a FHiCL definition of the form:

module_type : best-module-name

  Here module_type is an identifier reserved to art and best-module-name tells art the name of the module to load and execute. (Since it is within the analyzer parameter set, the module must be of type EDAnalyzer; i.e., the base type of best-module-name must be EDAnalyzer.)

Module labels are fully described in Section 24.5.

In this example art will look for a module named HelloWorld, which it will find as part of the toyExperiment product. Section 9.10 describes how art uses best-module-name to find the dynamic library that contains code for the HelloWorld module. A parameter set that is used to configure a module may contain additional lines; if present, the meaning of those lines is understood by the module itself; those lines have no meaning either to art or to FHiCL.

Now look at the FHiCL fragment below that starts with analyzers. We will use it to reinforce some of the ideas discussed in the previous paragraph.

art allows you to write a FHiCL file that uses a given module more than once. For example you may want to run an analysis twice, once with a loose mass cut on some intermediate state and PICT PICT once with a tight mass cut on the same intermediate state. In art you can do this by writing one module and making the cuts “run-time configurable.” This idea will be developed further in Chapter 15.

1   analyzers : { 
3        loose: { 
4        module_type : MyAnalysis 
5        mass_cut : 20. 
6        } 
8        tight: { 
9        module_type : MyAnalysis 
10        mass_cut : 15. 
11        } 
12    }

When art processes this fragment it will look for a module named MyAnalysis (lines 4 and 9) and instantiate it twice, once using the parameter set labeled loose (line 3) and once using the parameter set labeled tight (line 8). The two instances of the module MyAnalysis are distinguished by their different module labels, tight and loose.

art requires that module labels be unique within a FHiCL file. Module labels may contain only upper- PIC and lower-case letters and the numerals 0 to 9.

In the FHiCL files in this exercise, all of the modules are analyzer modules. Since analyzers do not make data products, these module labels are nothing more than identifiers inside the FHiCL file. For producer modules, however, which do make data products, the module label becomes part of the data product identifier and therefore has a real signficance. All module labels must conform to the same naming rules.

Within art there is no notion of reserved names or special names for module labels; however your experiment will almost certainly have established some naming conventions.

9.8.8 Paths and the art Workflow

In the physics parameter set in hello.fcl the two parameter definitions shown below, taken together, specify the workflow of the art job. Workflow refers to the modules art should run and the order in which to PICT PICT run them.2

1physics { 
2  ... 
3  e1        : [ hi ] 
4  end_paths : [ e1 ] 

In this exercise there is only one module to run (the analyzer HelloWorld with the label hi from Section ??), so the workflow is trivial: for each event, run the module with the label hi. As you work through the Workbook you will encounter workflows that are more complex and they will be described as you encounter them.

The FHiCL parameter e1 is called a path. A path is simply a FHiCL sequence of module labels. The name of a path can be any user-defined name that satisfies the following:

  1. It must be defined as part of the physics parameter set, i.e., “at physics scope”.
  2. It must be a valid FHiCL name.
  3. It must be unqiue within the art job.
  4. It must NOT be one of the following five names that are reserved to art: analyzers, filters, producers, end_paths and trigger_paths.

An art job may contain many paths, each of which is a FHiCL sequence of module labels. When PICT PICT many groups are working on a common project, this helps to maximize the independence of each work group.

Recall from Section ?? the parameter end_paths is not itself a path. Rather it is a FHiCL sequence of path names. It is the end_paths parameter that tells art the workflow it should execute.

Note that any path listed in the end_paths parameter may only contain module labels for analyzer and output modules. PIC A similar mechansim is used to specify the workflow of producer and filter modules; that mechanism will be discussed when you encounter it. If you need a reminder about the types of modules, see Section 3.6.3.

If the end_paths parameter is absent or defined as an empty FHiCL sequence,

1  end_paths : [ ]

both of which are allowable, art will understand that this job has no analyzer modules and no output modules to execute.

As is standard in FHiCL, if the definition of end_paths appears more than once, the last definition takes precendence.

The notion of path introduced in this section is the third thing in the art documentation suite that is called a path. The other two, as you may recall from Section 4.6, are the notion of a path in a filesystem and the notion of an environment variable that is a colon-delimited set of directory names. The use should be clear from the context; if it is not, please let the authors of the Workbook know; see Section 3.4.

The above description is intended to be sufficient for completing the Workbook exercises. If you want to learn more, now or later, Section provides more detail. Paths and the art Workflow: Details

This section is optional; it contains more details about the material just described in Section 9.8.8. It is not really a “dangerous bend” section for experts — just a side trip. PICT PICT

Exercise 1 is not rich enough to illustrate how to specify an art workflow, so let’s construct a richer example.

Suppose that there are two groups of people working on a large collaborative project, the project leaders are Anne and Rob. Each group has a workflow that requires running five or six module instances; some of the module instances may be in the workflow for both groups. Recall that an instance of a module refers to the name of a module plus its parameter set, and a module instance is specified by giving its module label. For this example let’s have eight module instances with the unimaginative names a through h. The workflow for this example might look something like:

1   anne      : [ a, b, c, d, e, h] 
2   rob       : [ a, b, f, c, g ] 
3   end_paths : [ anne, rob ]

That is, Anne defines the modules that her group needs to run and Rob defines the modules that his group needs to run. Anne and Rob do not need to know anything about each other’s list. The parameter definitions anne and rob are called paths; each is a list of module labels. The rules for legal path names were given in Section 9.8.8.

The parameter named end_paths is not itself a path, rather it is a FHiCL sequence of paths. Moreover it has a special meaning to art. During art’s initialization phase, it needs to learn the workflow for the job. The first step is to find the parameter named end_paths, defined within the physics parameter set. When art processes the definition of end_paths it will form the set of all module labels found in the contributing paths, with any duplicates removed. For this example, the list might look something like: [a, b, c, d, e, h, f, g] . When art processes an event, this is the set of module instances that it will execute. The order in which the module instances are executed is discussed in Section

The above machinery probably seems a little heavyweight for the example given. But consider a workflow like that needed to design the trigger for the CMS experiment, which requires about 200 paths and many hundreds of modules. Finding the set of unique modules labels is not a task that is best done by hand! By introducing the idea of paths, the design allows each group to focus on its own work, unaffected by the other groups. PICT PICT

Actually, the above is only part of the story: the module labels given in the paths anne and rob may only be the labels of analyzer or output modules. There is a parallel mechanism to specify the workflow for producer and filter modules.

To illustrate this parallel mechanism let’s continue the above example of two work groups led by Rob and Anne. In this case let there be filter modules with labels given by, f0, f1, f2  and producer modules with labels given by p0, p1, p2 . In this example, a workflow might look something like:

1   t_anne        : [ p0, p1, p2, f0, p3, f1 ] 
2   t_rob         : [ p0, p1, f2, p2, f0, p4 ] 
3   trigger_paths : [ t_anne, t_rob ] 
5   e_anne        : [ a, b, c, d, e ] 
6   e_rob         : [ a, b, f, c, g ] 
7   end_paths     : [ e_anne, e_rob ]

Here the parameters t_anne, e_anne, t_rob, and e_rob are all the names of paths. All must be be legal FHiCL parameter names, be unique within an art job and not conflict with identifiers reserved to art at physics scope. In this example the path names are prefixed with t_ for paths that will be put into the trigger_paths parameter and with e_ for paths that will be put into the end_paths parameter. This is just to make it easier for you to follow the example; the prefixes have no intrinsic meaning.

During art’s initialization phase it processes trigger_paths in the same way that it processes end_paths: it forms the set of all module labels found in the contributing paths, with duplicates removed. Again, the order of execution is discussed in Section

Now, what happens if you define a path with a mix of modules from the two groups? It might look like this:

1   bad_path      : [ p0, p1, p2, f0, p3, f1, a, b ] 
2   end_paths     : [ e_anne, e_rob, bad_path ]

In this case art (not FHiCL) will recognize that producer and filter modules are specified in a path that contributes to end_paths; art will then print a diagnostic message and stop. This will occur very early in art’s initialization phase so you will get reasonably prompt feedback. Similarly, if art discovers analyzer or output modules in any of the paths contributing to trigger_paths, it will print a diagnostic message and stop. PICT PICT

Furthermore, if you put a module label into either end_paths or trigger_paths, art will print a diagnostic message and stop. This is also true if you put a path name into the definition of another path art.

Now it’s time to define two really badly chosen names:3 trigger paths and end paths, first without underscores. In the above fragment the paths prefixed with t_ are called trigger paths, without an underscore; they are so named because they contain module labels for only producer and filter modules; therefore they are paths that satisfy the rules for inclusion in the definition of trigger_paths parameter.

Similarly the paths prefixed with e_ are called end paths because they satisfy the rules for inclusion in the definition of end_paths parameter.

This documentation will try to avoid avoid confusion between trigger paths and trigger_paths, and betweenend paths and end_paths. Order of Module Execution

If the trigger_paths parameter contains a single trigger path, then art will execute the modules in that trigger path in the order that they are specified.

When more than one trigger path is present in trigger_paths, art will choose one of the trigger paths and execute its module instances in order. It will then choose a second trigger path. If any module instances in this path were already executed in the first trigger path, art will not execute them a second time; it will execute the remaining module instances in the order specified by the second trigger path. And so on for any remaining trigger paths.

The rules for order of execution of module instances named in an end path are different. Since PICT PICT analyzer and output modules may neither add new information to the event nor communicate with each other except via the event, the processing order is not important. By definition, then, art may run analyzer and output modules in any order. In a simple art job with a single path, art will, in fact, run the modules in the order of PIC appearance in the path, but do not write code that depends on execution order because art is free to change it.

9.8.9 Writing an Output File

The file writeFile.fcl gives an example of writing an output file. Open the file in an editor and find the parts of the file that are discussed below.

Output files are written by output modules; one module can write one file. An art job may run zero or more output modules.

If you wish to add an output module to an art job there three steps:

  1. Create a parameter set named outputs at the outermost scope of the FHiCL file. The name outputs is prescribed by art.
  2. Inside the outputs parameter set, add a parameter set to configure an output module. In writeFile.fcl this parameter set has the module label output1.
  3. Add the module label of the output module to an end path ( not to the end_paths parameter but to one of the paths that is included in end_paths). In writeFile.fcl the module label output1 is added to the end path e1.

If you wish to add more output modules, repeat steps 2 and 3 for each additional output file.

The parameter set output1 tells art to make a module whose type is RootOutput. The class RootOutput is a standard module that is part of art and that writes events from memory to a disk file in an art-defined, ROOT-based format. The fileName parameter specifies the name PICT PICT of the output file; this parameter is processed by the RootOutput module. Files written by the module RootOutput can be read by the module RootInput. The identifier output1 is just another module label that obeys the rules discussed in Section 9.8.7.

In the example of writeFile.fcl the output module takes its default behaviour: it will write all of the information about each input event to the output file. RootOutput can be configured to:

  1. write only selected events
  2. for each event write only a subset of the available data products.

How to do this will be described in section that will be written later.

Before running the exercise, look at the source parameter set of writeFile.fcl; note that it is configured to read only events 4, 5, 6, and 7.

To run writeFile.fcl and check that it worked correctly:

art -c writeFile.fcl
ls -s output/writeFile.art
art -c hello.fcl -s output/writeFile.art

The first command will write the ouptut file; the second will check that the output file was created and will tell you its size; the last one will read back the output file and print the event IDs for all of the events in the file. You should see the HelloWorld printout for events 4, 5, 6 and 7.

9.9 Understanding the Process for Exercise 1

Section 9.6.1 contained a list of steps needed to run this exercise; this section will describe each of those steps in detail. When you understand what is done in these steps, you will understand art’s run-time environment. As a reminder, the steps are listed again here. The commands that span two lines can be typed on a single line. PICT PICT

SVG-Viewer needed.


SVG-Viewer needed.

Steps 1 and 4 should be self explanatory and will not be discussed further.

When reading this section, you do not need to run any of the commands given here; this is commentary on commands that you have already run. PIC

9.9.1 Follow the Site-Specific Setup Procedure (Details)

The site-specific setup procedure, described in Chapter 5, ensures that the UPS system is properly initialized and that the UPS database (containing all of the UPS products needed to run the Workbook exercises) is present in the PRODUCTS environment variable.

This procedure also defines two environment variables that are defined by your experiment to allow you to run the Workbook exercises on their computer(s):

the top-level directory in which users create their working directory for the Workbook exercises
the top-level directory in which users create their output directory for the Workbook exercises; this is used by the script makeLinks.sh

If these environment variables are not defined, ask a system admin on your experiment. PICT PICT

9.9.2 Make a Working Directory (Details)

On the Fermilab computers the home disk areas are quite small so most experiments ask that their collaborators work in some other disk space. This is common to sites in general, so we recommend working in a separate space as a best practice. The Workbook is designed to require it.

This step, shown on two lines as:

mkdir -p $ART_WORKBOOK_WORKING_BASE/username/workbook-tutorial/\

creates a new directory to use as your working directory. It is defined relative to an environment variable described in Section 9.9.1. It only needs to be done the first time that you log in to work on Workbook exercises.

If you follow the rest of the naming scheme, you will guarantee that you have no conflicts with other parts of the Workbook.

As discussed in Section, you may of course choose your own working directory on any disk that has adequate disk space.

9.9.3 Setup the toyExperiment UPS Product (Details)

This step is the main event in the eight-step process.

setup toyExperiment v0_00_14 -q$ART_WORKBOOK_QUAL:prof

This command tells UPS to find a product named toyExperiment, with the specified version and qualifiers, and to setup that product, as described in Section 7.3.

The required qualifiers may change from one experiment to another and even from one site to another within the same experiment. To deal with this, the site specific setup procedure defines the environment variable ART_WORKBOOK_QUAL, whose value is the qualifier string that is correct for that site. PICT PICT

The complete UPS qualifier for toyExperiment has two components, separated by a colon: the string defined by ART_WORKBOOK_QUAL plus a qualifier describing the compiler optimization level with which the product was built, in this case “prof”; see Section 3.6.7 for information about the optimization levels.

Each version of the toyExperiment product knows that it requires a particular version and qualifier of the art product. In turn, art knows that it depends on particular versions of ROOT, CLHEP, boost and so on. When this recursive setup has completed, over 20 products will have been setup. All of these products define environment variables and about two-thirds of them add new elements to the environment variables PATH and LD_LIBRARY_PATH.

If you are interested, you can inspect your environment before and after doing this setup. To do this, log out and log in again. Before doing the setup, run the following commands:

printenv > env.before
printenv PATH | tr : \\n > path.before
printenv LD_LIBRARY_PATH | tr : \\n > ldpath.before

Then setup toyExperiment and capture the environment afterwards (env.after). Compare the before and after files: the after files will have many, many additions to the environment. (The fragment  | tr : \\n  tells the bash shell to take the output of printenv and replace every occurrence of the colon character with the newline character; this makes the output much easier to read.)

9.9.4 Copy Files to your Current Working Directory (Details)

The step:

cp $TOYEXPERIMENT_DIR/HelloWorldScripts/* .

only needs to be done only the first time that you log in to work on the Workbook.

In this step you copied the files that you will use for the exercises into your current working directory. You should see these files:

hello.fcl  makeLinks.sh  skipEvents.fcl  writeFile.fcl PICT PICT

9.9.5 Source makeLinks.sh (Details)

This step:

source makeLinks.sh

only needs to be done only the first time that you log in to work on the Workbook. It created some symbolic links that art will use.

The FHiCL files used in the Workbook exercises look for their input files in the subdirectory inputFiles. This script made a symbolic link, named inputFiles, that points to: PICT PICT


in which the necessary input files are found.

This script also ensures that there is an output directory that you can write into when you run the exercises and adds a symbolic link from the current working directory to this output directory. The output directory is made under the directory $ART_WORKB0OK_OUTPUT_BASE; this environment variable was set by the site-specific setup procedure and it points to disk space that will have enough room to hold the output of the exercises.

9.9.6 Run art (Details)

Issuing the command:

art -c hello.fcl

runs the art main program, which is found in $ART_FQ_DIR/bin. This directory was added to your PATH when you setup toyExperiment. You can inspect your PATH to see that this directory is indeed there.

9.10 How does art find Modules?

When you ran hello.fcl, how did art find the module HelloWorld?

It looked at the environment variable LD_LIBRARY_PATH, which is a colon-delimited set of directory names defined when you setup the toyExperiments product. We saw the value of LD_LIBRARY_PATH in Section 9.9.3; to see it again, type the following:

printenv LD_LIBRARY_PATH | tr : \\n

The output should look similar to that shown in Listing 9.3. PICT PICT

Listing 9.3: Example of the value of LD_LIBRARY_PATH