hipo2root¶

Convert HIPO files to ROOT RNTuple format.

Synopsis¶

hipo2root [options] <input_files...>

Description¶

hipo2root converts HIPO files to ROOT RNTuple format using RNTupleParallelWriter for true concurrent writing. Each thread writes directly to the same output file via its own FillContext, eliminating the need for temporary files per thread.

ROOT Required

This tool requires ROOT 6.30+ to be installed on your system (for RNTuple support). All other hipo-utils tools work without ROOT.

Options¶

Option	Description	Default
`-i, --input <files>`	Input HIPO files (comma-separated, supports globs)	-
`-o, --output <dir>`	Output directory	`.`
`-m, --merged <file>`	Merged output filename	`merged.root`
`-t, --tree <name>`	RNTuple name in output file	`hipo`
`-j, --jobs <n>`	Number of parallel jobs	CPU cores
`-n, --nevt <n>`	Events to inspect per file to find banks with data (increase for rare banks)	`1000`
`-I, --include-banks <banks>`	Only include specified banks (comma-separated)	-
`-E, --exclude-banks <banks>`	Exclude specified banks (comma-separated)	-

Examples¶

Basic Conversion¶

# Convert single file
hipo2root data.hipo

# Convert to specific directory
hipo2root -o output/ data.hipo

# Using -i option
hipo2root -i data.hipo -o output/

Multiple Files¶

# Convert all HIPO files in data/
hipo2root data/*.hipo -o output/

# Using -i with comma-separated files
hipo2root -i file1.hipo,file2.hipo,file3.hipo -m output.root

# Custom output filename
hipo2root -m output.root file1.hipo file2.hipo

Parallel Processing¶

Records within each file are processed in parallel across threads using RNTupleParallelWriter:

# Use 4 threads for parallel writing
hipo2root --jobs 4 data/*.hipo

# Use all CPU cores (default)
hipo2root data/*.hipo

# Single file benefits from parallelism too
hipo2root --jobs 8 large_file.hipo

Bank Selection¶

# Include only specific banks
hipo2root -I REC::Particle,REC::Calorimeter data.hipo

# Exclude banks
hipo2root -E "RAW::*" data.hipo

Custom RNTuple Name¶

# Use custom RNTuple name
hipo2root -t events -m output.root data.hipo

Output Structure¶

The ROOT file contains an RNTuple with fields for each bank column:

BankName_ColumnName (e.g., REC_Particle_pid, REC_Particle_px)

All fields are stored as std::vector<T> to handle multi-row banks.

Reading RNTuple Output¶

// C++ example
#include <ROOT/RNTupleReader.hxx>

auto reader = ROOT::RNTupleReader::Open("hipo", "output.root");
std::cout << "Entries: " << reader->GetNEntries() << std::endl;

// Access fields
auto pidView = reader->GetView<std::vector<int>>("REC_Particle_pid");
for (auto i : reader->GetEntryRange()) {
    for (int pid : pidView(i)) {
        // process particle IDs
    }
}

# Python example (requires ROOT with RNTuple support)
import ROOT

reader = ROOT.RNTupleReader.Open("hipo", "output.root")
print(f"Entries: {reader.GetNEntries()}")

Architecture¶

┌─────────────┐     ┌─────────────┐     ┌─────────────┐
│  Worker 0   │     │  Worker 1   │     │  Worker N   │
│ FillContext │     │ FillContext │     │ FillContext │
└──────┬──────┘     └──────┬──────┘     └──────┬──────┘
       │                   │                   │
       └───────────────────┼───────────────────┘
                           │
                    ┌──────▼──────┐
                    │   RNTuple   │
                    │ParallelWriter│
                    │(single file)│
                    └─────────────┘

Notes¶

True parallel writing: Uses RNTupleParallelWriter where each thread has its own FillContext for lock-free concurrent writes to the same file
No temporary files: All input files stream directly to a single output file - no intermediate temp files or post-merge step required
Files are processed sequentially, but each file fully utilizes all threads for record processing
Schema discovery inspects N events per file (configurable with -n) to find banks that contain data. Increase this value if you have rare banks that only appear occasionally
Event order within the output may differ from the original sequential order when using multiple threads (records are processed in parallel)
Output uses 64MB cluster sizes for optimal parallel I/O performance