Skip to content

Data Model

Core Concepts

HIPO organizes physics data in a hierarchy:

File
  +-- Record 0 (compressed block of events)
  |     +-- Event 0
  |     |     +-- Bank "REC::Particle" (group=300, item=1)
  |     |     +-- Bank "REC::Event"    (group=301, item=1)
  |     |     +-- ...
  |     +-- Event 1
  |     +-- ...
  +-- Record 1
  +-- ...

Key Terms

  • Record: A compressed block containing multiple events. The fundamental I/O unit.
  • Event: A single physics event (particle collision). Contains multiple banks.
  • Bank: A typed, columnar table within an event. Defined by a schema.
  • Schema: Column names, types, and layout for a bank.
  • Dictionary: Collection of all schemas in a file.

Banks as Tables

A bank is conceptually a table where each column has a fixed type and each row represents one entry (e.g., one particle):

pid (Short) px (Float) py (Float) pz (Float)
11 0.52 -0.31 2.10
-211 -0.88 0.15 1.43
2212 0.01 0.72 3.55

The schema defines the column layout:

hipo::schema s("REC::Particle", 300, 1);
s.parse("pid/S,px/F,py/F,pz/F");

Each schema has a unique (group, item) pair that identifies its bank within an event.

Columnar Storage

Banks store data column-major (Structure of Arrays), not row-major. For the table above:

Memory layout:  [pid0, pid1, pid2, px0, px1, px2, py0, py1, py2, pz0, pz1, pz2]

This is efficient when you only need a few columns -- the CPU cache loads contiguous column data rather than skipping over unused columns in each row.

Buffer Reuse

Bank objects are designed to be reused across events:

hipo::bank particles(dict.getSchema("REC::Particle"));

while (reader.next()) {
    reader.read(event);
    event.read(particles);  // overwrites particles' buffer in-place
    // use particles...
}

Warning

Do not store references to bank data across events. The internal buffer is overwritten on each event.read(bank) call.

Schema Identification

Every bank type is identified by a (group, item) pair:

  • group: A 16-bit integer identifying the bank category (e.g., 300 for REC:: banks)
  • item: An 8-bit integer identifying the specific bank within that group

When event.read(bank) is called, it searches the event buffer for a structure matching the bank's (group, item).

Self-Describing Files

Each HIPO file embeds its complete schema dictionary in the first record. This means:

  • Files are self-contained -- no external schema files needed
  • The reader can discover all available banks by reading the dictionary
  • Schema evolution is handled gracefully -- old readers can skip unknown banks
hipo::dictionary dict;
reader.readDictionary(dict);

// List all available banks
for (auto& name : dict.getSchemaList()) {
    printf("Bank: %s\n", name.c_str());
}

Event Tags

Events can be tagged with an integer value for filtering:

// Writer: events with different tags go to separate records
event.setTag(1);

// Reader: only read events with specific tags
reader.setTags({1, 2});

Tagged events are stored in separate records within the file, enabling efficient tag-based filtering without decompressing all records.