Skip to content

Data Model

Core Concepts

HIPO organizes physics data in a hierarchy: a file contains compressed records, each record contains many events, and each event contains one or more typed banks identified by a (group, item) pair.

flowchart TB
    F[File] --> R0[Record 0<br/>compressed]
    F --> R1[Record 1]
    F --> Rdots[…]
    R0 --> E0[Event 0]
    R0 --> E1[Event 1]
    R0 --> Edots[…]
    E0 --> B0["Bank REC::Particle<br/>group=300 item=1"]
    E0 --> B1["Bank REC::Event<br/>group=301 item=1"]
    E0 --> B2["Bank REC::Calorimeter<br/>group=302 item=1"]
    B0 -. defined by .-> S[Schema<br/>columns + types]

Key Terms

  • Record: A compressed block containing multiple events. The fundamental I/O unit.
  • Event: A single physics event (particle collision). Contains multiple banks.
  • Bank: A typed, columnar table within an event. Defined by a schema.
  • Schema: Column names, types, and layout for a bank.
  • Dictionary: Collection of all schemas in a file.

Banks as Tables

A bank is conceptually a table where each column has a fixed type and each row represents one entry (e.g., one particle):

pid (Short) px (Float) py (Float) pz (Float)
11 0.52 -0.31 2.10
-211 -0.88 0.15 1.43
2212 0.01 0.72 3.55

The schema defines the column layout:

hipo::schema s("REC::Particle", 300, 1);
s.parse("pid/S,px/F,py/F,pz/F");

Each schema has a unique (group, item) pair that identifies its bank within an event.

Columnar Storage

Banks store data column-major (Structure of Arrays), not row-major. For the table above:

Memory layout:  [pid0, pid1, pid2, px0, px1, px2, py0, py1, py2, pz0, pz1, pz2]

This is efficient when you only need a few columns -- the CPU cache loads contiguous column data rather than skipping over unused columns in each row.

Buffer Reuse

Bank objects are designed to be reused across events:

hipo::bank particles(dict.getSchema("REC::Particle"));

while (reader.next()) {
    reader.read(event);
    event.read(particles);  // overwrites particles' buffer in-place
    // use particles...
}

Warning

Do not store references to bank data across events. The internal buffer is overwritten on each event.read(bank) call.

Schema Identification

Every bank type is identified by a (group, item) pair:

  • group: A 16-bit integer identifying the bank category (e.g., 300 for REC:: banks)
  • item: An 8-bit integer identifying the specific bank within that group

When event.read(bank) is called, it searches the event buffer for a structure matching the bank's (group, item).

Self-Describing Files

Each HIPO file embeds its complete schema dictionary in the first record. This means:

  • Files are self-contained -- no external schema files needed
  • The reader can discover all available banks by reading the dictionary
  • Schema evolution is handled gracefully -- old readers can skip unknown banks
hipo::dictionary dict;
reader.readDictionary(dict);

// List all available banks
for (auto& name : dict.getSchemaList()) {
    printf("Bank: %s\n", name.c_str());
}

Event Tags

Events can be tagged with an integer value for filtering:

// Writer: events with different tags go to separate records
event.setTag(1);

// Reader: only read events with specific tags
reader.setTags({1, 2});

Tagged events are stored in separate records within the file, enabling efficient tag-based filtering without decompressing all records.