Data Model¶
Core Concepts¶
HIPO organizes physics data in a hierarchy: a file contains compressed records, each
record contains many events, and each event contains one or more typed banks identified
by a (group, item) pair.
flowchart TB
F[File] --> R0[Record 0<br/>compressed]
F --> R1[Record 1]
F --> Rdots[…]
R0 --> E0[Event 0]
R0 --> E1[Event 1]
R0 --> Edots[…]
E0 --> B0["Bank REC::Particle<br/>group=300 item=1"]
E0 --> B1["Bank REC::Event<br/>group=301 item=1"]
E0 --> B2["Bank REC::Calorimeter<br/>group=302 item=1"]
B0 -. defined by .-> S[Schema<br/>columns + types]
Key Terms¶
- Record: A compressed block containing multiple events. The fundamental I/O unit.
- Event: A single physics event (particle collision). Contains multiple banks.
- Bank: A typed, columnar table within an event. Defined by a schema.
- Schema: Column names, types, and layout for a bank.
- Dictionary: Collection of all schemas in a file.
Banks as Tables¶
A bank is conceptually a table where each column has a fixed type and each row represents one entry (e.g., one particle):
| pid (Short) | px (Float) | py (Float) | pz (Float) |
|---|---|---|---|
| 11 | 0.52 | -0.31 | 2.10 |
| -211 | -0.88 | 0.15 | 1.43 |
| 2212 | 0.01 | 0.72 | 3.55 |
The schema defines the column layout:
Each schema has a unique (group, item) pair that identifies its bank within an event.
Columnar Storage¶
Banks store data column-major (Structure of Arrays), not row-major. For the table above:
This is efficient when you only need a few columns -- the CPU cache loads contiguous column data rather than skipping over unused columns in each row.
Buffer Reuse¶
Bank objects are designed to be reused across events:
hipo::bank particles(dict.getSchema("REC::Particle"));
while (reader.next()) {
reader.read(event);
event.read(particles); // overwrites particles' buffer in-place
// use particles...
}
Warning
Do not store references to bank data across events. The internal buffer is overwritten on each event.read(bank) call.
Schema Identification¶
Every bank type is identified by a (group, item) pair:
- group: A 16-bit integer identifying the bank category (e.g., 300 for
REC::banks) - item: An 8-bit integer identifying the specific bank within that group
When event.read(bank) is called, it searches the event buffer for a structure matching the bank's (group, item).
Self-Describing Files¶
Each HIPO file embeds its complete schema dictionary in the first record. This means:
- Files are self-contained -- no external schema files needed
- The reader can discover all available banks by reading the dictionary
- Schema evolution is handled gracefully -- old readers can skip unknown banks
hipo::dictionary dict;
reader.readDictionary(dict);
// List all available banks
for (auto& name : dict.getSchemaList()) {
printf("Bank: %s\n", name.c_str());
}
Event Tags¶
Events can be tagged with an integer value for filtering:
// Writer: events with different tags go to separate records
event.setTag(1);
// Reader: only read events with specific tags
reader.setTags({1, 2});
Tagged events are stored in separate records within the file, enabling efficient tag-based filtering without decompressing all records.