Data Model¶
Core Concepts¶
HIPO organizes physics data in a hierarchy:
File
+-- Record 0 (compressed block of events)
| +-- Event 0
| | +-- Bank "REC::Particle" (group=300, item=1)
| | +-- Bank "REC::Event" (group=301, item=1)
| | +-- ...
| +-- Event 1
| +-- ...
+-- Record 1
+-- ...
Key Terms¶
- Record: A compressed block containing multiple events. The fundamental I/O unit.
- Event: A single physics event (particle collision). Contains multiple banks.
- Bank: A typed, columnar table within an event. Defined by a schema.
- Schema: Column names, types, and layout for a bank.
- Dictionary: Collection of all schemas in a file.
Banks as Tables¶
A bank is conceptually a table where each column has a fixed type and each row represents one entry (e.g., one particle):
| pid (Short) | px (Float) | py (Float) | pz (Float) |
|---|---|---|---|
| 11 | 0.52 | -0.31 | 2.10 |
| -211 | -0.88 | 0.15 | 1.43 |
| 2212 | 0.01 | 0.72 | 3.55 |
The schema defines the column layout:
hipo::schema s("REC::Particle", 300, 1);
s.parse("pid/S,px/F,py/F,pz/F");
Each schema has a unique (group, item) pair that identifies its bank within an event.
Columnar Storage¶
Banks store data column-major (Structure of Arrays), not row-major. For the table above:
Memory layout: [pid0, pid1, pid2, px0, px1, px2, py0, py1, py2, pz0, pz1, pz2]
This is efficient when you only need a few columns -- the CPU cache loads contiguous column data rather than skipping over unused columns in each row.
Buffer Reuse¶
Bank objects are designed to be reused across events:
hipo::bank particles(dict.getSchema("REC::Particle"));
while (reader.next()) {
reader.read(event);
event.read(particles); // overwrites particles' buffer in-place
// use particles...
}
Warning
Do not store references to bank data across events. The internal buffer is overwritten on each event.read(bank) call.
Schema Identification¶
Every bank type is identified by a (group, item) pair:
- group: A 16-bit integer identifying the bank category (e.g., 300 for
REC::banks) - item: An 8-bit integer identifying the specific bank within that group
When event.read(bank) is called, it searches the event buffer for a structure matching the bank's (group, item).
Self-Describing Files¶
Each HIPO file embeds its complete schema dictionary in the first record. This means:
- Files are self-contained -- no external schema files needed
- The reader can discover all available banks by reading the dictionary
- Schema evolution is handled gracefully -- old readers can skip unknown banks
hipo::dictionary dict;
reader.readDictionary(dict);
// List all available banks
for (auto& name : dict.getSchemaList()) {
printf("Bank: %s\n", name.c_str());
}
Event Tags¶
Events can be tagged with an integer value for filtering:
// Writer: events with different tags go to separate records
event.setTag(1);
// Reader: only read events with specific tags
reader.setTags({1, 2});
Tagged events are stored in separate records within the file, enabling efficient tag-based filtering without decompressing all records.