File Format¶
File Structure Overview¶
+----------------------------------+
| FILE HEADER (72 bytes) | Magic: 0x4F504948 ("HIPO")
| - uniqueid, filenumber |
| - recordCount, version |
| - trailerPosition |
+----------------------------------+
| DICTIONARY RECORD | Schema definitions
| (first record in file) |
+----------------------------------+
| DATA RECORD 0 |
| +----------------------------+ |
| | Record Header (56 bytes) | | Magic: 0xc0da0100
| | - eventCount, compression | |
| +----------------------------+ |
| | Event Index Array | | 4 bytes per event
| +----------------------------+ |
| | Compressed Event Data | | LZ4 compressed
| +----------------------------+ |
+----------------------------------+
| DATA RECORD 1 ... N |
+----------------------------------+
| FILE TRAILER (optional) | Record offset index
+----------------------------------+
File Header (72 bytes)¶
| Offset | Size | Field | Description |
|---|---|---|---|
| 0 | 4 | uniqueid |
File format magic number |
| 4 | 4 | filenumber |
Split file number |
| 8 | 4 | headerLength |
Header length in 32-bit words (always 14) |
| 12 | 4 | recordCount |
Total number of records |
| 16 | 4 | indexArrayLength |
Index array length in bytes |
| 20 | 4 | bitInfoVersion |
Packed: bits[0:7]=version, bits[8:31]=flags |
| 24 | 4 | userHeaderLength |
Dictionary length in bytes |
| 28 | 4 | magicNumber |
Endianness marker: 0xc0da0100 |
| 32 | 8 | userRegister |
Reserved |
| 40 | 8 | trailerPosition |
File offset to trailer (0 = no trailer) |
| 48 | 4 | userIntegerOne |
User-defined |
| 52 | 4 | userIntegerTwo |
User-defined |
Format Identification¶
| Value | Hex | Format |
|---|---|---|
"HIPO" |
0x4F504948 |
HIPO format |
"CLAS" |
0x43455248 |
HIPO (alternate) |
"EVIO" |
0x4556494F |
EVIO-compatible |
bitInfoVersion Flags¶
| Bits | Field |
|---|---|
| 0-7 | Version number |
| 8 | Dictionary included |
| 9 | Has "first event" |
| 10 | Trailer with index exists |
| 20-21 | Padding 1 |
| 22-23 | Padding 2 |
| 28-31 | Header type: 1=EVIO, 5=HIPO |
Endianness Detection¶
The reader compares magicNumber at offset 28 against 0xc0da0100. If byte-swapped (0x0001dac0), all fields are converted with __builtin_bswap32() / __builtin_bswap64().
Record Header (56 bytes)¶
| Offset | Word | Field | Description |
|---|---|---|---|
| 0 | [0] | recordLength |
Total record length in 32-bit words |
| 4 | [1] | (reserved) | |
| 8 | [2] | headerLength |
Header length in words (always 14) |
| 12 | [3] | numberOfEvents |
Events in this record |
| 16 | [4] | indexDataLength |
Index array length in bytes |
| 20 | [5] | bitInfo |
Packed version + padding info |
| 24 | [6] | userHeaderLength |
Optional user header length |
| 28 | [7] | signatureString |
Endianness marker: 0xc0da0100 |
| 32 | [8] | recordDataLength |
Uncompressed data length |
| 36 | [9] | compressedWord |
Packed compression info |
| 40 | [10-11] | userWordOne |
User-defined 64-bit value |
| 48 | [12-13] | userWordTwo |
User-defined 64-bit value |
compressedWord (offset 36)¶
| Bits | Field |
|---|---|
| 0-27 | Compressed data length in 32-bit words |
| 28-31 | Compression type: 0=none, 1=LZ4 |
Decompressed Record Layout¶
After LZ4 decompression, the record buffer contains:
+---------------------------------------------+
| Event Index Array | 4 bytes x numberOfEvents
| (cumulative byte offsets) |
+---------------------------------------------+
| User Header + Padding |
+---------------------------------------------+
| Event Data |
| +----------++----------+ +----------+ |
| | Event 0 || Event 1 | ... | Event N | |
| +----------++----------+ +----------+ |
+---------------------------------------------+
Index Array Conversion
On disk, the index array stores event sizes. During readRecord(), these are converted in-place to cumulative byte positions for O(1) event extraction.
Event Buffer (16-byte header)¶
| Offset | Size | Field | Description |
|---|---|---|---|
| 0 | 4 | signature |
ASCII "EVNT" (0x45564E54) |
| 4 | 4 | eventSize |
Total event size including header |
| 8 | 4 | tag |
Event tag (for filtering) |
| 12 | 4 | reserved |
Always 0 |
| 16 | N | structures |
Concatenated bank entries |
Structure Entry (8-byte header)¶
Each bank inside an event:
| Offset | Size | Field | Description |
|---|---|---|---|
| 0 | 2 | group |
Bank group ID |
| 2 | 1 | item |
Bank item ID |
| 3 | 1 | type |
Data type |
| 4 | 4 | sizeWord |
bits[0:23]=totalSize, bits[24:31]=headerLength |
| 8 | N | data |
Column-major data payload |
Structure Scanning¶
To find a bank within an event, the library performs a linear scan from byte 16:
Position 16: [group|item|type|size][data...]
match? -> return position
no match -> advance by (8 + size)
Position 16+8+size: [group|item|type|size][data...]
...
Since events typically contain 5-20 banks, this scan is very fast.
Bank Data Layout (Columnar)¶
Data is stored column-major (Structure of Arrays). For schema pid/S,px/F,py/F,pz/F with 3 rows:
Offset 0: pid[0](2B) pid[1](2B) pid[2](2B) <- column 0: 3 x int16_t
Offset 6: px[0] (4B) px[1] (4B) px[2] (4B) <- column 1: 3 x float
Offset 18: py[0] (4B) py[1] (4B) py[2] (4B) <- column 2: 3 x float
Offset 30: pz[0] (4B) pz[1] (4B) pz[2] (4B) <- column 3: 3 x float
Total: 42 bytes (= rowLength(14) x nrows(3))
The byte offset for column c, row r with N total rows:
offset = columnOffset[c] * N + r * typeSize[c]
Dictionary Record¶
The first record contains schema definitions:
Dictionary Record
+-- Event 0: Structure(group=120, item=2) -> schema string for bank 0
+-- Event 1: Structure(group=120, item=2) -> schema string for bank 1
+-- ...
+-- Event N+1: Structure(group=32555, item=1) -> user config key
| Structure(group=32555, item=2) -> user config value
File Trailer¶
The trailer enables random event access via reader.gotoEvent(). It contains a bank (group=32111, item=1) with one row per data record:
| Column | Type | Description |
|---|---|---|
position |
Long | Absolute file offset of the record |
length |
Int | Record length in bytes |
entries |
Int | Number of events in the record |
userWordOne |
Long | User word from record header |
userWordTwo |
Long | User word from record header |
The readerIndex uses binary search on cumulative event counts for O(log R) random access, where R is the number of records.