Skip to content

File Format

File Structure Overview

+----------------------------------+
|  FILE HEADER (72 bytes)          |  Magic: 0x4F504948 ("HIPO")
|  - uniqueid, filenumber          |
|  - recordCount, version          |
|  - trailerPosition               |
+----------------------------------+
|  DICTIONARY RECORD               |  Schema definitions
|  (first record in file)          |
+----------------------------------+
|  DATA RECORD 0                   |
|  +----------------------------+  |
|  | Record Header (56 bytes)   |  |  Magic: 0xc0da0100
|  | - eventCount, compression  |  |
|  +----------------------------+  |
|  | Event Index Array          |  |  4 bytes per event
|  +----------------------------+  |
|  | Compressed Event Data      |  |  LZ4 compressed
|  +----------------------------+  |
+----------------------------------+
|  DATA RECORD 1 ... N             |
+----------------------------------+
|  FILE TRAILER (optional)         |  Record offset index
+----------------------------------+

File Header (72 bytes)

Offset Size Field Description
0 4 uniqueid File format magic number
4 4 filenumber Split file number
8 4 headerLength Header length in 32-bit words (always 14)
12 4 recordCount Total number of records
16 4 indexArrayLength Index array length in bytes
20 4 bitInfoVersion Packed: bits[0:7]=version, bits[8:31]=flags
24 4 userHeaderLength Dictionary length in bytes
28 4 magicNumber Endianness marker: 0xc0da0100
32 8 userRegister Reserved
40 8 trailerPosition File offset to trailer (0 = no trailer)
48 4 userIntegerOne User-defined
52 4 userIntegerTwo User-defined

Format Identification

Value Hex Format
"HIPO" 0x4F504948 HIPO format
"CLAS" 0x43455248 HIPO (alternate)
"EVIO" 0x4556494F EVIO-compatible

bitInfoVersion Flags

Bits Field
0-7 Version number
8 Dictionary included
9 Has "first event"
10 Trailer with index exists
20-21 Padding 1
22-23 Padding 2
28-31 Header type: 1=EVIO, 5=HIPO

Endianness Detection

The reader compares magicNumber at offset 28 against 0xc0da0100. If byte-swapped (0x0001dac0), all fields are converted with __builtin_bswap32() / __builtin_bswap64().

Record Header (56 bytes)

Offset Word Field Description
0 [0] recordLength Total record length in 32-bit words
4 [1] (reserved)
8 [2] headerLength Header length in words (always 14)
12 [3] numberOfEvents Events in this record
16 [4] indexDataLength Index array length in bytes
20 [5] bitInfo Packed version + padding info
24 [6] userHeaderLength Optional user header length
28 [7] signatureString Endianness marker: 0xc0da0100
32 [8] recordDataLength Uncompressed data length
36 [9] compressedWord Packed compression info
40 [10-11] userWordOne User-defined 64-bit value
48 [12-13] userWordTwo User-defined 64-bit value

compressedWord (offset 36)

Bits Field
0-27 Compressed data length in 32-bit words
28-31 Compression type: 0=none, 1=LZ4

Decompressed Record Layout

After LZ4 decompression, the record buffer contains:

+---------------------------------------------+
|  Event Index Array                          |  4 bytes x numberOfEvents
|  (cumulative byte offsets)                  |
+---------------------------------------------+
|  User Header + Padding                      |
+---------------------------------------------+
|  Event Data                                 |
|  +----------++----------+     +----------+  |
|  | Event 0  || Event 1  | ... | Event N  |  |
|  +----------++----------+     +----------+  |
+---------------------------------------------+

Index Array Conversion

On disk, the index array stores event sizes. During readRecord(), these are converted in-place to cumulative byte positions for O(1) event extraction.

Event Buffer (16-byte header)

Offset Size Field Description
0 4 signature ASCII "EVNT" (0x45564E54)
4 4 eventSize Total event size including header
8 4 tag Event tag (for filtering)
12 4 reserved Always 0
16 N structures Concatenated bank entries

Structure Entry (8-byte header)

Each bank inside an event:

Offset Size Field Description
0 2 group Bank group ID
2 1 item Bank item ID
3 1 type Data type
4 4 sizeWord bits[0:23]=totalSize, bits[24:31]=headerLength
8 N data Column-major data payload

Structure Scanning

To find a bank within an event, the library performs a linear scan from byte 16:

Position 16:        [group|item|type|size][data...]
                         match? -> return position
                         no match -> advance by (8 + size)
Position 16+8+size: [group|item|type|size][data...]
                         ...

Since events typically contain 5-20 banks, this scan is very fast.

Bank Data Layout (Columnar)

Data is stored column-major (Structure of Arrays). For schema pid/S,px/F,py/F,pz/F with 3 rows:

Offset 0:   pid[0](2B) pid[1](2B) pid[2](2B)    <- column 0: 3 x int16_t
Offset 6:   px[0] (4B) px[1] (4B) px[2] (4B)    <- column 1: 3 x float
Offset 18:  py[0] (4B) py[1] (4B) py[2] (4B)    <- column 2: 3 x float
Offset 30:  pz[0] (4B) pz[1] (4B) pz[2] (4B)    <- column 3: 3 x float
Total: 42 bytes (= rowLength(14) x nrows(3))

The byte offset for column c, row r with N total rows:

offset = columnOffset[c] * N + r * typeSize[c]

Dictionary Record

The first record contains schema definitions:

Dictionary Record
  +-- Event 0: Structure(group=120, item=2) -> schema string for bank 0
  +-- Event 1: Structure(group=120, item=2) -> schema string for bank 1
  +-- ...
  +-- Event N+1: Structure(group=32555, item=1) -> user config key
  |              Structure(group=32555, item=2) -> user config value

File Trailer

The trailer enables random event access via reader.gotoEvent(). It contains a bank (group=32111, item=1) with one row per data record:

Column Type Description
position Long Absolute file offset of the record
length Int Record length in bytes
entries Int Number of events in the record
userWordOne Long User word from record header
userWordTwo Long User word from record header

The readerIndex uses binary search on cumulative event counts for O(log R) random access, where R is the number of records.