File Format¶

File Structure Overview¶

+----------------------------------+
|  FILE HEADER (72 bytes)          |  Magic: 0x4F504948 ("HIPO")
|  - uniqueid, filenumber          |
|  - recordCount, version          |
|  - trailerPosition               |
+----------------------------------+
|  DICTIONARY RECORD               |  Schema definitions
|  (first record in file)          |
+----------------------------------+
|  DATA RECORD 0                   |
|  +----------------------------+  |
|  | Record Header (56 bytes)   |  |  Magic: 0xc0da0100
|  | - eventCount, compression  |  |
|  +----------------------------+  |
|  | Event Index Array          |  |  4 bytes per event
|  +----------------------------+  |
|  | Compressed Event Data      |  |  LZ4 compressed
|  +----------------------------+  |
+----------------------------------+
|  DATA RECORD 1 ... N             |
+----------------------------------+
|  FILE TRAILER (optional)         |  Record offset index
+----------------------------------+

File Header (72 bytes)¶

Offset	Size	Field	Description
0	4	`uniqueid`	File format magic number
4	4	`filenumber`	Split file number
8	4	`headerLength`	Header length in 32-bit words (always 14)
12	4	`recordCount`	Total number of records
16	4	`indexArrayLength`	Index array length in bytes
20	4	`bitInfoVersion`	Packed: bits[0:7]=version, bits[8:31]=flags
24	4	`userHeaderLength`	Dictionary length in bytes
28	4	`magicNumber`	Endianness marker: `0xc0da0100`
32	8	`userRegister`	Reserved
40	8	`trailerPosition`	File offset to trailer (0 = no trailer)
48	4	`userIntegerOne`	User-defined
52	4	`userIntegerTwo`	User-defined

Format Identification¶

Value	Hex	Format
`"HIPO"`	`0x4F504948`	HIPO format
`"CLAS"`	`0x43455248`	HIPO (alternate)
`"EVIO"`	`0x4556494F`	EVIO-compatible

bitInfoVersion Flags¶

Bits	Field
0-7	Version number
8	Dictionary included
9	Has "first event"
10	Trailer with index exists
20-21	Padding 1
22-23	Padding 2
28-31	Header type: 1=EVIO, 5=HIPO

Endianness Detection¶

The reader compares magicNumber at offset 28 against 0xc0da0100. If byte-swapped (0x0001dac0), all fields are converted with __builtin_bswap32() / __builtin_bswap64().

Record Header (56 bytes)¶

Offset	Word	Field	Description
0	[0]	`recordLength`	Total record length in 32-bit words
4	[1]	(reserved)
8	[2]	`headerLength`	Header length in words (always 14)
12	[3]	`numberOfEvents`	Events in this record
16	[4]	`indexDataLength`	Index array length in bytes
20	[5]	`bitInfo`	Packed version + padding info
24	[6]	`userHeaderLength`	Optional user header length
28	[7]	`signatureString`	Endianness marker: `0xc0da0100`
32	[8]	`recordDataLength`	Uncompressed data length
36	[9]	`compressedWord`	Packed compression info
40	[10-11]	`userWordOne`	User-defined 64-bit value
48	[12-13]	`userWordTwo`	User-defined 64-bit value

compressedWord (offset 36)¶

Bits	Field
0-27	Compressed data length in 32-bit words
28-31	Compression type: 0=none, 1=LZ4

Decompressed Record Layout¶

After LZ4 decompression, the record buffer contains:

+---------------------------------------------+
|  Event Index Array                          |  4 bytes x numberOfEvents
|  (cumulative byte offsets)                  |
+---------------------------------------------+
|  User Header + Padding                      |
+---------------------------------------------+
|  Event Data                                 |
|  +----------++----------+     +----------+  |
|  | Event 0  || Event 1  | ... | Event N  |  |
|  +----------++----------+     +----------+  |
+---------------------------------------------+

Index Array Conversion

On disk, the index array stores event sizes. During readRecord(), these are converted in-place to cumulative byte positions for O(1) event extraction.

Event Buffer (16-byte header)¶

Offset	Size	Field	Description
0	4	`signature`	ASCII `"EVNT"` (`0x45564E54`)
4	4	`eventSize`	Total event size including header
8	4	`tag`	Event tag (for filtering)
12	4	`reserved`	Always 0
16	N	`structures`	Concatenated bank entries

Structure Entry (8-byte header)¶

Each bank inside an event:

Offset	Size	Field	Description
0	2	`group`	Bank group ID
2	1	`item`	Bank item ID
3	1	`type`	Data type
4	4	`sizeWord`	bits[0:23]=totalSize, bits[24:31]=headerLength
8	N	`data`	Column-major data payload

Structure Scanning¶

To find a bank within an event, the library performs a linear scan from byte 16:

Position 16:        [group|item|type|size][data...]
                         match? -> return position
                         no match -> advance by (8 + size)
Position 16+8+size: [group|item|type|size][data...]
                         ...

Since events typically contain 5-20 banks, this scan is very fast.

Bank Data Layout (Columnar)¶

Data is stored column-major (Structure of Arrays). For schema pid/S,px/F,py/F,pz/F with 3 rows:

Offset 0:   pid[0](2B) pid[1](2B) pid[2](2B)    <- column 0: 3 x int16_t
Offset 6:   px[0] (4B) px[1] (4B) px[2] (4B)    <- column 1: 3 x float
Offset 18:  py[0] (4B) py[1] (4B) py[2] (4B)    <- column 2: 3 x float
Offset 30:  pz[0] (4B) pz[1] (4B) pz[2] (4B)    <- column 3: 3 x float
Total: 42 bytes (= rowLength(14) x nrows(3))

The byte offset for column c, row r with N total rows:

offset = columnOffset[c] * N + r * typeSize[c]

Dictionary Record¶

The first record contains schema definitions:

Dictionary Record
  +-- Event 0: Structure(group=120, item=2) -> schema string for bank 0
  +-- Event 1: Structure(group=120, item=2) -> schema string for bank 1
  +-- ...
  +-- Event N+1: Structure(group=32555, item=1) -> user config key
  |              Structure(group=32555, item=2) -> user config value

File Trailer¶

The trailer enables random event access via reader.gotoEvent(). It contains a bank (group=32111, item=1) with one row per data record:

Column	Type	Description
`position`	Long	Absolute file offset of the record
`length`	Int	Record length in bytes
`entries`	Int	Number of events in the record
`userWordOne`	Long	User word from record header
`userWordTwo`	Long	User word from record header

The readerIndex uses binary search on cumulative event counts for O(log R) random access, where R is the number of records.