From DarkSoulsDev
Jump to: navigation, search

Header files for BDT archives. Little-endian.


Some simple header data, detailing entry records.

struct BhdHeader
  /* 0x00 */  uint32_t magic;           // 'BHD5', 0x35444842
  /* 0x04 */  uint32_t unk1;            // always 0xFF ?
  /* 0x08 */  uint32_t unk2;            // always 0x01 ?
  /* 0x0C */  uint32_t file_size;       // size in bytes of this file
  /* 0x10 */  uint32_t num_records;     // number of data entries
  /* 0x14 */  uint32_t records_offset;  // absolute offset to the entry records

records_offset is an absolute offset from the beginning of the file, and seems to always be 0x18.

Entry records

These entries are used to access the real data entries later. There are BhdHeader.num_records of them.

I don't know why data entries are packed in such records. Is it directories?

struct BhdEntryRecord
  /* 0x00 */  uint32_t num_entries;     // number of entries in this record
  /* 0x04 */  uint32_t entries_offset;  // absolute offset to the data entries

Data entries

Information on how to access the data in the BDT content file.

struct BhdDataEntry
  /* 0x00 */  uint32_t hash;    // hash of the file name
  /* 0x04 */  uint32_t size;    // size of the data
  /* 0x08 */  uint32_t offset;  // absolute offset to the data
  /* 0x0C */  uint32_t unk;     // always 0 ?

Calculating the hash from a file name is easy. From the Alexandria dev:

To hash a string, start at 0. For each step multiply the hash code by 37,
then add the next lower-case character from the file name
(probably in Unicode).

To reformulate that a bit, you start with a hash of 0, then for each character of the lowercased file name, multiply the hash by 37 and add the Unicode value of the character.

It can be implemented this way:

file_name = "/dir/CAPS/Example.slt"
file_name = file_name.lower()

full_hash = 0
for character in file_name:
    full_hash *= 37
    full_hash += ord(character)

string_hash = "{:X}".format(full_hash)
eight_last_chars = string_hash[:-8]

There, eight_last_chars represents your hash as found in the BHD5 data entry (mind the endianness though).

This hash system is annoying because we lack a direct mapping for names to files, but at the same time it's weak enough to allow us some tricks: if two file names differ only by their last chars, as if they have the same base name but with different extensions, their hashes will be very similar. This is useful to find correspondind pairs of BHF and BDF files.