higan Manifests

Overview

Manifests are used to describe games, as well as their PCB components.

The base format is a minimal encoding to express every possible variation of PCB that exists in licensed form, as well as most unlicensed PCBs by extension.

The format is extensible, so that new fields can be added on an as-needed basis, yet in a backward-compatible fashion for existing games.

I feel it's best to start with examples, and then move on to explanations.

Simple Example
game
  sha256:   b7209ec3a5a0d28724f5867343195aef7cb85aeb453aa84a6cbe201b61b0d083
  label:    ドレミファンタジー ミロンのドキドキ大冒険
  name:     DoReMi Fantasy - Milon no Dokidoki Daibouken
  region:   SHVC-AM4J-JPN
  revision: SHVC-AM4J-0
  board:    SHVC-1J0N-20
    memory
      type: ROM
      size: 0x200000
      content: Program
Complex Example
game
  sha256:   89ad4ba02a2518ca792cf96b61b36613f86baac92344c9c10d7fab5433bebc16
  label:    Super Mario Kart
  name:     Super Mario Kart
  region:   SNS-MK-USA
  revision: SNS-MK-0
  board:    SHVC-1K1B-01
    memory
      type: ROM
      size: 0x80000
      content: Program
    memory
      type: RAM
      size: 0x800
      content: Save
    memory
      type: ROM
      size: 0x1800
      content: Program
      manufacturer: NEC
      architecture: uPD7725
    memory
      type: ROM
      size: 0x800
      content: Data
      manufacturer: NEC
      architecture: uPD7725
    memory
      type: RAM
      size: 0x200
      content: Data
      manufacturer: NEC
      architecture: uPD7725
      volatile
    oscillator
      frequency: 7600000
  note: DSP1
Encoding

Manifests are stored in plain-text files named manifest.bml. The BML extension refers to the markup language used. For manifests specifically, only a simplified subset of the BML syntax is used.

The file format is always UTF-8 without a BOM (byte order marker) present. If there is a BOM (eg from Notepad), the document will be considered invalid.

Line endings should only consist of line feeds (0x0a), however it is permissible to have carriage returns + line feeds (0x0d, 0x0a). Carriage feeds alone (eg MacOS 9 format) are not permitted.

Tree-structure nesting is permitted by way of indenting, either by tabs or by spaces. It is up to the user how many tabs or spaces each indentation level uses, however it must be applied consistently to the entire document to be considered valid. In the above example, two spaces are used. This is the recommended default choice. Tabs have a tendency to display at inconsistent widths between editors, and to be eaten by web forum software.

Each node can contain [A-Za-z0-9-_], but in practice only [a-z] is used for manifests.

Each node can contain a value. To specify a value, a : separator is used. After the : separator, any values can appear outside of carriage returns and line feeds, however control characters should not be used.

For manifests specifically, all whitespace (both from tabs and spaces) will be erased from values. In the above examples, extra whitespace is used to vertically align the root-node elements, but this is strictly optional. If desired, each node can be encoded as name:value.

Each node can have child nodes, even if the parent node is assigned a value.

From BML, attributes and multi-line values are not used. This is so that parsing of manifests is simpler than full BML syntax.

Specification

Every manifest must have a root-level game node. All relevant information is stored under this node, so as to enable direct transposition of game database entries into individual manifest files.

The sha256 field is optional for individual games, and may be omitted for rewritable media such as BS-X Satellaview memory cartridges and Nintendo Power cartridges. The SHA256 value itself is the combined hash of all relevant ROM files. The hashing order for this is program ROM, then data ROM, then character ROM, then firmware boot ROM, then firmware program ROM, then firmware data ROM. No cases exist of games with multiple coprocessor firmware. Volatile data such as EEPROM, Flash, and RAM components are not considered for the SHA256 hash computation.

The label field is mandatory. This is used by higan to display the game title in the title bar. The idea is that this field represents what is printed on the game labels. It should be in the native language of the game itself, and should include any special characters that can be encoded in UTF-8, excluding emoji. The simple example above has the name of the game written in Japanese, for instance.

The name field is optional for standalone gamepaks, but is mandatory for database entries, so that icarus can import the games and name them appropriately. This field represents a filesystem-safe filename. It should only use characters between 0x20 and 0x7e, and should not use reserved filename characters such as <>/:*?|. It is very strongly recommended to have a name field, even if it is identical to the label field as in the complex example above. It is especially useful in the case of the simple example, so that the label can be read by those who are not fluent in Japanese.

The region field is mandatory, and is used by higan to determine the regional hardware and TV frequency variants used for proper emulation. When game serial numbers are present that include the regional information within them, these values are used. In the simple example above, the region ends in -JPN, which indicates the game is from Japan, and uses the NTSC standard. In the complex example, -USA obviously implies the game is from the US. If the game serials are not known, the value will be up to the emulated system's requirements. The NES/Famicom will need either NTSC or PAL, the Mega Drive/Genesis will need NTSC-J, NTSC-U, or PAL. The PC Engine will always be NTSC. Technically, the region could be omitted for the Wonderswan, but for consistency, it should always be included.

The revision field is mandatory, and is used to uniquely identify game revisions. If the game cartridge ROM chip has a serial number that also includes revision information, this should be used. In the simple example above, the -0 indicates a 1.0 revision. A -1 would indicate a 1.1 revision, and so forth. If a game does not have clear ROM serials, then the revision will start at 1.0 and increment with each newer release sequentially. No known cases exist of a revision field exceeding 1.5. In the case of the Sufami Turbo, there are no known revisions of any games released for the device, and so although the field is technically not required here, it is included anyway for consistency. higan will not actually change its emulation based on this field.

The note field is optional. Anything can be included here.

The ordering of these fields is unimportant, and is up to user preference, so long as all required fields are present.

Specification — Boards

The board field is mandatory for games that are distributed on printed circuit boards. They describe the components (memory, oscillators, etc) that are present on each individual game board.

If the game PCB contains a serial code that can be used to deduce the board layout, this is provided. In the complex example above, the 1K1B component indicates a DSP1 game with a 74LS memory mapper.

If the game does not contain a serial code, or it is not known due to poor game preservation of the given system, then it will be up to the individual system what appropriate value should go here to inform higan how to emulate the cartridge in question properly.

The long-term intention is to have per-system documentation to describe the board value for each. In the cases that all PCBs are uniform, as in the PC Engine, this field would just be left blank, but still present, so that child node components could be present.

memory

The memory node is used to describe memory. Instead of being per-physical chip, it is meant to describe per-purpose functionality. So for instance, the same game might be released initially on two 8mbit program ROM chips, but later re-released on a single 16mbit program ROM chip. Instead of trying to encode a detail not used by emulation, only one memory entry will be present to describe the full 16mbit program ROM. However, a data ROM, say from the SPC7110, or a character ROM, say from the Famicom/NES, will be described separately, as this detail is very important to emulation.

The type field encodes the type of memory used. ROM indicates truly read-only memory; typically mask ROM or EPROM. RAM indicates rewritable memory, such as SRAM or PSRAM. RTC indicates a real-time clock. EEPROM and Flash represent memory which can be reprogrammed by software. Very obviously, this field is mandatory.

Both RAM and RTC types can have an optional child specifier to indicate if a battery is connected to ensure non-volatile memory. If the battery child node is not present on type, then the memory contents will not be loaded or saved by higan. This is important as there exist PCBs that have CR2032 battery connection slots, yet are not populated. As such, this detail has to be encoded on a per-game basis.

The size field encodes the number of bytes present on the memory chip. This can be expressed as either a decimal value (eg 256), or a hex value (eg 0x100). Hex is preferred. This field does not support logical units such as 8mbit, because many memory sizes do not fit neatly into these descriptions, such as the 0xc00-byte data ROM for the Cx4 coprocessor. The size field is required because some boards can support multiple ROM sizes.

The content field is mandatory, and describes the purpose of the memory chip. This can contain values such as Program, Data, Character, Save, Time, and so forth.

The manufacturer field is only sometimes mandatory, to disambiguate coprocessor firmware by the same content names, as well as to indicate specific flash-based chips with different protocols in combination with the memory size.

The architecture field is is only sometimes mandatory. It should be populated when the manufacturer field is, if the value is known, and represents the ISA of a given CPU. So for instance, the NEC uPD7725 contains internal program and data ROM, plus internal data RAM. This memory may exist inside of another chip, but as the memory nodes contain memory by use, they are listed separately, and are tagged as being part of said coprocessor. The architecture field can also further disambiguate edge cases such as NEC releasing both the uPD7725 and the uPD96050.

With all of these fields combined, the exact ordering of memory nodes becomes position-independent. That is to say, it doesn't matter how they are ordered, each should be uniquely identifiable.

higan locates memory files via lowercase(architecture.content.type), or if the architecture field is empty, lowercase(content.type). So for instance, program.rom, save.ram, or upd7725.data.rom.

Overriding of filenames is not permitted, so as to enable manifest-free operation and less database fields for easier hashing and loading of individual component files.

oscillator

The oscillator node is used to describe clocks that exist on PCBs. A game may optionally include a quartz crystal or ceramic clock, and this component node documents such cases. For example, the NEC uPD96050 may ship with an oscillator with an effective frequency of either 11MHz or 15MHz. This field disambiguates this case.

The frequency field describes the exact oscillator frequency in hz (hertz). No suffix is added. So in the complex example above, 7600000 implies a clock rate of 7.6MHz.

Specification — Extensibility

Manifests may contain any amount of additional fields. Although higan will not use them at the present time, it may in the future add additional fields as needed to emulate new edge cases discovered.

As an example, one may wish to describe an oscillator in more detail:

oscillator
  frequency: 11000000
  divider: 2
  type: quartz

This is still a valid manifest, the extra fields will simply be ignored.

There can even be additional root-level nodes present, as unlike XML, BML does not require only one root-level node per document.