Experimenters: Audio style file format

pjd · April 09, 2018, 03:54:33 PM

Ever since Yamaha distributed the audio styles for Genos, I've been meaning to take a look inside of an audio style file. Here's a little preliminary information.

An audio style file is an IFF-like container just like a Standard MIDI File (SMF). In fact, an audio style file has the same internal organization as a regular style file which we know to be a Type 0 SMF with extra chunks.

An audio style file has the following chunks (in order):


    Type    Purpose
    ----    ------------------------------------
    MThd    SMF header chunk
    MTrk    SMF track chunk
    CASM    Yamaha CASM chunk
    AASM    Audio assembly (descriptor) chunk
    AFil    Audio file (waveform) chunk
    OTSc    Yamaha OTS chunk

The AASM and AFil chunks are new, additional chunks beyond the known MIDI, CASM and OTS chunks. All chunks have a four byte chunk identifier and a four byte chunk size. The chunk size does not include the identifier or chunk size bytes, as usual.

The AASM chunk is relatively small, about 2,500 bytes. It consists of 15 variable length ASEG subchunks. The ASEG subchunk has a four byte subchunk size. Each ASEG corresponds to a style section; that's why there are fifteen of them.

An ASEG subchunk has three parts:


    Type    Purpose
    ----    ------------------------------------
    Adec    Identifies the style section
    Atab    Identifies the audio file; other functions unknown
    AMix    Function unknown

The Adec part is variable length, having an explicit four byte size. The Atab and AMix parts appears to be fixed length (101 and 28 bytes, respectively) ~~and do not have an explicit size field~~.

The Adec part is ASCII text and is a style section name like "Main A" or "Fill In DD". That is the only information in Adec.

I don't know what the Atab or AMix parts do. The Atab part contains an ASCII string which identifies the audio file associated with the style section. This string is clearly visible in a dump. (Example below.) All of the Atab and AMix parts in the test audio file have the same values except for the audio file names.


File Offset:       36965
Subchunk type:     'ASEG'
Subchunk size:     151
Section name:      Main D
Atab type:         'Atab'
   0    0    0   97    0   32   32   32 | 00 00 00 61 00 20 20 20 | ...a.
  32   32   32   32   32   41   56   48 | 20 20 20 20 20 29 38 30 |      )80
 115   67   97  110   97  100  105   97 | 73 43 61 6E 61 64 69 61 | sCanadia
 110   82  111   99  107   95   77   97 | 6E 52 6F 63 6B 5F 4D 61 | nRock_Ma
 105  110   32   68    0    0    0    0 | 69 6E 20 44 00 00 00 00 | in D....
   0    0    0    0    0    0    0    0 | 00 00 00 00 00 00 00 00 | ........
   0    0    0    0    0    0    0    0 | 00 00 00 00 00 00 00 00 | ........
   0    0    0    0    0    0    0    0 | 00 00 00 00 00 00 00 00 | ........
   1   15   -1    7   -1   -1   -1   -1 | 01 0F FF 07 FF FF FF FF | ........
   0    0    0  127    0    0    0    0 | 00 00 00 7F 00 00 00 00 | ........
 127    0    0    0    0    0  127    0 | 7F 00 00 00 00 00 7F 00 | ........
   0    0    0    0  127    0    0    0 | 00 00 00 00 7F 00 00 00 | ........
   0    0    0    0    0    0    0    0 | 00 00 00 00 00 00 00 00 | ........
AMix type:         'AMix'
   0    0    0   24    7 -128    0   -1 | 00 00 00 18 07 80 00 FF | ........
  88    4    4    2   24    8    0  -80 | 58 04 04 02 18 08 00 B0 | X.......
   7   71    0   10   64    0   91    0 | 07 47 00 0A 40 00 5B 00 | .G..@.[.
   0   -1   47    0    0    0    0    0 | 00 FF 2F 00 00 00 00 00 | ../.....

I'm still working on the AFil chunk. It has substructure, too. The AFil has the following (sub)chunk types:


    ADSg
    ANdc
    AWav
    WAVE
    Afmt
    Sfmt
    SPnt
    Sdec
    Adat
    Atmp

ADSg seems to be the container chunk. Like ASEG, there is fifteen of everything. The ANdc subchunk contains the audio file name which matches up with the name in the ASEG. AWav looks to be a container, too. (I need to verify this hunch.)

As you might guess, the AFil chunk is pretty big because it contains waveform data.

The audio "file" format is WAV-like, but it is not exactly WAV (Microsoft RIFF). I was able to playback the audio by importing the audio style file as a raw (untyped) audio file. The audio format seems to be 44,100Hz, 16-bit stereo. No compression or encryption. It shouldn't be too hard to dump the audio. Or, maybe replace the audio, thereby making it possible to create a new audio style.

I've got some Java code that I will eventually share as soon as I get the AFil chunk worked out.

All the best -- pj

pjd · April 09, 2018, 04:50:23 PM

By the way, I don't know if there is any proprietary monkey business in either the CASM or MIDI parts of the audio style file. It would be sort of cool if we could strap AASM and AFil chunks to an existing style file and get an audio style. Kind of like Jørgen's split/splice.

Take care -- pj

vlbrgt · April 09, 2018, 05:47:49 PM

pjd,

AMix looks like midi event codes.

AMix : header
00 00 00 18 : length of data
07 80 : 0780 hex = 1920 decimal (PPQN ?)
00 : delta time
FF 58 04 04 02 18 08 : meta event Time signature 4/4
00 : delta time
0B 07 70 : controller volume
00 : delta time
0A 40 : controller Panpot
00 : delta time
5B 00 : Controller Reverb send level
00 : delta time
FF 2F 00 : end of MTrk trunk

Regards
Etienne

pjd · April 09, 2018, 07:55:57 PM

Quote from: vlbrgt on April 09, 2018, 05:47:49 PM
AMix looks like midi event codes.

Beautiful observation!

Thanks -- pj

pjd · April 10, 2018, 04:17:54 PM

Here's just a few quick observations while working out the AFil chunk.

An AFil chunk consists of 15 ADSg subchunks. The following table shows the offset and length information for the first ADSg in the example AFil:


    AFil     37287  15261858
    ADSg     37295   1219275      Container for an audio file
    ANdc     37303        50      File name
    AWav     37361   1219209      Container for audio waveform
    WAVE     37369       n/a      Marker (no subchunk size)
    Afmt     37373        16      Audio format information
    Sfmt     37397       217      Container for section information
    Sdec     37608         6      Section name, e.g., Main A
    Adat     37622   1218300      Waveform data
    AInf   1255930       640      Container for audio information
    BPnt   1255938       136
    OPnt   1256082       240
    APnt   1256330       232
    ATmp   1256570         0      Empty, subchunk size is 0
    ADSg   1256578

The container relationships are important because the containers and subchunks are nested:


    AFil contains ADSg
    ADSg contains ANdc, AWav
    AWav contains WAVE, Afmt, Sfmt, Sdec, Adat, AInf
    AInf contains BPnt, OPnt, APnt, ATmp

Thanks to all of this nesting, it's gonna take a little while to extend the Java code.

I appreciate everyone taking a look and making suggestions and comments, especially about the meaning of this stuff.

Take care -- pj

pjd · April 13, 2018, 11:03:31 PM

Quick status...

I made a little more progress coding. I can read and recognize all of the major (sub)chunks. The code is a little brittle in that it assumes the (sub)chunks are in a particular order.

As a side benefit, my program dumps all 15 of the audio phrases in an audio style in pure raw format. The raw audio file can be read into a DAW and converted to WAV.

I got distracted by the new Audio Phraser application and am trying it out.

Yamaha assume that all of the MIDI, NTR/NTT and OTS editing will be performed on a keyboard (e.g., Genos), not on a PC. After more experience with Audio Phraser, I may start another thread about use cases (maybe in the General area of the Forum because this topic touches all YEM and audio style capable keyboards).

All the best -- pj

pjd · April 16, 2018, 09:21:27 PM

Now that you know a little bit about what's inside of an audio style file, here is brief overview of what the Audio Phraser program generates.

Audio Phraser generates an MThd MIDI file header chunk, a single MTrk chunk (Type 0), an ASEG chunk for each audio waveform, an AFil chunk (containing an ADSg subchunk for each audio file) and a CASM chunk.

The MIDI tempo and time signature are the same as the tempo set in Audio Phraser. The MIDI song title is set to "Audio Phraser".

The MIDI track contains the usual markers at the beginning: SFF2 and SInt. A single Sy*** message is generated after SInt: General MIDI System ON (F0 7E 7F 09 01 F7). The key signature is set to C/Am, followed by:

SMPTE Offset
Sequencer-specific MIDI meta event: ff 7f 04 43 00 01 00 00

Oddly, MIDI channel 4 has four, whack-looking MIDI OFF events:


    NOTE OFF G#9
    NOTE OFF G5
    NOTE OFF C0
    NOTE OFF C0

A bug? The remaining markers indicate the start of the style sections. The section length corresponds to the length of the audio waveform for the section. Thus, if the audio waveform for "Main A" is 2 bars, then the MIDI section for "Main A" is 2 bars long.

The CASM chunk is minimal and sets NTR/NTT for MIDI channel 9 (Subrhythm). NTR is "Root Fixed" and NTT is "Bypass/Bass Off". No NTR/NTT is given for channel 10 (rhythm/drums).

Audio Phraser does not generate an OTSc (One Touch Settings) chunk.

Audio Phraser creates an AWI file for each waveform that it imports into an audio style file. The AWI file most likely holds the results of Audio Phraser's analysis (i.e., beat detection and so forth). It would be interesting and informative to compare the contents of an AWI file against the ASEG and AInf chunks in the resulting audio style file. I'm guessing that the AWI file is the "prototype" for the ASEG and AInf chunks.

CamiloCross · April 13, 2020, 12:08:45 PM

Quote from: vlbrgt on April 09, 2018, 05:47:49 PM
pjd,

AMix looks like midi event codes.

AMix : header
00 00 00 18 : length of data
07 80 : 0780 hex = 1920 decimal (PPQN ?)
00 : delta time
FF 58 04 04 02 18 08 : meta event Time signature 4/4
00 : delta time
0B 07 70 : controller volume
00 : delta time
0A 40 : controller Panpot
00 : delta time
5B 00 : Controller Reverb send level
00 : delta time
FF 2F 00 : end of MTrk trunk

Regards
Etienne

This really helps when adding reverb to the audio style or changing its volume for each main variation.
Now I could use the RHY1/RHY2 parts for instruments when the audio style contains all of the drums and percussion. Really hoping that the new midi protocol let you work with more than 8 channels for style creation.

CamiloCross · April 13, 2020, 12:09:28 PM

I'm really grateful for all your contributions because it has helped me on the journey of creating and understanding audio styles. At the time I have created two of them and although it is a time-consuming task compare to regular styles, the result is really satisfying.

alanstechnical · May 14, 2022, 04:22:23 AM

Yamaha dosnt allow to copy my own audio style from user to usb... is there any way around it? I have seen that, after editing audio style (adding parts to the audio style in keyboard) in keyboard and changing the bars which can remove audio part of the style... with that can i merge audio and the parts..

i dont know how to work with hex files...