YEM: Voice volume mystery solved

Started by BogdanH, October 22, 2023, 12:38:47 PM

Previous topic - Next topic

BogdanH

I just realized that in this section, among last four topics, I started three of them... Well, simply don't read if you think it's wasted space and time.. or moderators remove it -no hard feelings  :)

This article is meant for those who create custom voices in YEM and are familiar with dB term.

To create a custom voice, we need wav samples which have properly adjusted loudness level. That is, all samples must have the same loudness level -for notes to sound equally loud in complete range that voice covers. Of course, we can make corrections later in YEM, but that can be very difficult, because in this case the result depends solely on our guessing (and that's never a good approach).
Sidenote: I mean main samples in first place (and not effects which are maybe added in separate layers).

In audio processing, loudness is measured in LUFS, which is human percieved loudness. Like dB, LUFS scale has logarithmic characteristic and usually goes somewhere from -40 (very silent) up to 0 (maximal).

So, how loud it should be? In general, as loud as possible -as long there's no clipping. But at the same time, our custom voice should have similar loudness as already existing voices -because we don't wish our custom voice would rip-off speaker membrane when we switch to it.

To keep it short: loudness of our samples should be between -12 and -15LUFS. Btw. that's also the loudness that Youtube (among others) recommends for published audio content. It doesn't really matter if we decide for -12 or -15LUFS. Important is, all samples should have similar LUFS value. And one more thing: it makes no sense to increase loudness above -12LUFS, because in this case, we will need to additionally decrease volume later in YEM anyway. Why additionally? Because at loudness of -12LUFS, voice is already too loud compared to existing (preset) voices.
So why not starting at (for example) -20LUFS, so we wouldn't need to decrease volume later in YEM? We can do that, but maybe one day we will use these samples for some other keyboard (which could have overall louder voices than our current keyboard).

And we're finally at the topic... When talking about voice loudness, YEM doesn't deal with LUFS and dB. There are just volume sliders which range from 0 to 100 and that's it. Ok, 0 is silence and 100 is maximum. But how much is 50? If you think that's half loudness compared to 100, then you're wrong.  And if you think that reducing volume from 100 to 90 has the same impact as reducing volume from 80 to 70, then you're wrong again.

The problem is, when we adjust volumes of newly created voice in YEM, we can't hear the result of our adjustments: we need to guess. And after we install new voice on keyboard we realize, that we need to decrease volume just "a little" more (say, by 3dB), but we have no idea how much on 0..100 scale is needed for "a little". And so we guess again.. install.. and guess again. Ok, truth to be told, our hearing isn't that perfect anyway and so in my opinion, the best solution would be if volume adjustments in YEM would be in dB scale.

But you know what? Internally YEM actually uses dB for volume settings! That is, volume 100 is 0dB (=no volume reduction) and volume 0 is -96dB (=silent). Why -96dB for silent? Because that's the dynamic range of 16bit 44100Hz (CD quality) wav files. Btw. you can check these values if you open ppf file with some hex editor.
Why Yamaha decided not to use dB calibrated sliders in YEM, is a mystery to me.

The question that remain is, how does 0..100 volume translate into -96..0 dB range? Because by knowing this, we could answer two main questions:
1. How much is loudness reduced if I decrease slider position from N1 to N2?
2. To what position do I need to move the slider if I wish to reduce volume by X dB?

Answer to 1st question: dB = 40x(log(N2)-log(N1))
Example:
Current slider position is N1=80 and I moved it to N2=70. How much is loudness reduced compared to slider at N1?
40x(log(70)-log(80)) = -2.3dB
How much is total loudness reduction?
40x(log(70)-log(100))= -6.2dB

Answer to 2nd question: N2 = 10^(log(N1)-X/40)
Example:
Current slider position (N1) is 80 and we wish to reduce loudness by X=6dB. What's new slider position?
10^(log(80)-6/40) =57

Above formulas obviously don't work for slider at 0 position (because log can't have zero value). That is, if volume slider is at zero position, then YEM simply uses -96 value.

How I came to above formulas? I simply checked what YEM writes at certain scale settings and I saw the pattern:
50 (half of 100) =-12dB,.. 25 (half of 50) =-24dB, ...
-that is, at halving volume value, loudness is reduced by fixed amount (12dB) and so conversion is obviously logarithmic.

Some will probably ask: where's the benefit of knowing all this?
First, those who have sense for dB (probably every experienced audio editor) will for sure appreciate knowing what's going on in YEM.
And second, if it happens to someone that volume sliders in YEM don't act as expected, here's the answer why's that.

I hope that is of some help for at least one person  ;)

Bogdan
PSR-SX700 on K&M-18820 stand
Playing for myself on Youtube

travlin-easy

Love Those Yammies...

Amwilburn

Thanks Bogdan, this will save me a ton of trial and error time! (I gave up trying to make my own samples because of trying and failing to set loop points, but now I can DB adjust other unlocked samples that are too loud)

Mark

BogdanH

hi Mark,
I'm glad to hear that you find this useful. And what you say, was the main reason why I thought that it's worth sharing this info: to make process of voice creation just a little bit easier (or at least more understandable).

"..failing to set loop points"? -can you explain what you mean with that?

And being at looped samples... If we plan to use soundfont (SF2) as voice source, then it's always important that we check for looped wav files before importing SF2 into YEM. As we know, SF2 holds two separate wav files for L and R channel (stereo). And it can happen that these two channels use different loop length on each channel. That's not the problem if SF2 is used in SF2 player (i.e. Polyphone), because each channel is processed (looped) separately and then result combined as stereo sound.
But we can't import L/R channels separately into YEM. They need to be converted into single (stereo) wav first -and such file can only have one loop length for both channels. Means, the result will be not be exactly the same as what we hear from SF2 player. In worst case, it can happen that we can hear "click" in the loop.
.. there are surprises everywhere..  :)

Greetings,
Bogdan
PSR-SX700 on K&M-18820 stand
Playing for myself on Youtube

Amwilburn

Oh I mean it was easy enough to create voices based on a simple wav file,  but it was memory inefficient to sample everything for like 12 seconds; instead I wanted to sample just a small segment and repeat it, but couldn't find the software or notation that I needed to navigate that. (My PC is 14 years old, FYI)


BogdanH

Thank you for clarification, Mark
Yes, certain voice types (woodwind, violins, etc.) are usually looped, because it saves a LOT of memory. For loop creation, I mainly use LoopAuditioneer -not only because it's free! Don't let it's simplistic interface fools you: it does perfectly what it's supposed to do. Actually, I couldn't find any other audio editor that's even close to LoopAuditioneer. It's only 12MB and so I'm sure your ancient PC can handle it  ;)
For info (in case you decide to try it): it only finds (and marks) best loop start/endpoints. For everything else you need separate audio editor.

Greetings,
Bogdan
PSR-SX700 on K&M-18820 stand
Playing for myself on Youtube