Discrepancy Between Specs And Sound Quality

SOULNOTE Chief designer Kato continues his series of essays on design philosophy with Discrepancy between specs and sound quality.  
The specifications (static specs) referred to here are so-called catalog specs such as distortion rate, frequency response, and signal-to-noise ratio. These are easily quantifiable performances, mainly using sine waves for measurement.
In the audio industry, everyone knows that sound quality cannot be judged by specifications (static specs) alone. Also, anyone who likes audio knows that sound changes depending on cables and racks. No matter how precisely they are measured, even though the specs make no difference!
This would seem odd to anyone but an audiophile. In this age of scientific universalism, it seems impossible that people can sense small differences that cannot be detected by precise measurement with high-end measuring instruments. Human hearing is not that good, and the frequency range is only 20 to 20 KHz at best. (But for sine waves! but)
Well, that is why, even though we knew that sound quality cannot be described by specs alone, I think there was still a part of us that could not go against the specs. In other words, the history of audio is such that no one could refute the opinion that “sound is a matter of taste, so you are free to choose, but there is no doubt that a sound with better specifications is a more correct sound.
For example, suppose you were developing a product, and by working on the circuitry, you improved the specs in some way. And suppose the sound has changed. In that case, most engineers would assume that the sound with the better specs is the “better sound. Furthermore, if a major manufacturer develops a new device and no matter how good the sound is, if the specs are inferior to the previous product, the bosses and sales people will usually not allow the release of the new product. This is especially true if the manufacturer explains the sound quality by the quality of the specs.
Let me tell you an old story.
When I was a student, I loved music, but I had no money, so I built amplifiers and speakers as a hobby. At first I didn’t have any proper measuring instruments, but that didn’t matter because as long as I could enjoy listening to music, that’s all that mattered. For me, it was a proud device that allowed me to enjoy listening to music. It sounded a whole lot better than my friend’s high-end amplifier.
However, one day I acquired a measuring device. When I measured it, it was terrible. Then, I wanted to improve the measurement value as much as possible. And as a result of various improvements and better readings, I was very shocked. Listening to music with it is completely boring. Why is this? Since then, I have been thinking about this for 40 years. And then I arrived at a way of thinking.
Imagine for a moment.
What if I could explain to you that specs don’t mean much for sound quality? Furthermore, what if you could explain that improving the specifications may even degrade the sound quality? Don’t you think it would be like a change in values?
I can explain this. It is not that difficult. It is all because of a certain curse.
There are plenty of examples where value cannot be determined by specifications (measurements).
In today’s age of universal science, Everyone thinks that it is impossible for humans to hear differences that cannot be detected even by the most advanced measuring instruments. But is this really so? In fact, there are many values around us that cannot be easily quantified.
Take, for example, cooking. Suppose we measure the mass of each ingredient with a state-of-the-art measuring instrument and make it exactly the same to the nearest 0.0001g. Even so, if the creators were a world-famous chef and myself, it is only natural that the resulting dish would taste different. The reason is that although the ingredients are exactly the same, the cooking skills are different. But can the cooking skill be quantified? And can the taste be quantified? This is quite difficult. Even today, the only way to evaluate the taste of a dish is to try it.
Take automobiles, for example. If two cars with precisely matched engine power and weight were driven on a circuit by the same driver, would they set the same time? That is not possible. Body rigidity and suspension settings can completely change the time. This is because cornering performance changes. However, there is no section on cornering performance in the car catalog. In other words, you cannot know the performance of a car until you drive it. Even in the cutting-edge F1 where everything is electronic and various simulations are possible, at the end of the day, the only way to tune the car is for the driver to actually drive it.
I gave examples of food and cars, but that has nothing to do with audio! I am sure I will be scolded. That’s right. But there are usually values that cannot be expressed in numbers! I am just showing you an example.
Now, sound quality in audio is different from these. This is because not only can’t sound quality be measured by specs, but also better specs can make the sound worse.
The usual way is to tune the sound quality after improving the catalog spec. But there is a trap there.
In the last issue, I wrote about an example that even today, there are some performances that cannot be expressed in numbers. In the same way, in audio, there may be some factors that cannot be expressed in numbers but change the sound. There may also be factors that change the sound with cables. There may be factors that are not yet generally known or overlooked.
Well, even if there are factors that cannot be expressed in numbers, an engineer of an ordinary audio manufacturer would consider the following. “Why don’t you just improve the distortion ratio, signal-to-noise ratio, frequency response, and other catalog specs, and then make the sound better!” This has been the conventional wisdom. Especially in the past, catalog spec competition was fierce, and even now in the field of digital audio, spec competition is fierce. Everyone thinks that there is no way that the sound will get worse by improving the values. That is a trap…improving catalog specs can result in a bad sound. And it is not uncommon. In many cases, pursuing catalog specs more than necessary is accompanied by deterioration of sound quality. The reasons for this are described below. It will be a bit long, but please read it. We will reach a conclusion that no one has ever told you before! However, it is not yet a theory that has been proven by official experiments, and my subjective opinion will be involved, especially when it comes to sound quality evaluation. I will be the first to admit that. However, I am confident that the sound quality obtained by this method will resonate with many people.

Sound can exist only in two axes, Amplitude axis (voltage axis) and Time axis.

To begin with, sound is made up of the Amplitude axis and the Time axis, which are the vertical and horizontal axes in a graph. Music sources in audio are also recorded as amplitude (voltage values) per time. This is basically the same for both digital and analog sources. Without the time axis, sound cannot exist. As proof, there is a “still image” in video, but there is no such thing as a “still sound” in sound. You have never heard a still sound, have you?

A catalog spec is a performance that ignores the time axis.

Sine waves are used to measure catalog specs such as distortion rate, frequency response, and signal-to-noise ratio. The reason is that it is convenient for quantification. A sine wave is a signal of a single frequency that lasts forever. It is a static signal with no dynamic changes. I mentioned that there is no static sound, but a sine wave is close to that. This makes the measured result less likely to reflect a temporal component. I mentioned that sound has two axes, the “amplitude axis” and the “time axis,” but the catalog spec is a measurement that almost ignores the “time axis” in order to make it easier to quantify.

The Curse of the Fourier

We often use FFT (Fast Fourier Transform) analyzers to analyze sound. Simply put, the FFT transforms the time axis into the frequency axis for easier analysis. Assuming that a signal of a certain time width is repeated forever, we decompose it into its frequency components and arrange them. This is called the Fourier transform. The familiar frequency response graph is the result of the Fourier transform itself. In this case, too, the time axis is completely ignored.
In other words, it is a fourier transform that turns the food into a paste in a blender and then separates and arranges it by component in a centrifuge. The chef’s skill is ignored.
Somehow we have come to think of sound quality in terms of the Frequency axis. And somehow we forgot about the Time axis. I call this the Fourier’s curse.
When I was a child, I used to think that a perfect graphic equalizer would give me the freedom to create any kind of sound quality I wanted. But of course, even if you match the frequency response, the sound quality will not be the same. We try to find the answer in the signal-to-noise ratio or distortion ratio. But that is the curse of Fourier: we are made to forget about the time axis. It is as if we wonder at the difference in taste between two dishes made with the same ingredients in the same quantity (exactly in the Frequency axis). The cook’s skill, for example, the order in which the ingredients are added or the simmering time (Time axis), is not even considered. It is truly a curse.

Static performance and Dynamic performance

From this point on, frequency axis performance that can be quantified as catalog specs, such as distortion rate, frequency response, and signal-to-noise ratio, is called static performance.
On the other hand, the performance related to the time axis, which is difficult to quantify, is called Dynamic performance.
Dynamic performance is the lost performance that does not appear in ordinary catalog specs. If we were to mention just a few, rise time, impulse response waveform, clock jitter, etc., are among the Dynamic performance. However, it is difficult to quantify and visualize because it seems to affect the sound only for a very small period of time.
Dynamic performance is like a chef’s skill in cooking. In the case of a car, it is the cornering performance. It is interesting to note that the time axis is also a factor in these performances and is difficult to quantify. Humans seem to be good at ignoring time and quantifying it. The only way to determine the essence is to eat or drive. Dynamic performance in audio can also be understood by listening, and it can be said that performance can only be judged by listening.
And there is something even more tricky. Static performance and dynamic performance in audio become a trade-off after a certain level. The reason for this lies in the characteristics of human hearing.


Here is an extreme example of how too much pursuit of static characteristics leads to degradation of dynamic characteristics. I am sorry to use the car analogy again.
A car used in a competition where the competition is based on acceleration in a straight line for 400m is called a drag car. It is much faster than an F1 car in terms of straight line acceleration, but it cannot turn. Let’s apply this to audio.
The performance required to listen to music is similar to the performance required to drive a car fast on a circuit. In other words, the performance to trace (reproduce) various circuits (sound sources) faithfully, in other words, dynamic characteristics are important. On the other hand, a straight line is a sine wave in audio. Therefore, the performance that can be measured is exactly the static characteristic. An audio product that places too much emphasis on static characteristics, like a drag car, cannot reproduce music properly.
With audio equipment, it is common to first improve static performance and then tune the sound quality. But is that enough?
This is not the case with cars.
It is impossible to increase cornering speed by tuning after building a car that pursues straight line performance first; the basic design of F1 cannot be considered without cornering performance.
A car that specializes in static performance (straight line performance) cannot drive on a circuit. 

Frequency Brain

Finally, I will write about human hearing. It seems to me that the conventional wisdom about hearing is also distorted by the curse of the Fourier’s law, which is centered on static performance. When we evaluate sounds, we unconsciously think in terms of frequency axis, such as bass, midrange, treble, and so on. I call such a cursed way of thinking “frequency brain.
Humans can sense the frequency range above 20kHz.
It is common knowledge that humans cannot hear above 20 kHz. Of course, I cannot hear it either. However, that is the case with sine waves.
Let me put it this way.
“Humans cannot hear above 20 kHz in the case of a sine wave, but they can sense the slowing of the rise of a musical waveform when the frequency band above 20 kHz is cut off.
In other words, experiments emphasizing Static performance with the frequency brain and experiments emphasizing Dynamic performance with the time axis taken into account have different results. Let me illustrate this with my favorite sushi.
Let’s compare two sushi made by a sushi chef and an amateur using the exact same ingredients and rice. The frequency-brain experiment goes like this. The experiment is to crush the sushi in a blender and analyze the ingredients in a centrifuge. The result will be that there is no difference in the ingredients, so the taste is the same, and so on. The result will be that the taste is the same because there is no difference in the ingredients, and that there is no difference in the taste depending on the hand gripper. Of course, I would not be able to taste the difference in the taste of the sushi that has been crushed into mush. I wouldn’t even want to eat it.
The sine wave experiment is an experiment that does not take into account the time axis, i.e., the sludge sushi experiment. Why not eat it and compare? That’s because it can’t be quantified and is subjective. And the component results of sludge sushi are more important. That is what audio is today, raped by the frequency brain. No matter how good the sound is, “if it’s not measured right, it doesn’t sound right!” Static performance is a universal opinion, which cannot be clearly refuted. It is at such a level. Isn’t it ridiculous?

Sound image localization

There are any number of events generally recognized in modern audio that contradict the assumption that humans do not perceive anything above 20 kHz. Take, for example, sound image localization. If the equipment is excellent, we can perceive three-dimensional sound image localization with two speakers. I don’t believe that! If you are one of those who say, “I’m sorry,” there is no need to read any further. It is true that some people do not feel it, but it is also true that some people do. Assuming that “humans cannot hear above 20 kHz, so it is not necessary” is correct, it is impossible to explain the three-dimensional localization of the sound image itself. This is because the phase difference required to produce a finely spread sound image localization, when converted into frequency, far exceeds 20 kHz.

Sound Difference by Clock Generator

This is also a fairly well-known phenomenon. It is becoming common knowledge that the difference in sound quality with a 10 MHz clock generator is very large in today’s audio. This is exactly what we are talking about when it comes to the time axis. As I have said before, sound is made up of only an amplitude axis and a time axis. The reference for the amplitude axis is GND, and the reference for the time axis is the clock signal. The clock signal controls half of the sound. So it is no surprise that it has a significant effect on sound. However, it has no effect on the results of measurements made with the frequency brain. No matter how much jitter (time fluctuation) there is in the clock signal, as long as the period is correct, the time fluctuation is averaged out and makes no difference.
Thinking about clock generators is a chance to break free from the frequency brain. It is a proof that humans can perceive minute behaviors of 10MHz, not 20kHz.

LPF (Low Pass Filter) Experiment

This is an easy experiment. For example, the analog amplifier stage of the D-2 or S-3 is basically flat, but it has a built-in LPF that attenuates by 8 dB at 100 kHz and can be switched between through and through with a switch. The LPF is a simple construction with a mechanical relay to turn the capacitor on and off, and it has no effect on the audible band below 20 kHz. However, anyone can recognize the difference. The LPF has been removed from the S-3 Reference and D-3. We removed the LPF from the S-3 Reference and D-3 because, of course, it is better to have no LPF in terms of sound quality.

Ferrite core experiment

The experiment of inserting a ferrite core, which attenuates at 10 MHz or higher, into a line cable or speaker cable is simple. Simply snap it into place and conduct the experiment. If the equipment is excellent, there are few people who do not feel a change in sound, whether good or bad. This proves that humans can sense changes in signal waveforms at 10 MHz. Whether the cause is the reduction of high-frequency noise or the dulling of the signal waveform, the difference can be felt. I believe that a proper blind experiment would yield a useful difference. However, you need good equipment and good testers. It is impossible for someone who has never eaten sushi to do a sushi comparison.
Can you hear above 20 kHz? The various experiments that have been conducted in the past on the subject are full of intrusions, such as super-tweeter experiments that ignore waveform synthesis, and experiments with random people. These are also the work of the frequency brain.
In the next article, I will finally explain how static performance and dynamic performance become a trade-off relationship from a certain level. In other words, why raising Static performance more than necessary degrades Dynamic performance.