# REW Accuracy/Precision



## johnr (Jan 6, 2007)

Hello,

First off, REW is an amazing piece of software...but you all know that already!

I have not seen any analysis of the accuracy or precision of REW. Has this been done? Are there tests one can perform that could validate these things or would more controlled environment be required?

Thanks,

John


----------



## tonyvdb (Sep 5, 2007)

The accuracy all depends on the mic used, if its an ECM8000 then you will likely not find anything better.


----------



## johnr (Jan 6, 2007)

tonyvdb said:


> The accuracy all depends on the mic used, if its an ECM8000 then you will likely not find anything better.


I imagine a large component of the overall system acurracy/precision depends on the mic (and soundcard) but my question is around REW software itself. For example, the following areas might have transformations that could impact quality of the measurement:


Pulling the data off the sound card and transforming it as required
Sound card calibration and subsequent application of that calibration
Application of mic calibration
Rendering of results

There are probably other areas where accuracy or precision could be impacted within REW iteself.

John


----------



## JohnM (Apr 11, 2006)

The principal limitations in acoustic measurements are the environment and its background noise levels; the mic preamp noise floor and dynamic range; and the microphone itself. All other factors pale into insignificance by comparison. 

For electrical measurements (e.g. connecting directly to the line outputs of a device to measure it) the next limiting factor is the soundcard data resolution, REW uses JavaSound to access the soundcard and whilst word lengths greater than 16 bits can in principle be supported in practise they are not. That is not such a limitation as it might seem, since the log sweep measurement method has considerable process gain so measurement dynamic range of 160dB or more is still comfortably achievable. 

Moving down another step the soundcard sample clock stability can have an influence, though not as significant as poor quality resampling of data from the soundcard on its way through the OS if the card is running at a rate other than that REW requests, which can generate artefacts at the -80dB FS level and below. 

Calibration data is only used in the presentation of the frequency responses, essentially the calibration data gets added to the raw response so inaccuracies carry through to the displayed results.

Contributions from the numerical precision effects within the underlying internal calculations are comfortably below the other factors mentioned.

Sweep or spectrum measurements of a soundcard loopback connection are the easiest way to assess the underlying limits.


----------



## johnr (Jan 6, 2007)

Thanks for the thorough response.


----------



## CraigNZ (Apr 22, 2010)

We must also remember that accuracy and precision are two different things. The best analogy I have seen of this is a target on a shooting range. Precision is a measure of how 'closely packed' the holes on the target are. A very high concentration indicates precision, a scattered pattern indicates a lower precision. Accuracy is how close to the center of the target the holes are. Notice that you can have high precision but low accuracy, for example you can have the all the holes tightly patterned but way off to the edge of the target. You can also have high accuracy but low precision, for instance, a scatter pattern of holes but the geometric center of the hole pattern would be the center of the target.

In the case of electrical measurements, precision would denote the variance in measurement of the data. So if all the readings are all within .001 volt then we say we have a measuring precision of 1mv. If the readings vary by say .1 volt then we have a precision of .1 volt. As for accuracy, notice that we can be completely recording the wrong voltage, but have a great precision. For example, the actual voltage is say 1V and we make 10 measurements which have a variance of .001 volt but the average of the readings is .995 volts. So the precision is .001 volt and the accuracy is +/- .005 volts.

Two completely different aspects of the data. This is not a completely accurate definition of accuracy and precision, but without going into the statistics of measurements and data, it is good enough to get a handle on what we are seeing.


----------



## scharfsj (Jun 16, 2010)

I've done a formal "precision to tolerance" study using REW 4.0 and shown that for even using a dbs systems calibrated mike, the precision tolerance ratio is very good at, with a P/T around 4.0% for even the least precise frequencies. Conventions for statistically valid Gage RR studies state that the precision to tolerance ratio should be 5.0% or less.

The precision to tolerance study was conducted per standard six sigma conventions, using the formula 5.15*std dev/tolerance, in this case the tolerance was 0.3 dB (or ± 0.15 dB either side of the measurement). I took eight measurements at the frequencies shown because these were the frequencies I was using when I was doing a Design of Experiments to determine optimal settings of my sub to determine optimal in-room response. 










Accuracy can only be measured by comparing the mike to, ultimately, a NIST traceable standard. That, is accuracy is a measure of the "the truth"; that is, how representative of the _actual_ frequency response was the response of the microphone? in this case, the closer the mike is to the reference standard, the more accurate it is.


----------



## johnr (Jan 6, 2007)

scharfsj said:


> I've done a formal "precision to tolerance" study using REW 4.0 and shown that for even using a dbs systems calibrated mike, the precision tolerance ratio is very good at, with a P/T around 4.0% for even the least precise frequencies. Conventions for statistically valid Gage RR studies state that the precision to tolerance ratio should be 5.0% or less.


Interesting...thanks for sharing this.

Do you have an explanation for the variance in precision? Specifcally, 30Hz has a P/T approximately half of 20hz/50hz (which it is between as far as frequency range).

John


----------



## scharfsj (Jun 16, 2010)

johnr said:


> Interesting...thanks for sharing this.
> 
> Do you have an explanation for the variance in precision? Specifcally, 30Hz has a P/T approximately half of 20hz/50hz (which it is between as far as frequency range).
> 
> John


Unfortunately, i don't. It's just what i measured. I can post the results of the Design of Experiments if folks are interested, though it's fairly technical. It did allow me to arrive at close to optimal settings for integrating the sub w/o using equalization.


----------



## DanTheMan (Oct 12, 2009)

That's interesting, and I'm interested.

Thank you for posting that. It very much mirrors what I've seen(slight deviation upon repeating that doesn't amount to much), but I've been too lazy/stupid to get into that detail.

You East Bay guys are alright sometimes 

Thanks!

Dan


----------



## scharfsj (Jun 16, 2010)

I'm a scientist by profession. In recently acquiring a REL subwoofer, and with the data acquisition system that Tim helped (greatly..thanks, Tim) to set me up with, it naturally occured to me that one could use a formal system of experimentation called Design of Experiments (henceforth referred to here as DOE) to formally and reproducibly optimize the integration of the sub with the mains to provide the ideal room response. 

Design of Experiments is different than the classic scientific "change one factor, leave everything else the same" (One Factor At A Time or OFAT) approach that scientists and engineers were taught in school and are still primarily using to this day (much to their detriment in many cases). While the basic concepts of DOE have been around since the 1700s, it was originally formalized by the statistician R.A. Fisher. The difference between DOE and the classic OFAT approach is that you can change _multiple factors_ at a time, set the the factors different "levels" (effectively, high or low), and run a series of a defined pattern of experiments (even in random order), measure the response (the effect you are interested in) and the DOE will tell you which factors are most important to get the response you want, and more importantly, what, if any INTERACTIONS there are between factors. 

The ability to examine interactions between factors is one of the most important aspects of DOE, because if there are interactions involved between complex set factors and their resultant responses, you will never be able to deconvolute them without DOE, or if you can, it will not usually be without pure chance or luck, or without spending a lot more time, money, and effort in trying to figure them out, and often, you will still not figure them out in terms of which of the factors have the biggest effect from a statistically valid point of view. 

Given this, I set out to to perform a set of experiment to see if I could determine the optimal settings of speakers, REL sub settings and other factors, like grilles on or off, speaker toe-in, port plugs, etc. that would optimize the in-room response. 

My next post will be a brief introduction to DOE, and setting up the precepts of the experiments.


----------



## scharfsj (Jun 16, 2010)

Description
Design of experiments (DOE) is a powerful tool that can be used in a variety of experimental situations. DOE allows for multiple input factors to be manipulated determining their effect on a desired output (response). By manipulating multiple inputs at the same time, DOE can identify important interactions that may be missed when experimenting with one factor at a time. All possible combinations can be investigated (full factorial) or only a portion of the possible combinations (fractional factorial). Fractional factorials will not be discussed here.

When to Use DOE
Use DOE when more than one input factor is suspected of influencing an output. For example, it may be desirable to understand the effect of temperature and pressure on the strength of a glue bond.

DOE can also be used to confirm suspected input/output relationships and to develop a predictive equation suitable for performing what-if analysis. 

DOE Procedure
Acquire a full understanding of the inputs and outputs being investigated. A process flow diagram or process map can be helpful. Utilize subject matter experts as necessary.

Determine the appropriate measure for the output. A variable measure is preferable. Attribute measures (pass/fail) should be avoided. Ensure the measurement system is stable and repeatable.

Create a design matrix for the factors being investigated. The design matrix will show all possible combinations of high and low levels for each input factor. These high and low levels can be generically coded as +1 and -1. For example, a 2 factor experiment will require 4 experimental runs

..........................Input A Level.......Input B Level
Experiment #1.........-1......................-1
Experiment #2........-1......................+1
Experiment #3.......+1.......................-1
Experiment #4 ......+1......................+1


Note: The required number of experimental runs can be calculated using the formula 2n where n is the number of factors.

For each input, determine the extreme but realistic high and low levels you wish to investigate. In some cases the extreme levels may be beyond what is currently in use. The extreme levels selected should be realistic, not absurd. For example: 


Enter the factors and levels for the experiment into the design matrix. Perform each experiment and record the results. For example:


Factors................Input -1 Level.......Input +1 Level
Temperature........100 degrees.........200 degrees
Pressure..............50 psi.................100 psi


Calculate the effect of a factor by averaging the data collected at the low level and subtracting it from the average of the data collected at the high level. For example:

Effect of Temperature on strength: 
(51 + 57)/2 - (21 + 42)/2 = 22.5 lbs 

Effect of Pressure on strength:
(42 + 57)/2 - (21 + 51)/2 = 13.5 lbs

The interaction between two factors can be calculated in the same fashion. First, the design matrix must be amended to show the high and low levels of the interaction. The levels are calculated by multiplying the coded levels for the input factors acting in the interaction. For example: 

..........................Input A Level............Input B Level....Interaction
Experiment #1.....-1.............................-1.................+1
Experiment #2.....-1............................+1.................-1
Experiment #3.....+1............................-1................. -1
Experiment #4.....+1...........................+1.................+1


Calculate the effect of the interaction as before.

Effect of the interaction on strength:
(21 + 57)/2 - (42 + 51)/2 = -7.5 lbs

The experimental data can be plotted in a 3D Bar Chart.









The effect of each factor can be plotted in a Pareto Chart.

http://photos.imageevent.com/puma_cat/fujif31andf20photos/doe-tutorial2.gif[/imp]

The negative effect of the interaction is most easily seen when the pressure is set to 50 psi and Temperature is set to 100 degrees. Keeping the temperature at 200 degrees will avoid the negative effect of the interaction and help ensure a strong glue bond.
_________________

This simple-minded example above shows that there is an interaction between temperature and pressure in the strength of the glue bond. This is one feature of DOEs that is particularly useful when looking at the effect of a number of factors and their effect on the critical functional response. 

Now that that is out of the way as intro, let's look at the specific experiment I had in mind in the next post.


----------



## scharfsj (Jun 16, 2010)

The experiment I was trying to reproduce was the one based on the graph that 
Tim linked to a while ago from resarch from JBL showing a much preferred percieved flat room reponse actually looked more like thist that perfectly flat...










With that in mind, I set out to see what I could do with my set up to emulate that via DOE. 

The desired responses were to maximize 20 and 50 Hz repsonse in dB, and minimize the 150 and 500 Hz responses. These 150 and 500 Hz responses were nodes that I wanted to minimize, if possible. 

The factors I used for the DOE were sub gain (as clicks up from zero), sub crossover, likewise clicks up from zero, plug or no plug in the speaker reflex port. So, four responses being measured as the result of 3 factors at two different levels (low, high as in the examples above shown). 

Setting up a full-factorial DOE in JMP (a stastical package), here are the experimental matrix I ran per JMP's output for the experimental design and the measurements (as measured in dB by room Eq Wizard).










An example trace as my starting point for reference is shown; the goal hear was to maximize the 20-50Hz node to range, while smothing the 70, and minimizing the 105, 155 and 500 Hz node peaks, so as to emulate the JBL graph.








. The red is one of interest, as this is the one with the sub engaged. The brown trace is the one w/o the sub engaged. 

Here is the result of analyzing the 20 Hz response of the DOE (warning: large image)










We can see from the actual by predicted plot that the R^2 and r^2 adjusted are strongly correlated at 0.97 and 0.94, respectively, indicating that our results fit our model pretty well. Also the leverage plots (the factors affecting the response (20Hz output), show that Sub gain has a large effect, and quite possibly that the interaction of sub gain and sub crossover may have an effect, as the p-value is just barely above p>-.05, which tends to suggest there may be a significant effect if we gather more data or relax our confidence level from 95% to 90%. In, fact, if we relax our alpha from 5% (the chance we are willing to accept that we are wrong) to 10%, then the Sub crossover point becomes significant. In addition, our analysis of variance with probability of F >0.0016, suggests that that this result occurred from our null hypothesis, that the model does not predict sub 20 Hz performance is very low. 

The interaction profile plots also suggest that there is a likely interaction between sub XO setting and sub gain, and the plot lines are not perfectly parallel, but appear to intersect. 

Looking at the Prediction Profiler from JMP, comparing Sub gain with Sub XO, we can see that the max 20 Hz response is obtained by the Sub gain at 12 clicks up, and sub set at zero clicks, and we could expect a level of 75.1875 dB. 










JMP is also cool because it will show you a 3-D image of two factors at one time as they affect the desired response, in this case, 20 Hz:










These analyses show how you can utitlize DOE to predict with good confidence the response from setting the factors (crossover point and gain) to give the desired response at 20 Hz. I'll show other data later for the other target responses of 50, 155 and 500 Hz and you can see how we might be able to arrive at a set of setting that maximizes the responses we want, and minimizes the responses we don't want.

Interesting stuff to mull over....


----------



## JohnM (Apr 11, 2006)

As it happens, REW's EQ optimiser uses a Simultaneous Perturbation Stochastic Approximation algorithm. On the original measurement repeatibility results, the main cause of variations between runs is low frequency noise from traffic, wind and other external factors.


----------



## scharfsj (Jun 16, 2010)

I just noticed this...my base room level SPL went down about 3-4 when I turned off the AC in the house.


----------

