Any good engineer or scientist will tell you that the proof of any concept requires measurements.  Measurements provide verification.  Without validation of a concept, it is just a hypothesis.But where does that leave the DIY builder?  If building loudspeakers, can good results be obtained without the ability to perform measurements?  Are their properties so complex that one is completely blind without measurements?  If on a budget, which ones measured data is most important?  What kinds of tests can be done reliably?
These are some of the things I've pondered over the years.  When I first built loudspeakers, I had an oscilloscope, meters, good microphones, an accurate signal generator and an SPL meter.  Having good microphones and an oscilloscope and signal generator put me ahead of the game, with more visibility than 99% of the hobbyists that might undertake building their own loudspeakers at that time.  But I still wouldn't have wanted to publish response graphs made by plotting individual data points.  It is too coarse, and doesn't give an accurate picture.
Now days, a guy can use the same PC he uses to play video games and check E-mails and have a pretty good measurement system.  That wasn't so just a few years ago, and it was difficult to obtain measurements that were worth doing.  But several measurement system programs have been written that work pretty well.  Using a PC and its built-in sound card, you can connect a $10.00 microphone and actually perform some pretty good measurements.
But how good are they?  The answer, in my opinion, is that they are very good for helping a hobbyist find response between 500Hz and 5kHz, where crossover points are likely to be.  That in itself is worth the admission price, because it makes good design work much easier.  But I would still be leery of using them for exchanging with others to make overall performance comparisons.  There are too many places for indefinites and ambiguity to creep in.
Hobbyist measurements performed on homebrew test equipment are useful, but probably should be limited to fine-tuning and not for critical comparisons and performace evaluations, in my opinion.  At least one should be careful to realize what they are looking at when interpreting data like this.  Understand that what is measured by one person with his test setup may be wholly different than what another person finds, even if they are testing the exact same loudspeakers on the very same sound system.  So comparisons should only be made if the conditions of the test can be controlled, or measurement data should be taken with a grain of salt.
I've made comments like this before, and some have characterized me as a person that doesn't like measurements.  Nothing could be further from the truth.  The problem is that I don't like ambiguous measurements.
Set the clock back to a time before PC's.  This is when the only measurement systems were pretty expensive.  A DIY audio hobbyist couldn't make measurements reliably, so he had to depend on shops that could.  But he could pretty well trust reactive circuit formulas, so he could expect to understand the crossover.  He could use Thiele/Small data and model a sealed or ported box with reasonable certainty.  Electro-mechanical measurements aren't nearly so hard to obtain on a budget.  So these are things that were reasonable to do.  Using the data that was available, one could get an accurate picture of some features, and use mathematical models to determine the rest.
Using limited measurement equipment, one probably couldn't see the fine-grain features of response that would illustrate anomalies due to crossover interaction.  Maybe a lucky hit on a certain frequency might show a null, but there was much more a chance they would be missed.  Even if you've done the math and know what to look for, the tolerances involved would make it difficult to hit the frequencies of interest within less than about 10% accuracy.  The tolerance of the parts and the system involved prevents it.  So the best thing, in my opinion, was to use mathematical models to determine crossover and physical placements that would work best.
Now fast-forward to today, when measurement software is available to everyone.  A motivated person can setup a measurement system and generate useful data with practically no investment.  It is still difficult to measure some things, but it's worlds easier than it was before.  An example of something that I don't expect to measure is the response of a basshorn in eighth-space, decoupled from the room its placed in.  The two conditions are tied together.  So this is an example of something I still find some merit in comparing models as opposed to measurements, to prevent two people from comparing their rooms more than comparing their speakers.  But as for checking response through the crossover region, it's much easier to measure than to model these days.  You can do the math to know what to expect, and depend on measurements to confirm or disprove your models, making changes if necessary.
There are still problems to avoid though.  This is probably what was most important for me to say, and why I took the time to write this here.
Testing is a science all by itself.  Engineers certainly need to perform tests to validate and confirm predictions, and scientists perform tests to see if they're on the right track.  But it is important to realize that testing is a pretty sophisticated thing all by itself.  Doing a test right makes all the difference.
As an engineer, I think test results are extremely helpful.  An example is putting a part that will be subjected to physical stress in a hydraulic press to see how much it deflects.  Another example would be to check airflow through an orifice or current flow through an electronic component at specific voltages and frequencies.  These kinds of tests are just part and parcel of the design process.
Where complications arise is when a system is to be tested for overall performance evaluation.  Here again, there are some places where performance testing is appropriate and unambiguous.  I might time an automobile through the traps or check its gas mileage.  Or I might test the speed of a computer doing a particular set of instructions.  That can give me a good feel for the design, and can assure me that the design goals have been met.  But after the design is completed and it comes to the point where a battery of "signoff" tests be done, this is where I think the engineer or scientist must hand off to an unbiased test group.  The engineer is too close to his work and a true evaluation really cannot be performed by the designer or design team.  It should be done by an impartial and qualified testing person or organization.
If an impartial test can't be done, as is often the case with small shops and individuals, then other measures might be considered.  It might be cost prohibitive to send out products to an independent testing facility, for example.  But the problem still remains:  How reliable is a comparative performance evaluation when the test is performed by someone who is affected by the test outcome?  Should it be done by an engineer or design group that wants their project to be successful?  Should it be done by a competitor?  Even if everyone is objective and ethical, you can see how there is a problem here, or at least the potential for one.
Hobbyists are sometimes just as emotionally tied to their favorite equipment as designers are, sometimes even more so.  They are certainly not immune to bias.  And another complication presents itself, which is their varied abilities.  Some hobbyists are technically inclined, but others are not.
Even though loudspeakers are very simple, acoustics testing requires many variables be considered.  It's a little like the weather, in that it's a simple subject but it isn't easy to nail down.  Engineers and scientists have a hard time dealing with some of the issues because they can be difficult to solve.  Things like boundary reinforcement and reflections can make certain kinds of tests impossible or at least somewhat ambiguous.  And emotional attachment can blind even the most objective people and tempt them to see what they want to see instead of what is.
Now add to this the fact that some hobbyists obtain tools to perform acoustic tests with their PCs, but may be entirely unqualified to do them.  This adds a whole new layer of ambiguity to the picture.  In one sense, it is very good that these affordable measurement systems are available to hobbyists, but on the other hand, it makes it possible to put a credible face on an entirely bogus dataset.
I've seen more than one occasion where wholly false data was presented in a pretty format and was more believable to laymen than better, more accurate data presented in raw form, which is less impressive looking.  It isn't always a case where data is falsified on purpose, although that is sometimes the case.  Sometimes it is just wishful thinking, throwing out good datasets in favor of less accurate data that looks better.  The opposite case can be made too, if someone has a bone to pick.  Sometimes erroneous charts are just a result of improper setup or calibration.  But whatever the case, the point is that even a good measurement system can provide ambiguous data if used in the wrong way.
Just like a mechanic's tool can be misused and broken, so too can a tool like an audio measurement system.  A system like this in the hands of an amateur will make professional looking charts that lend credibility to the dataset.  But the data is no good whatsoever if the system isn't setup right.  The environment may be unsuitable, the hardware may not be adequate or the system may just be misconfigured.
Sophisticated test equipment in the hands of hobbyists is a two-edged sword.  Not everyone can use the equipment properly and yet practically anyone can make a professional looking chart.  It's easy to think that test data is accurate when it is presented to you in a professional format, but if you don't know the conditions of the test, it really should be considered with skepticism.
So that's my problem with trusting measurements.  I guess what I'm saying is that if testing isn't done by someone I know to be qualified, reliable and unbiased, I am skeptical.  And even if I know and trust the person doing it, if they are too close to the subject emotionally, it may be difficult for them to be objective.  Test results reflect conditions under test as well as devices under test, so a person wanting to find a particular outcome may throw out important test results that contradict the expected outcomes.  But those may be the most reliable datasets.  And when you put this in front of amateurs, the problem becomes even more acute.
If you get a group that is already bent on finding superiority in their particular pet project, it makes it pretty difficult to overcome this prejudice.  As an example, it's almost a foregone conclusion that Chevy guys will like their Chevy better than a Ford, even if the Ford is a better product.  Probably best to let an independent and unbiased testing group do the tests than a Chevy club or an engineer for Chevrolet.  Otherwise, even if the charts and graphs look very professional and the people involved have unimpeachable credentials, it's easy to see how they might find the Chevy as the better car and produce data to back it up.
Take it or leave it.  Certainly there are plenty of people that can perform good tests, and that are objective enough to come up with something useful.  But do be careful and dilligent because there are plenty of ways to screw it up.  And there are also lots of people with emotional attachments that make it very difficult to really get to the bottom of things.