There are pros and cons of both approaches. Multi-way tends to improve extension at both ends of the audio spectrum and it reduces intermodulation distortion. Two-way tends to improve phase distortion which in turn improves response by reducing ripple. It also improves off-axis response, when designed properly, which makes the reverberent field more uniform.The main thing is summing through the crossover region, both on-axis and off-axis.
If you can make a three-way or four-way speaker with very good summing through the crossover regions, then you've got it licked. It's not usually a problem at low frequencies except for basshorns, which introduce propogation delay because of path length. But at high frequencies, it is difficult or impossible to get drivers within 1/4 wavelength at the crossover point. This causes interference problems, which lead to response ripple. It's different throughout the room too, so the reverberent field becomes non-uniform.
My favorite approach is to crossover between subsystems where directivity is matched and where sound sources are within 1/4 wavelength. This is easy to do at low frequencies, because wavelengths are long and directivity is wide. If directivity is set by a horn, this can be matched with an adjacent horn with a similar pattern, or by a direct radiator operated high enough that its pattern starts to narrow. This makes a convenient crossover point for two-way loudspeakers, because the directivity-matched crossover point is generally around 1kHz to 2kHz, which is above the fundamentals of vocals and many instruments, into the low overtone region. That's also pretty good place to crossover, because wavelengths aren't so small as to make it difficult to arrange drivers within 1/4 wavelength.
Adding a super-tweeter is more problematic. You can do it, but there's no way to add one and have it be within 1/4 wavelength. It will always be several cycles off. You can position it so on-axis sums true, but off-axis will always lobe, and it sounds phasey to my ears. Any movement in the room causes the listener to pass through several high and low amplitude nodes, something that is clearly audible.