If interaction of sound sources causes frequency anomalies - cancellation at certain frequencies - then you'll notice an unusual sound. It won't sound natural. But this is really a manifestation of conditions that cause nulls, not so much due to small delays themselves. Time shifts of less than 1/4λ are inaudible. Our hearing isn't sensitive to time alignment, probably because of the conditions of our environment that make these cues less important. When you hear a time alignment problem, it is purely a manifestation in the frequency domain that you hear, not the actual misalignment. Until the sound sources are so far apart as to cause an audible echo, delay is impossible to hear, except due to the frequency notches that may result.To physically align sound sources is pretty easy. But you're right that a horn tweeter is longer than a direct radiating midwoofer, so physical alignment might be sort of awkward. Also consider that there are many things that introduce time jitter with respect to frequency, so you would never be able to obtain perfect time alignment anyway. Both the woofer and the tweeter are reactive, so that means their phase moves with respect to frequency. This makes them each act as though they were moving in physical space.
The woofer has a resonance down low which will move its apparent position by as much as several feet. Then, through the crossover region, there is a phase shift too. The horn introduces several phase peaks near its flare frequency, and only at higher frequency does phase become relatively constant. As you can see, phase jumps all around through the audio band, so perfect time alignment is not possible.
What is possible, and I think most important, is to limit phase shifts to less than 1/4λ between two adjacent and/or overlapping sound sources. You don't want destructive interference between two overlapping sound sources because they'll cause a frequency null. And since position and path length enter into the picture, interaction between sound sources is different throughout the room. The goal of a designer is to limit interactions in the target listening area to those which will not cause destructive interference, and that means keeping them within 1/4λ.