There's a lot more going on time-wise than just the position of the drivers. There's the resistive/reactive nature of the driver's voice coil and crossover. There's the resistive/reactive nature of the driver's mechanical movement, i.e. mass/suspension. There's the resistive/reactive nature of the cabinet, horn, baffle, etc. And there are several chaotic components in each of these sets of parameters as well, such as cone flex (breakup modes), non-linear BL, non-linear inductance, non-linear suspension compliance, etc. So even though loudspeakers are pretty simple as machines go, some complex behaviours rise up in the system. One thing that's very important is summing. It really involves a range of values, not so much a specific dead-set position. This is why people talk about center-to-center spacing not to exceed a certain distance for best performance up to a certain frequency. The idea is to make summing good, which involves keeping things within 1/4λ, where possible. This is because as the distance between sound sources nears 1/2λ, response forms a null because 1/2λ is 180° out-of-phase, causing cancellation.
You don't want the difference in the distance between the listener and two sound sources to be 1/2λ apart, or a multiple of 1/2λ. One way around this, however, is to place sound sources so that some drivers are constructive where others are destructive. This is called dense interference and it's another way to smooth the sound field when the number of sound sources are high or when distances involved are necessarily large with respect to frequency.
That ventures a little bit off the subject, but they're related issues. Here are some other links that might be of interest: