I ported the sound test program to SDL library. With SDL I get a stable sound at 0.02 latency, on Vista, using a built-in Intel sound card. It is even almost stable at 0.01s, with rare audible cracks only when the system is heavily loaded.
Unfortunately using SDL does not really solve the problem, as SDL does not have an API to report playback position. It uses a callback mechanism. When it needs another chunk of audio data, it calls a user function. The easiest way is to generate sound in the callback based on whatever is happening at the moment on the screen - but then you loose temporal resolution. There are ways to improve the resolution, such as writing information about sound events to a log, with a time stamp of each event, then playing the log in callback, streching or shrinking gaps between events as required. However, it may add unneeded complexity.
SDL uses DirectSound API on Windows. It seems to use IDirectSoundBuffer::GetCurrentPosition to find out what is being played. I think this is a viable replacement for waveOutGetPosition. It does not seem to be affected by the Vista bug, and I guess it can be easily adopted without big changes to the program logic.
As a side note - both waveOut and DirectSound position reporting APIs are not very accurate. Undetermined amount of time passes between the driver getting position information from the hardware, and the information finally reaching the application. Time spent in context switches, interrupts, other tasks. ALSA solved this problem many years ago by providing a time stamp at which the position was sampled, thus allowing the application to determine the current position by extrapolation, with high accuracy. The same method is used in Vista's WASAPI.
If you are interested in the SDL based sound test, it is available from my 
SVN repository. The repository can also be browsed using a 
web interface.