HOW AUDIO EMULATION WORKS IN QEMU: ================================== Things are a bit tricky, but here's a rough description: QEMUSoundCard: models a given emulated sound card SWVoiceOut: models an audio output from a QEMUSoundCard SWVoiceIn: models an audio input from a QEMUSoundCard HWVoiceOut: models an audio output (backend) on the host. HWVoiceIn: models an audio input (backend) on the host. Each voice can have its own settings in terms of sample size, endianess, rate, etc... Emulation for a given soundcard typically does: 1/ Create a QEMUSoundCard object and register it with AUD_register_card() 2/ For each emulated output, call AUD_open_out() to create a SWVoiceOut object. 3/ For each emulated input, call AUD_open_in() to create a SWVoiceIn object. Note that you must pass a callback function to AUD_open_out() and AUD_open_in(); more on this later. Each SWVoiceOut is associated to a single HWVoiceOut, each SWVoiceIn is associated to a single HWVoiceIn. However you can have several SWVoiceOut associated to the same HWVoiceOut (same thing for SWVoiceIn/HWVoiceIn). SOUND PLAYBACK DETAILS: ======================= Each HWVoiceOut has the following too: - A fixed-size circular buffer of stereo samples (for stereo). whose format is either floats or int64_t per sample (depending on build configuration). - A 'samples' field giving the (constant) number of sample pairs in the stereo buffer. - A target conversion function, called 'clip()' that is used to read from the stereo buffer and write into a platform-specific sound buffers (e.g. WinWave-managed buffers on Windows). - A 'rpos' offset into the circular buffer which tells where to read the next samples from the stereo buffer for the next conversion through 'clip'. |<----------------- samples ----------------------->| | | | rpos | | |_______v___________________________________________| | | | | | | |_______|___________________________________________| - A 'run_out' method that is called each time to tell the output backend to send samples from the stereo buffer to the host sound card/server. This method shall also modify 'rpos' and returns the number of samples 'played'. A more detailed description of this process appears below. - A 'write' method callback used to write a buffer of emulated sound samples from a SWVoiceOut into the stereo buffer. Currently all backends simply call the generic function audio_pcm_sw_write() to implement this. According to malc, the audio sub-system's original author, this is to allow a backend to use a platform-specific function to do the same thing if available. (Similarly, all input backends have a 'read' methods which simply calls 'audio_pcm_sw_read') Each SWVoiceOut has the following: - a 'conv()' function used to read sound samples from the emulated sound card and copy/mix them to the corresponding HWVoiceOut's stereo buffer. - a 'total_hw_samples_mixed' which correspond to the number of samples that have already been mixed into the target HWVoiceOut stereo buffer (starting from the HWVoiceOut's 'rpos' offset). NOTE: this is a count of samples in the HWVoiceOut stereo buffer, not emulated hardware sound samples, which can have different properties (frequency, size, endianess). ______________ | | | SWVoiceOut2 | |______________| ______________ | | | | | SWVoiceOut1 | | thsm<N> := total_hw_samples_mixed |______________| | for SWVoiceOut<N> | | | | |<-----|------------thsm2-->| | | | |<---thsm1-------->| | _______|__________________v________|_______________ | |111111111111111111| v | | |222222222222222222222222222| | |_______|___________________________________________| ^ | HWVoiceOut stereo buffer rpos - a 'ratio' value, which is the ratio of the target HWVoiceOut's frequency by the SWVoiceOut's frequency, multiplied by (1 << 32), as a 64-bit integer. So, if the HWVoiceOut has a frequency of 44kHz, and the SWVoiceOut has a frequency of 11kHz, then ratio will be (44/11*(1 << 32)) = 0x4_0000_0000 - a callback provided by the emulated hardware when the SWVoiceOut is created. This function is used to mix the SWVoiceOut's samples into the target HWVoiceOut stereo buffer (it must also perform frequency interpolation, volume adjustment, etc..). This callback normally calls another helper functions in the audio subsystem (AUD_write()) to to the mixing/volume-adjustment from emulated hardware sample buffers. Here's a small graphics that explains it better: SWVoiceOut: emulated hardware sound buffers: | | (mixed through AUD_write() called from user-provided | callback which is itself called on each audio timer | tick). v HWVoiceOut: stereo sample circular buffer | | (sent through HWVoiceOut's 'clip' function, which is | invoked from the 'run_out' method, also called on each | audio timer tick) v backend-specific sound buffers The function audio_timer() in audio/audio.c is called periodically and it is used as a pulse to perform sound buffer transfers and mixing. More specifically for audio output voices: - For each HWVoiceOut, find the number of active SWVoiceOut, and the minimum number of 'total_hw_samples_mixed' that have already been written to the buffer. We will call this value the number of 'live' samples in the stereo buffer. - if 'live' is 0, call the callback of each active SWVoiceOut to fill the stereo buffer, if needed, then exit. - otherwise, call the 'run_out' method of the HWVoiceOut object. This will change the value of 'rpos' and return the number of samples played. Then the 'total_hw_samples_mixed' field of all active SWVoiceOuts is decremented by 'played', and the callback is called to re-fill the stereo buffer. It's important to note that the SWVoiceOut callback: - takes a 'free' parameter which is the number of stereo sound samples that can be sent to the hardware stereo buffer (before rate adjustment, i.e. not the number of sound samples in the SWVoiceOut emulated hardware sound buffer). - must call AUD_write(sw, buff, count), where 'buff' points to emulated sound samples, and their 'count', which must be <= the 'free' parameter. - the implementation of AUD_write() will call the 'write' method of the target HWVoiceOut, which in turns calls the function audio_pcm_sw_write() which does standard rate/volume adjustment before mixing the conversion into the target stereo buffer. It also increases the 'total_hw_samples_mixed' value of the SWVoiceOut. - audio_pcm_sw_write() returns the number of sound sample *bytes* that have been mixed into the stereo buffer, and so does AUD_write(). So, in the end, we have the pseudo-code: every sound timer ticks: for hw in list_HWVoiceOut: live = MIN([sw.total_hw_samples_mixed for sw in hw.list_SWVoiceOut ]) if live > 0: played = hw.run_out(live) for sw in hw.list_SWVoiceOut: sw.total_hw_samples_mixed -= played for sw in hw.list_SWVoiceOut: free = hw.samples - sw.total_hw_samples_mixed if free > 0: sw.callback(sw, free) SOUND RECORDING DETAILS: ======================== Things are similar but in reverse order. I.e. the HWVoiceIn acquires sound samples in its stereo sound buffer, and the SWVoiceIn objects must consume them as soon as they can.