To an electronic musician, a formant is a peak in an audio signals’s spectral envelope. Formants are specified by giving their peak frequency, peak amplitude, and bandwidth. To synthesize a sung or spoken vowel or voiced consonant, one often specifies a fundamental pitch and/or a noise source, and some number of formants the synthesized spectrum should have. To analyze an recorded vocal signal, one measures its fundamental frequency (or determines that the sound is unpitched), and measures the spectral envelope to determine its formants.
Synthesizing a spoken or sung voice is done in two different ways. The traditional way is to model the vocal tract as a source (periodic or noisy), passed through a time-varying filter or filterbank having resonances that modify the spectral envelope to make formants. This can be done using linear predictive coding (LPC) or using Fourier-based analysis and filtering. Charles Dodge wrote an influential computer music piece using LPC in 1972[Dodge 89].
Alternatively, to synthesize vowels or voiced consonants, one can turn to classical synthesis techniques such as frequency modulation to synthesize formants directly, starting with a sinusoid or pair of sinusoids tuned to the desired resonant frequency, and modulating them to achieve the desired bandwidth[Chowning 89].
[Dodge 89] Dodge, Charles. On speech songs. MIT Press, 1989.
[Chowning 89] Chowning, John M. “Frequency modulation synthesis of the singing voice.” Current Directions in Computer Music Research. MIT Press, 1989.