Vocal timbre, or voice quality refers to a particular multidimensional quality or tone color of vocal sound production that is perceived by a listener. It is the perceptual correlate of spectral characteristics of vocalized sound shaped by the regularity of the vibration cycle (period) of the vocal folds. Vocal timbre is often referenced in terms of what it is not — that is, the aspect of vocal sound which is not frequency (pitch), not amplitude (volume), and not time (duration). As Kreiman and Gerratt demonstrate, “voice quality is an interaction between an acoustic voice stimulus and a listener; the acoustic signal itself does not possess vocal quality, it evokes it in the listener. For this reason, acoustic measures are meaningful primarily to the extent that they correspond to what listeners hear” (1998: 1598).
Kreiman, Jody, and Bruce R. Gerratt. 1998. “Validity of rating scale measures of voice quality.” The Journal of the Acoustical Society of America 104 (3): 1598-1608. (See also related works by these scholars.)
“Harmonic singing” refers to various style of singing or chanting in which individual harmonic components (overtones) are perceived by listeners, such as in certain types of Tibetan Buddhist chanting. Practices such as “Western overtone singing” or “throat-singing” also employ vocal tract resonance to produce reinforced harmonics.
This term refers to a variety of broken voice, high pressure, glottal (or other) singing techniques not typically used in Western bel canto singing. The traditional throat-singing practice called xöömei is most commonly associated with the Republic of Tuva (Russia) and Western Mongolia (where it is called höömii) and refers to a number of solo-voice drone singing techniques involving the production of reinforced harmonics or overtone melodies. Related throat-singing styles are practiced in the Republics of Altai (kai) and Xakassia (xai), Bashkortostan (özläü), and in other parts of Central Asia under different names. Other styles, such as umngqokolo singing of the Xhosa in South Africa involve different techniques for voice production and overtone manipulation. Some vocal practices referred to as throat-singing do not produce overtones per se, such as katajjaq throat games indigenous to the Inuit peoples of Arctic Canada as well as rekuhkara of the Ainu peoples in Hokkaido, Japan.
Intervocality is a term that ethnomusicologist Steve Feld has used to signify “the inherently dialogic and embodied qualities of speaking and hearing. Intervocality underscores the link between the felt audition of one’s own voice, and the cumulatively embodied experience of aural resonance and memory” (Feld 1998: 471).
Feld, Steven. 1998. “They Repeatedly Lick Their Own Things.” Critical Inquiry 24 (2). (See also related works by this scholar.)
Bel canto singing refers to the classical (Italian) operatic singing tradition where an emphasis is placed on bringing sung pitch and the resonance of vowels into an ideal relationship consistent with cultural and musical aesthetics.
The word entered the lexicon two years after the release of the first feature length film with synchronized dialogue, The Jazz Singer (1927), and refers to the temporal matching of sound with movements in film and other multimedia images, as when dialogue or singing are matched to lip movements so as to suggest the sound’s point of origin (sounds without a source in the diegesis, the world of the story, such as most film music, are “extra-diegetic”; most offscreen sounds are eventually assigned an onscreen source, thereby extending the diegesis beyond the confines of the frame). Film and media theorists have long argued that sync sound supports realism, a genre of representation many theorists in a range of disciplines have shown rose to prominence with capitalist modernity and both reflects and helps reproduce it. For that reason, the Marxist Soviet director Sergei Eisenstein asserted that rather than “marrying” sound and image for an effect of realism, sound editors should strive to create a discontinuity like that associated with the techniques of “intellectual montage” he developed in silent cinema for joining shots so as to disrupt the apparent continuity of spatio-temporal relations between them. He believed discontinuities could spark new ideas, much as the combination of separate elements of meaning in an ideogram did.
Extending and developing this idea, filmmakers and theorists of experimental cinema have argued for the value of asynchronous sound as anti-realist, promoting critical reflection on, rather than immersion in, the world of the film. Sound editing points to the way the voice and other sounds tend to divorce themselves from, rather than remain married or synced to, the source from which they originate, a property of language in general, since even so-called “motivated“ or “natural” signs, which resemble or have an existential connection with what they represent (American pragmatist philosopher Charles Saunders Peirce’s “icons” and “indexes”), have a symbolic and arbitrary dimension too, as the differences in onomatopoeias in different languages show. That signs have a metaphoric relation to their referents is a key claim of deconstructive philosophy, which with psychoanalysis and semiotics emphasizes that language is a play of such substitutions; hence shrieking violins can replace the scream of a murder victim in Alfred Hitchcock’s Psycho, or as a more general practice, location sounds are “sweetened” in postproduction, dialogue is replaced through looping (the reverse of “lip-synching”), and some “natural” sounds are figured by surprising Foley equivalents (as when cucumbers and melons are reamed to simulate the noise of bullets tearing through flesh). This denaturalization of the bond between sound and image means that sounds have an uncanny, spectral dimension like that psychoanalyst Jacques Lacan associates with the voice as objet (petit) a. Media theorist Michel Chion terms “acousmatic” any sounds unanchored to a source in the cinematic diegesis and notes their importance in genres striving for an effect of anxiety and horror.
From the late 1950s, poststructuralist psychoanalyst Jacques Lacan used the phrase “objet (petit) a”—object (little) a–or the algebraic letter “a,” for the “object of desire” we seek in others. In his seminars from the mid- 1960s on, he increasingly associated it with what he called the “Real” and the trauma of constitutive lack and alienation due to the fact that human subjects are creatures of language, which derails any “natural” relation to objects that might meet instinctual needs. Objet a is the “cause” of desire, a fragment of the phallus the subject believes it lost with the symbolic castration that ended its pretensions to satisfying the desires of the (M)Other but has found again in the love object. Objet a inaugurates and sustains desire as lack, rather than satisfaction, and it is signaled by the anxiety associated with the enigmatic desire of the Other with which the subject has identified, conveyed by Lacan in Italian to signal its foreign origin, “Che vuoi?” (“What do you want?”).
Lacan theorizes the gaze and the voice as exemplary objets a, sources of an excessive pleasure and anxiety caught up in the erotic drives of seeing and hearing that derive but diverge from instinctual functions through their libidinization, circling repeatedly around an object they never attain. He preferred that the term for that object remain untranslated to underline its foreign, disquieting, and uncanny dimension as something that belongs to the “Other,” rather than the self. If desire is “desire of the Other,” as Lacan insisted from the outset, so that what is most intimate is strange and estranging, so too are the gaze and the voice. They exceed symbolization as a way of containing threats to identity and object relations, just as desire exceeds the demands to which it gives rise and responds. The gaze is even best represented by something heard, rather than seen, such as the noise of an eavesdropper, while the voice can be figured by a look that seems to speak volumes. The gaze is the blind spot in the reassuring picture we would make of the world as we look at and relate to it through how we name it, the point from which we are seen as something out of place, a stain, while the voice is the incomprehensible silence or noise in speech or music, in which we are heard as a dissonance. Gaze and voice thus make for a lack of satisfaction with the objects we see and hear, disturbing our fantasy of a harmonious relationship with them and unsettling our sense of self or identity.
Associated with the temporal arts such as film, video, animation, theater, dance, and music, montage, initially a cinematographic term, and its spatial equivalents collage and assemblage, are the most important formal techniques for combining elements into a composition in 20th and 21st century art forms. They call into question 19th century monistic notions of absolute and homogeneous time and space and instead construct a “whole” that cannot be totalized, not quite “one” but partial, fragmentary, and dislocated; loosely articulated and plural, sometimes a composite structured literally by chance, as with texts produced by automatic writing, its meanings and very identity are relative to the point of view of the perceiver. In this broader sense, psychoanalysts have theorized sexuality as a “montage” of partial drives (oral-cannibalistic, anal-sadistic, phallic, scopic, and invocatory) that never cohere in the service of a genital whole and a natural instinct to procreate. Similarly, the whole of any film is in fact a montage of shots filmed in different times and places, but in mainstream cinema their spatio-temporal discontinuities are hidden by the techniques of “continuity editing” that developed in the 1910’s, notably with U.S. director D.W. Griffith. Their effect, according to film theorists, is to immerse spectators in the diegetic world of the film through “realism,” the latter the narrative and visual genre that expresses and naturalizes a 19th c. monistic and homogeneous universe.
By contrast, as Soviet filmmaker Sergei Eisenstein argued when theorizing “intellectual montage,” editing shots to create discontinuities could generate new ideas not present in any one shot on its own, just as the juxtaposition of elements of meaning in an ideogram did, which others have theorized promotes critical reflection, rather than immersion. Typically, a montage style of editing involves the rapid alternation between two sets of related shots whose meaning arises from their collision (the pace, and sometimes unusual camera angles, contribute to the sequence’s “denaturalizing” effect—although the technique Eisenstein developed for a proletarian spectator now dominates film and television advertising). In music, montage refers to a combination of elements without a shared key or scale and tonality, or in which relationships of theme and variation do not apply; in literature, it refers to a discontinuous juxtaposition of lines of poetry, dramatic scenes, or narrative actions; in architecture it refers to postmodern pastiche and hybridity, such as one finds in the buildings of Las Vegas. In all these mediums, heterogeneity, fragmentation and decentering predominate, and any sort of naturalized whole or harmony is called into question in a fashion that Fredric Jameson, writing of the postmodern arts, has described as schizophrenic, underlining its relation to a critique of humanist ideas of a coherent, expressive, and fully conscious self.
The distinction between utterance and enunciation in a range of contemporary theories of communication derives from linguistics and refers to the two angles one can take on it: an abstract focus on its statements, regardless of their performative context, and a focus on the latter and the agency that has assumed or actualized those statements in a specific context (in linguistics proper, enunciation refers to “shifters,” those elements of an utterance whose meaning alters with different contexts of enunciation and thus are indexical signs of the latter, such as “I” and “you,” “here” and “there,” “now” and “then,” etc.). Structuralist and post-structuralist anti-humanist theories have argued that language speaks man, who is constituted by it, rather than having an instrumental relationship to it as something anterior and external. French linguist Emile Benveniste has shown that the “I” is a fictive effect of the pronoun itself, so that an expressive agency seems to precede and take it up, although in reality it is entirely coeval with it: “Ego is he who says ‘ego.’” Benveniste draws attention to the splitting of the ego that follows, as the “I” of the utterance and the I of the enunciation never coincide, a fact that explains psychoanalytic phenomena like negation (in which a subject admits a censored desire to consciousness through denying it) and repression (in which a censored desire returns in disguise as a compromise formation). It also accounts for the broader suspicion that all human speech is deceptively polysemic, saying something more and other than it seems to, as is obvious in the classic paradox of the Cretan liar (in which the poet Epimenides avers “All Cretans are liars”—but as he is a Cretan himself, the truth or falsity of his utterance cannot be determined). This shows that the subjective position of the enunciation can be at odds with the utterance–there is more than one enunciation of the “same” utterance.
Freud noted this when explicating the beating phantasies of some of his patients, who he discovered could identity with the masochistic position of the person being beaten, the sadistic position of the person giving the beating, or the position of a disembodied gaze on the scene itself. Some film theorists drew on this dimension of fantasy to escape the deadlock of ideological critiques of cinema like those of feminist Laura Mulvey, who had argued the spectator assumed the patriarchal film fantasy as his own by identifying with the “male gaze” of the camera that was in turn identified with the male hero as its enunciator, since his desires structured the film’s actions, cinematography, and editing (in Lacan’s terms, the spectator identified with the “desire of the Other,” in this case, a patriarchal figure). If there were multiple enunciations of the same film fantasy or utterance, it might not reproduce patriarchal roles and desires. Postcolonial studies theorist Gayatri Spivak argued similarly that there are several enunciations of the “rescue fantasy” underwriting colonial interventions in indigenous cultures in the name of local women, so that white women, for instance, might identify with a white male savior of a woman of color victimized by men of her community or with the oppressed woman of color, as, for example, in responses to sati or widow-burning in India.
In the theory of subjectivity of Bulgarian linguist, psychoanalyst, and sometime “French feminist” Julia Kristeva, the “chora” is a space shared by mother and child at the time of their initial and partial separation through processes associated with the semiotic (and later, the abject), before the subsequent and more definitive differentiation through the accession to language proper and symbolic castration, when the child ceases to be the mother’s creature (for psychoanalyst Jacques Lacan, her imaginary phallus) and assumes a name and identity in the community to which both belong. In attending to the importance of the maternal, pre-Oedipal object relations, and archaic, pre-phallic drives directed towards the mother, Kristeva, like other feminists, supplements the emphasis in Freud and Lacan on the role in subject formation of the father, language, law, the phallus, and castration. She adopts the term chora from Plato’s Timaeus (better known for the first story of Atlantis), where it is the (non) place of the inscription of the forms or ideas (“eidos,” linked etymologically to vision), a matrix of becoming and change, and thus womb-like (Plato calls it a nurse and mother by contrast with the forms as father and their worldly copies as offspring). Plato contrasts it with the two other realities he theorizes, the sensible (the earthly, material realm of illusions, mere shadows of the forms) and the intelligible (the transcendental realm of the forms themselves as the ultimate reality); it is therefore neither percepts nor concepts but what gives rise to them.
For deconstructive philosopher Jacques Derrida, Plato’s chora thus enables difference to be thought at all, such as the metaphysical opposition between the sensible and the intelligible. It is prior to any particular, nameable differences (signs), in effect unnameable and unlocalizable, no-thing, no-where; it cannot be maternal and feminine in the usual sense of those terms (for Kristeva it is only “maternally connoted,” an “archaic disposition of primary narcissism”), since the latter are already caught up in the system of differences the chora precedes, though feminists have sometimes characterized it as such (similarly, the “arche-writing” giving rise to “speech” and “writing” proper is not the same as the latter, according to Derrida). If some feminist object-relations theorists have romanticized the pre-Oedipal as a utopic space of the oneness of mother and child, Kristeva highlights the ambivalence of the kind of suffocating fusion it entails, since the chora at once enables and threatens to prevent the child’s individuation. It is characterized by what Kristeva terms the “semiotic,” rhythmic articulations of light, gesture, and sound, primarily infant laughter, babble, and echolalias, proto-signifiers without signifieds that project preliminary differences into what has been a sensorial continuum and prepare for language and signification, in which difference and identity are stabilized; the semiotic chora “arrests and absorbs” drive energy and primitive demands for nurturance. Signs replace experience with representation and bind polymorphous and anarchic pre-Oedipal drive energy in the secondary processes associated with the Symbolic, consciousness, and the reality principle that restricts pleasure or jouissance. The Symbolic is in fact formed through the repression of the semiotic, although the latter finds an outlet in aesthetic sublimation, particularly in experimental art, where meaningful, propositional representation breaks down and the choric and semiotic and their archaic drive energies reemerge. By destabilizing fixed meanings and identities and putting subjects, objects, and their signs “in process” and “on trial” (Kristeva’s French expression, “en procès,” is a double entendre), the maternal chora and its “revolution in poetic language” of avant-garde texts (to recall the title of the book based on her dissertation about this) revivifies the mortifying, reifying paternal Symbolic, even if it must be contained again for a reordered world of stable differences to emerge.
The term derives from the Greek “akouw,” for “hearing” (as opposed to deaf) and a range of associated meanings: to hear, to listen or attend to something heard, to understand (in particular, what was heard), to learn (through listening), and even to obey ( a word itself arising from the Latin “audire,” to hear, as if the open ear canal were a conduit for suggestion and the implantation of the desires of others, to which there could be no resistance). It was first applied to the disembodied voice of the classical Greek mathematician and philosopher Pythagoras, as he required his students to listen to his teaching from behind a curtain so that they would not be distracted by his appearance, a usage that resonates with the arguments by film and media theorists that the embodied voice is experienced as partial and limited, rather than authoritative and commanding; hence the source of the classic documentary voice-over is never revealed (and gods too must not be seen but heard). As defined by French film scholar Michel Chion, who borrowed the term from Pierre Schaeffer (a French composer, engineer, and musicologist), acousmatic sound differs from visualized sound because it lacks a clear source in the image (for Schaeffer, it referred to a media-saturated culture whose sound sources are not readily apparent). It evokes anxiety because it seems omnipotent, omniscient, and omnipresent. All offscreen sound is therefore acousmatic, and the diegetic world of narrative film is extended beyond the bounds of the frame by visualizing sounds first heard through cinematography or editing that finally locate them onscreen, revealing their source.
The reverse is also true: visualized sound can be acousmatized in shots that succeed those in which it was introduced and assigned an origin, bridging space (and time). Sound, in particular, sync sound, which is visualized sound, thus contributes to the effect of realism in film, while acousmatic sounds introduce a spectral, uncanny dimension to an otherwise realist diegetic world, which is why they are an important element of gothic and horror films (by contrast, the loss of sync disrupts filmic realism and seems to mock the speaker or singer whose voice floats free of his or her body and apparent control). Media involving the telepresence of a voice unanchored from its speaker, such as the telephone, phonograph, and radio, have been experienced as weird and ghostly when first introduced, and hearing one’s own voice on an answering machine or other recording device still produces a similarly unsettling sensation, as one does not sound like oneself when air alone is the primary transducer of sounds typically conducted through bone and soft tissue at the same time. The voice as theorized by psychoanalyst Jacques Lacan is always acousmatic, only loosely articulated with the body from which it emanates, since the desire and the language to which it gives rise comes from the Other. Voice as the vehicle of authentic self-expression, as in feminist, ethnic nationalist, and other “New Left” social movement discourses is for him largely an illusion, since the voice is always alien and alienating. For Lacan, voice belongs first and foremost to the Other as the cause of desire, an unattainable object around which the “invocatory drive” endlessly circles, as the subject tries to sustain a distance from and pacify the siren call that might be its undoing.
Both the material substrate for signifying elements and a signifying element in its own right, silence functions like the white space between the marks of writing or drawing, a background that is also a foreground and a potential for meaningful difference that is always already actualized as such once recognized. In its paradoxical generativity, silence is what a deconstructive critic might term “arche-silence”: just as both “speech” and “writing” emerge from “arche-writing,” according to deconstructive philosopher Jacques Derrida, so sound and silence arise from a hypothetical muteness that is displaced and figured by “room tone” in realist cinema, the background noise of a diegetic space that does not even register as sound, which instead emerges from meaningful contrasts with it, even as it also admixes with it, so that the absence of room tone itself speaks volumes. Similarly, human speech arises out of, against, and perfused with what philosopher Jean Luc Nancy discusses as “borborygmos,” the bodily rumblings that form the “silent” background of and blend with any speech. “Arche-silence” thus functions like Plato’s chora, the non-place in which the difference between the intelligible and the sensible can be thought, or in linguist and psychoanalyst Julia Kristeva’s appropriation of that Platonic concept, a virtual space out of which mother and child can emerge as separate individuals, in large part through the infant’s nonsense vocalizations, like laughter, crying, babble, and echolalias; the latter cut into and mark the limits of the background hum of a fusional, oceanic oneness and function as proto-signifiers, anticipations of the meaningful differences of language assumed with the sexed identity resolving what psychoanalyst Jacques Lacan terms Symbolic castration. Lacan suggests that silence can figure the voice or even the gaze as “objet (petit) a,” object (little) a, the cause of desire, since it seems to say a great deal without saying anything at all, thus invoking anxiety about what is behind it and the subject’s fantasmatic resolution of that enigma; for that reason, psychoanalyst Sigmund Freud associates silence—like darkness and solitude—with the uncanny.
According to silent cinema theorists, a visual sign evokes sound better than an acoustic one; they lamented the loss to which the addition of sound subjected the film audience in the name of greater fidelity to reality, just as for Romantic poet John Keats, “Heard melodies are sweet, but those unheard/Are sweeter….” In a similar vein, Marxist theorist Theodor Adorno notes the way the noise of the recording medium that partially obstructs the voice of a singer actually makes it all the more powerful–the function of what Lacan terms the screen, which creates the illusion of something beyond it. Derrida identifies the silence of the authentic voice with the metaphysics of presence, as in phenomenological reasoning about the pure, silent speech of auto-affection, in which the alien materiality of acoustic or written signifiers and the violence of naming that brought the subject and objects into being in the first place are rendered mute and invisible in “self-presence.”
Jean-Luc Nancy, a former student of French deconstructive philosopher Jacques Derrida, opens his book on listening by wondering if philosophy is capable of it, suggesting that “understanding” tends to displace “listening” in the discipline (he draws on the double sense of the French “entendre,” which means both “to hear” and “to understand,” and contrasts it with “écouter,” “to listen”). Literary critics such as J. Hillis Miller and Catherine Belsey have reflected on reading in a similar vein, since it seems to veer between the poles of readerly projection into a text (fantasmatic “understanding” in Nancy’s sense of the latter term, in which the reader finds only what he anticipated in the words of the text) and ideological subjection to a text (with “interpellation” the inverse of understanding, in which the text “comprehends” the reader, taking hold of him or her, as the etymology of the word indicates, commanding submission to its sense and effects—the root of “obey” is in the Latin for “to listen, to lend an ear to, pay attention to,” as if to hear were always already to obey). If psychoanalysis is “the talking cure,” it presumes a listener as Nancy envisions the latter, one who does not rush to understand and impose an interpretation but who instead listens with an evenly suspended attention to everything, with an open mind, all ears, on the lookout (“être à l’écoute”) without quite knowing for what, rather than a little bit deaf to anything other than what the analyst expected or hoped to hear (as in counter-transference, when he projects his own desires onto the analysand in response to the latter’s transferential projections). Nancy’s argument overall is that the senses, including the auditory, must not only “make sense,” or “logos,” but also “sense” or perceive, dwelling with the anxiety arising from an open receptivity to or “resonance” with the world experienced before experience is named and understood, its power to communicate–to connect and move–in surprising ways defensively defused.
If listening to speech strains toward a sense beyond sound, as if speech were first music, and listening to music strains toward a sound beyond sense, as if music always said something–as when it is used to mobilize identities through marches and national anthems–listening aims at or is aroused by the reverberation of sound and sense in and through each other, which produces a crisis of self as an intelligible identity. This notion of listening resonates with Lacan’s discussion of the voice as “objet (petit) a”—object (little) a—the cause of desire as a lack of satisfaction, which sustains the circulation of the invocatory drive around it. Listening subjects do not always hear what they want to—and do not even want to hear what they think they want to; their ears strain to catch unsettling frequencies. Reason is thus undone by resonance, “on edge,” as Nancy says, because on the edge of meaning, sound sensed, not simply made sense of. The subject of listening is finally an echo chamber, summoned by the sounds of the world which resound in it without definitively naming and identifying it; the call of the Other is not yet or only an interpellation and symbolic mandate.
Structurally, multimedia texts involve eavesdropping, as their viewer is also an active and engaged listener, hearing speech and sometimes thoughts represented as private inasmuch as the characters with whom they are aligned generally seem unaware of the presence of auditors (and could not be aware of the theater’s auditors, of course). Just as cinema provides a point of view for the spectator, which has been extensively theorized, so too it provides a point of hearing, a subject of increasing interest to film and media theorists in the last quarter century. In doing so, films orchestrate the placement of sound in the diegesis or world on screen and the theater or other listening space of the spectator. There may be little congruence between point of view and point of audition: eavesdropping is often facilitated through close up sound matched or synced to a much more distant source onscreen, and audiences may be privy to both sides of a telephone conversation even if the camera does not crosscut between the characters talking with one another.
Classically, sound tracks (dialogue, noise, and music) have been mixed so as to privilege human speech as the medium for conveying narrative information, including character motivation; thus, a noisy environment is partially muted so that the auditor does not miss key dialogue. However, the non-diegetic music in some television shows and films of the last ten years sometimes dominates and overwhelms the dialogue track, suggesting a different conception of the contributions of music and dialogue—and, arguably, narrative–to the media experience even of narrative multimedia, which at times resemble music videos. While films typically construct a third person point of audition, so that the auditor (over)hears more than any one character, enabling an effect of omniscience recalling the “dominant specularity” film theorists have ascribed to classic realism and its (re)production of bourgeois subjectivity, films can deploy aural masking or distorting techniques and other signs of a more limited and character-identified point of audition, sometimes signaled by visual signs of a subjective camera, although such sound may also be anchored to a character simply through close-ups of him or her that alternate with views of the source of what is heard (or unexpectedly muted).
Just as dominant specularity has been theorized as empowering the viewer, so too has its auditory analogue, and Mary Ann Doane and Kaja Silverman have developed a feminist analysis of sound in cinema that in many respects echoes Laura Mulvey’s analysis of the image track, showing how women’s voices and ears are as narratively disempowered as their gazes in classic Hollywood cinema. As Silverman explains it, the eavesdropper is like a voyeur, maintaining an authoritative distance and difference from those to whom he listens unbeknownst to them, and the mic is as gendered as the camera in the relay of listening Hollywood cinema constructs, as the film auditor eavesdrops on a female character along with the male protagonist. Women’s voices too participate in a gendered dynamic that is disempowering. Silverman and Chion both argue that horror narratives strive to make women characters scream, reducing their speech to nonsense that confirms their inability to master the diegetic world through which they move. Doane has shown that the authoritative voice-over of classic documentary is typically male because a feminine voice is heard as embodied and therefore partial—biased and limited in its perspective, rather than knowledgeable. Thus we can speak of sadistic eavesdropping as we do sadistic voyeurism, and the scream is perhaps the exemplary instance of an “aural exhibitionism” through which women participate in what for Mulvey is perverse patriarchal desire, as they do when they dress and display themselves for a fetishistic “male gaze.”
While theorists might have extended and developed this reasoning to argue for racist, imperialist, and/or bourgeois relays of listening, for the most part, they have not done so, no doubt because at about the same time as theories of a patriarchal sound track were developing, Mulvey’s work was coming under fire as “universalizing,” insufficiently attentive to differences between men, between women, and between film genres, all of which impact character relations and cinematic reception; from a somewhat different, more psychoanalytic and less sociological perspective, other theorists were arguing that fantasies, including those on offer in cinema, have multiple enunciations, enabling cross-identifications that complicate Mulvey’s assumptions about the reproduction of patriarchal subjectivities through the relay of gazes.
The concept, from the Greek “semeion,” “distinctive mark, sign,” was adapted by Bulgarian linguist, psychoanalyst, and sometime feminist Julia Kristeva from “semiology,” Swiss linguist Ferdinand de Saussure’s “science of signs,” the study of the life of signs in a society, developed independently and at about the same time, around the turn of the 20th century, in somewhat similar terms, by American pragmatist philosopher Charles Saunders Peirce, who called it “semiotics.” Semiotics focuses on “signs,” minimal meaningful elements, and the codes or “grammar” governing their selection or combination into utterances (so that one might speak of a “grammar” of the “language of traffic lights”). “Structuralism,” a dominant critical method in the humanities, arts, and some social sciences from the 1950s through the 1980s, drew on the methodology of semiotics for the analysis of social phenomena as “discourses” structured by a grammar; discursive systems it analyzed included genres, individual poems or stories, menus, Hollywood cinema, and fashion statements, among others. Articulating psychoanalysis with structuralism (including Belgian anthropologist Claude Levi-Strauss’s structuralist work on the language of kinship systems), French psychoanalyst Jacques Lacan theorized that there was a grammar of the unconscious.
Formed by the Symbolic law condensing rules concerning sexual relationships and erotic desire with rules concerning language and its discrimination of identities and differences, the unconscious was characterized by what Sigmund Freud termed the “primary processes” organizing its functioning, “condensation” and “displacement,” which were similar to metaphor and metonymy in the “secondary processes” associated with consciousness and much studied by structuralist linguists and literary critics. Like Lacan, Kristeva put psychoanalysis in dialogue with structuralism and the later poststructuralist critiques of it; however, she uses the term “semiotic” to refer not to rule-governed relations between signs but rather to uncoded and unstable pre-signifying differences that are prior to, prepare for, and problematize the systematic and enduring distinctions signs articulate in the Symbolic order, the mode of relating to reality that Lacan associates with the resolution of the castration complex and the consequent accession to language, a sexed identity, and regulated exchanges (a community’s ordering of identities and relationships).
Whereas the Symbolic binds drive energies (“libido”) in the secondary processes linked to consciousness and the reality principle that restricts pleasure or jouissance, the semiotic comprises rhythmic articulations of light, gesture, and sound (primarily the latter, such as infant laughter, babble, and echolalias) through which polymorphous libido is loosely channeled into preliminary and shifting differences projected into the sensorial continuum, allowing for greater mobility and discharge of impulses directed toward the mother in the pre-Oedipal relation, a fusional space of mother-and-child which Kristeva terms the “chora.” The semiotic is unified and organized in a fashion that anticipates the Symbolic in two key “thetic” moments that establish an “identification of the subject and its object as preconditions of propositionality,” which characterizes the realm of signification. The first is the mirror stage that inaugurates the Imaginary, when the baby invests libido in the image of itself and thus posits a primordial ego that distinguishes it from others (though it continues to confuse self and others until the resolution of the castration complex) and serves as the “Imaginary phallus” the mother is thought to lack and desire. At about the same time, the baby also begins to appropriate signifiers from the demands of the mother and use them for its own demands, which furthers the process of separating itself as a subject from objects through representation and also assigns provisional qualities to the ideal ego and Imaginary phallus, though such preliminary signifiers lack the stable signifieds of mature language use (they begin as rudimentary vocal oppositions like the “ooh” and “ah” that Freud reports his toddler grandson Ernst attached to the comings and goings of his mother).
The second, and definitive, thetic moment occurs when the child resolves the Oedipal complex by resolving the castration complex, separating from the Imaginary phallus that would satisfy the desire of the mother, which made the baby her extension, by recognizing instead that the father has the Symbolic phallus the mother desires. The semiotic is repressed as the pre-Oedipal and Oedipal phallic drives are finally hierarchized and unified under the primacy of the genital and the social regulation of object relations, though it finds an outlet in experimental art in particular, where reified meanings and identities are called into question through a semiotic play with uncoded but rhythmic relations between colors, shapes, gestures, and sounds that disrupt the logic of grammar, the body, and object relations. In developing the notion of the semiotic, Kristeva, like some of Freud’s early 20th century female disciples and other feminists interested in psychoanalysis, ascribes an importance to pre-Oedipal object relations with the archaic mother, and the pre-phallic drives directed towards her, that supplements the emphasis in Freud and Lacan on the role in subject formation of the father, language, law, the phallus, and castration.
The theory of the semiotic also complicates Lacan’s claim about a grammar of the unconscious, since Kristeva links it to the pre-verbal, dyadic, and narcissistic relation with the mother, involving color, gesture, and sound, rather than to the Symbolic, quaternary subject-object relations arising the in the wake of the paternal intervention in that relation, which are mediated by language as the vehicle of substitutive, compromise formations associated with the restricted mobility and discharge of libido in the interests of sociality. The semiotic is closer to the body, the senses, and the drives than is Symbolic language, as with abjection, Kristeva’s other contribution to theorizing the archaic pre-Oedipal, and an engagement with either can unsettle and restructure the defensive reifications of the Symbolic and Imaginary, playing as revolutionary a role as a destabilization of the sociopolitical or economic sphere.
Frequency is the primary acoustic correlate of the perceived pitch of a voice. The fundamental frequency corresponds to the rate of vocal fold vibration, and also to the frequency of the lowest harmonic of the voice spectrum.
Pitch is also partly related to the resonant frequencies of the speaker’s vocal tract, so that the voice of a speaker with a small vocal tract will sound higher pitched than that of a speaker with a larger vocal tract, other things being equal.
The quality (or timbre) of a voice is by definition everything that allows a listener to distinguish two voices that are the same in pitch and loudness. By definition, then, quality is multivariate and multidimensional. The acoustic correlates of perceived voice quality have not been identified (possibly because of the difficulty of the multivariate problem quality poses), in contrast to the well-understood relationships between frequency and pitch, and amplitude and loudness.
Prosody is sometimes defined as “the melody of speech.” At a minimum, it comprises changes in pitch, loudness, speaking rate, and speech rhythm. Prosodic patterns distinguish both individual talkers and speakers of different languages.
As with frequency and pitch, loudness is the perceptual correlate of the amplitude of a sound wave (which roughly corresponds to how much acoustic energy the sound has). In psychoacoustic theory, a sound is completely characterized by its pitch, loudness, and quality.
In contrast to time-domain waveform displays, which show the amplitude of the sound as a function of time, spectral show the amount of energy a sound has as a function of frequency. For a sound with a clear pitch, energy can be found at the fundamental frequency and at every frequency that is a whole number multiple of the fundamental.
The vocal tract is usually defined as including the pharynx, oral cavity, and nasal cavity. It acts as a variable resonator. Movements of the tongue, jaw, lips, and soft palate changes the shape of the vocal tract, which produces changes in the sound that is articulated.
Cognitive, behavioral, and neuroimaging studies all indicate that perception of voice quality is best thought of as a problem in the perception of a complex, integral acoustic pattern. Voices are difficult to analyze into component “dimensions,” and listeners have a very difficult time isolating a single dimension in the complex voice pattern.
The process of producing a voice is called “phonation.” To begin phonating, speakers must first approximate the vocal folds so that they are close together and block the airway. Air pressure from the lungs next blows them apart (so that a puff of air travels through the glottis), and then ongoing rush of air creates a negative pressure between the folds that sucks them back together. These opening-and-closing movements interrupt airflow from the lungs, producing the changes in the air pressure that we hear as sound.
Psychoacoustics is the discipline concerned with the relationship between acoustic signals and the perception of sounds. Psychoacoustic research first flowered in Germany in the late 1800s. William James once stated that the only thing proved by psychoacoustics was that it is impossible to bore a German.
Biomechanics is the study of mechanical properties of human tissue. In particular, the complex layered structure of the vocal folds means that they have very complex mechanical properties that determine whether or not they will vibrate. A combination of biological and engineering approaches has provided many new insights into how phonation begins and how we sustain it across an utterance.
The voice carries a significant amount of information about the speaker. This includes not just the person’s identity (the part that corresponds to their name), but also information about their physical characteristics (age, sex, health), emotional state, regional origin, relative status in a group of talkers, and so on.
The ability to recognize a familiar voice is not unique to humans. Many species recognize the voices of mates, parents, and/or offspring; some (for example, penguins) use voice quality as their sole means of recognizing others. The ability to recognize another who is not a family member is less common, but also occurs in non-human animals. This ability is crucial to the functioning of social groups in many species.
The vocal folds (also called the vocal cords) are tiny, complex layered structures that sit in the neck behind the Adam’s apple. Their primary purpose is protection of the airway: The folds can move together across the airway (for example, during swallowing) to prevent foreign matter from entering the lungs. Because this function is quite basic biologically, the vocal folds are considered an evolutionarily “conservative” structure, and their anatomy and physiology is surprisingly similar across species.
When a recorded audio signal is periodic and its period is within the range permitting audible pitch, it can be expanded into a Fourier series, a sum of sinusoids. The amplitudes of these sinusoids determines the timbre of the signal. (Their phases will change the waveshape of the signal but not its timbre; one can’t hear the phases.) This definition of timbre is only useful in very special situations since most signals aren’t periodic. One can speak loosely of a time-varying timbre in case a signal is nearly periodic (its pitch or waveform are changing slowly compared to the period of repetition.) For signals that do not have a clear pitch, sometimes the (time-varying) spectral envelope can be used to describe timbre.
Subjectively, the word timbre denotes the quality of a sound, so that the pitch, the loudness, and the timbre make a total description of a sound’s character. Of these, the pitch and loudness are somewhat understood and timbre is really a catch-all term for everything that is not pitch or loudness, that is, not understood. In special situations, aspects of the timbre of a sound can be predicted by making measurements of the corresponding audio signal as described above. For example, the disposition of formants in a sound’s spectral envelope can make the sound suggest an audible vowel. That would be considered an aspect of the sound’s timbre.
Resonance, as an acoustical phenomenon, describes the ability of a physical enclosure to amplify incident sounds at particular resonant frequencies. The vocal tract has time-varying resonances which act to draw extra power from the glottis (or other energy source) at those frequencies; a portion of this extra power is projected from the noise and mouth. The radiated vocal sound then may have formants corresponding to the resonant frequencies.
In electronic music practice, resonators, usually called resonant filters, also amplify sounds selectively to relatively enhance one or more resonant frequencies; in this case, though, an external voltage source can provide power needed for amplification. A typical resonant filter has one resonant frequency, whereas physically resonant bodies can have many.
To an electronic musician, a formant is a peak in an audio signals’s spectral envelope. Formants are specified by giving their peak frequency, peak amplitude, and bandwidth. To synthesize a sung or spoken vowel or voiced consonant, one often specifies a fundamental pitch and/or a noise source, and some number of formants the synthesized spectrum should have. To analyze an recorded vocal signal, one measures its fundamental frequency (or determines that the sound is unpitched), and measures the spectral envelope to determine its formants.
Synthesizing a spoken or sung voice is done in two different ways. The traditional way is to model the vocal tract as a source (periodic or noisy), passed through a time-varying filter or filterbank having resonances that modify the spectral envelope to make formants. This can be done using linear predictive coding (LPC) or using Fourier-based analysis and filtering. Charles Dodge wrote an influential computer music piece using LPC in 1972[Dodge 89].
Alternatively, to synthesize vowels or voiced consonants, one can turn to classical synthesis techniques such as frequency modulation to synthesize formants directly, starting with a sinusoid or pair of sinusoids tuned to the desired resonant frequency, and modulating them to achieve the desired bandwidth[Chowning 89].
[Dodge 89] Dodge, Charles. On speech songs. MIT Press, 1989.
[Chowning 89] Chowning, John M. “Frequency modulation synthesis of the singing voice.” Current Directions in Computer Music Research. MIT Press, 1989.
Liveness seems to have different meanings to different musicians and musical researchers [Hagan 14]. From the outset of recording technology until perhaps about 1960, a performance was live to those who personally witnessed it and recorded as it was listened to afterward; recordings were of live performances. More recently, these are heard as “live” (in contrast with recordings of sounds that were never produced on stage but rather were assembled in a studio). Such an artifact is paradoxically called a “live recording” to distinguish it from a “studio recording”.
In laptop performance, even if the music is being generated in real time, the perceived “liveness” of the performance can depend on whether, and how meaningfully, the performer is actually controlling the generation of sound. This could take the form of playing an instrument or adjusting a more abstract collection of controls. Liveness appears less to be a yes-or-no question than to lie along many possible dimensions.
Liveness is used quite differently by acousticians to describe the reflectivity of the surfaces that make up the interior of an enclosed space.
[Hagan 14] Kerry Hagan. “How Live is Real-Time?” Online proceedings, Electroacoustic Music Studies Conference, Berlin, 2014.
Starting in the mid 1940s, recording became understood as a technique for music production as well as merely reproducing pre-existing music. An early pracitioner[Schaeffer 66] listed three potentialities of the new medium: permanence (the ability to fetch a sound later that one had made previously); reproducibility (the fact that, of you have one copy of a sound you can have as many others as you like), and reversibility (the fact that you can play the sound backward; today, we would generalize this to say “manipulability”).
In the digital era, every sound that is manipulated is first recorded in order to get it into the computer, even if it is then used and erased within a few milliseconds. Recorded sounds are streams of numbers, and manipulations on them are described by mathematical formulas.
[Schaeffer 66] Traité des Objets Musicaux, Paris, Éditions Du Seuil,1966.
Available in English translation.
Synthesis is the artificial production of sound. An expansive definition would include all sound-making that does not come from nature or from the human voice, but in common usage the word is usually reserved for electronic modes of sound generation. Synthesis can be described as an alternative to playback of recorded sounds (sampling, for example), and to processing (manipulating sounds in real time).
In the “classical studio era” (roughly, 1950-1965), synthesis was done using electrical equipment originally designed for laboratory use. The mid 1960s saw the appearance of the first synthesizers, which were built expressly for music making. Between about 1985 and 2000, digital synthesizers became predominant, although analog synthesizers still appeal to many musicians.
Although prosody is a quality of speech rather than singing, it is of vital importance to anyone setting text to music. The rising and falling pitches of a melody, along with rhythm and sometimes dynamic markings, impose a particular prosody on the sung text, that can appear natural or not, as a matter of musical choice. Musical settings of text can shift the emphasis of words in a phrase, or even help bring out alternative meanings hidden in the text. When setting tonal-language texts to music, a melodic setting could even contradict the text grammatically.
If a recorded audio signal is nearly periodic (as is much voiced speech), it is said to have a fundamental frequency, sometimes denoted “f0″ in speech research, equal to one over the period of the signal. The signal can then be approximated as a sum of sinusoids whose frequencies are multiples of f0. Unvoiced speech (fricative consonants, for example) does not have a fundamental frequency. Fundamental frequency is a physical property of a sound or a sound recording.
Many algorithms have been proposed to estimate the fundamental frequency of the voice and/or other recorded audio signals. These are useful for all sorts of analytical tasks, and are also used in various pitch-synchronous processing tasks such as pitch shifting. In real-time machine accompaniment of live instruments or voice, the fundamental frequency is usually the most salient information used to keep track of the instrument or voice’s progress through a piece of music.
Pitch is a perceptual quality that, in speech, usually corresponds to fundamental frequency (although unvoiced sounds such as wheezes and some whistles can have a perceptible pitch without any clear fundamental frequency). The pitch of the spoken or sung voice normally varies constantly.
In Western music practice, singing is separated into notes, each of which is assigned a single pitch. If the singing consists of words, individual sylables of the words are often identified with notes, although a single syllable can be held over a series of notes. In any of these cases, there is a complicated relationship between the musical pitch (that of the note being sung) and of the continuous pitch of the voice.
Loudness is a perceptual quality of sound that correlates somewhat with the sound’s amplitude or power, but also depends on its short-time spectrum. The loudness of sinusoids depends on their frequencies and amplitudes according to measured and codified equal-loudness contours. The loudness of more complex sounds is harder to determine, particularly in cases where it may vary in time. The loudness of a sum of sinusoids lying in non-overlapping critical bands is roughly the sum of their individual loudnesses, but the combined loudness of two or more sinusoids within a single critical band, assuming they do not beat audibly, can be estimated as the loudness of a single sinusoid of equal power [Rossing 02].
A recorded signal is a stream of numbers called samples. Each such sample denotes an instantaneous amplitude, theoretically corresponding to the air pressure (or sometimes velocity in a specified direction) at a fixed point in space.
Power is a physical measure of an audio signal, proportional to the square of its amplitude. One can also specify or measure a physical sound’s total power flux through a fixed area in space, which also varies continuously. Power may be averaged over a period of time; such an average is referred to as RMS (root mean square) power.
[Rossing 02] Thomas Rossing, et al., The Science of Sound, third edition. Addison-Wesley, 2002.
The spectrum of audible sound is the interval from roughly 20 to 20000 Hz. Any particular audio signal whose total duration is finite has a power spectrum which gives the distribution of its (finite) power over the audible spectrum. This will be a continuous function of frequency that describes the power density, or power per unit of frequency.
A periodic sound has infinite duration (and infinite total power) but it may be represented by an audio signal describing a single period. The power spectrum can then be identified as the (period averaged) power of each of its harmonics. Such a spectrum is called discrete, although one can also assign discrete spectra to inharmonic sums of sinusoids (whose components are note tuned to any fundamental frequency). When one encounters a discreet spectrum one can assign it an (ill-defined) continuous spectral envelope, made by drawing a smooth curve through the points of the spectrum.
From any audio signal one can also calculate a series of short-time power spectra, each being the power spectrum of the sound windowed into a short interval of time, typically between 10 a 100 milliseconds. The attainable frequency resolution varies inversely with the length of time window chosen. The short-time spectrum of a sound is associated with the sound’s pitch, loudness, and timbre, but not in any simple way.
A periodic audio signal can be represented as a sum of harmonically tuned sinusoids called harmonics. Their frequencies are multiples of the signal’s fundamental frequency, and their amplitudes and phases completely determine the signal.
An audio signal that cannot be analyzed as a sum of sinusoids can, alternatively, be characterized as noise. (These two possibilities, sums of sinusoids and noise, aren’t exhaustive but are rather two extremes with many other intermediate possibilities.) Measuring a noisy signal’s short-time power spectrum will give a random result whose average is the power spectrum of the noise. If this is a constant independent of frequency, the noise is characterized as white, in analogy with the visible spectrum.
White noise can be synthesized as a sequence of independent random amplitudes (samples). Noise that is not white can be derived from white noise by filtering. Noise generators and oscillators make up the traditional palette of synthesis options in electronic music.
Vowels, being relatively long-lived vocal sounds that usually can be assigned a pitch, are the parts of sung words that are assigned notes in musical settings. (In both vocal exercises and in vocal compositions, often a vowel is specified without any surrounding word; the presence of language in singing is optional.)
Consonants are classified as fricatives (generated by moving air turbulently through some part of the vocal apparatus), unvoiced plosives (bursts of air resulting from opening a passage previously closed), and voiced plosives (fast changes in spectrum brought about by opening or closing a passage). Some phonemes considered as consonants (‘l’, ‘w’) are better described as dipthongs, continuous changes in vowel formation. Fricatives and dipthongs may be prolonged at will, but plosives have short, characteristic durations in the tens of milliseconds. The sound quality of plosives (as measured, for example, in a short-time spectrum) depends strongly on the adjoining vowel(s).
There is a pervasive tendency in the academy to assume (and theorize) equivalence between language, writing, and text. Despite the strenuous attempts of many scholars to dismantle this axiom, the academy generally still seems to be in that pernicious historical moment where languages that count are written, where all texts that count are written, and where language is uninteresting outside of that which can be faithfully disciplined via recording (and again, by record I refer to the act of taking something down ‘for the books’). I have listed the three here to define a general problem in thinking, and to suggest that perhaps the myriad of attempts to resist this habit– attempts that, for example, think dancing as a text, or emphasize the fugitive worlds of sound—have not succeed in rooting out this pattern of thought NOT because they are not excellent example of scholarship, but in part because we scholars, on the whole, have failed to recognize this formatting of thought (wherein language = writing = text) as a specific historical move with calculated effects. Works that point to the complicity of literacy in maintaining slavery, the use of spoken and written English in dismantling Native tribes and perpetrating the genocide of Native languages, etc are examples of scholarship that describe well the effects of this set of equivalencies. In listing this axiom here I call for research that examines not only the effects of, but the historic and structural causes inaugurated the ascendancy of this habit of thought.
Rhetoric is, to most, an outdated field of study: one that smacks of the 19th century and other questionable historic fads, such as ethnographic display and say, manifest destiny. Rhetoric persists in the academy however, and could be best described as the artful arrangement of words so as to best convey argument or effect. It is the arranging business of rhetoric that I’d like to emphasize. This is the theatrical element of rhetoric. How words are staged vis a vis one another, vis a vis a listener, vis a vis a speaking body determines how they are received and the effect or affects they can produce. Rhetoric is merely an historic way of saying that form and content are intrinsically intertwined. Historically however, Rhetoric also referred not only to the artful arrangement of the meaning of words, but the theatrical dramaturging of their sense. To clarify, this meant that Rhetoric was invested in how the words (so artfully arranged) were delivered; rhetoric was interested in the performance of the words, and the artful arrangement of the performing body was part and parcel of the artful arrangement of speech. Though today Rhetoric seems to refer only to the limited study of texts and is frequently another way of explaining University wide writing requirements and English curriculum, this earlier understanding of Rhetoric has allowed me to think of voice as never just a biological given, but a social actor whose performance has been artfully arranged by long-standing protocols for social speech.
As an object of study in the humanities, is frequently discussed as existing (confusingly) in between and across the famous, ‘Deconstructionist era,’ dichotomies of presence and absence, liveness and deadness. Voice is at once something that signals the presence of someone or something (the being or thing that voiced), and it is generally assumed that the presence that voices is a live one (computers, smart phones, GPS, talking dolls and text to speech software uncannily complicate this deep-seated association of voice with living being by housing “dead” voices). By the same token, in its ability to be recorded (either in transcribed text or on an analog or digital sound file) voice also has the ability to make itself present in the absence of the original being or object that produced it. Complicating matters, voice travels and can be transmitted, and so can persist as a live phenomenon absented of or outside of its originating body. Voice can also be generated through a present being but outside of that being’s body, as in the case of artificial larynx users or individuals using TTS software. If these threads seem tangled, it’s because they are. These dichotomies (not so dichotomous it seems) were used originally to theorize texts and textuality. While it may no longer be useful to employ deconstructionist approaches to voice studies, it does seem pertinent to ask a larger question: why are questions of ontology (being) traditionally raised by the specter of voice? How do we expect voice to arbitrate questions of being, etc?
As an ablebodied and classically trained singer and actor, articulation to me is that practical space where lips, teeth, tongue and so on (those organs known in anatomy and physiology etc. as the ‘articulators’) combine to shape the sound emanating from the vocal source into a comprehensible/socially recognizable stream of…well: words…or any other utterance that has some degree of meaning. To “articulate!” in both these art forms is a short-hand expression used to cajole a performer into working their speech organs more consciously to produce and communicate a crisper meaning. This gets a bit tricky however, since every good artist knows that communication doesn’t come down strictly to how clearly or perfectly one pronounces or projects. Meaning comes from the ‘how’ something is articulated, not just from the ‘what’ of semantic meaning that is ostensibly the target of clear articulation. When I start thinking about the multiple, socially impacted dimensions of the ‘how’ of articulation, I find Stuart Hall’s “Race, articulation, and societies structured in dominance” a helpful tool. Hall uses the concept of ‘articulation’ to describe how meaning-laden social structures are assembled, gain force, and present the illusion of totality, and, I might add, the illusion of being ‘natural.’ As a researcher my interest in the ‘how’ of articulation is in the meaning transmitted by the material that carries the ‘what’ of articulation for most ablebodied individuals: the voice. When thinking about the power that the ‘how’ of articulation wields through voice, I borrow from Hall’s thinking to trace out how biology and the social work together to give meaning to not what is said, but the way that saying sounds.
For many folks who work with voice day in and day out, the persistent use of voice as a figure or metaphor is tiring and cliché. Voice is something we do, our voice practice is complex, and frequently has little to do with how voice, as a metaphor, is batted around in common understanding. What is ‘the voice of the people’ besides a democratic truism? How does ‘giving voice’ to minoritized communities actually counter real, structural inequalities? The general objection to the use of voice as metaphor is that such figures of speech don’t account for the real, on the ground cultural work voice does. For example, metaphor alone can’t explain how Oum Kalthoum’s voice came to galvanize a particular brand of Egyptian Nationalism. In reaction to a strain of scholarship that has, particularly since the literary turn, thought little of the voice beyond its discursive life, a number of scholars of late have begun to concentrate on the material voice, or voice (and the many persons and things that voice) as a lived and performed entity. What are the material dimensions of voice? How were these dimensions assembled? How to they function and what do they produce? While these questions are of paramount importance in refocusing humanistic discussions around voice, I think metaphor also deserves a second glance. How is metaphor made? Surely all metaphors have material beginnings, or vice versa. How does voice, as an entity with a representational or metaphoric life, intervene in/engage with/negotiate the material realities of figures that speak? Restated, I think the current challenge in Voice Studies is to think how the discursive (generally thought in the Humanities) and the material (generally thought in the Sciences) co-form voice.
Technique is common word amongst artists to refer to an individual’s training in and competency with the aesthetic conventions of an art form. A technique is also a technology that mediates and transforms material conditions to produce a desired effect. Thus, jazz singing, classical singing, or extended singing all seek to literally shape and mold the performing body and to condition its habitus as a means of producing three distinctly different sets of sound with three distinctly different sets of social meaning. Importantly, technique is also required in everyday acts of voicing, and becomes noticeable when one’s voice fails to produce what are understood as normal vocal sounds. We might think of the technique required to sound female, or to sound authoritative and competent – the technique required to sound healthy or the technique required to sound native.
a neurological disorder that affects the ability to process music. Amusia can affect the perception of music (e.g., recognize a familiar melody) or the production of music (e.g., playing a perviously known musical instrument). Amusia can be aquired (e.g., from brain damage) or congenital (present at birth). The term amusia was coined in 1888 by the German physician August Knoblauch. Amusia is thought to be caused by damage to the areas of the brain that are responsible for music, including the frontal and temporal lobes; however, the specific areas of brain damage are related to the specific symptoms and clinical presentations of amusia.
a neurological disorder that affects the ability to recognize familiar voices. Persons with phonagnosia have difficulty recognizing or distinguishing familiar voices. It is interesting that persons with phonagnosia often have a preserved ability to analyze the perceptual aspects of voices and are able to recognize emotions conveyed in voices. Phonagnosia was first described by Van Lancker and Cantor (1982).
There is more than two decades of research about the neurobiological underpinnings of human singing behavior. Singing engages both the vocal motor and sensory networks in highly complex and precise ways. The process of vocalization, beginning with the vocal tract and ending up with discussion about the brain network involved in singing and integration with sensory feedback. Vocal motor control in humans is both hierarchical and parallel based on observations of preserved and impaired skills in persons with brain damage. Music training modifies the basic brain networks used for vocalization.
Our voice changes as we age. Although the most obvious voice changes occur in childhood and adolesence, there are a number of changes that occur with the voice later in life (e.g., after age 60). After the seventh decade, a number of changes in the voice often become more evident. For example, aging voices can become weaker and less reliable (e.g., hoarse) because of structural and physiological changes to the vocal motor and respiratory systems.
Singing was often considered in late-eighteenth-century discussions about the origins of language. Several scholars hypothesized that original languages had musical properties and linked the early languages to singing. That is, from an evolutionary perspective, there were a singing languages prior to the development of a spoken language. For example, the German philosopher and theologian Johann Gottfried Herder (1744-1803) published an essay that considered singing in his writing about the origins of language in 1772 (Herder, 1772). An interest in the singing origins of language continues in contemporary literature (e.g., Brown, 2000)
Community choir can be defined as a choir that draws its membership from a community at large and is inclusive. In the US, choir singing is the most popular arts hobby, and 32.5 million adults regularly sing in approximately 270,000 choirs (Chorus America).
The (often narrational) vocal audio track one sometimes hears layered “over” the image track of a film. This voice can be omniscient, fallible, bodiless, or transiently embodied. It can represent the psychic musings of a character (as in Adaptation (2002)), or perform the function of an extra-diegetic storyteller (Maleficent (2014)). A voiceover can also be ambiguous, troubling the boundary between the world of the film and the world without. Importantly, the vocal track of a voiceover does not actually hover “over” film’s image track. It is printed on celluloid, alongside the visual track.
A term popularized by the work of Michel Chion. In his 1982 book The Voice in Cinema, Chion details the presence of an acousmatic voice in cinema, a voice not linked to a body (ex. a voiceover). In other words, when we hear a voice and yet see no body with which to synchronize it—when we cannot locate a source onscreen for an audible voice—this voice is acousmatic. Accordingly, the being constituted by this bodiless voice is an acousmêtre. When an acousmatic voice finds a body, or becomes associated with a corporeal form, this voice is said to de- or dis-acousmatize.
The correspondence of sound and image, voice and body, spoken words and moving lips, such that the audible appears to come from the visible. In synch sound cinema, synchrony largely means that the movements of characters’ lips “match” the sounds that emanate from them (see the work of Rick Altman). Asynchrony is synchrony’s opposite. In structuralism, synchronic linguistics studies language as a system, whereas diachronic linguistics studies language as an evolving historical phenomenon.
A practice whereby the moving lips of a person or thing are synchronized with the voice of another person or thing, often a person-shaped thing. Popularized in the Vaudevillian period, during which mostly male ventriloquists performed with mostly male-gendered wooden dummies with moveable jaws. Often used as a metaphor for power relations (i.e. Dick Cheney was often accused of acting as Dubya’s ventriloquist, and Dubya accused of serving as Cheney’s puppet). Also used as a metaphor for synch sound film (Altman). Ventriloquism’s frequent metaphoric usage despite its mythological cultural anachronism bespeaks a desire to ascribe a source to every voice (or other expression of power). Many Lacanian thinkers (Chion, Mladen Dolar, Zizek) have asserted that all vocalization is ventriloquism.
A term used by Roland Barthes to describe the materiality of the embodied voice: “The ‘grain’ is the body of the voice as it sings, the hand as it writes, the limb as it performs” (1972). The grain is what is nonlinguistic about the voice, and in a way marks its uniqueness (somewhat like Barthes’ punctum). According to Barthes, much popular singing lacks grain, as though the grain of the popular voice has been burnished to absolute, textureless smoothness. Many critics have taken issue with Barthes’ formulation of “grain,” accusing him of romanticism and the like. Those who believe the voice to be fundamentally ventriloquial will doubtless find this notion hard to swallow.
Produced or delivered by many voices. And these voices do not have to come from separate sources, or be acknowledged as coming from separate sources. For instance, in a Richard Pryor performance, there’s a crucial polyvocality, and yet these utterances visibly emerge from the same body. Similarly, when a ventriloquist performs on the stage, in the traditional Vaudevillian mode, she activates a certain polyvocality in her practice: we know, yet disavow, that the two voices we hear are in fact emanating from the human body before us. And yet do we know this?
A ventriloquial practice wherein the ventriloquist’s lips move one way and her tongue moves another way. The ventriloquist is thus able to produce the effect of asynchrony when speaking “normally.” In other words, when bifurcating, the ventriloquist’s lips don’t move in synchrony with her voice, and the real-time, live effect is the same as that produced by an “out of synch” sound film. Bifurcation is uncanny. It is also at play in all forms of ventriloquism, and, debatably, in all varieties of vocalization.
A term that first emerged in the context of speech act theory (J. L. Austin), and which was later popularized by queer critical theorists Judith Butler and Eve Sedgwick. When an expression is performative, it is both a figure of speech and an action. One paradigmatic example is the “I do” of the marriage ceremony, where the words themselves enact the legality of getting married. Similarly, “I promise” itself carries out a promise. One could say that performativity is an example of the voice’s nonlinguistic force, as much as it exemplifies the power of language.
Popularly understood, “noise” indicates both unpleasant incursion and resistant cultural practice (noisy protest, noise music). Laura Marks suggests that noise indexes the infinite, constituting not merely that which renders images, sounds, and other signals less recognizable, but rather, embodying the “stuff whose patterns we can’t recognize” (2013). Along a similar and different line of thinking, Saidiya Hartman argues for the presence and continued emergence of what she terms “black noise”: a post-slavery sound rife with “wildly utopian,” anti-capitalist longings in the face of a silencing political rationality (2008). Noise remains resistant to a singular definition, and thus pregnant with possibility.