Buy Auditory Scene Analysis: The Perceptual Organization of Sound on Amazon. com ✓ FREE SHIPPING on qualified orders. Albert S. Bregman (Author). Stewart H. Hulse, “Science” Auditory Scene Analysis addresses the problem of hearing complex auditory environments, using a series of Albert S. Bregman. In perception and psychophysics, auditory scene analysis (ASA) is a proposed model for the basis of auditory perception. This is understood as the process by which the human auditory system organizes sound into perceptually meaningful elements. The term was coined by psychologist Albert Bregman.
|Published (Last):||16 March 2006|
|PDF File Size:||1.63 Mb|
|ePub File Size:||6.58 Mb|
|Price:||Free* [*Free Regsitration Required]|
This massive tome is the culmination of more than two decades of research by one of the leading figures in auditory perception audtiory Albert Bregman.
Al Bregman’s Website
Over the years, a constant series of papers has issued forth from Bregman’s Montreal lab — nearly all dealing with the formation of auditory images. In this book, Bregman has brought it all together and given us a lucid and masterful theory of how listeners make sense of the world of sound. Bregman begins by asking the question, what is the purpose of perception?
He suggests that our perceptual faculties evolved as a means of allowing us to construct a useful representation of reality. Perception is functional and ecological — providing us with the what, when, and where of the events around us. The primary task of the auditory system is to arrange the cacophony of frequency wisps into meaningful clumps that correspond to various real-world activities.
In short, the act of hearing may be likened to the work of a cartographer constantly drafting maps of the auditory scene. Some sounds such as the slamming of a door mark the occurrence of unique events. But the world of sound is not merely a succession of momentary incidents.
Even discrete sounds — such as a series of footsteps or the dripping of a tap — are often caused by an on-going coherent activity. Most sounds have a lineage or history.
The mental images we form of such “lines of sound” Auditlry has dubbed auditory streamsand the study of the behavior of such images is the study of auditory streaming. Since the recognition of events depends upon the proper assignment of auditory properties to audditory sound sounrces, auditory streaming is fundamental to the process of scene analysis — which in turn is fundamental to music perception.
There is a long history of research pertaining to the formation of auditory images. In the realm of music, the work of Leo van Noorden is especially outstanding. However, none of the above researchers have pursued the topic with such sustained conviction and in such detail as Al Bregman.
Auditory streaming entails two complementary domains of study. How sounds cohere to form a sense of continuation is the subject of stream fusion. Audotory more than one source can sound concurrently, a second domain of study is how concurrent activities retain their independent identities — the subject of stream segregation. In general, individaul sounds tend to coalesce into a single percept in anapysis to the physical correlations shared by the parts.
In addition, when sounds evolve with respect to time, it brefman possible for them to share similarities by virtue of evolving in the same way. In Gestalt psychology, this perceptual co-evolution of parts is known as the principle of common fate. Bregman has pointed out that the formation of bregmann auditory stream is governed largely by this principle.
However, one difficulty with this view is that aalysis location is a relatively weak factor contributing to auditory streaming. One would have thought that location would provide the strongest cue in the construction of an ecological representation since one of the best generalizations that can be made about independent sound sources is that they normally occupy distinct positions in space.
Bregman suggests that due to reverberation and the transparency of sound, localization cues are comparatively unreliable p. Although reverberation can indeed confound localization, the arguments here are not especially compelling.
There is no mention of the Haas or Precedent Effect, or citing of the literature demonstrating surprisingly good monaural localization abilities. The relative unimportance of localization in stream formation suggests that the ecological account may be incomplete. An important distinction Bregman makes is between primitive segregation and schema-based segregation.
Primitive segregation is a bottom-up process whereby streams are parsed according to the correlations of acoustical cues. By contrast, scheme-based segregation is a top-down process that arises from experiential and cognitive factors.
Schema-based streaming is characterized by voluntary or effortful listening — an active “hearing-out” for a given pattern. Bregman postulates several differences by which primitive streaming can be distinguished from schema-based streaming. He suggests that in primitive streaming all frequencies will be assigned to one or another stream with no “unstreamed” residual components.
Auditory Scene Analysis: The Perceptual Organization of Sound – Albert S. Bregman – Google Books
In schema-based streaming,fusion of the foreground elements does not automatically result in the collective fusion of background elements. In other words, schema-based streaming may leave isolated “embellishment tones” that do not themselves cohere.
Primitive and schema-based streaming differ also with regard to tempo. In primitive streaming, increasing the tempo of presentation always enhances the with-stream integration and between-stream segregation.
However, in schema-based streaming, Bregman suggests that beyond a certain tempo, increasing the speed of presentation may tend to worsen the perceptual integration of the target stream since recognition of a familiar or predicted pattern may be lost.
One factor that Bregman claims does not contribute to schema-based segregation is pitch trajectory. The evidence against the auditory system extrapolating an existing pitch trajectory is both impressive and initially counter-intuitive. This phenomenon contrasts with the situation in vision where extrapolated motion is fundamental to the maintenance of visual images.
Bregman suggests that there is some merit in the visual system expecting objects to behave in accordance with Newton’s first law regarding momentum.
But sound sources have no reason to act this way: In fact, the opposite could be true. Bregman neatly summarizes these results in the motto “interpolation not extrapolation”.
In the chapter concerning auditory organization in music, Bregman suggests that music may be regarded as a sort of “auditory fiction”. Contrasted with other listening experiences, musical streams do not necessarily correspond with real sources in the world. Auditorry course individual instruments such as trumpets and violins are truly real sources, but musicians like to combine such sources to form supra-source objects such as multi-instrument “voices”.
Stephen McAdams has called these virtual sources ; Pierre Boulez has called them phantasmagoric instruments ; Bregman proposes the term chimeric percepts: We use the word chimera metaphorically to refer to an image derived as brdgman composition of other images. An example of an auditory chimera would be a heard sentence that was created by the accidental composition of the voices of two persons who just happened to be speaking at the same time. Natural hearing tries to avoid chimeric percepts, but music often tries to create them.
It may want the listener to accept the simultaneous roll of the drum, clash of the cymbal,and brief pulse of noise from the woodwinds as a single coherent event with its own striking emergent properties. The sound is chimeric in the sense that it does not belong to any single environmental object.
There is plenty of evidence to suggest that a melody is a species of auditory stream. There is similarly plenty of evidence indicating that polyphonic music-making accords with the principles of auditory streaming. Bregman does not directly address the perception of homophonic textures, although he does suggest that music may be conceived in terms of hierarchies of streams.
Partials may cohere into tones while tones may constitute chimeric entites normally called chords. Since chords may presumably be perceived as single entities, chord sequences might be able to form a single stream. There anaoysis several unexplored repercussions to this view.
Normally the formation of a stream is signalled audltory, 1 the opacity of its constituent parts, and 2 the concurrent appearance of the emergent properties of the new whole. But how can a steam be hierarchically constituted of subordinate streams if its parts are supposed to be opaque? A good response might be that a stream can be regarded as an object of attention at whatever level: In this case, when perceiving a chord, the amalgamated partials of a constituent chordal tone don’t really form a stream per sebut rather form a potential stream that is realized only with a shift of attention from chord to alberh tone.
Since Bregman proposes that primitive streaming is pre-attentive, the brrgman is that hierarchical stream organization is necessarily schema-based or at least attention-driven. One of the most musically innovative ideas in the book is the theory of dissonance developed in conjunction with James Wright.
Wright and Bregman suggest that when two concurrent tones are captured by independent streams, their potential dissonance is suppressed sceen neutralized. Thus the degree to which a major seventh interval is perceived as dissonant depends upon how well the constituent tones are integrated into their respective horizontal voices. The theory is illustrated in Figure from page Due to the close within-voice pitch proximity the two diatonic scales in Figure 1a segregate well from each other.
At the same time there is little or no perceived dissonance in example 1a. However, if the four through seventh intervals are extracted and rearranged so as to reduce the pitch proximity and so reduce the horizontal streaming the dissonances become evident Figure 1b. From this principle a full-fledged theory of non-chordal notes is developed. Wright and Bregman propose that the potential dissonance arising from non-chordal notes is controlled by ensuring good streaming.
In practice, this means that most non-chordal notes will maintain close within-voice pitch proximity i. Passing notes, neighbor tones, suspensions, and anticipations all conform to these stringent streaming conditions — whereas appoggiaturas and escape tones conform less well to the pitch proximity constraints. The most common types of non-chordal tones appear to be those that most contribute to the within-voice stream fusion. Wright has suggested that the increasing dissonance over the course of the history of western music is reflected in the manner by which dissonant intervals are prepared.
In short, the historical increase in musical dissonance is less attributable to the increasing prevalence of dissonant vertical moments, and more attributable to the weakening of horizontal streaming. Stumpf argued that the degree of perceived consonance in intervals is proportional to their tendency to fuse into a single percept.
Intervals exhibiting simple frequency ratios are especially prone to tonal fusion Verschmelzung — and hence are perceived as being most consonant. Stumpf later retracted this explanation.
With the classic paper by Plomp and LeveltHelmholtz’s theory of consonance and dissonance arising from the aggregate beating of adjacent partials was vindicated — with an important modification arising albret the influence of critical bands.
Auditory scene analysis
First, consider the notational examples in Figure 2. There is little difficulty hearing the two tones constituting the major seventh interval in Figure 2a. By adding the “e” and “g” in Figure 2b, two things happen: More importantly, by adding pitches, the dissonance of the major seventh interval has been considerably softened. Second, Wright and Bregman argue that the existence of a good horizontal streaming permits the addition of non-chordal tones without suffering the penalty of undue dissonance.
It may be that the goal of good horizontal streaming leads composers to add non-chordal tones in order to enhance the voice segregation. In short, the purpose of non-chordal tones may not be to add the spice of dissonance without being too spicy ; an equally plausible explanation may be that non-chordal tones are used to enhance the horizontal fusion of individual voices. The fact that even monophonic melodies make use of “non-chordal” tones such as passing tones lends credence to the idea that part of their purpose is to enhance horizontal streaming rather than to add dissonance.
Without doubt, this volume is destined to be a classic treatise in hearing sciences. However, unlike much of the hearing sciences literature, Auditory Scene Analysis deals with the type of higher level issues that begin to intersect significantly with truly musical concerns. The book’s sheer length and technical detail is apt to intimidate many musician readers, but I cannot recommend this book too highly for serious scholars of music perception. Auditory Scene Analysis is a first-rate reference work that provides an exhaustive account of the state of research concerning the formation of auditory images.
The book also articulates a unique and visionary theoretical framework that is bound to inspire a great deal of further research.