The Impact of Laughter in Earwitness Identification Performance

This study examines whether voice identification performance is influenced whilst processing voice identity information by the presence of non-verbal vocalizations such as laughter. Ninety-six participants were exposed to an auditory event of 45 seconds in length presenting verbal and non-verbal information, including laughter. After a delay of 5 minutes, participants took part in a voice line-up manipulated for laughter (speech only, laughter only, or speech and laughter) and target presence (target present or target absent). Supporting the first hypothesis, participants’ performance was significantly worse in the speech alone condition compared with both laughter conditions (laughter alone and laughter with speech). Further, identification performance was best in the laughter only condition. In addition, participants correctly rejected the line-up significantly more in the speech and laughter condition than in the speech alone or laughter alone conditions. Findings are discussed in terms of their implications for real-life earwitness identification parades.

The ability to recognize unfamiliar voices is critical for the criminal justice system in cases involving earwitness identification testimony. Identification performance of once-heard individuals is a difficult task (Yarmey, Yarmey, Yarmey, & Parliament, 2001). This may be because human beings are not equipped to identify voices due to people's over-reliance on visual cues compared with auditory ones (Legge, Grossman, & Pieper, 1984). However, poor performance in earwitness identification tasks may result from earwitnesses being poor at describing voices (Yarmey, 1986(Yarmey, , 1991(Yarmey, , 1994(Yarmey, , 2001Yarmey & Matthys, 1992). As Sapir (1927) explained, "the essential quality of the voice is an amazingly interesting thing to puzzle over. Unfortunately we have no adequate vocabulary for its endless varieties" (p. 896). Indeed, it has been consistently demonstrated that earwitness identification is prone to error as it depends on many factors, speech duration during encoding, familiarity of the language/accent spoken and so on (e.g., Bull, 2001;Bull & Clifford, 1999;Deffenbacher et al., 1989). Despite the recent surge of research in this area regarding the impact of some of these different factors, an issue that remains underexplored concerns the presence of non-verbal vocalizations in earwitness identification accuracy.

The Importance of Non-verbal Vocalizations in Earwitness Identification
Earwitness research has focused almost exclusively on verbal vocalization information (i.e., speech) as the sole cue to voice identity. Non-verbal information such as peripheral information (e.g., gun shots) or non-verbal vocalizations such as laughter has been largely ignored. Whilst it is evident that peripheral auditory information (e.g., gun shots) cannot ultimately lead to voice recognition, such study is important to understand how episodic memory for auditory information works. To date, few studies have attempted to examine voice identification accuracy using absent or distorted verbal information (e.g., person identification based on grunt or voice played backwards).
The paucity of literature that does exist on the usefulness of non-verbal cues in earwitness identification contains conflicting findings. On the one hand, Huss and Weaver (1996) found that verbal auditory stimuli were better remembered than non-verbal ones (i.e., gun shots) in an ecological setting. On the other hand, Van Lancker, Kreiman, and Emmorey (1985) demonstrated that voices whose verbal vocalizations were distorted (i.e., voice played backward) were no less recognized than when played forward. They concluded that cues used to voice identity do not follow a universal law but greatly depend on the individual's voice characteristics and on the listeners themselves. However, this study was performed on familiar voices, which are known to be governed by different cognitive processes than unfamiliar voices (Van Lancker & Kreiman, 1987). More recently, Yarmey (2004) compared identification performance for familiar and unfamiliar voices for different non-verbal vocalizations (i.e., laughter, sigh, cough, moan, grunt and clearing throat). His findings suggested that some non-verbal vocalizations such as laughter led to lower levels of erroneous decisions compared with shouting or sighing for both familiar and unfamiliar speakers. Even though Yarmey (2004) offered a good starting point in terms of earwitness performance based on non-verbal vocalizations, no direct comparisons were made between non-verbal information and normal speech information (more than one word being uttered). Ultimately, this cannot inform the criminal justice system on the potential benefit of incorporating nonverbal vocalizations in voice identification line-ups. Based on rather scarce current evidence, the importance of non-verbal information in voice identification still needs to be established to aid earwitness recognition and ensure better identification performance.
During the many academic discussions the authors had regarding the difficulty of earwitness identification, allusions to laughter became inevitable. Probably these discussions were exacerbated by the fact that one of the authors has such a distinctive laugh that stops her from going incognito in the department's corridor. Indeed, it is not unusual to hear comments about the individual's laugh being described as "wicked" or "raucous" for instance, and it appears that this laugh is characteristic to its owner. Aside from this anecdotal digression, scientific evidence clearly suggests that laughter is an important matter for scientific inquiry and a powerful tool for exploring the mechanisms of speech production (Provine, 2001). Armony, Chochol, Fecteau, and Belin (2007), using a two-stimulus discrimination task, showed that people remembered better emotionally charged vocalizations (positive or negative such as laughing or crying respectively) compared with neutral ones (e.g., yawning). This evidence also supports the idea that auditory emotional expression is likely to reinforce episodic memory. A similar finding concerned facial expressions and face recognition, with fearful faces being better remembered than neutral ones (Sergerie, Lepage, & Armony, 2006).

Why is Laughter an Important Auditory Information?
Laughter is one of the most important universal features of non-verbal vocalizations (compared with grunt, sigh or even yawning) found in human speech and can be found in all cultures around the world (Gervais & Wilson, 2005;Ruch & Ekman, 2001;Trouvain, 2001Trouvain, , 2003. From an evolutionary point of view, laughter existed long before vocal speech-like sounds and it is shared by other species (Ruch & Ekman, 2001). Indeed, primates are able to elicit emotional vocalizations such as laughter (Provine, 2004). Laughter is also present very early in human development, its onset being observed as early as in the fourth month of life (Sroufe & Wunsch, 1972), around the same time as infant vocal babbling, but long before first words production (MacNeilage & Davis, 2001;Oller & Eilers, 1988). These suggest that laughter is not a social construct, but rather an innate behaviour (Ruch & Ekman, 2001), though it serves a highly social purpose. Indeed, laughter is far more common in social interaction than in solitary occasions (Provine, 2004).
Laughter has different functions and takes place in different situations. Obviously, it is commonly associated with happiness and humour but it can also appear in less joyful situations, more as a way to punctuate speech (Provine, 1993;. Two distinctive type of laughter expression has been identified in the literature. Whilst Duchenne laughter refers to laugh bouts in response to humour, non-Duchenne laughter relates to self-generated and emotionless laughter, mostly found in conversational speech (Gervais & Wilson, 2005). Furthermore, this latter type of laughter is common (Devereux & Ginsburg, 2001;Kuiper & Martin, 1998;Truong & van Leeuwen, 2007), occurring on average 5.8 times in a 10-minute conversation (Vettin & Todt, 2004).
Laughter during a conversation is not scattered randomly throughout the speech stream but is usually strategically placed at the end of a statement, like a meta-communicative marker and it serves different purposes (Gervais & Wilson, 2005;Provine, 1993Provine, , 2004Vettin & Todt, 2004). Indeed, the main function of laughter amidst the speech stream is referred to as the punctuation effect (Provine, 2004). Bachorowski, Smoski, and Owren (2001) argued that laughter in conversation is thought to influence listeners by directing their attention. This might have considerable implications for episodic memory and as a result might reinforce memory traces of the voice previously encoded. Additionally, it is thought to be used unconsciously by speakers as a way to tone down or change the meaning of the speech content and promote positive feeling between interlocutors (Vettin & Todt, 2004). That is probably why people are generally unaware of resorting to laughter as a way to punctuate their speech, and when asked to account for it they generally underreport its frequency (Vettin & Todt, 2004). The presence of laughter is also believed to be sometimes a spontaneous response to stress (i.e., nervous laughter) and can signal to listeners the presence of any anxiety (Keltner & Bonanno, 1997).

Acoustical Features of Laughter
The study of laughter alongside speech in voice identification seems to be of paramount importance as it is generally accepted that the acoustic features of laughter present considerable differences compared with monotonic speech (e.g., Bickley & Hunnicutt, 1992). In light of the phylogeny of verbal and non-verbal vocalizations, Ruch and Ekman (2001) noted that laughter requires coordination between respiration, phonation and resonance, but not articulation that is vital in speech sounds.
Furthermore, intra-and interspeaker variability between laughs strongly supports the view that not all laughs are alike . Grammer andEibl-Eibesfeldt (1990, cited in Bachorowski et al., 2001) distinguish laughter in terms of voiced and unvoiced laughter. This earlier classification took into account the idea that laughter is not a uniform stereotyped signal but instead presents considerable acoustic variability that listeners are well equipped to discriminate and produce (cited in . A more recent classification proposed by  identified three types of laughter that each individual is capable of producing at different occasions, namely song-, snort-and grunt-like laughter. Songlike laughter (voiced laughter such as "haha" laughs) is described as multiple vowel-like sounds with fundamental frequency variation (best describes as giggles and chuckles). Unvoiced snort-like laughter is characterized by salient nasal type of sound, whereas unvoiced grunt-like laughs are characterized by friction in laryngeal and oral cavities (see Trouvain, 2003). Voiced laughter and unvoiced laughter are not perceived similarly, with the former being consistently evaluated more positively . More specifically, it is believed that the physical properties of laughter offer enough cues to speaker recognition Knox & Mirghafori, 2007). This evidence offers a good starting point for the present study and has major implications for voice identification research. Based on the evidence reviewed above, it appears that voiced laughter presents the most acoustical properties for person identification and will therefore be used in the study.
It has been evidenced that laughter is common in everyday conversation. Because laughter seems to be such a natural part of speech, it is surprising that this has been little investigated in relation to voice recognition and earwitness identification studies. This study investigates firstly whether laughter bouts alone contained enough acoustical information to enable accurate person recognition. It also explores whether laughter is an important feature, which when combined with verbal information, convey enough supplementary information to establish someone's identity. It is expected that earwitness identification performance will be significantly associated with the presence of both verbal and/or non-verbal information (i.e., the usefulness of voiced laughter alone ("haha" laughs) will be explored and compared with the speech alone and the speech and laughter conditions) (Hypothesis 1).

Target Presence
It has been shown extensively that target absent (TA) line-ups tend to produce less correct identification than target present (TP) line-ups (e.g., Kerstholt, Jansen, van Amelsvoort, & Broeders, 2004, 2006Philippon, Cherryman, Bull, & Vrij, 2007). For example, Van Wallendael Surace, Hall Parsons, and Brown (1994) described the effect of target absence in voice identification as alarming, with only 1 participant of 76 in these authors' experiment being able to correctly reject the line-up that did not contain the perpetrator's voice. Since the literature indicates that it is more difficult to identify that the culprit is not present in the line-up, it is hypothesized that participants would be more accurate when the line-up contains the perpetrator compared with when it does not contain the perpetrator (Hypothesis 2); but also it would be interesting to see whether the presence non-verbal vocalizations and more specifically laughter as investigated in the present study will increase correct rejection of line-ups not containing the perpetrator.

Hypotheses
Participants' performance will show different levels of accurate decisions depending on the type of line-up presented (i.e., a line-up that contains speech with laughter, speech only, or laughter only) (Hypothesis 1).
Participants will perform significantly better in TP conditions compared with TA conditions (Hypothesis 2).

Design
An independent 3 (type of line-up cues; speech alone, speech and laughter, and laughter alone) Â 2 (line-up type; target presence and target absence) design was used. The dependent variable was identification performance. Non-Duchenne, voiced laughter was chosen as it is believed to be specific to each speaker and it is known to be a common form of non-verbal behaviour in social situations (Bachorowroski et al., 2001;Vettin & Todt, 2004).

Participants
The participants were 96 undergraduate psychology students, native English speakers, recruited via the participant pool system in exchange for course credits (43 male and 53 female), aged between 19 and 30 years. None had hearing impairments.

Voices
The recorded versions of the materials in this study were generated by seven males, aged between 20 and 23, recruited by convenience sampling on the university premises. Screening the voices for distinctiveness is deemed to be difficult when investigating verbal and nonverbal vocalizations (i.e., laughter). Indeed, one person's laughter could be seen as unusual, whereas that same person's voice when talking could be the most typical and vice versa. However, the voices to be used did not present any speech impediments or unusual accents and the researchers assumed that they did not show any signs of typicality or atypicality that would make any of them easy to recognize or stand out from the rest of the voices (as demonstrated by Mullennix et al., 2011).
However, in order to control for the effect of individual voices, two target voices were used in the to-be-remembered event. The two target voices varied across participants in an attempt to control for the effect of individual voices (Vanags, Carroll, & Perfect, 2005;Philippon, 2006). The two target voices were chosen on the basis that their acting was the most realistic compared with the other voices. The voices used as target replacement in the TA conditions were selected on their similarity to the target voices (based on the speech part), as an attempt to ensure fairness (Hollien, 1990). Six foils were used for the line-ups and the presentation order of the line-ups was manipulated so that the voices appeared in all of the various line-up positions except being placed first and last (Cook & Wilding, 1997).

Materials
The to-be-remembered events, exposing the voice to the participants, consisted of sound clips (one for each target voice) presenting a one-sided telephone conversation. This preserved the realism of the situation by presenting the event in a context that may happen in crimes involving earwitnesses. The telephone conversation was presented as one that could be overheard in a public place. The speech material (a telephone conversation) consisted of one-side of a dialogue that one might have in a conversation with a partner in crime including several pauses in order to ensure that it sounded as realistic as possible (see Appendix 1). It was 163 words long and lasted 40 to 45 seconds. Four of the sentences were constructed in a humorous manner in order to induce laughter. The type of laughter studied here concerns voiced laughter as it is thought to present the greater interspeaker variability . The speakers were instructed where to laugh when acting the script but not how to laugh as the former was restrictive enough in terms of sounding natural.
Speech samples for the line-ups followed the same format as the crime event (i.e., oneside telephone conversation) (see Appendix 2). The same speech material was used in the speech only and speech with laughter conditions, apart from the laughter information being digitally edited out in the speech only condition. Similarly, in the condition with laughter only, the conversation (verbatim) was digitally edited out, so that the laughter only remained. Even though this method is advantageous in terms of similarity between the different auditory stimuli used (i.e., the same extracts were used in the three different conditions), its main disadvantage resides in the fact that the exposure to each foil is inherently inconsistent between conditions (laughter only, speech only and speech with laughter). Exposure to each foil was between 20 and 25 seconds long (84 words long), whereas this was reduced to 15 seconds in the laughter only condition. The number of foils presented in the voice line-ups is based on relevant findings from previous research in this area and the minimum number used in real-life line-ups (e.g., Bull & Clifford, 1984;Clifford, 1983).

Procedure
Participants were tested individually. They were randomly assigned to one of the six experimental conditions (TP speech only, TA speech only, TP speech and laughter, TA speech and laughter, TP laughter only and TA laughter only). The study took place in the auditory laboratory in the Department of Psychology. All participants were asked to listen to a telephone conversation, which involved information about a non-threatening crime event. They were specifically instructed to pay close attention to the voice and details about what was being said. It is true that in real-life witnesses are seldom prepared, but given the cocktail party phenomenon (Cherry, 1953) it is likely that listeners would pay careful attention because of the content. However, this is difficult to achieve in experimental research apart from instructing them to pay attention to peripheral details as well, such as speech content in the present study.
After listening to the event, participants were given a 5-minute filler task (i.e., crossword). This short filler task was decided on in order to distract participants from the original task as would happen in real-life but bearing in mind the time constraints of the experiment. Straight after the filler task, participants were instructed that they would take part in a voice identification task. Such a short delay between exposure and identification was decided as an attempt to make the task as easy as possible in order to avoid a floor effect regarding the different events. They were instructed to write down any details about the voices and voice number as they were listening to the line-up in order to facilitate decision-making due to the difficulty of the task. Different sound materials (from the ones used during exposure) were used for the identification task as an attempt to be consistent with real-life issues and thus to increase ecological validity. Indeed, in real-life it is impossible to recreate the original event.
The instructions also emphasized that each voice would be preceded by a number for later identification, that the line-up would be played twice (to mimic real-life voice identification parade), that they would have to make an identification at the end of it but also that the voice might or might not be present and that as a result they did not have to choose a voice.

Preliminary Analyses
Preliminary analyses investigated whether identification performance differed between the two target voices used. A chi-square analysis revealed no significant associations between identification performance and the target voices, x 2 (2, n ¼ 96) ¼ 0.480, p ¼ .830. Indeed, 37% accurate decisions were obtained for target voice A compared with 33% for target voice B. The lack of association between the different target voices used and accuracy was also confirmed when looking at the different conditions separately (x 2 (1, n ¼ 32) ¼ 0.237, p ¼ .626 in voice only; x 2 (1, n ¼ 32) ¼ 0.533, p ¼ .465 in voice and laughter; and x 2 (1, n ¼ 32) ¼ 0.125, p ¼ .723 in laughter only conditions. Therefore, the data for both target voices were collapsed together.

Identification Performance
In TA conditions, participants can either correctly reject the line-up (hit) or incorrectly identify a foil (false alarm). However, in TP conditions participants' incorrect decisions can either be to identify someone else than the culprit (false alarm) or incorrectly reject the line-up (miss). A chi-square analysis revealed no significant associations between the different decisions (i.e., false alarm and miss) and the different event line-ups, x 2 (2, n ¼ 26) ¼ 0.14, p ¼ .993. Indeed, "miss" rates were similar across conditions (Table 1). Based on the above analysis and the identification literature from eyewitness and earwitness studies (e.g., Memon & Rose, 2002), the decisions in the TP conditions (i.e., hit, false alarm, and miss) were collapsed into correct and incorrect responses so that TP and TA data could be compared more evenly. It is evident that such categorization presents its own limitation as it confounds witnesses falsely identifying someone from the line-up and witnesses who are aware of being unable to remember who the perpetrator is. However, all the participants in the present study reached a decision on the identification task.
Then a 3 Â 2 Â 2 hierarchical log linear analysis (HILOG backward elimination procedure) was performed to look at the effect of TP/TA and event (speech only, laughter only or speech with laughter) on identification performance. Supporting the hypothesis, there was a significant effect of event (x 2 (2, n ¼ 96) ¼ 7.406, p ¼ .025). Surprisingly, participants were more likely to be accurate in both the laughter condition only (53%) and the combined laughter and speech condition (53%), compared with the speech only condition (16%). Also, as hypothesized, a significant effect of target presence was found (x 2 (1, n ¼ 96) ¼ 6.750, p ¼ .009). As expected, participants in the TP conditions were more accurate (46%) than participants in the TA conditions (23%) ( Table 2). No other factors contributed to the model.

Discussion
The present study indicated that participants are less likely to correctly identify voices only speaking than speaking and laughing combined or only laughing. This is consistent with the first experimental hypothesis and it supports the idea that laughter is an important feature that enables people to discriminate between voices. Even though overall performance was similar in the two different laughter conditions, laughter with speech resulted in higher proportions of correct decisions in TA line-ups only, whereas laughter only led to more accurate identification in TP line-ups.
The superiority of non-verbal vocalizations can be explained by the idea that participants were not able to proceed to a discrimination task for both laughter and speech acoustical properties, therefore resulting in paying attention only to one feature, which led to more false alarm in TP line-ups. However, the combination of speech with laughter was more beneficial for correct rejection of TA line-ups as more acoustical information might have assisted the participants in a better discrimination task.
The current findings are consistent with the existing literature concerning the variability of laughter  and the potential benefit of laughter in voice identification performance (Yarmey, 2004). This indicates that laughter is indeed distinctive enough to each individual and might carry sufficient information that is necessary in the recognition of unfamiliar voices. The fact that participants correctly identified more voices in the laughter only condition might be explained by the idea that it is difficult to stereotype laughter bouts compared with speech where superficial interspeaker variability such as accents might play a large role in confirmation bias (Dixon & Mahoney, 2004).
Even though the present study is exploratory and replications of these findings are necessary to further establish the impact of laughter on voice identification, these findings have major implications for the criminal justice system and earwitness identification research. It is important to note, however, that such findings need to be replicated using less favourable conditions, such as a longer time delay (as it would happened in an applied setting), in order to assess fully the benefit of non-verbal vocalizations in identification decisions. Future studies using a more realistic time delay might be able to further explore whether laughter is remembered better than speech sounds in relation to voice identification. If nothing else, the current findings clearly demonstrated the benefit of looking at non-verbal vocalizations in voice identification research. It further suggests that in real-life it might be beneficial during the interview process to investigate whether laughter or any non-verbal vocalizations was uttered during the event and, if so, whether its inclusion in a later voice identification parade might be valuable in an attempt to increase earwitness performance reliability. Clearly, the scope of this study is only limited to reallife cases in which the culprit would exhibit bouts of laughter during the initial encoding. Based on the current finding and those of Read and Craik (1995), it seems vital to recommend to the criminal justice system that similar stimuli as in the initial exposure and therefore at encoding should be presented at retrieval, even non-verbal vocalizations. Of course, the use of laughter bouts is not recommended for identification tasks in which laughter was absent during initial exposure, but it may be that other types of non-verbal utterances present at the initial exposure to the voice may help. This is a topic that seems worthy of further research.
Even though the benefit of incorporating non-verbal vocalizations seems evident in relation to voice identification performance, it is also important to emphasize that the use of both verbal and non-verbal vocalizations in earwitness identification is problematic for two reasons. A voice identification task that combines both verbal and non-verbal vocalizations will make the construction of lineups more complex to achieve. It is common procedure to select foils according to voice profile comparisons with the suspect, which is usually based on verbal information (Yarmey, 1991). However, and based on the idea that laughter is characteristic to each individual (Knox & Mirghafori, 2007), one can ask whether a line-up that is fair to the suspect based on speech information will be fair regarding laughter bouts. This is indeed an inherent limitation in the present study and the authors are aware that it may partly explain the superiority of non-verbal vocalizations on performance, though none of the voices (speaking or laughing) stand out from the others (based on the proportions of each voice being picked out). Future studies using a more rigorous line-up construction process might be needed to fully investigate the benefit of non-verbal vocalizations on earwitness identification performance. A second problem inherent in research dealing with laughter concerns the type of laughter bouts used. Indeed, as evidenced by , each individual possesses a varied repertoire of laugh bouts. Therefore, the use of similar types of laughter bouts during encoding and retrieval is thought to be necessary. Thus, this would require that witnesses are capable of describing laugh bouts fairly accurately and that the person in charge of the line-up construction is also able to use the witness's description to identify the type of laughter to include in the identification task. Moreover, there is always a degree of uncertainty that the suspect might disguise his/her distinctive laugh. Whilst this study exclusively concentrated on voiced laughter, it might be interesting to further investigate whether unvoiced laughter have similar level of acoustic properties that is necessary for speaker identification.
The present study also found participants would be more accurate when the perpetrator is present in the line-up. Even though higher proportions of incorrect decisions were found across the verbal and non-verbal conditions, the highest proportions of misidentification were found in the speech only condition. This is consistent with the existing literature examining target presence (e.g., Kerstholt et al., 2004;Van Wallendael et al., 1994). Interestingly, similar proportions of correct decisions were found in the TP and TA conditions for speech with laughter line-ups. This could be explained by the idea that the combined features add discriminatory power during voices comparison. This further suggests that the more vocal features are presented, the less likely voice identification will be erroneous, especially when the suspect is not the culprit. However, this needs to be investigated further.
As Hollien (2002) contended, the high proportion of misidentification in TA line-ups might be partly due to earwitness' belief that the perpetrator must be in the line-ups, especially in real-life. The obvious attempt at trying to inform mock and real earwitnesses has been to instruct them specifically that the "culprit's voice" might or might not be presented in the line-up (Hammersley & Read, 1983;Hollien, 1996). This was done in the present study. Given the low proportion of accurate rejections of the line-up, one can ask whether witnesses are simply unable to make such a decision, regardless of the difficulty of the task, or whether the instructions given are not practical enough to persuade them that the suspect is not the culprit. In light of this, it seems necessary to investigate the benefit of additional instructions and/or safeguards such as using a two trial identification task, as described by Nolan and Grabe (1996).

Conclusion
The present study confirmed that voice identification is a difficult task with identification rates being below chance level when the information presented only contained verbal vocalizations (i.e., speech only), therefore suggesting that such line-ups are limited and do not offer enough acoustical features for accurate retrieval of the voice previously encoded. Importantly, however, this exploratory study found that earwitnesses' ability to identify voices correctly when the line-up presented the non-verbal vocalizations of laughter led to better performance even when this was not combined with verbal vocalizations in TP line-ups. Even though these specific findings relate only to earwitness identification situations in an applied setting where laughter was present at the encoding stage, it provides a basis for new avenues to be explored. Ultimately, findings concerning non-verbal utterances in earwitness identification research may provide a clearer picture to inform the criminal justice system on the validity of including this type of information in line-ups.