Facial Expression Recognition for Non-native Speakers in Zoom

Exploring Changes in Communication between Native English Speakers (NES) and Non-native English Speakers (NNES) with the Aid of Facial Expression Recognition Feedback in Group Videoconferencing

HCI Research
UX Research
Videoconferencing Software
Master's Thesis

Duration

Aug 2022 - Aug 2023 (1 year)

My Role

Individual HCI Researcher

Software

Zoom, OpenFace 2.2.0, Delve, Figma

Methods

Quantitative & qualitative research methods

Acknowledgment

Advisor:

Dr. Heera Lee

Committee members:

Dr. Susannah Paletz
Dr. Ge Gao

Background

Video conferencing usage increased dramatically since Covid.

In multilingual teams, all members showed respect towards using English
(Gao & Fussell, 2017).

Picture source: https://backlinko.com/zoom-users

Literature Review

3 Steps Caused Confusion in NES-NNES Conversation

STEP 1 NNES were hesitant to ask for clarification

Because they feel embarrassed to admit confusion in English (Van der Zwaard & Bannink, 2016).

STEP 2 Lack of verbal expression for confusion

NES also avoids comprehension checks to protect NNES' images (Van der Zwaard & Bannink, 2020).

STEP 3 NES misunderstood NNES' body language expressions of confusion

1. Confused expressions varied by different cultures (Barrett et al., 2023; Patterson et al., 2023)
2. NES attributed NNES' body language to the wrong reasons (He et al., 2017)

Is there any way to break the process from Step 1 to Step 2?

Literature Review

Meeting Tool in Assisting NNES' Hesitancy

Transcript / Subtitle

Inaccuracy (Hautasaari & Yamashita, 2014)
Cannot concentrate on both transcript and meeting (Pan et al., 2009; Pan et al., 2010)
NNES' need of repairing mutual understanding may go unnoticed (Echenique et al., 2014)

Conversation Agent Giving Speaking Turns to NNES (Face-to-face; Guo & Inoue, 2019)

Even having chances to talk, NNES may still be hesitant to ask for clarification (Van der Zwaard & Bannink, 2019)

Agent Asking for Clarification (Duan et al., 2021)

Only has limited question database

Current tools cannot fully help NNES-NES conversation. Can we help on misunderstanding process STEP 3's 1st point "Confused expressions varied by different cultures"?

Help NES Understand NNES of Different Cultures Communicate

Learning different culture people's ways of behavior, and thinking before communication (Terui & Hishiyama, 2014).

Limitation:

Time-consuming in real-life settings

There's still limitations. Then, can we help on misunderstanding process STEP 3's 2nd point "NES attributed NNES' body language to the wrong reasons"?

A Display Providing Usage of Transcript and Dictionary

Picture: NNES' Awareness Display

A detailed display, showing
(Gao et al., 2015):

· Line of the transcript NNES is checking.

· Words NNES is searching in dictionary.

Inspiration for my Thesis:

· Showing NES dynamics of NNES' confusion.

Existing Confusion Reporting Tool

Existing tools asked audience:

1. Mark questions on the presentation slides (Glassman et al., 2015)

2. Express confusion by submitting  quesitons (Park & Cho, 2014)

3. Mark confusion at corresponding video content (Kim et al., 2021)

4. Report the extent of confusion via a scale (Rivera-Pelayo et al., 2013)

5. Pressed comprehension level button to show on speaker's Google Glass (Zarraonandia et al., 2019)

Limitation:

Use devices (e.g., smartphones), which might interrupt the meeting participation (Park & Cho, 2014)

Body-movement Sensor & Biosensor Technology

  • Alert posture measured by chair pressure pads (D’Mello & Graesser, 2010)
  • Electroencephalography (EEG; (Benlamine & Frasson, 2021))

Limitations

External devices needed to be purchased and implemented.

Comparing result:

Facial expression recognition tool can detect NNES' confusion non-intrusively without external device.

Objective

I investigated how NES and a facial expression recognition tool identified NNES' confusion during video conferencing group meetings, and how the awareness of NNES’ confusion affected the communication approaches NES adopted in the following video conferencing.

Research Questions

RQ 1: To what extent do NES and a facial expression recognition tool recognize NNES' confusion in communication during video conferencing?

RQ 2-1: How does the awareness of NNES’ confusion, as perceived by NES during video conferencing, affect the communication approach of NES when interacting with NNES?

RQ 2-2: How does the awareness of NNES’ confusion, identified by the facial expression recognition tool, affect the communication approach of NES when interacting with NNES?

Participants

Study Procedures

Procedures were adapted from He et al. (2017)

Detect Faces & AU

In my thesis:

I used OpenFace to process the AUs of confused faces of NNES (Baltrušaitis et al., 2018)

Action Units (AU)

  • Represent the facial muscular movements happened for creating facial expressions (Ekman et al, 2002)
  • This could be used to code and identify specific facial expressions

Which AU to use to code Confused Faces?

Cultural differences are valid, but an International Core Pattern (ICP) of AU was found in confused emotion
(Cordaro et al., 2018)

>> test out whether ICP works on my NNES participants' ethnicities (e.g., South Asian and East Asian)

ICP of Confused Faces

(Ekman et al, 2002)

AU 4  Brow Lowerer

AU 7  Lids Tightener

AU 55 or 56  Head Tilt Left / Right

Timeline Charts

NC's Charts include annotations by:

  • NNES
  • NC

NE's Charts include annotations by:

  • NNES
  • NE
  • Tool

Data Analysis

NES and Tool's Accuracy Calculation

I calculated the accuracy with TPR in the Confusion matrix (Buolamwini & Gebru, 2018)

Picture: Confusion matrix (Amin & Yan, 2011)

Thematic Analysis of Chart Review Interviews and Post-interviews

I used Braun and Clarke's (2006) thematic analysis method

  1. Create initial codes
  2. Group codes into themes
  3. Produce narratives including themes and quotes

Results & Discussion

MSCEIT Questionnaire Result

NE's Positive-negative bias score was higher than NC's

It means NE easily read emotion in a positive way.

Accuracy Calculation

Comparison between NES and Tool's Accuracy of Identifying NNES’ confusion

Answering RQ 1 - To what extent do NES and a facial expression recognition tool recognize NNES' confusion in communication during video conferencing?

  • There's no significant pattern on whether NES or the tool has higher accuracy
  • But NES and the tool have different trend
    ◦ Both NES groups show increasing trends
    ◦ The tool's accuracy fluctuated
Thematic Analysis Results

Increased NES’ Accuracy

Answer RQ 2-1 - How does the awareness of NNES’ confusion, as perceived by NES during video conferencing, affect the communication approach of NES when interacting with NNES?

  • NES found out where they recognized wrong and adjust their awareness and behavior
    · “(NN4) was… blinking a lot and staring a lot at the screen. So that is 2 indication, like, you're trying to do something, and you kind of getting confused on how to do it.” (NE2)
Discussion

Fluctuated Tool's Accuracy

NNES' Various Definitions of Confusion

  • E.g., Some NNES interpreted thinking what to say as confusion, but some did not

Limited Universality of International Core Patterns (ICP)

  • This study proves that facial expression may vary by individual or cultural differences
Thematic Analysis Results & Discussion

Reduced Dependence on Information Provided by the Tool

Answering RQ 2-2 - How does the awareness of NNES’ confusion, identified by the facial expression recognition tool, affect the communication approach of NES when interacting with NNES?

Assumption:

  • NE tend to trust NNES' self-reports when it was shown with the tool's data together

    • If applied the tool in real-life situations, NES will fully depend on the tool's data
Thematic Analysis Results

Helpfulness of Various Types of Cues

Compared with the tool, NES can judge based on more cues: verbal cues, non-facial body language, and contextual cues, leading to higher accuracy.

Accuracy Calculation

Individual NE and NC's Accuracy Trend

NC's accuracy is always higher than NE's.

MSCEIT Questionnaire & Thematic Analysis Results

Higher Accuracy of NC over NE

MSCEIT Questionnaire Result:

  • NE easily read emotion in a positive way.

Discussion:

  • NES consider confusion as negative; NE are more likely to identify confused faces as non-confused.
Thematic Analysis Results

NNES Realized Confused Moments in Post-interview

  • NNES might overlooked their confusion when it happened, but realized it while reviewing timeline charts with annotations of NES or the tool.
  • Their incomplete reports will mislead NES.
  • Therefore, it is important to provide correct confusion information to NES.

Potential Facial Expression Recognition Features

Anonymity

Ask for clarification anonymously.

Uses as an Assistive Technology

Develop it to assist attention deficient population.

Automatic Alert System

Real-time notification of teammate's confusion.

Ethical Concerns of the Tool

Racial Equity

(Buolamwini & Gebru, 2018)

  • No racial biases were found in my study between South Asian and East Asian NNES.
  • Support facial expression recognition tool's research in various races.

Real Human Interaction

  • Keep natural human interaction when developing technology intervention

Limitations and Future Works

Bias and Validity Limitations

Self-selection Bias

Bias in self-identifying NES or NNES

Participant Bias

Participants guessed out study purpose by the study title

Cultural Bias

Imagine NNES will have confusion in English

Small Sample Size

Sample size too small to be stated valid

Single Coder

One coder of thematic analysis will decrease the validity

Future Works