Page 12 Cite

Suggested Citation: "Methods." Lam, M., A. Falcon, and N. Merhill. 2023. Approaches to the Evaluation of Sexual Harassment Prevention and Response Efforts. Washington, DC: The National Academies Press. doi: 10.17226/27267.

Methods

Many different evaluation methods and tools can be used to assess the effectiveness of an intervention, and selecting the appropriate one depends on a variety of factors, including the specific goals and objectives of the intervention, the resources available for the evaluation, and the target audience for the findings. The first step is to clearly define what is being measured and why, and then consider the relevant strengths and weaknesses of evaluation approaches. And because most change is incremental, phased measurement is important to determine the effectiveness of the intervention and whether some elements need to be adapted or discontinued so that the desired goal can be reached (Perry, 2020). To this end, Nation et al. (2003) indicate that ongoing evaluation is necessary to maintain the effectiveness of interventions over time.

Key considerations when selecting an evaluation method or tool include the following:

Formulate the evaluation questions that the evaluation is intended to answer. What does your institution want to learn about a particular program? What are the goals of the intervention? Consider making the questions specific, measurable, and bound by a reasonable time frame.⁸
Identify the target audience for both the intervention and the evaluation findings. Some methods are more or less appropriate for use with certain populations of interest, and similarly, the audience for your findings may expect certain data or figures.
Consider the availability of resources including time, money, and personnel. For those with a fixed budget, limited resources may inform the evaluation methods used. For those without a fixed budget, the needs and goals outlined in a proposed evaluation plan may inform the resources allocated.
Consider the feasibility of implementing the method or tool given the resources and constraints of the evaluation.
Contemplate the ethical issues that could arise from implementing particular methods or tools, including confidentiality, consent, and potential harms.

Approaches to program evaluation and frameworks for sexual harassment interventions have been previously described by Perry (2020), Crusto and Hooper (2020), and Soicher and Becker-Blease (2020). This section builds on that work by discussing a subset of potential methods and considerations for use.

Survey Methods

Surveys, which can be used to collect both quantitative and qualitative data, are one of the most commonly used study methods. They may include the more traditional mail-in, paper copy questionnaires; the popular

___________________

⁸ For example, one question for a prevention program might be “Does this program reduce rates of reported sexual harassment 6 months after implementation?” One question for the addition of more informal reporting mechanisms may be “Does this procedural change result in more requests for supportive measures in the 2 years after launch?”

Page 13 Cite

Suggested Citation: "Methods." Lam, M., A. Falcon, and N. Merhill. 2023. Approaches to the Evaluation of Sexual Harassment Prevention and Response Efforts. Washington, DC: The National Academies Press. doi: 10.17226/27267.

online self-administered questionnaires; the more expensive interviewer-administered questionnaires; and the multimethod Computer Assisted Telephone Interviewing (CATI). The style of questions included in the survey may also vary broadly from open-ended paragraph responses (e.g., “Please describe your experience with Student Affairs.”) to fixed check-boxes (e.g., “Please select the option that best describes your experience with Student Affairs.”) to interval numerical responses (e.g., “How many times in the past year have you interacted with Student Affairs?”). Similarly, research has shown that survey participants will respond differently to questions describing perpetrating sexual misconduct as opposed to ones that simply ask whether they have perpetrated sexual violence (Edwards et al., 2014; Koss, 1998). Some of the subsequent methods described in this section may be considered part of the survey “umbrella,” but are pulled out specifically to provide more information about their usefulness to the field of sexual harassment evaluation.

Surveys are frequently used to examine the prevalence of sexual harassment, related factors, and institutional climate and culture,⁹ “but they are not well suited to building a deep, more personal knowledge of a given topic” (Guest et al., 2013, p. 27). The use of surveys in the evaluation of sexual harassment interventions can take numerous forms, including as part of a needs assessment (e.g., paired with text analysis and focus groups) to determine where best to direct and focus programming and interventions, and as a means of assessing efficacy by comparing results between surveys that were administered both pre- and post-intervention (referred to as a pre-post survey). Pre-post surveys are frequently used before and after trainings (such as bystander intervention education) to determine changes in knowledge, attitude, perceived competence, readiness/preparedness, and/or confidence. Surveys that are repeatedly administered over time (e.g., quarterly, annually) provide longitudinal information about changes or trends in a population, and can be one way of continuously evaluating programming. For interventions with long-term goals, such as institutional climate change, longitudinal surveys that can capture trends over the course of several years are more appropriate than simple pre-post surveys.

Surveys have the potential to be one of the easier methods to use because validated surveys¹⁰ exist, they can be conveniently administered with online tools (i.e., Qualtrics, Redcap, Voxco, or SurveyMonkey), and the data can often be exported directly to aggregate files such as XLSX or CSV¹¹ to be used in analysis. This makes them a good choice for evaluation designs that prioritize conserving time and budget. However, even though surveys have the potential to be less labor intensive than other methods, tailoring them to unique campus needs or investing in rigorous analysis of the data can quickly expand a budget. Survey revisions that change the survey structure, including changes to item or section display logic and patterns in which items or sections appear, will greatly increase the time needed to develop and finalize the survey instrument.

When opting to design a tailored survey, either inhouse or in collaboration with a consulting firm, it is important to work with subject-matter experts in the development and validation of the survey instrument.¹²

___________________

⁹ For discussion of surveys in sexual harassment, see Chapter 2, “Sexual Harassment Research,” in Sexual Harassment of Women (NASEM, 2018); see also the Guidance for Measuring Sexual Harassment Prevalence Using Campus Climate Surveys (Merhill et al., 2021).

¹⁰ “Validated surveys” refers to surveys that have been specifically designed and tested to correctly measure what they intend to measure, meaning that institutions do not have to recreate the wheel and design their own surveys; they may simply select an existing one that has already been tested.

¹¹ XLSX and CSV are file formats used to store data; the former can store complicated data that may include formatting and calculations, while the latter can only store text, but can be opened in a wider variety of programs.

¹² For more information on the importance of involving subject-matter experts in survey design, see Chapter 2 of Sexual Harassment of Women (NASEM, 2018).

Page 14 Cite

Suggested Citation: "Methods." Lam, M., A. Falcon, and N. Merhill. 2023. Approaches to the Evaluation of Sexual Harassment Prevention and Response Efforts. Washington, DC: The National Academies Press. doi: 10.17226/27267.

For example, research has demonstrated that asking behavior-based questions (e.g., “Have you ever been in a situation where a supervisor or coworker attempted to discuss sex?”) is more effective at gathering accurate information about rates of harassment than simply asking whether a respondent has experienced sexual harassment. This problematic wording of the question contains legal terms that can carry a negative stigma. Working with experts in subject matter and survey design or using existing, validated surveys (e.g., the “Sexual Experiences Questionnaire, or SEQ,” Fitzgerald et al., 1995; the “Administrator-Researcher Campus Climate Collaborative, or ARC3, Questionnaire,” Swartout et al., 2019) ensures that the final survey instrument has content validity and reliability (Wood et al., 2017).

Survey data related to sexual harassment is now a component of compliance with the campus climate survey requirements of the Violence Against Women Act Reauthorization Act of 2022 (S. 3623). This will likely increase the popularity of an already frequently used evaluation tool. Because surveys are used ubiquitously, survey fatigue—disinterest in participating in or completing an assessment—is a common concern (Porter et al., 2004). This is particularly relevant when trying to gather information from people who have experienced sexual harassment, including sexual assault, as thinking about or discussing the subject can often be emotionally and psychologically draining. Taking proactive steps to limit survey fatigue is an important consideration that can maximize survey response rates and data quality (Driver-Linn and Svensen, 2017). These steps may include timing the field period to avoid other competing surveys (e.g., orientation is a densely surveyed time for students and faculty/staff at colleges and universities), as well as keeping them as short as possible (Driver-Linn and Svensen, 2017; Porter et al., 2004). To limit the length, survey developers can focus on the most essential measurements and use skip patterns and display logic to avoid asking respondents to answer questions that do not apply to them. Offering desirable incentives (e.g., gift cards) for completion may further counter survey fatigue by providing respondents with extrinsic motivation.¹³

Interviews

Interviews are one of the most commonly used qualitative methods, and are especially well suited for examining beliefs, attitudes, behaviors, and experiences. They may be structured, semistructured, or unstructured; vary in length; and be used at any point in program evaluation to identify or examine specific areas of interest (Bush et al., 2019). They can also facilitate an understanding of the distinct needs of groups that are underrepresented, which analysts could then incorporate into the framework of other interview methods (e.g., focus groups, discussed later), research instruments, and evaluation plans. A semistructured or unstructured approach, when trained staff use “open-ended (though not necessarily unscripted) questions, which are followed up with probes in response to participants’ answers” (Guest et al., 2013, p. 5), would allow for in-depth and personal exploration of a topic and may help uncover complexity or nuance not captured by other methods.

___________________

¹³ Survey fatigue can also be considered during data analysis. For example, respondents who consistently select the answer on the far left may not be providing answers that reflect their true thoughts or feelings. Thoughtful survey design, such as planned missing design, can prevent or mitigate such careless responding and increase data validity. For more information on careless responding (also known as straightlining), see Ward, M. K., and A. Meade. 2022. Dealing with careless responding in survey data: Prevention, identification, and recommended best practices. Annual Review of Psychology 74. 10.1146/an-nurev-psych-040422-045007. For more information on planned missing design, see Little, T. D., and M. Rhemtulla. 2013. Planned missing data designs for development researchers. Child Development Perspectives 7:199–204. https://doi.org/10.1111/cdep.12043.

Page 15 Cite

Suggested Citation: "Methods." Lam, M., A. Falcon, and N. Merhill. 2023. Approaches to the Evaluation of Sexual Harassment Prevention and Response Efforts. Washington, DC: The National Academies Press. doi: 10.17226/27267.

Because of the nature of interviews as noted above, several important considerations pertain to the use of interviews for evaluating sexual harassment interventions, for example, the social identities of the interviewer and how they relate to those of the participant. A power differential between the participant and the interviewer based on a perceived or existing hierarchical relationship can affect the sense of safety in discussing difficult topics such as sexual harassment (Devotta et al., 2016), and shared community membership, such as racial, ethnic, or gender identity, may increase comfort (Dwyer and Buckle, 2009). For more information on power differentials in higher education related to finances, citizenship, career, race and ethnicity, gender, sexual orientation, family status, and health status, see “Preventing Sexual Harassment and Reducing Harm by Addressing Abuses of Power in Higher Education Institutions” (Kleinman and Thomas, 2023).

For interviews about the topic of sexual harassment in particular, the decision of any participant to disclose information to the interviewer may be affected by the participant’s sense of safety and control. Given the potential for disclosure of reportable or traumatic experiences, it behooves the interviewer to discuss any requirements to report cases of harassment or assault, limits and expectations of confidentiality, and referral to support services in the interview guide, and to have absolute transparency about these obligations with the participant before the start of the interview, during the informed consent process. While participants may choose not to share certain details with interviewers who could be compelled to report those details to a Title IX office, explaining the obligations of the interviewer will allow the respondent to make informed choices about how to disclose sensitive information and reduce the likelihood of harm or undue burden.

Experts in qualitative methods and trauma-informed interviewing should be involved in the design of the interview guide and script to avoid triggering feelings of guilt, shame, or distress in the respondent. Researchers may also consider referencing existing trauma-informed interviewing guides¹⁴ to assist in the design of scripts and training of interviewers. Training for interviewers is important to ensure that those involved in the administration of the questionnaire or interview guide understand what topics or phrases to avoid and how to safeguard the well-being of their respondents. This is particularly true for semistructured or unstructured interviews in which lines of questioning are pursued outside of a carefully vetted script. While this freedom can result in unexpected insights, it also requires knowledgeable, considerate, and well-trained interviewers.

Because using interviews requires trained staff to both conduct the interviews and clean and analyze data that is often complicated and subjective, this method can be time-consuming and expensive, particularly if respondents receive compensation for their time. When deciding whether or not to use interviews, the research team may want to consider both the opportunity to obtain novel, complex insights and the burden of implementation.

___________________

¹⁴ Some of these guides are Putting Women First: Ethical and Safety Recommendations for Research on Domestic Violence Against Women, World Health Organization, 2001; “A Guide to GPRA Data Collection Using Trauma-informed Interviewing Skills,” Substance Abuse and Mental Health Services Administration, 2015; “The Blueprint for Campus Police: Responding to Sexual Assault,” Busch-Armendariz et al. 2016; “Title IX investigations: The importance of training investigators in evidence-based approaches to interviewing,” Meissner et al. 2019; and “What Will It Take? Promoting Cultural Change to End Sexual Harassment,” UN Women, 2019.

Page 16 Cite

Suggested Citation: "Methods." Lam, M., A. Falcon, and N. Merhill. 2023. Approaches to the Evaluation of Sexual Harassment Prevention and Response Efforts. Washington, DC: The National Academies Press. doi: 10.17226/27267.

Exit Interviews

Exit interviews are a specific subset of the interview method, performed at points of transition to gather information from students or employees who are leaving the institution. They often follow a predefined structure and are completed with either every individual leaving an institution—as with the Lawrence Berkeley National Laboratory (2022) Exit Survey Follow Up—or only those who opt in to the exit interview process. They can be used to identify and address aspects of hostile environments within a division or department, unsatisfactory supervisors, or problematic policies and practices. As many organizations already perform exit interviews to gather information about retention and attrition, it might be useful for a researcher to consider the structure and style of existing exit interviews and determine whether there is an opportunity to gather information about sexual harassment without changing the purpose of the exit interview or compromising other important information. Similarly, because many organizations already perform exit interviews, this method may be particularly cost-effective.

The considerations related to interviews described above still apply, but several additional considerations are pertinent to sexual harassment exit interviews. Individuals who are leaving the institution may feel more comfortable speaking freely about their experiences compared with those who are maintaining an affiliation with the institution. This is particularly important when an illegal, uncivil, or unethical practice is the driving force behind a student or employee’s departure. Moreover, their reasons for leaving are most salient at the point of departure and not tempered by time or distance from the situation, so more relevant details may be able to be captured compared with those who have long since departed their institution. Additionally, if the interview is successfully conducted with all separating employees and/or students, the process of interviewing everyone may result in the sharing of information that would not be captured using interviews of a subset of employees.

Importantly, exit interviews are most often conducted in a human resources (HR) environment as opposed to a research environment. As with other kinds of interviews, the information-gathering method, purpose, and context is all-important to the experience of the respondent. Interviews performed for research or evaluation purposes are often aggregated to protect confidentiality and are typically performed by someone who is not known to the participant. Conversely, exit interviews are usually performed for the purposes of workplace improvement (not academic research) and are likely to be performed by someone known to the participant, such as a colleague, supervisor, or HR staff member. Additionally, the information gleaned is usually shared with leadership in the division, and therefore individuals may not feel as comfortable divulging all of their experiences. Furthermore, exit interviews conducted by personnel within the division introduce an unavoidable power differential, which may limit disclosure. Although many individuals who are leaving an institution may be willing to discuss their true reasons for departure, early-career individuals and others who rely on references for future work may not feel as free to openly discuss challenges (Hinrichs, 1971, and Yourman, 1965, in Webster and Flint, 2014).

As with all evaluations of sexual harassment-related interventions, an IRB should be consulted to ensure that participants are not subjected to undue discomfort or risk. When utilizing exit interviews that are conducted through traditional HR means, IRBs may be particularly concerned with the confidentiality of participants, the privacy of their responses, and the power differentials of a work environment.

Page 17 Cite

Suggested Citation: "Methods." Lam, M., A. Falcon, and N. Merhill. 2023. Approaches to the Evaluation of Sexual Harassment Prevention and Response Efforts. Washington, DC: The National Academies Press. doi: 10.17226/27267.

Focus Groups

Focus groups are a qualitative interview method involving facilitated conversations performed with a small group, often 8 to 12 participants (Stewart and Shamdasani, 1990). Participants, who are recruited based on specific identities, traits, or experiences respond to a set of questions and engage in conversation on a specific topic. This method allows for dynamic interactions between participants and may uncover information that might not emerge during the course of individual interviews (Jarvis and Barberena, 2007).

Focus groups can be used before program implementation to identify target areas and guide the selection of appropriate interventions, or the design of other methods of data collection. As discussed in Box 4 of the Guidance for Measuring Sexual Harassment Prevalence Using Campus Climate Surveys document (Merhill et al., 2021), Harvard University used focus groups to modify its approach to collecting data on sexual harassment prevalence rates, determining that students were more likely to act on emails sent in the morning compared to midday and were likely to engage with Facebook posts by their peers. Soteria Solutions (2020) also used focus groups before developing its Workplace Culture Survey, which led it to embed the concept of Navigational Identities in the survey for use with the National Oceanic and Atmospheric Administration (NOAA). Navigational Identities include “sexual orientation, gender identity, race, age, caretaker status, veteran status, disability status and political views/affiliation” and “will provide important information on how individuals from different identity groups experience the NOAA workplace environment” (Soteria Solutions, 2020, Current Status section).

Focus groups are also often used following a pilot phase to determine whether the intervention met its objectives and goals, how it was experienced and received, and whether individual elements need to be adapted or discontinued. The University of California, Berkeley (2022) utilized focus groups to reflect on the first two phases of its #WeCARE campaign, an intervention that used physical banners on campus and social media messaging to share information about bystander intervention. These focus groups revealed that men “did not respond to the campaign messaging as positively as people of other genders” (para. 2), including a disbelief in the data that were being shared. This led the team to tailor a third phase of the campaign to men, emphasizing stories of men intervening in potentially harmful scenarios instead of emphasizing the data.

Focus groups are especially useful for identifying the needs and experiences of underrepresented groups, particularly when piloting an intervention that is specifically designed for groups such as a target audience or beneficiary. Like interviews generally, the social identity of the facilitator and how it relates to the identities of the participants affects the perception of safety and potentially what is discussed. Holding identity-based focus groups with a facilitator from the same community is one way to create a safer space for sharing. Unlike one-on-one interviews, however, feedback can be generated from many individuals at once, using only one facilitator, one observer,¹⁵ and occupying only one time slot. This can be useful when “quick turnaround is critical, and funds are limited” (NSF, 1997).

The benefits of focus groups are best realized when they are facilitated by skilled interviewers, who are trained to allow space for the participants to explore ideas, while probing deeper on meaningful insights

___________________

¹⁵ Because focus groups are an opportunity for group observation, it may be useful to have a staff member assigned to the group whose only role is to observe tone of voice, pauses in conversation, signs of physical discomfort or excitement, and other features of the conversation that cannot be captured by a transcript alone.

Page 18 Cite

Suggested Citation: "Methods." Lam, M., A. Falcon, and N. Merhill. 2023. Approaches to the Evaluation of Sexual Harassment Prevention and Response Efforts. Washington, DC: The National Academies Press. doi: 10.17226/27267.

generated by the group. When successfully facilitating discussions about sexual harassment issues, an interviewer would also be able to discern when an interview with a single respondent may be more appropriate. Depending on how direct and detailed the conversation on harassment is, respondents might be unwilling to discuss their experiences and opinions candidly in a group.

Unlike one-on-one interviews, focus groups do not meet the same level of confidentiality. Experiences and stories are shared with the group and a facilitator, which may inhibit participants from disclosing challenging information. Clearly addressing expectations about confidentiality and privacy before dialogue begins is of utmost importance to engender trust and maintain the integrity of the session. The design of any focus group on the topic of sexual harassment should also include discussion of reporting requirements, resources for support, and any limits on confidentiality. To this end, involving subject-matter and focus group experts is important when preparing the interview guide and script and creating facilitator training.

Regardless of the flow of conversations during a focus group session, the interviewer’s efforts to support participants discussing content related to sexual harassment could include ending the session with questions that are strength based or future oriented and providing resources for support.

Experience Sampling

Experience sampling, sometimes referred to as the daily diary method or ecological momentary assessment, measures an individual’s ongoing experience by collecting data across multiple time points. There are several approaches to data collection (Sather, 2014). For example, participants may be prompted to respond to an assessment based on a set of predetermined dates or times, such as every other week on Wednesday or daily at noon. Alternatively, they may complete assessments using an event-contingent approach, where they respond when something specific happens. This type of assessment is often used to measure the effect of an intervention on a specific population, such as women’s experiences of sexual harassment before, during, and after a prevention program or training (Conner, 2015). Ecological momentary assessments can be repeated daily, weekly, or monthly, providing real-time information on an intervention or experience.

Assessments can be quantitative, qualitative, or physiologic, and may be documented during or after an event depending on the design. This type of sampling is most useful as an evaluation method during an intervention itself because it systematically assesses experience in real time and in someone’s natural environment. Unlike many of the other methods, it can provide data close in time to an experience; however, it is typically very resource intensive and difficult to sustain over long periods of time (Shiffman et al., 2008).

The different ways to perform experience sampling vary by cost and sophistication. Data collection can include assessments completed online; via an app; or using quick response, or QR, codes. Assessments can be linked to wireless-enabled wearable technology, such as smartwatches and devices with reminders, which can collect data at predetermined times or remind participants to complete assessments at specific times. Emails, text messages, phone calls, or alarms can also be used to prompt participants to complete paper assessments or document an experience in a journal. Expertise in longitudinal data analysis is necessary

Page 19 Cite

Suggested Citation: "Methods." Lam, M., A. Falcon, and N. Merhill. 2023. Approaches to the Evaluation of Sexual Harassment Prevention and Response Efforts. Washington, DC: The National Academies Press. doi: 10.17226/27267.

when using this approach because multiple time points are being analyzed. Statistical correction for multiple comparisons with repeated measures and how missing data is handled need to be carefully considered as part of the analysis plan.

The University of Massachusetts Lowell uses experience sampling as part of its Making WAVES program. One of the program goals is to “disrupt subtle bias and microaggressions” (UMass Lowell, 2022a). To accomplish this, the program has a microaggressions blog, bystander intervention education, and a daily bias survey. Faculty members complete a baseline questionnaire followed by daily surveys for 15 consecutive days to capture witnessed and personal microaggressions (UMass Lowell, 2022b). The Sexual Health Initiative to Foster Transformation, or SHIFT, at Columbia University used daily diaries to assess the associations between sexual behavior, mental health, stress, and substance use in undergraduates for 60 days (Hirsch and Khan, 2020). Columbia University is using data from these assessments to identify individual, social, and structural “risk and protective factors affecting sexual health and sexual violence” (Mellins et al., 2017).

Map Marking

Map marking is a form of environmental and situational data collection that identifies associations between actual events or perceptions of safety and the physical and social environments in which they occurred. It uses a combination of spatial analysis and mapping programs, and requires location data and a base map. An expert in map marking (or hotspot mapping, as it is sometimes called) is necessary for analysis and interpretation, most importantly because of the difference in incident-based map marking and perception-based map marking.

In incident-based map marking involving matters of sexual harassment, participants are asked to mark the location of incidents of harassment or assault. While accurate recall may be a concern, the goal of this approach is to identify areas in which incidents are clustered and tailor interventions to those locations. For example, if incidents are repeatedly occurring in a cluster of off-campus housing, a research team may look into the residents of that particular location and tailor a prevention program to them. The team could also work with the community to install more consistent lighting around that area, or offer campus security escorts to and from the buildings. If incidents are clustered within a particular set of office buildings, researchers may consider removing couches from private offices located there. This approach can also be used to evaluate the effectiveness of a particular intervention by monitoring whether reported cases have decreased or have simply shifted to a new location.

Most institutions have some degree of existing data on the locations of these incidents because of Clery Act Annual Security Reports (Mahoney et al., 2022). These data are likely limited, as official reports comprise a small proportion of overall incidents, and not all reports include information about location. Additionally, the information available to the public may not be specific enough to determine geographically tailored interventions. However, analyzing existing data is an accessible way of starting conversations about environmental and situational data.

Page 20 Cite

Suggested Citation: "Methods." Lam, M., A. Falcon, and N. Merhill. 2023. Approaches to the Evaluation of Sexual Harassment Prevention and Response Efforts. Washington, DC: The National Academies Press. doi: 10.17226/27267.

With perception-based map marking, participants are asked to mark locations they think are unsafe or risky or in which they personally felt unsafe or at risk. Perceptions of risk and lack of safety may not translate to actual danger, so the ability of these data to inform approaches or evaluate interventions is limited. Information generated from perception-based data alone may be unable to shed light on the real effect of interventions, and these reports may perpetuate implicit biases about spaces and people. At a middle school in Colorado, students used map marking to identify perceived safe and unsafe places on campus. In response, privacy curtains were improved in the girls’ locker room after students identified feeling unsafe because the curtains failed to cover the doorway and people could potentially see in (Fansler Boudreau, 2020). At Williams College, map marking was used to identify campus social spaces associated with gender-based violence, including parties (Rider-Milkovich and Meredith, 2018). Some ideas generated in response to these data included placing DJs at the center of the party to disrupt the dance floor and establishing sober spaces in party locations. It is unclear whether these interventions resulted in decreased incidents of harassment and assault, or simply addressed perceptions of safety and risk.

Researchers can use map marking to examine increases or decreases in incidents of sexual harassment and/or perceived areas of risk in specific locations on campus, inform responsive actions such as safety measures, and identify additional needs for resources or programming. For example, they could monitor resident halls, advocate for increased lighting on campus, pinpoint locations that need blue light emergency call buttons, and get a clear idea of how social environment shifts over time. However, contextual information about locations should be considered when thinking about how to use the information gathered. For example, clustering of issues on a particular residence hall floor may be the result of the people currently residing there and therefore may vary year to year, perhaps making a targeted, temporally restricted intervention more effective.

Map marking could also be used to identify possible discrimination based on race, ethnicity, gender identity, sexual orientation, professional hierarchy, or other identities so that targeted interventions could be implemented in key locations. It could also be used to examine increases or decreases in responsible reporting¹⁶ by administrators, faculty, and staff when leadership or supervisory roles within their department/office change or they receive training.

As with all of the methods discussed in this paper, the relative cost-effectiveness of map marking depends on the scale of the effort and the level of technology involved. Simple, paper-based map marking may only require a map of the space and push-pins to indicate the site of incidents of harassment or feelings of risk. However, the use of digital tools and software may increase the cost of this method.

Unobtrusive Observation

Unobtrusive observation uses trained observers, hidden cameras, two-way mirrors, or review of written materials to identify behaviors within the context of the environment (Lee, 2019). Trained observers assess and code subtle behaviors without the knowledge of the individuals being observed in order to avoid potential

___________________

¹⁶ Responsible reporting, also known as mandatory reporting, is when “any faculty member or college/university employee designated as a ‘responsible employee’ who learns of sexual harassment on campus must report the incident to the Title IX office, even in cases where the target specifically requests that the information remain confidential,” (NASEM, 2018, p. 106).

Page 21 Cite

Suggested Citation: "Methods." Lam, M., A. Falcon, and N. Merhill. 2023. Approaches to the Evaluation of Sexual Harassment Prevention and Response Efforts. Washington, DC: The National Academies Press. doi: 10.17226/27267.

changes in behavior related to being watched (known as the Hawthorne effect). It may occur in natural or experimental settings. Unobtrusive observation identifies behavior at the individual level as well as the contribution of the social and physical environment and interactions between the people involved. It provides real-time data, which can be used to evaluate the effectiveness of interventions. Subject-matter experts and experts in unobtrusive observation could be involved in the development and validation of measurement tools. In addition, it is important for observers to be trained in observation and coding.

Classrooms, lecture halls, and conferences are common locations for unobtrusive observation studies. Researchers at the Oregon Health and Science University watched recorded grand round presentations to code for gender bias in speaker introductions (Pincus et al., 2020). Social media posts and text messages are another source of data for unobtrusive observation. A recent study of posts on Twitter examined why those who have experienced sexual violence do not report their experiences, using the hashtag #WhyIDidntReport (Reich et al., 2021).

The ethics of observation as a tool should be discussed carefully with an IRB prior to use, as being unobtrusive means that participants cannot consent to being observed. They can be debriefed after the observation if researchers are able to identify and contact them, but IRBs may not recommend that either occur if privacy is determined to be paramount. Moreover, identification and contact may not be possible in situations where large groups of individuals are involved, such as at conferences or lectures.

While these concerns are most salient to the use of trained observers, hidden cameras, two-way mirrors, or other physical observation techniques, they are still relevant to observations that utilize written materials and online content. Ethical guidelines for internet research are evolving, and researchers planning to use data from social media or other online sources should consult the most up-to-date ethical guidelines for internet research. Current ethicists in this area highlight the fact that some online communities are private and researchers need to consider the privacy and confidentiality of the individuals who are posting (Burles and Bally, 2018). Confidentiality in unobtrusive observation is especially important because the data were not produced for research and are easily searchable. Approaches to protecting confidentiality may involve aggregating common ideas or using composite cases that obscure the source (Burles and Bally, 2018).

Text Mining

Text mining is a broad term that typically encompasses the process of collecting sources for text analysis, and is traditionally used when analyzing large datasets. It can be used to identify key themes or novel areas for examination at any time during program evaluation. Because it relies on large datasets, text mining requires the use of software for collection and analysis. Social media, such as Twitter or Facebook, are common sources for text mining and have been used to evaluate perception of the gender gap in science, technology, engineering, and mathematics, and the #MeToo movement (Modrek and Chakalov, 2019; Reyes-Menendez et al., 2020; Stella, 2020). Important considerations for this method include access to large datasets, data management and storage, data scrubbing, and computing and software requirements.

Page 22 Cite

Suggested Citation: "Methods." Lam, M., A. Falcon, and N. Merhill. 2023. Approaches to the Evaluation of Sexual Harassment Prevention and Response Efforts. Washington, DC: The National Academies Press. doi: 10.17226/27267.

Once the text has been accessed and stored, there are multiple approaches for analyzing the text. Text analysis is a broad term that refers to the systematic approach to examining patterns, content, structure, or meaning in speech or written text, and can be analyzed using qualitative or quantitative methods. Although discussed in detail here because of the close relationship between text mining and text analysis, text analysis can be used in conjunction with many of the methods discussed in this paper. For example, it is often used to analyze open-ended responses in surveys and written policies and procedures as part of a needs assessment.

Qualitative analysis of text, such as thematic analysis, identifies themes, patterns, or concepts present in the text, and frequency measures are often used to describe word counts. Text analysis can be performed by hand or using computer-assisted qualitative data analysis software such as NVivo, ATLAS.ti, DICTION, or LIWC (Linguistic Inquiry and Word Count) to automate the process. When software is used, the text is prepared by “scrubbing” or “cleaning” to make it readable by the program. Expertise in qualitative methods, whether or not software is used to perform the analysis, is important.

However before the analysis is done, it is important to recognize the potential for bias in the coding process. Ensuring diversity within the team that designs the coding protocol will help reduce the likelihood that social and cultural biases are not reiterated in the design. Moreover, carefully following the coding protocol (instructions for how to label and organize relevant information that promotes consistency across coders) and dictionary (a guide to the meaning of the codes used, which often includes definitions, examples of appropriate content to include, and inappropriate content to exclude) is important for avoiding bias, such as investigator interpretation of text beyond what is stated or written, and improving interrater reliability (consistency between coders) and intrarater reliability (one coder’s consistency across the span of their work).

Definitions of words vary depending on context, and automated text analysis programs may not always recognize these differences, potentially introducing bias or error. For example, when coding the text “She was patient” and “She was a patient,” automated programs may simply tag the word “patient,” without differentiating between the noun and adjective forms of the word. If a subsequent analysis examines words associated with communal versus agentic traits based on the automated coding, then the noun form of “patient” may be incorrectly included in the analysis. Some programs incorporate natural language processing and can code words based on context and usage. Working with an expert in the particular software is important for understanding and interpreting the results.

Site Visits

Site visits are conducted by individuals with a specific set of expertise relevant to the area being evaluated, and are typically vehicles through which other evaluation methods are implemented. For example, observations, focus groups, interviews, and text analysis of relevant policies and procedures could all constitute components of a site visit. However, “site visits are much more than a collection of methods, [and] constitute a methodology in their own right, distinct from other evaluative methodologies” (Lawrenz et al., 2003, p. 342).

Page 23 Cite

Suggested Citation: "Methods." Lam, M., A. Falcon, and N. Merhill. 2023. Approaches to the Evaluation of Sexual Harassment Prevention and Response Efforts. Washington, DC: The National Academies Press. doi: 10.17226/27267.

Site visits are conducted over a set period of time during which data are gathered based on a predetermined set of criteria and objectives. They can be conducted by a funder across several awardee sites (e.g., the Department of Education can conduct site visits via the Office for Civil Rights to investigate different colleges and universities), or by a central administrative office across several different campuses or physical locations within a university network (e.g., a central office may wish to audit individual campuses to assess consistency across sexual harassment prevention efforts). They can be announced or unannounced. Unannounced site visits may allow evaluators to capture more real-time data that is accurate to the lived experiences of the community, and announced visits provide institutions time to prepare to address the criteria being evaluated.

In higher education, site visits are commonly associated with accreditation, although it is unclear how frequently Title IX programming is assessed as part of the accreditation process. Some organizations or governing bodies use them to evaluate an institution if the number of sexual harassment complaints reach a certain threshold, or to provide technical assistance in tandem with the evaluation. For example, the Sexual Assault and Rape Prevention (SARP) Evaluation Project provided all state-funded sexual assault prevention and victim services programs in Michigan with technical assistance and evaluation capacity-building resources in part through site visits (Campbell et al., 2004). The SARP team used these visits to execute a needs assessment unique to each site, and gathered data through 3-hour interviews with organizational directors, program coordinators, and other staff members. Not only did site visits allow SARP staff to conduct these intensive interviews, it also allowed them to establish buy-in and trust with the staff, who were, at times, skeptical of the purpose of the project (Campbell et al., 2004).

Site visits are costly and time-consuming, as they often involve compensating a specialized team for travel, accommodations, and other per diem expenses. These costs also extend to preparing for the site visit, including pre-visit research, planning and scheduling, and ensuring support staff are available while evaluators are present onsite. Required resources may include space for reviewing documents, meetings with relevant personnel, dedicated log-ins, and computer stations. Despite the costs associated with site visits, they provide an opportunity to implement several different types of data collection at once, including observation of the physical space and staff interactions, interviews with staff members who are present, and review of text or other documentation that is only available onsite as opposed to online, such as patient records, posted policies, or educational pamphlets.

As mentioned in the discussion of surveys, the social identities of the team members performing the site visit and how they relate to those of the site employees can affect the willingness of the latter to engage with the site visit. This is particularly salient if a central administrative office is auditing different member institutions within a university system. Not only may the member institutions differ in demographic composition of employees (e.g., a campus located in a historically Black neighborhood may have far more Black employees compared to a central administrative office located in a primarily white neighborhood), but there may be power differentials related to the professional hierarchies between the central office and the member institutions. Shared community membership may increase comfort, as may including employees of the member institutions being audited in the design of the site visit.

Page 24 Cite

Suggested Citation: "Methods." Lam, M., A. Falcon, and N. Merhill. 2023. Approaches to the Evaluation of Sexual Harassment Prevention and Response Efforts. Washington, DC: The National Academies Press. doi: 10.17226/27267.

Benchmarking

Benchmarking compares processes and performance metrics between an institution and an industry standard or best practice. It is useful when evaluating a program or intervention against a gold standard or common standard among peers and helps researchers identify areas for improvement (ASQ, 2023). When Texas A&M University (TAMU) engaged with a consultant to review its Title IX program, the consultant used peer benchmarking to compare TAMU to 14 other universities (Texas A&M, 2018). For example, while TAMU’s sexual harassment policy did “not provide any concrete examples” of conduct that would violate the policy, nearly 86 percent of its peers included clear examples of such conduct in their policies (Texas A&M, 2018). This form of benchmarking may be particularly useful for assessing anti–sexual harassment measures in higher education institutions, as gold standards may not be available for every form of intervention or practice.

Benchmarking is a flexible tool that can be used for evaluating policies, practices, programming, and training related to the prevention of or response to sexual harassment. Evaluations may be completed internally against a predetermined standard or assessed by an external reviewer or company. Because comparison is necessary for benchmarking, a baseline, best practice, or standard is identified prior to evaluation, and the evaluator ensures that the standards align with the information being examined. Similarly, because comparison requires information about peers or competitors, the cost-effectiveness of this method depends on the ease with which that information can be gathered. If policies and procedures are readily available on a university website, benchmarking will take less time and fewer resources than if research staff needs to contact administrators at other universities to retrieve materials by email or traditional mail.

The process of comparison may provide insight into the data, but it may also foster interest within the campus community (Driver-Linn and Svensen, 2017), because many institutions are motivated by comparison to peers. Benchmarking should be approached with caution, however, because its connection to evaluation outcomes is indirect. It uses comparison to assess consistency, but consistency may often be mistaken for success when such implications are misleading. Benchmarking is most useful to evaluations when consistency is appropriately considered as part of the overall evaluation of an intervention’s success and not the primary indicator of such.