Mona Sloane, planning committee co-chair, opened the June 11, 2024, session with planning committee member Nathanael Fast, University of Southern California. The event consisted of a keynote address and three panels.
Gabriella Waters, National Institute of Standards and Technology (NIST), gave the keynote address to open the series of sessions and highlight why it is necessary to identify, measure, and plan for risk. Waters began by categorizing the types of risks posed by artificial intelligence (AI) systems: technical (biases, security vulnerabilities), operational (system failures, goal misalignment), and ethical (privacy, lack of explainability). Waters cautioned that without proper risk mitigation, AI’s powerful capabilities can turn into liabilities. To balance capabilities and potential risk better, Waters stated, there is a need to measure the impacts of AI scientifically. To respond to this need, NIST launched the Assessing Risks and Impacts of AI (ARIA) program, which aims to provide a comprehensive framework for evaluating AI in real-world scenarios through model testing (e.g., “comparing system outputs to expected or known outcomes”), red teaming (a “structured testing effort to find flaws and vulnerabilities in an AI system”), and field testing (e.g., “several thousands of human participants interacting with AI applications in realistic settings”).1
Waters highlighted the importance of creating risk-aware organizational cultures and governance mechanisms to promote ethical AI practices, explainability, and accountability. NIST’s AI Risk Management Framework (RMF) is designed to inform organizations
___________________
1 National Institute of Standards and Technology (NIST), 2024, “AI Evaluations: Assessing Risks and Impacts of AI,” May 9, https://ai-challenges.nist.gov/uassets/6.
on how they might approach that task.2 For example, policies and procedures for risk management should be established and regularly audited, said Waters. Waters highlighted training and collaboration as focus areas that could be leveraged more heavily to mitigate risk. Moreover, Waters stressed that organizations must train all employees to become AI-literate and utilize cross-disciplinary knowledge, including stakeholder input, to enable a holistic approach to AI development.
Waters turned next to best practices for effective testing and evaluation. These practices include evaluating system performance in the real world rather than in controlled settings, considering both positive and negative potential societal impacts, and assessing AI systems proactively rather than reactively. Waters also pointed to tools and frameworks, including bias detection software, model auditing tools, and risk assessment frameworks, as starting points for effective AI evaluation.
Waters concluded by reflecting on all-too-numerous press accounts of AI tools that were publicly deployed without thorough evaluation. By employing diverse development teams, continuous monitoring, and robust impact assessment to detect and mitigate harmful biases in AI systems before they cause widespread harm, Waters said, society can benefit from AI systems that are not only powerful but also equitable and trustworthy.
Fast gave additional background regarding the motivation for this workshop’s first event. Fast noted that the planning committee drew on NIST’s recently published AI RMF to guide the design of each event. Fast suggested that the workshop series offered an opportunity to discuss who builds, informs, and governs AI. Fast pointed to widening participation or broadening the scope of those who can contribute to the design, development, and deployment of AI tools as the first step in doing so.
Nathanael Fast, panel moderator, opened the panel by asking the participants to define stakeholder engagement and the role it plays in their research. Tawanna Dillahunt, University of Michigan, defined stakeholder engagement as incorporating the voices of many stakeholders. She emphasized the value of stakeholders who hold less power. She also advocated for early stakeholder engagement during the design and development stages.
___________________
2 NIST, 2023, “AI Risk Management Framework,” January 26, https://www.nist.gov/itl/ai-risk-management-framework.
Deep Ganguli, Anthropic, framed stakeholder engagement as a way to balance power dynamics by moving decision making regarding the development and deployment of AI models beyond a small number of highly capitalized private organizations. Such a shift is important, Ganguli argued, because large AI models can impact large swaths of society. Getting multiple stakeholders involved is especially important, stated Ganguli, because the ultimate impacts of large AI models are still so uncertain.
Brent Hecht, Microsoft and Northwestern University, characterized stakeholder engagement as critical to ensuring that new AI tools have broad and equitable positive impacts. AI model creation is inherently a collective activity because many people are typically involved in creating a model’s training data. Stakeholder engagement is, according to Hecht, a way for the people that models depend on to play an active role in how they are designed. Public unease with growing use of AI is, in Hecht’s view, a result of insufficient stakeholder engagement.
In the discussion that followed, panelists described different approaches to understanding and applying community engagement in their own AI design and development work. Dillahunt described her shift from traditional participatory design to community-based participatory design, which fosters the sharing of decision-making power with community stakeholders from the outset rather than asking for community input after key decisions have already been made. Dillahunt observed that early involvement of community members significantly increases the chance to anticipate harms and address them before technologies are developed or deployed.
Ganguli described a process used by Anthropic’s societal impacts research team to assess capabilities and potential harms of new systems built in-house. Anthropic, according to Ganguli, works to incorporate a wider range of perspectives into the development of “constitutional principles” for how its AI models should behave. The team often develops prototypes along with the normative rules, or principles, to which they would want the systems to adhere. The team then develops a constitution that is accessible to the public, letting those outside of the organization weigh in on the initial principles.
Although they found that internally developed principles appear to perform well, Ganguli’s team recognized the importance of external input and began using a tool called Polis3 that allowed for public input. Polis allows members of the public to suggest and deliberate on possible normative rules, and the developers use this input to fashion appropriate new rules to train the AI models further. He expressed the hope that future efforts employ crowdsourced methods to develop principles and evaluate the resulting models. The team assessed that the use of public-informed principles appears to result in
___________________
3 According to the tool website (https://pol.is/home), “Polis is a real-time system for gathering, analyzing and understanding what large groups of people think in their own words, enabled by advanced statistics and machine learning.”
a model with less bias, but Ganguli acknowledged that external validation of this result would be valuable. Following crowdsourcing exercises, stated Ganguli, conclusions are distilled into digestible memos aimed at providing policy makers with credible, clear, and concise information about the perspectives expressed by members of the public and the expected behavior of the models.
Fast next asked panelists to comment on key areas where stakeholders should be influencing and informing AI design. Ganguli suggested that stakeholder input is needed at key points throughout the life cycle. He included examples such as creating pretraining data, training, evaluating the model, and evaluating potential impacts after deployment. Ganguli also noted that although models are trained on large amounts of data from the Internet, the data from the Internet do not necessarily represent the plurality of global opinions, voices, or cultures. To address this gap, Ganguli added, Internet data can be systematically augmented. For example, additional samples could be collected to better represent low-resource languages. Ganguli highlighted that reinforcement learning from human feedback would benefit from wider participation because it is typically conducted by crowd workers4 who represent a narrow range of demographic identities.
The panelists then turned to a discussion of who needs to be involved in different aspects of AI model development and how best to include them. Dillahunt suggested considering what guardrails need to be put in place to honor the boundaries of workers and users. Hecht highlighted the significant investments made by those who create the data used to train AI models and suggested finding better ways to compensate their efforts—perhaps by sharing the profits.
Last, the panelists discussed how to incentivize the responsible incorporation of broader groups into AI research and the development of AI systems. Dillahunt noted that research funders’ consideration of stakeholder participation influences who researchers engage with and who, as a result, has power over development decisions. Ganguli offered that his team at Anthropic aims to open as many aspects of their research and development process as possible to the public. He suggested, building on Dillahunt’s statement, that new funding efforts such as the National Artificial Intelligence Research Resource could play an important role in opening the door to a wider range of stakeholders. With respect to low-resource language communities, Hecht noted that such communities may have more resources to invest in meeting their needs through AI than is currently appreciated but cautioned that they are concerned about preserving the identity and provenance of their language.
___________________
4 Crowd workers perform tasks via online platforms such as Amazon Mechanical Turk and Polis.
The second panel was moderated by Tamara Kneese, Data & Society. Kneese highlighted the emergence of labor concerns, especially connected to generative AI, as a growing topic area for policy makers. Kneese set the stage for the panel by asking how risks to workers are manifesting across the AI development life cycle and how worker voices can be brought more effectively into both the development and risk assessment processes. Kneese asked panelists to comment on how AI models and the systems that use them affect labor practices and what new forms of risk are emerging with foundation models.
Panelists first discussed the prevalence of AI used for worker monitoring and the consequences of this practice. Krystal Kauffman, Distributed AI Research Institute and Turkopticon, observed an increase in the deployment of AI tools for surveilling and managing data workers. In Kenya, for example, where content moderation jobs have grown considerably, workers experience stress and anxiety that her team’s research attributed to oversurveillance. She explained that surveillance discourages workers from taking breaks when they feel the need to step away from harmful content and contributes to a work environment in which employees are in constant fear of being reprimanded or fired. Kauffman’s research found that these effects have negative impacts on the workers’ mental health.
Veena Dubal, University of California, Irvine, highlighted the shift in the experiences of ride-hailing workers due to increased incorporation of surveillance technologies. Dubal also observed that AI is being used by employers to determine how to individualize compensation to incentivize workers to behave in uniform ways.
Christina Colclough, The Why Not Lab, echoed both observations and emphasized that the impacts of such AI tools are being felt globally. Colclough stressed that employer misuse of AI tools can compromise workers’ autonomy and dignity. She also cautioned that tools developed using nonrepresentative populations are a threat to workforce diversity and thus ultimately to the workforce’s vitality and capacity for innovation.
Tom Kochan, Massachusetts Institute of Technology, illustrated this last concern with a case of call center training that used only older white men as examples for best practices, forcing workers to perform in ways that ignored alternative means of completing tasks. Kochan cautioned that if workers are not given a voice, employers and workers will both miss out on the benefits that can accrue from bottom-up design and development.
Dubal pointed to the decline in union labor representation as well as gaps in the protections accorded to gig workers as obstacles to risk mitigation. The European Union recently enacted a directive called the European Union Artificial Intelligence Act, highlighted by Dubal, that enhances the rights of people who work through platforms by
means such as banning firms’ use of emotion analysis for decision making. Dubal noted that this legislation only protects a small subset of workers from misuse of personal data. Too often, according to Dubal, legislators assume that data collection disclosure is sufficient. Dubal argued that few workers have the collective power to use such awareness to induce employers to establish protective boundaries.
Colclough cautioned that risk management frameworks, such as those developed by NIST and the European Union, can only go so far in addressing risks to workers given that some of the risks ultimately stem from shortcomings in basic protections of worker rights. Kauffman stated that national legislation and labor organization efforts can still fall short of protecting global employees of multinational corporations because companies can and do shift work to countries with weaker worker protection. A more positive trend, in Kauffman’s view, is the pushback represented by workers who file lawsuits against big technology companies.
Kneese asked about successful strategies for mitigating the labor-related risk of AI and ways that workers have been successful in documenting harms or seeking redress. Kochan observed that some workers have been successful in addressing potential harms from AI, pointing to recent strikes by U.S. screen actors and writers that resulted in new limits on AI use and on the replacement of workers with AI-generated content. Turning to a discussion of opportunities for better collaboration, Kochan suggested that activities such as co-designing new workplace tools and including worker input on the training needed for effective use of tools can yield better results when new AI tools are considered for the workplace. Kochan underscored the need for legislation affording workers advance notice of when new AI technologies are introduced and enhanced opportunities for input, consultation, and negotiation on how they will be used.
Kneese asked the panelists for successful examples of AI co-design or worker-led design and development. Kochan described a case in which Kaiser Permanente leadership gave its workforce and the unions representing it the opportunity to participate in the co-design of new facilities and associated technology.5 Kauffman said that she has not yet seen a successful instance in data work but was optimistic about the potential for future success. She noted that when several members of Congress contacted data workers in September 2023 to assist in drafting a congressional inquiry letter investigating worker treatment at nine major technology companies, a positive step forward was made.
Wrapping up the panel, Kneese inquired about cases in which workers’ concerns were meaningfully engaged outside of a collective bargaining agreement. Dubal and Kochan pointed to instances in which university faculty were able to make decisions collectively about how new technology would be used and restricted on campus, and they
___________________
5 A. Arora, B. Dyer, and T. Kochan, 2020, “A Case Study of Integrating Technology and Work Systems at Kaiser Permanente’s Health Hubs,” WP07-202, Massachusetts Institute of Technology.
noted that university faculty are perhaps unique in their access to the power needed to achieve such outcomes. Pointing to the power of legislation, Dubal referenced new California legislation to address high injury rates for warehouse workers. The law requires that employees are provided advance notice of their quotas so that AI tools cannot be used to base future expectations on their rate of work.6
The third panel was moderated by Sheena Erete, University of Maryland, and centered on the tools and systems needed to diversify participation in the design of AI tools to better address risks to those who have traditionally been marginalized. Erete opened the panel by asking how one might define or view such infrastructure.
Ovetta Sampson, Google, described infrastructure as the hardware, tools, platforms, and systems that allow firms to execute corporate missions. Infrastructure, according to Sampson, is where one embeds the ethical, moral, responsibility, legal compliance, safety, sociocultural equity, and fairness needs when creating AI tools. Sampson noted companies have improved their efforts to develop and capture these principles in policy or requirements documents but still can fall short on integrating these principles into the actual design or operation of products. Such failures tend to occur when the incentives to follow corporate policy are insufficient or when employees are otherwise not aligned with them.
Rayya El Zein, Code for Science & Society, illustrated the concept of infrastructure as relationships that embody a political vision of the world and a political practice of building it. El Zein reframed the idea of AI risk management as a question of who deserves safety, who will provide that safety, and at what cost.
Aviv Ovadya, newDemocracy Foundation and AI & Democracy Foundation, discussed infrastructure as a means to facilitate decision making that can help accelerate the understanding and addressing of risks in a way in which everyone has a voice. Ovadya noted challenges such as disagreement about which risks matter, what to do about risks, and how to allocate resources. Infrastructure is needed to ensure equity and to support such deliberation.
Gloria Washington, Howard University, juxtaposed infrastructure, often seen as the cloud-based resources that are involved with the creation of AI, with the communities and individuals that usually do not have a say in how the AI is created. Washington observed that these stakeholders are often only brought into the conversation once
___________________
6 Warehouse Quotas, AB-701 (2021).
something negatively impacts them and that they too often bear the burdens of time and financial costs.
Tina Park, Partnership on AI, described infrastructure as the policies, practices, norms, culture, and priorities that can foster or obstruct AI design or development activities. Park echoed El Zein’s view that infrastructure reflects the values and the priorities of the people who have the power and resources to build it.
Erete next asked the panelists to describe infrastructure that they have been involved in building or maintaining. Washington discussed an ongoing project focused on building an African American Vernacular English data consortium to collect, store, and protect audio and other information about African American language and culture. Washington noted initial resource challenges such as lack of cloud infrastructure and funding to pay participants, which constrained not only developing the project but also building trust with participants. Washington pointed to the consortium’s development of fair usage guidelines as critical to ensuring participants’ safety and honoring the trust of their communities.
Park described the Partnership on AI as a forum for conversations across sectors. She stressed the importance of enabling a network of people who are champions of ethical issues to share ideas in a community built on trust. Giving those who are more technically oriented the political education to understand social equity issues as well as empowering those with community-based subject matter expertise introduces mechanisms for fairness, argued Park.
El Zein reflected on her work inviting technologists to rethink their assumptions regarding how technical work must be performed. El Zein argued that collectively identifying how current organizational culture might hamper desired behavior and building accountability structures to foster changes in conduct would be more efficient and impactful than relying on a compliance checklist.
The panelists discussed the role of scaling infrastructure in the promotion of equity. Panelists contrasted the ability to extend infrastructure from small local networks to larger engagements with the richness of small-scale, person-to-person interactions. Park noted that interpersonal connections may not be scalable in a direct sense but offered that replicable lessons can be learned from person-to-person interactions. Park suggested organizations emphasize lower-tech approaches that can engage groups that might otherwise be excluded and identify concerns that might otherwise not surface. El Zein agreed, suggesting that professionals in the field should prioritize individual connections over considerations of scalability. Ovadya reframed scalability as ensuring that all communities have access to power in a meaningful way.
Last, Erete asked what new infrastructure approaches might play a role in mitigating AI risk in the next 5 to 10 years. Washington stressed the importance of techniques to
allow individuals to build and use their own machine learning (ML) models but preclude downloading the models or their training data. Park suggested building on work by Indigenous scholars on data sovereignty, AI sovereignty, and collective stewardship. Sampson offered that synthetic data and new approaches to embedding trust and safety into systems will both play a role in mitigating AI risk in the future. Park suggested strong whistleblower protection as another type of risk mitigation infrastructure.
Park and El Zein expressed concerns about the inherent risks of certain applications of AI. Park noted the ethical bind that workers encountered when their employers began developing AI-based systems for military use. El Zein echoed Park’s concerns, expressing the view that not all applications are aimed at shared positive outcomes and citing AI’s use for worker mistreatment as well as military applications. Park and El Zein observed that tools that aim for shared positive outcomes will foster meaningful and successful broad engagement. Prioritizing such tools could increase the opportunity to identify harms before they happen, find ways to mitigate and restore justice after harm is done, and lead to the creation of tools that meet the needs of the communities that use them.