Review process

All applications—whether winners, honorable mentions, or others—to the prize competition were reviewed for this toolkit. All applications were read in their entirety. Reviewers used a standardized data extraction form to record data on:

  1. The college or university submitting the application.
  2. The name of the intervention. Whenever possible, the toolkit uses the name as listed in the application. If no name was given in the application, the reviewer developed a name to describe the intervention.
  3. The scale of the intervention. “Scale” is defined as the unit that was the target of the intervention. It could be an individual department, a group of departments, an entire school (e.g., the school of medicine) or college (e.g., the college of science), or the entire university.
  4. The target population of the intervention. Whenever possible, data on the specific population (e.g., early-career faculty members, mid-career faculty members, senior leadership) targeted by the intervention were extracted. In cases in which there was not an obvious target population (e.g., policy reform), reviewers listed either “all faculty” or “not available.” When “not available” was listed, an explanation was provided.
  5. The evidence provided in the application that the intervention worked. Initially, evidence was extracted verbatim from the application. Later, these were edited for clarity and succinctness.


Intervention: In this toolkit, “intervention” means a discrete activity or closely related set of activities with a defined target population, a method of implementation, and envisioned outputs and outcomes. In cases in which multiple interventions were described as working in synergy but having different target populations or addressing different identified problems (and thus having different envisioned outputs and outcomes), reviewers worked to identify each of the unique interventions.

Evidence: “Evidence,” for the purposes of this toolkit, is broadly defined. Both quantitative and qualitative evidence are considered forms of evidence. The most important form of evidence for the purposes of this toolkit is how an intervention changed something — for example, people’s opinions, attitudes, knowledge, actions, or behaviors or a process at the college or university for hiring, retention, or promotion of faculty members. Participation in an intervention is also considered evidence. Evidence that indicates that an intervention did not work is also considered evidence.

The following caveats should be considered when assessing the evidence presented for the interventions:

  • Evidence is often partial. For example, the application may have reported an impact metric (e.g., hiring of women faculty members) that was the result of multiple interventions. Metrics either attributed to multiple interventions or not attributed to any intervention in the applications are not reported for individual interventions in this toolkit.
  • The lack of a particular kind of evidence does not necessarily indicate that an intervention did not have an effect for a specific indicator or population. For these applications, a lack of evidence usually indicates that the data either were not collected or could not be attributed to an individual intervention.
  • Reviewers did not use any “strength of evidence” metrics to determine inclusion in the toolkit. Thus, the toolkit does not exclude interventions for not having, for example, a randomized control group. The toolkit does report the type of outcome reported and the presence and type of a counterfactual evidence used (e.g., “before the intervention,” a comparison group, or a randomized control group) when available. Users are free to reach their own judgments on the usefulness of the evidence provided.

Exclusion criteria

Interventions were excluded from the toolkit if:

  • No evidence was provided for the individual intervention. If an intervention was included in the submission but no evidence was provided specifically linking the individual intervention to an outcome, it was not included in the toolkit. 
  • Only evidence at the process level was provided for the individual intervention. “Process” is defined as the occurrence of the intervention. So, for example, the number of trainings held is a process-level indicator. Reviewers did not consider this to be evidence that the intervention worked.
  • Insufficient information was provided to adequately describe the intervention. If, for example, only a name for an intervention was provided and there was no information on what was done, then the result was judged unlikely to be of interest to users of the toolkit. An intervention was excluded if the activity and/or evidence presented did not relate specifically to improving faculty gender diversity.

Please note that the prize competition submission guidelines did not specifically ask institutions to link individual interventions to specific evidence or outcomes. Thus, the exclusion criteria listed above were not used during the judging process of the prize competition. These exclusion criteria were developed specifically for the development of the toolkit to ensure that interventions presented within the toolkit are based on specific evidence. This does not imply that interventions that were included in applications but are not presented in the toolkit did not work or should not be adopted by other institutions. 


For classifications related to the institutions applying to the prize competition, reviewers used the Carnegie Classification® and associated data. To determine the size of each university’s faculty, the reviewers looked at the number of faculty members, and the reviewers created the following categories: 

  • Small: having fewer than 1,000 faculty members
  • Medium: having between 1,000 and 1,999 faculty members
  • Large: having 2,000 or more faculty members

When classifying data from the applications themselves, reviewers used a hybrid approach (using both deductive and inductive coding) to classifying the data. The analysis started with premises for likely classifications; for example, for type of intervention, the analysis assumed that “training” and “mentoring” would be included, whereas for target population of the intervention, groups common in the literature were used, such as early-career, mid-career, and senior faculty members. Reviewers then extracted the data from all of the interventions identified, using the language in the applications. When the language closely matched the predefined categories, they were classified into the predefined categories. When there was not a close match, the results were grouped into similar responses, and new categories were defined.

In the first round of classification, multiple categories were defined. From this, a mock toolkit was developed and circulated among ORWH staff members and other NIH staff members. These staff members were asked which criteria they found most useful when filtering the toolkit and which categories they thought were the most important within each criterion. From this feedback, the final list of criteria and categories was made. Note that a single intervention may be classified into multiple categories within a criterion.

The final search criteria and categories are:

  1. Scale of intervention: the relative extent of the target of the intervention.
    1. “Department (less than school/college)” means a department or multiple departments but not a full school or college (e.g., the biology department or the departments in biological sciences).
    2. “School/college (multiple related departments)” means a distinct formal entity (e.g., the school of medicine or the college of science). This category includes multiple schools (e.g., the schools of medicine, nursing, and pharmacy) if these do not constitute the entire institution.
    3. “University-wide” interventions target all faculty members in the institution.
  2. Type of intervention: the main activities of the intervention.
    1. “Training, workshops, conferences, and networking” means an intervention increased recognition or otherwise attempted to reach faculty members in a manner that does not fall into “mentorship” or “grants” below.
    2. “Mentorship” means guidance was provided to faculty members on a one-on-one (e.g., mentor–mentee) basis. Interventions to improve the mentorship process also fall into this category.
    3. “Grants” means an intervention provided money to faculty members to enable them to undertake or sustain research.
    4. “Reform of processes (e.g., hiring, promotion) or policies” means that an intervention changed the way the institution undertakes previously existing activities but that it did not necessarily directly engage the ultimate intended beneficiaries of the reform. For example, reforming the hiring process targets faculty members and administrators who do the hiring but should ultimately benefit job candidates, and promotion reform usually targets department chairs, deans, or similar senior leaders but should benefit early- and mid-career faculty members.
  3. Target population of the intervention: the intended beneficiaries of the intervention.
    1. “Early-career faculty members” is defined within each application that discusses faculty rank, and the toolkit uses those definitions in these classifications. However, this generally refers to those with a rank of assistant professor or lower.
    2. “Mid-career faculty members” is again defined within each application that discusses faculty rank. The toolkit uses those definitions in these classifications. “Mid-career” is less consistently used than “early-career” but largely refers to faculty members who are associate professors or of a similar rank.
    3. “Other target or no specific target” refers to any intervention that did not specifically target early- or mid-career faculty members. Note that an intervention that targeted all faculty members—including early-career faculty members and mid-career faculty members—would be included here and not be included in the other two categories within this search criterion.

Although “type of outcome measured as evidence” (e.g., “change in outcome measures” and “new policy adopted or substantively changed”) was determined not to be a useful search criterion, it is identified and included in the description of each intervention. 

Back to Prize Competition Toolkit