NRT Annual Meeting 2018: Evaluation Challenges Continued

Common challenges across NRT sites were evident in attendee conversations during the evaluation and assessment sessions and in the plenary sessions of the NRT Annual Meeting 2018. Gleaned from the participants questions and comments, three categories of challenges for NRT programs are discussed in this blog: framing the role of evaluation, selecting the approach for evaluation, and facilitating the dissemination of evaluation products. Although discussed within the context of a challenge, each provides an opportunity to enhance the use of evaluation in NRT programs.

Frame. Simply asking the question, what is the role of evaluation for NRT programs, opens the door to many responses beyond formative and summative feedback. The question does not specify the full complexity of program evaluation and graduate education.  One complexity that was touched upon in the NRT meeting was the different possible combinations of people contributing to the evaluation at each site; each individual contributing to the role of evaluation. This particular complexity makes it tricky to compare the implementation and products of evaluation across sites. The formal process of framing the role of evaluation within a given site  helps to elucidate the contribution of  the different perspectives and priorities of the people involved. The inclusion of personnel, from external evaluators, internal evaluators, program coordinators, principal investigators, faculty to graduate students, determine how evaluation is framed.  It is often a challenge for programs to fully frame the role of evaluation because it is determined by the unique contributions of individuals and their interactions. Although the combination of people contributing to the evaluation of a program is often fluid or undetermined at the beginning stages of a program, establishing an initial frame of perspectives and priorities adds to a more nuanced reponse to the question what is the role of evaluation.

Select. Selecting the approach for program evaluation is constrained by many factors including the overarching context of graduate education. The list below (which is not an exhaustive list) are common factors considered across NRT sites to select an approach to program evaluation. Each generates its’ own set of questions and concerns about what works or doesn’t work for evaluating graduate programs.

  • Institutional human subjects review – navigating the process and guidelines within an institution.
  • Small sample sizes – drawing evidence from the numbers of participants in graduate programs.
  • Tradeoff of different methods – considering (for example) the benefits of qualitative methods verses quantitative methods.
  • Level of faculty and stakeholders involvement – assessing the amount of time and training required for participants to contribute to the evaluation design and interpretation of evaluation results.
  • Recruiting comparison groups –  motivating students or faculty outside the program to participate in the evaluation.
  • Cohort effects – accounting for the differences in the composition of each year of entering students.
  • Hawthorne effect – changing behavior because participants are aware of the evaluation.
  • Burden on students (and faculty) – requiring time and energy to provide evaluation data.

Facilitate. There are many mechanisms to disseminate evaluation findings. Dissemination of evaluation findings happens informally and formally in various ways at the site level, across sites, and to the broader community.  The NRT annual meeting provided an opportunity to share and discuss what is happening at other sites. It was also an opportunity for evaluators to meet and build relationships across sites, provide critical feedback and create connections to enhance evaluation. Some ideas emerged during the NRT annual meeting that could  further facilitate increased communication of evaluation results and could be directly integrated into the meeting agenda. First, during the poster session of the annual meeting, in a conversation with other attendees, it was proposed that there could be posters for evaluation findings. These posters would be of interest to everyone because evaluation is one of the common threads across all sites. In another conversation someone suggested, an alternative to a poster session, a “rubric fair” to share the assessment and evaluation tools that sites are using. Both ideas are more informal yet structured forums to interact. Second, one of the key suggestions given to new PI’s during their orientation is to plan for dissemination. For evaluation, what are the specific evaluation deliverables for the program? Who is responsible? When are the deliverables due? Following up on the the plan for dissemination during the subsequent annual meetings in the form of a working session focused on dissemination of products would keep the plan current and offer an opportunity for peer feedback.  Finally, the meeting offered the opportunity to help facilitate building an evaluation community beyond the meeting. Part of this effort is to connect individuals working on evaluation at the different NRT sites to the NRT evaluator website. Ideas swirled for the ways to curate and share materials on the website. An interim step is to  solicit posts from the NRT evaluation community for this site to expand what we know about each others interests and work.

Each NRT site must frame the role of evaluation, select the evaluation approach and faciliate dissemination of evaluation products. The NRT sites collectively can add their experience and knowledge to addressing these issues. Further, the challenges connect the programs and provide a common forum to discuss evaluation and graduate education in the STEM fields.

Please consider sending in a blog post to


NRT Annual Meeting 2018: Assessment and Evaluation Breakout

The NRT Annual Meeting at the end of September 2018, was an opportunity for the NRT sites across the United States to showcase their expanded program knowledge and advances in scientific knowledge. The lessons learned from assessment and evaluation of NRT programs were presented in two sessions during the two day meeting. Presentations ranged from descriptions of frameworks for the evaluation process to methods for visualization of evaluation findings.

Each presenter described information and insight produced by evaluation activities. The range of methods and tools used across the sites reflected the priorities and novelty of the programs as well as the expertise of the participants; hierarchical models, mixed methods, interviews, rubrics, competency models, peer evaluation, social network analysis, mental modeling, and competency models.  Regardless, the evaluation activities produced information to drive NRT program toward their goals. A brief description of each presentation is below.

Elijah Carter, University of Georgia, described a hierarchical data use model used to navigate the transition between formative and summative evaluation. Often the formative components of an evaluation are extensive and data rich, which is resource consuming. Framing the kind of information collected in parallel to the development of the program allows programs to understand and prepare for data decision points across time.

Gemma Jiang, Clemson University, discussed the use of social networking analysis to understand the impact of the program activities on social connections. Social networking analysis created maps used to visualize relationships within a program and to provide feedback to the program and to students.

Kate McCleary, University of Wisconsin Madison, discussed the need for an on-going process of formative evaluation. The role of formative evaluation was reframed as integrated into all stages of a program’s development, through pre-implementation to post-implementation, formative evaluation continues to contribute to program learning.

Glenda Kelly, Duke University, presented her work with trainees and content experts to develop learner centered rubrics. Students created customized rubrics in which they defined their goals and tracked their progress across time. Through this work authentic, common student goals were articulated and can be used to guide the instructional priorities of the program.

Rebecca Jordan, Rutgers University, described the use of mental models to track students’ changing transdisciplinary ideas over time. As students learn about use-inspired science their conceptual models of a problem change. Throughout the program students engaged in generating models of their understanding, providing insight into their developing research skills.

Ingrid Guerra Lopez, Wayne State University, outlined an extensive process to develop and validate common competencies across 12 academic perspectives. The implementation reflected the program participants commitment to communication and collaboration across academic boundaries. The process also revealed the parallel purposes of evaluation activities to guide student, faculty and program development.

Dawn Culpepper and Colin Philips, University of Maryland, College Park, described their work together to understand scholarly identity. This study is an exemplar of the feedback loop and close collaboration between evaluators, faculty, and students undertaken to understand how graduate students learn and how a program can support them.

Cheryl Schwab, University of California, Berkeley, presented the results of an Evaluation Survey distributed to the PIs of the NRT sites. The results started a conversation around common practices and challenges across the NRT programs. Those common threads will be further delineated and discuss on the NRT Evaluators website.


Evaluation Relationships and Issues I

During the NRT Annual Meeting on September 27, 2018, I presented the results of a Evaluation Relationships and Issues survey conducted to gather information across NRT programs about their experience with evaluation. I described the common threads of evaluation across NRT programs, the results of the survey, and the discussed opportunities to build community.  In this blog I begin by highlighting the common threads and the relative impact of evaluation issues.

The common threads of evaluation practices across NRT programs stem from the commonalities of who, what, how and when, evident in the logic models produced to guide the programs.


The main WHO are NRT graduate students, non NRT graduate students, masters students, faculty and staff. Within the institution the members of lab groups, institutes, and departments contribute to the program. Many programs employ different types of partners from industry to government, mentors and advisors, outside workshop and career building providers, and internal and external evaluators. WHAT is the core of the program, specified outcomes usually consisting of content, research, and communication or career building skills. HOW the program intends to achieve the outcomes is a set of elements or activities implemented in the program (i.e., courses, internships, workshops, mentors). WHEN is series of training activities building over time. Each question reveals commonalities between programs that are opportunities for connections.

The survey link was sent via email to the PIs of funded NRT programs with a request for them to complete the survey and to forward the survey link to key members of their NRT program engaged in evaluation.  A total of 59 people filled out the survey.

The survey asked about person’s level of engagement in the evaluation , connections, and to what extent issues impacted the evaluation of the program. I start with the latter, the interpretation of the responses to the prompt “Rate the extent to which the following issues currently impact your work to evaluate your NRT program.” A total of 20 evaluators, 29 principal investigators, and 10 program coordinators responded to the likert scale. The table contains the issues in ordered using an item response rating scale model, from the issue impacts the work of evaluation to a “great extent” to the issue impacts the work of evaluation “not at all”. I have included the raw count data across the five cateogies of the likert scale in order for you to get a sense of how the issues were ordered.

Impact on Evaluation Issue
Great Extent
Moderate Extent
Very Little
Not at All
11 17 17 11 2   Engaging time constrained program participants in all aspects of evaluation.
12 18 11 15 2   Understanding what constitutes success in the program.
15 11 18 8 6   Analyzing small sample sizes.
7 19 19 10 3   Creating “authentic” performance measures.
12 12 17 9 8   Defining the scope and content of the program.
9 11 14 16 8   Balancing funding, personnel and resources to do evaluation work.
7 11 16 17 6   Supporting the use of evaluation results.
6 9 20 15 8   Assimilating the evaluation process into the culture of the program.
5 13 15 11 12   Generating generalizable evaluation findings.
6 12 14 14 12   Accounting for the variability in student background and integration into NRT program.
5 10 18 13 11   Finding comparison groups.
4 10 18 18 8   Adapting to a changing program.
5 8 20 16 9   Increasing trainee’s motivation to participate in evaluation.
4 7 20 22 5   Reconciling the priorities across program stakeholders.
4 12 12 18 12   Communicating evaluation findings succinctly.
6 8 9 24 11   Managing attrition and low response rates.
5 8 15 16 14   Disseminating evaluation findings outside the program.
2 6 21 15 14   Opportunities to discuss and share evaluation findings.
1 10 13 20 13   Maintaining engagement of outside stakeholders (e.g., industry partners, etc.).
4 4 6 11 33   Navigating changes in the evaluation team.

These findings help us as an NRT community to target issues that may need extra attention. The findings also provide a forum in which to ask NRT sites how they have addressed the most pressing issues and provide insight.

To be continued…


This material is based upon work supported by the National Science Foundation under grant no. DGE-1450053

Learning What is Valuable in Graduate Education at UC Berkeley

On May 1, 2017, the Data Science for the 21st Century (DS421) NRT training program hosted a symposium to celebrate the graduation of the first cohort of trainees. The day’s agenda included discussions of the changing environment on the UC Berkeley campus for data science and interdisciplinary research education and what we have learned from the implementation of the DS421 program. Five themes framed the program evaluation results as “what is valuable in graduate education”: heterogeneity, opportunity, practice, question, and people. Each theme was supported by multiple sources of evaluation data. The themes help to ground the training outcomes and guide the discussion of ways to improve the next iteration of the program elements.

The DS421 program was founded on building students’ 1) knowledge of concepts, empirical methods, and analytic tools from environmental science, social science, and statistics, 2) ability to conduct interdisciplinary research, and 3) development of skills to effectively communicate results to diverse audiences. The program evaluation activities target the assessment of the student outcomes, providing feedback and support for the development and improvement of the program elements. The driving questions of the evaluation activities were:

–How do the DS421 training elements contribute to student and faculty attainment of training outcomes?

–How can we scaffold students’ development of interdisciplinary research skills?

Evaluation data was collected from multiple sources through self-report surveys, observations, interviews, and focus groups. In the review of the evaluation data five themes emerged across sources and methods.


The graduate students admitted into the DS421 program come from 8 departments across 5 schools. Students’ vary in their prior knowledge and experiences, as well as, their ongoing experience. When the students are in the program their experiences continue to be different; navigating varying departmental requirements, courses to take, papers to complete, teaching assistantships to fulfill, and qualification exam timelines. Some students are admitted to graduate school with an advisor and others do not select an advisor until their second year. There is also varying membership to departmental cohorts, laboratory groups, and research seminars that are dependent upon school, advisor, and research the student is interested in.  Needless to say, the students are heterogeneous when they enter the program and continue divergent paths during the program.

Yet, students apply to the DS421 program to acquire a foundation of data science skills and to engage in interdisciplinary dialogue regarding their research. These common goals tie the DS421 community together. Each individual brings a perspective and experience that is unique and welcome. A student described his/her biggest take away in the colloquium as, “The opinions of my classmates are pretty diverse!” Highlighting the exposure to new perspectives and approaches to inquiry. The program provides a rich environment for learning that extends students’ knowledge across discipline.


The graduate students admitted into the DS421 program are provided the opportunity to discuss their ideas and to present their research to people outside their discipline. One student said the program allowed him/her to “[get] outside the box” of departmental course requirements. The program provided alternative courses and a new set of peers from different disciplines. Another student wrote, “It has been helpful to see  what problems other people in different disciplines are grappling with and what tools they are using.” Each element of the program was designed to give students the opportunity to immerse themselves in research outside their discipline. For the first cohort of the DS421 program, the final project of the Reproducible Data Science course involved working in mixed discipline groups, bringing students together with different expertise and interests to address one question. Within these opportunities students are learning about and building networks to areas of research that are not traditionally discussed in their departmental programs.


Graduate students don’t explicitly say they are practicing research but they actually are; practicing to think and present research questions, talk with an audience, collaborate with others, and apply tools. The elements of the DS421 program have practice built-in. The colloquium brought students in a cohort together to practice “how to communicate across disciplines.” The communication workshop provided the space for students to try, fail, and succeed at communicating to an audience outside their discipline. From practicing the delivery of a jargon free message to “practicing answering tough questions before an audience,” students were asked to apply the skills and techniques they learned again and again. In the Impacts, Methods and Solutions course students practiced proposal writing. Students posed and wrote about different ideas throughout the semester. A student wrote, “It’s been useful to write frequent proposals, and to workshop them in class.” The Reproducible Data Science course was designed as a series of modules to practice “the collaborative workflow process using Github”, each module adding new tools and different content. Opportunities to practice skills in a supportive environment allows students to try new skills and test out new ideas.


The DS421 program faculty press students to “expose [them]selves to the types of questions in other disciplines.” The program elements push students beyond exposure into engagement and integration of the way other people think about problems. As one student in the program wrote that he/she was able “to think about a statistical methods question …in a new and interesting way by talking with students who do not go to the same usual methods as economists usually go to.” Another student wrote that the colloquium “… has been a great introduction into the way people from other disciplines think about the same questions I do.” Understanding the nuances of research questions across disciplines reveals to students new ways of framing and new methods to addressing their own research questions.


At the symposium, a faculty member stated that the program supported the “…building [of] trust, [by] talking across boundaries.” In order for students to cross disciplinary boundaries the people from both sides need to be brought together for a common purpose and to be in a supportive environment.  The DS421 program does this through striving to balance the participants interests, experiences, and goals.  One of the primary activities students report they are participating in are discussions of topics with peers and mentors. If their peers and mentors cross disciplinary boundaries the discussions are fundamentally different than if they did not. Building these relationships increases students access to interdisciplinary discussions and ultimately opportunities. The summer research program requires students to work with graduate students and faculty from different disciplines, further extending the boundaries of single disciplines.

Next: How can we use these themes to inform and increase the success of the DS421 program?


Metrics & Rubrics Working Group Update

Kate McCleary (University of Wisconsin-Madison), Daniel Bugler (WestEd), Cindy Char (Char Associates), Stephanie Hall (University of Maryland), Glenda Kelly (Duke University), Mary Losch (University of Northern Iowa), Leticia Oseguera (Penn State University), & David Reider (Education Design).

On Monday, October 10, 2016, members of the metrics and rubrics working group held a teleconference to update each other and get feedback on different tools and instruments being created and used around shared skills and competencies which are being assessed across project sites. The skills and competencies that the group discussed include agency, communication, cross-disciplinarity, entrepreneurship, and student engagement. The purpose of this blog post is to share with the NSF-NRT evaluator community that met in May 2016 key updates from our conversation, and to encourage continued collaboration across sites in the development and implementation of evaluation measures.

Agency: Agency is a point of interest for the University of Maryland’s NSF-NRT evaluation. KerryAnn O’Meara and Stephanie Hall see agency as a key characteristic in the development of interdisciplinary researchers. Loosely defined as “the strategic actions that students might take towards a goal,” the UMD evaluation team is seeking to understand how the graduate students in the language science community develop ownership of their training program. O’Meara and Hall see mentorship and multiple pathways through programs as contributing to agency. They are interested in learning if other programs are looking at agency, and if so, what is being used to capture trainee agency? Other members of the working group see the development of agency as components to some of their programs through decision-making and the trainees’ identification of coursework to fulfill their career goals.

Communication: Measuring different components of communication cuts across many of the evaluations being carried out by the working group. A central focus for our working group was how to use rubrics as components of our evaluation plans. Glenda Kelly, internal evaluator with Duke University, shared a rubric on how to assess trainee elevator speeches Scoring Elevator Speech for Public Audiences. The rubric was used as part of the Duke NSF-NRT two-week boot camp, one week of which featured training in team science and professional skills. Trainees participated in a science communication training “Message, Social Media and the Perfect Elevator Speech” facilitated by faculty from the Duke Initiative for Science and Society. Trainees were presented the rubric on how to assess elevator speeches at the beginning of the workshop and used this rubric as a guide in helping develop their elevator pitches. Graduate trainees then presented their elevator speeches to the group a few days later, and used the rubric as a guide in providing informal feedback. The rubric served as a useful tool for trainees as a guide to developing their elevators pitches and providing formative feedback of each other’s presentations.

Cross-Disciplinary Skills: Kate McCleary, evaluator for the University of Wisconsin-Madison, created a cross-disciplinary presentation rubric to be used during a weekly seminar where trainees present their research to peers and core faculty from four disciplines. McCleary used data from individual interviews with faculty and graduate trainees to define cross-disciplinarity within the context of the NSF-NRT project at the University of Wisconsin-Madison, and used literature to further explore the competencies developed through cross-disciplinary collaborations (Hughes, Muñoz, & Tanner, 2015 and Boix Mansilla, 2005). The rubric was also turned into a checklist to provide different options in assessing trainee presentations. The rubric format was based on the AAC&U VALUE Rubrics which are useful tools in assessing sixteen competencies. Relevant competencies and rubrics for the NSF-NRT grants include: creative thinking, critical thinking, ethical reasoning, inquiry and analysis, integrative learning, oral communication, problem solving, teamwork, and written communication.

Entrepreneurship: Leticia Oseguera, evaluator for Penn State University, spent time investigating the literature around entrepreneurship. Working with the NSF-NRT team, she contributed to an annotated bibliography on entrepreneurship, and this competency will be the focus of one professional development program hosted for graduate trainees. A few books that stood out on entrepreneurship include Fundamentals for becoming a successful entrepreneur: From business idea to launch and management by M. Brännback & A. Carsrud (2016) and University startups and spin-offs: Guide for entrepreneurs in academia by M. Stagards (2014). Oseguera shared that the Penn State team has shifted more to investigating cross-disciplinary and interdisciplinary knowledge, and finding ways to objectively assess them.

Student Engagement: The NSF-NRT team at Iowa State University is interested in assessing student engagement. Mary Losch, evaluator from the University of Northern Iowa, is looking at existing metrics and measures to assess student engagement. Losch’s current work on student engagement is pulling from two key articles: “Assessment of student engagement in higher education: A synthesis of literature and assessment tools” by B.J. Mandernach (2015) and “Processes involving perceived instructional support, task value, and engagement in graduate education” by G.C. Marchand & A.P. Guiterrez (2016). In her work with the NSF-NRT team at Iowa State, she is seeking to clarify what aspect of student engagement they are looking to measure (i.e. behavioral, cognitive, affective engagement), and determine where in the program the assessment of student engagement best aligns.

The rubrics and metrics working group plans to continue meeting once a semester. We value the opportunity to share ideas and support one another in the development of meaningful evaluation measures.


Boix Mansilla, V. (2005). Assessing student work at disciplinary crossroads. Change, January/February, 14-21.

Brännback, M. & Carsrud, A. (2016). Fundamentals for becoming a successful entrepreneur: From business idea to launch and management. Old Tappan, NJ: Pearson Education Inc.

Hughes, P.C., Muñoz, J.S., & Tanner, M.N. (Eds.). (2015). Perspectives in interdisciplinary and integrative studies. Lubbock, Texas: Texas Tech University Press.
Mandernach, B.J. (2015). Assessment of student engagement in higher education: A synthesis of literature and assessment tools. International Journal of Learning, Teaching and Educational Research, 12(2), 1-14.

Marchand, G.C. & Gutierrez, A.P. (2016). Processes involving perceived instructional support, task value, and engagement in graduate education. The Journal of Experimental Education,

Stagards, M. (2014). University startups and spin-offs: Guide for entrepreneurs in academia. New York: Apress.

Welcome New NRT Sites and Evaluators

The National Science Foundation (NSF) announced awardees for the second cohort of the NSF Research Traineeship (NRT) program. On October 3rd, Laura Regassa, Tara Smith, and Swatee Naik, NRT Program Directors, convened PI’s from 18 funded sites at NSF’s Arlington, VA campus to discuss project commonalities and shared challenges. Invited members of current NRT program   sites offered insight into navigating the first year of funding: Michelle Paulsen (Northwestern University) program logistics and administration; Cheryl Schwab (UC-Berkeley) program evaluation; Lorenzo Ciannelli (Oregon State University) resources and models; and Colin Phillips (U of Maryland) dissemination, sustainability, and scalability of the model. In addition, Earnestine Easter, Program Director, Education Core Research (ECR), discussed NSF’s commitment to funding opportunities to broadening participation in graduate education. Although each NRT site has an evaluation plan written into their proposal there are common elements and barriers across sites. The NRT PI’s had questions for a range of evaluation topics from steering the human subject approval process to integrating evaluation findings over time. We will address these questions by sharing how other sites are defining and tackling these common issues.

What is the difference between internal and external evaluation? How do we benefit from each type of evaluation?

How can we better specify what we want to know about our programs?

How can we use data that is already being collected?

How can we quantitatively measure success? How can we qualitatively measure success? How can we integrate the two kinds of information?

How can we engage faculty in the development, implementation, and use of assessment and evaluation?

How can we engage students in the development, implementation, and use of assessment and evaluation?

How can we use evaluation data to make changes in our program?

How can we share evaluation results and resources across sites?

Do one of these questions resonate with you? If you have comments or would like to submit a blog addressing an evaluation topic contact

We look forward to hearing from you!

Life at NSF | NSF – National Science Foundation

National Science Foundation590 × 380Search by image

NSF Building in Ballson/Arlington, Virginia

Evaluation: Biggest Challenges

Attendees at the May 2016 NRT Evaluator workshop were asked what was the biggest challenge for evaluation at their program site. The responses reviewed were from the perspective of principle investigators, program administrators, graduate students, and evaluators working with NRT programs who were funded in 2015.  Challenges ranged from defining the evaluation focus, selecting methodology and design, to managing the utility of evaluation results. The challenges reflect the different stages of the evaluation process but are dependent; cutting across the sites was the persistent challenge of engaging people to participate at each stage of the process.

Focus. A central challenge in evaluation of the NRT programs was the identification of “what questions we should be asking” and the determination of “what will constitute success.” NRT sites struggled with how to synthesize and focus the expectations of stakeholders (i.e., students, faculty, departments, NSF) who have varying levels of engagement in the program and roles that change over time. The production of clear and meaningful objectives for the evaluation required the investment of limited resources and time. A key and related challenge was gaining the support of the principal investigator(PI), and for PIs fellow faculty, to allocate resources needed to focus the evaluation and to promote the value of the evaluation process.

Methodology and design. There are different methodological approaches to evaluation but each approach faces the same set of challenges due to the nature of graduate research programs: small number of students, variability of graduate students’ backgrounds, burden of participating in evaluation, selecting and recruiting comparison groups, identifying and controlling variables, and designing and implementing authentic assessment. Many sites struggled with how to “move beyond self-report measures” within the constraints of limited budgets. Added to these challenges was the lack of tools to assess the development of nuanced student learning happening in the NRT program.

Utility. “How will the results be used?” It was challenging for many sites to establish lines of communications and action plans for evaluation. The challenge within programs was how to integrate evaluation results, for example, making time for the review of the results and revision of the evaluation focus. Ensuring the use of evaluation results was reported as challenging because the data needs changed over time and often, the data was not generalizable.  At the same time, many perceived the lack of a clear mandate from NSF for reporting and using evaluation data as a challenge.

Increasing the level of participation and engagement in the evaluation process was a common thread in the challenges reported. This was seen as a problem at all levels of involvement in the program, from students to NSF.  Ways to increase participation were suggested in attendee responses; increase buy in by including students in the development of assessment tools, require explicit agreements (e.g., contracts with partners and faculty), set clear expectations of program participants, and plan ahead for evaluation publications.  In sharing our challenges, we open the conversation for ideas to address the challenges and struggles. Through sharing of creative approaches to evaluation we can sharpen the focus of NRT evaluations, strengthen the methodology and design, and enhance the impact of the results.

Thank you to all workshop attendees. Let’s continue the conversation!

Image retrieved from:

Evaluation: Why do we do it?

At the beginning of the May 2016 NRT Evaluator workshop attendees were posed the question:

Why do you think we are evaluating NRT programs?

The room was filled with principle investigators, program administrators, graduate students, and evaluators associated with NRT programs across the country.  Attendees’ responses were used to assess the groups understanding of evaluation and to find common ground for collaborative work.  Three broad categories emerged from the responses:  contribute to models for graduate education, assess program level goals, and advance evaluation practice.  Although a simplification of the nuances across sites, the categories reflect different program priorities and the uses of evaluation work.

The largest number of  responses described evaluation as a way to contribute to models of graduate education. Responses included sharing stories to communicate the importance of STEM, measuring success of program elements, studying factors that influence student progress (i.e., motivation to do interdisciplinary collaboration), and replicating successful elements across sites. Evaluation is viewed as a means to identify best practices and build conceptual models to provide guidance for programs. Creating evidence based models provide justification for the investment of resources into NRT programs and models to inform future policy.

The second largest number of responses included approaches to assess program level goals and program elements.  Gathering data for the purpose of accountability;  concrete evidence of individual program success or failure.  Evaluation is viewed as an evolving process at the program level. First, a formative role to support continuous improvement and second, a summative role to establish the impact of individual programs.  The formative role builds an understanding of the program goals and ways to quantify the impact.  Many responses focused on measuring the specific aspects of the program to capture the impact of the program on student learning.

Finally, the smallest number of responses addressed ways to advance the evaluation practice. Using the evaluation work at the different sites to create and share strong designs for evaluating graduate education programs.  Evaluation is viewed as a practice that extends beyond a given site.  With shared designs and assessment tools, evaluation findings can more clearly inform other programs or be generalized across programs.

The challenge for NRT evaluators is to continue building on these categories and the conversations started in the workshop. Finding ways to share our evaluation work and further define our own questions. Evaluation is driven by program priorities therefore building upon the common priorities across sites will increase the impact of our evaluation work. These categories represent concrete pathways in which to align NRT evaluation goals.

Thank you to Marilyn Korhonen, University of Oklahoma, for helping to categorize the attendee responses during the workshop.

NRT Logic Models

The development of a logic model for a program is a iterative and dynamic process. Central to the process is the inclusion of different stakeholders in a program. The purpose of the model is dependent upon who is involved, the stage of program development, and the needs of the program. The NRT programs involved in the May 2016 Evaluator Workshop shared logic models designed for many different purposes. The type of frameworks employed varied in complexity from one to five elements. Common elements of the models included inputs, outputs, and outcomes.

Many programs start by clarifying the underlying goals of the program. In Figure 1 the University of Maryland, College Park, outlined the goals for students, graduate education and institutional change.


From the goals, the goals for each level of the program can be further expanded. Figure 2 shows Oregon State University’s logic model of the connections at the student level; outcomes to specific program elements and participating groups.


Figure 2. Oregon State University NRT Logic Model, by  Cynthia Char and Lorenzo Ciannelli, June 2, 2016. Reprinted with permission.

Finally, Figure 3 is the logic model for the University of Rochester’s NRT program, an implementation logic model which outlines inputs, outputs, and outcomes across time.


Figure 3. University of Rochester NRT Program Logic Model, by Chelsea BaileyShea, May 2016. Reprinted with permission.

Each figure displays a model of the program that targets the current needs of a program. There are many resources available online that define and offer frameworks to develop logic models for educational programs.  For example the W.K. Kellogg Foundation and University of Wisconsin-Extension provide guidance and tools to select an approach and to create a logic model. In any form, a logic models can provide evaluators  a window into the rationale and structure of a program. These representations help to clarify the program and are the foundation for creating evaluation plans.

Articles of Interest

Philip Stark

An Evaluation of Course Evaluations

Laura Regassa (co-author)

Designing and Implementing a Hands-On, Inquiry-Based Molecular Biology Course

Cheryl Schwab (co-author)

Interdisciplinary Laboratory Course Facilitating Knowledge Integration, Mutualistic Teaming, and Original Discovery

John Gargani (author)

What can practitioners learn from theorists’ logic models?

Richard Shavelson (co-author)

Room for Rigor: Designs and Methods in Informal Science Education Evaluation

Ed Hackett (co-author)

The Snowbird Charrette: Integrative Interdisciplinary Collaboration in Environmental Research Design

Maura Borrego (co-author)

Sustained Change: Institutionalizing Interdisciplinary Graduate Education

Lisa Kohne (co-author)

Lessons Learned From an Inter-Institutional Graduate Course on Interdisciplinary Modeling for Water-Related Issues and Changing Climate