captUS Home CSAP's Centers for the Application of Prevention Technologies
National CAPT Central CAPT Northeast CAPT Southeast CAPT Southwest CAPT Western CAPT
Western
Calendar Site Map Contact Staff Login
Western CAPT
  Planning and Best Practices
  Tip of the CAPT
  Prevention Talk
  Prevention Materials
  PowerPoint Presentations
  Grant Information
  Resource Links
  Research Links

Western > Resources > Planning and Best Practices > Step 7 > Evaluation 5f

print page

F. Evaluating Outcomes and Impacts

In assessing a program, we want to know whether the program had any effect. Did the program do what it was intended to do? Did the program achieve its stated goals and bring about the desired changes? These kinds of questions are outcome and impact questions. Outcome evaluation and impact evaluation refer to the process of collecting evidence that a program was successful in effecting certain outcomes and impacts. As any good detective will tell you, making sense of the evidence (who did what to whom) is a difficult and imperfect process. The investigator must draw conclusions about a crime from whatever evidence is available after the fact. Program evaluators must also use whatever evidence they have, but they can have some control over what evidence is available. Unlike a crime detective, we are often in the position to plan what evidence will be available.

Note: In your logic model, you have identified both short-term outcomes and long-term impacts. However, the methods for evaluating these are quite similar. Therefore, we address the methods for conducting outcome evaluation below, considering both short-term outcomes and long-term impacts together.

1. Some Common Methods of Outcome/Impact Evaluation

There are many different methods for collecting outcome and impact evaluation data. Below we describe some of the most common evaluation methods, including:

  1. Post-test only data collection
    1. Post-Test Only Data Collection

    (Note:  The term "outcome" includes both short-term outcomes and long-term impacts on this page.)

    Often, program outcomes are measured only after the program is completed. This is understandable since programs must first be developed and operated as planned before they can be assessed. Although collecting data only after the program is implemented can't tell you how much participants have changed (because you don't know what their status was before the program), this information can contribute additional information to the description of your local program and to the overall picture of drug prevention programs at the state and national levels.

    Outcomes measured only after a program is completed provide you with information about where your participants stand at one point in time. You may learn, for instance, that students in your drug information program have mastered 85 percent of the knowledge about the effects of alcohol and other drugs as measured by a drug information test. In some cases this information can be compared with already existing information about the standard rate of drug information among students. (Information about standard rates of behavior or levels of performance are referred to as normative or standardized data. For example, we commonly use published information about the standard level of reading or computational skills of students in our states or the nation to see how well our local educational programs are reaching our goals.).

    The problem with existing data about student drug information knowledge or other drug-related behaviors is that the data often are not an appropriate standard for students in a particular program. The data may be based on a different grade level, region of the country, social class, or some other factor that can make comparisons with your program participants misleading. In fact, the lack of good descriptive data about drug information and use with different groups of students in your community and across the country is one of the most important reasons for local program evaluators to collect and share their outcome findings.

    What do you do with outcomes measured only after the program when there are no appropriate published standard rates of knowledge, attitudes, use, or program outcomes? There are times when the outcome variable—accurate drug knowledge for example—seems relatively unlikely to be influenced by the participants' prior knowledge or current experiences, and the outcome is logically related to the program (e.g., specific knowledge taught and same knowledge tested for outcome). In such a case you might have some confidence that the program had the desired effect on the outcome. Indeed, most classroom teaching and testing operates on a very similar basis. Often, however, we need to have some basis of comparison before concluding that the program brought about a change compared to how things were before the program.

    1. Post-Test Only with a Comparison Group

    (Note: The term "outcome" includes both short-term outcomes and long-term impacts on this page.)

    One approach to the problem of collecting data only after the program is implemented is to expand your after-program data collection to include other people who didn't participate in the program. This group of non-participants is called a "comparison group."

    You could give your outcome measures or test to another school that is very similar to the school where the program was conducted. This comparison school would have a similar student body in terms of income, race, and neighborhood. The comparison school would not, however, have a similar drug prevention program that could affect your outcomes.

    The most important quality that the program group (in this case, the school that has the program) and the comparison group (in this case, another school that does not have the program) can have is that they are alike in all important ways except that only one group receives the program.

    Using this method of data collection, if your outcome measures show differences in scores between the program and comparison schools, this would suggest that the program was effective. We say "suggest" because it is very difficult to demonstrate that the program and the comparison groups were perfectly comparable before the program. If the two schools were different in important ways before the program began, then these initial differences could account for the after-program differences. However, you can build a stronger case for the similarity of the groups by going back and collecting and comparing already existing information about the two schools from existing records (e.g., average standardized test scores, economic makeup of student body, etc.). Great care must be taken to ensure confidentiality of student records by strictly complying with your school district's policies regarding access and use of this information!

    Collecting after-program outcome information from both this year's program group and a comparison group provides rich descriptive information and can suggest program effects. Also, after-program outcome scores from this year's comparison group can begin to build a good comparison basis for next year's program. In addition, the skills and experience developed in this year's after-program outcome evaluation can encourage you to test next year's participants before the program starts, which offers several advantages.

    Here are some examples of types of comparison data you might be able to use, depending on your program:

    • students from comparable schools
    • students in other classrooms
    • students referred to the program but put on a waiting list
    • community residents from a different but comparable community
    • scores from tests done in previous years
    1. Pre- and Post-Test Data Collection

    (Note: The term "outcome" includes both short-term outcomes and long-term impacts on this page.)

    The most direct way to know if the prevention program changed program participants' knowledge, attitudes, behavior, or some other outcome is to test program participants before the program and again after the program. Comparing the difference between before-program scores with after-program scores (i.e., after-program scores minus before-program scores) will clearly indicate if a change on the outcome scores has occurred. Students will, for example, have increased in drug knowledge or decreased in accepting attitudes toward drug and alcohol use. We no longer have to assume change. We have gone beyond describing where our program participants stand at one point in time to demonstrating that they have changed in important ways. Because of this benefit, we strongly recommend using a pre-post data collection method rather than post-test only whenever possible.

    However, using a pre-post test strategy for data collection does not ensure that the change you've seen was actually caused by your program. Consider the following.

    The consequences of drug use are so serious and often so dramatic that drug-related incidents are a constant topic of interest in the media. As a consequence of this level of media coverage and of personal experience, drug use has become a very serious concern to citizens and to all levels of government. Many different uncoordinated efforts are being made to solve the drug use problem.

    We can't just assume that our particular drug use prevention program is the only force affecting our program participants' drug-related knowledge, attitudes, behaviors, or other outcomes. We are all exposed to news programs, TV dramas, magazine articles, or sermons that could change how we stand on some outcome measure. For example, an intoxicated high school student driver and his girlfriend die tragically in a car accident. As a result, new materials intended to prevent AOD use are introduced into the curriculum by a school teacher, or the student government independently begins an anti-drug program. These events and others can all act to change program participants' outcome scores in unanticipated ways. While these events contribute to our common effort to prevent drug use, the combination of these events does make it difficult to say decisively that our particular program was the most important event that brought about the desired change.

    1. Pre- and Post-Test Data Collection with a Comparison Group

    (Note: The term "outcome" includes both short-term outcomes and long-term impacts on this page.)

    The best way to increase confidence that your particular program led to specific changes is to collect data from a comparison group. Testing both program and comparison groups before the program would indicate both how much change had occurred over the course of the program and how comparable both groups were before the program. For example, a local high school conducts an experimental program to change ninth graders' attitudes toward drug use. Another local high school is selected as the comparison group. Both groups have similar student bodies, are in similar neighborhoods, and have similar before-program outcome scores. With reasonably comparable program and comparison groups, there is a good chance that both groups are exposed to similar outside experiences during the program.

    Sometimes these experiences may cause changes in the after-program scores of both groups. For instance, both groups' attitudes toward drug use become more negative, but the program group scores change much more dramatically than the comparison group scores. Later, questioning the ninth graders revealed that, during the program, a large percentage of both groups viewed the "Cosby Show" special series of three programs that dealt with the dangers of teenage drug use. The program evaluators were able to detect this unanticipated event (the "Cosby Show") and able to explain why both groups' outcome scores changed. The evaluators demonstrated good program effects and were able to answer questions about other possible influences.

    Careful monitoring of school, community, and media events can also help detect possible other influences or give some assurance that the most reasonable explanation for changes in outcome scores is the prevention program.

    How important is it to be able to attribute cause to your specific program?

    Although professional program evaluators have developed many methods to help them better identify specific causes (e.g., was it this program or something else?), it may not be critical to your evaluation to know this information with confidence. For many programs, if demonstrated changes in knowledge, skills, behaviors, etc. occur within a group of program participants, this is sufficient evidence that the program is working. Further, in terms of preventing substance abuse, it is widely recognized that no one program is going to have a strong overall effect in isolation. Thus, we expect that multiple programs are having multiple influences at any given time. This suggests that if your program keeps good information about how well it is implemented, and the short-term changes that participants experience, this is sufficient evidence for program effectiveness. Isolating one program's effects may be unrealistic.

    What if we can't include before measures or comparison groups?

    Program evaluation, like politics, is the art of the possible. The program that provides outcome evaluation with only after-program testing is providing descriptive information that contributes to the overall drug prevention effort by building the database for future standard rates of behavior. As you add before measures and comparison groups, there is an increase in your ability to say how effective your program was, but most information can make a real contribution. The more important the social problem, the more difficult it is to conduct an evaluation. Highly visible social problems like drug abuse bring about a great many responses. Consequently, it is virtually impossible to conduct the perfect evaluation, free of problems, to determine if any one program was the major factor affecting participants' outcome scores. We do the best we can or we retreat from the problem.

  2. Post-test only with a comparison group
  3. Pre-Post data collection
  4. Pre-Post data collection with a comparison group

2. Distinctions between Long-Term Impacts and Short-Term Outcomes

Evaluating long-term impacts is usually done in the same manner as for short-term outcomes. Sometimes evidence is gathered only after the program and sometimes it is gathered both before and after the program; sometimes we have comparison groups and sometimes we don't. One important difference between evaluating long-term impacts and short-term outcomes is the amount of change we can reasonably expect any individual program to have on measures of impact.

As stated earlier in this chapter, change in many outcomes (e.g., drug knowledge, attitudes about drug use, accessibility to drugs, law enforcement, and peer group values) can have some effects on the ultimate criteria of drug use. It is unreasonable to expect any one program, by itself, to independently and dramatically change existing patterns of drug use. It is the combination of many programs and other local, state, and federal responses to the drug problem that will, over time, culminate in significant reductions in drug use. Changes in many short-term outcomes must come before changes in long-term impacts. Good evaluation must first document and evaluate programs' short-term outcome effects. This is not to say that long-term impacts are not important, but we must evaluate them in the context of a longer time perspective and a bigger picture in mind. Don't be discouraged by findings that show little or no program effects on long-term impacts! Serious social problems require the combined efforts of many people and time. Local drug prevention programs and their evaluations are part of the solution.

Evaluating Short-Term Outcomes & Long-Term Impacts

  1. Methods and Data Sources. Any of the possible sources of information described can be used to evaluate program outcomes and impacts.
  2. List of measures of some common short-term outcomes and long-term impacts of substance abuse prevention programs.
  3. Examples of data sources for short-term outcomes based on hypothetical program logic models.
  4. Examples of data sources for long-term impacts based on hypothetical program logic models.

3. Measuring client satisfaction

Although not technically a program "outcome," it is often very important to learn about whether or not persons participating in programs are satisfied with the services they have received. This can be very useful information for making improvements to the program in the future. For more information about how to design a survey of client satisfaction, click here.

Previous | Outline | Next

Privacy Policy | Site Disclaimer | Site Accessibility

U.S. Department of Health and Human Services
SAMHSA | NCADI | National Mental Health Information Center | USA.gov

Page last updated: 08/17/2006