11 May Survey research has become an important component of the public agency data analysts toolbox. Explain how survey research has been used to support
Discussion Assignment Instructions
DUE: by 11am Tuesday May 17, 2022. NO LATE WORK!
Survey research has become an important component of the public agency data analyst’s toolbox. Explain how survey research has been used to support and analyze policy decisions by Criminal Justice administrators.
You will post a thread presenting your scholarly response on the assigned topic, writing 750–850 words. For each thread, students must support their assertions with at least four (4) scholarly citations in APA format. The original thread must incorporate ideas and several scholarly citations from all of the Learn material for the assigned Week.
Then, you will post replies of 250–300 words (supported with at least two cites) each to 3 or more classmates’ threads. Each reply must incorporate at least two (2) scholarly citation(s) in APA format. The reply posts can integrate ideas and citations from the Learn material throughout the course.
Any sources cited must have been published within the last five years. Integrate Biblical principles in your personal thread and in all replies to peers.
Cronk, Brian C. (2018). How to use SPSS®: A step-by-step guide to analysis and interpretation (11th ed.). New York, NY: Routledge. ISBN: 978-0367355692.
Meier, Kenneth J., Brudney, Jeffrey L., & Bohte, John (2015). Applied statistics for public and nonprofit administration (9th ed.). Stamford, CT: Cengage Learning. ISBN: 9781285737232.
Meier, Brudney, & Bohte: CH.2
Meier, Kenneth J., Brudney, Jeffrey L., & Bohte, John (2015). Applied statistics for public and nonprofit administration (9th ed.). Stamford, CT: Cengage Learning. ISBN: 9781285737232
Using a statistical approach in public and nonprofit administration begins with measurement. Measurement is the assignment of numbers to some phenomenon that we are interested in analyzing. For example, the
effectiveness of army officers is measured by having senior officers rate junior officers on various traits. Educational attainment may be measured by how well a student scores on standardized achievement tests. Good performance by a city bus driver might be measured by the driver’s accident record and by his or her record of running on time. The success of a nonprofit agency’s fund-raising drive might be measured by the amount of money raised. How well a nonprofit agency’s board of directors represents client interests might be measured by the percentage of former or current clients on the board.
Frequently, the phenomenon of interest cannot be measured so precisely but only in terms of categories. For example, public and nonprofit administrators are often interested in characteristics and attitudes of the general populace and of various constituency groups. We can measure such things as the racial and gender composition of the individuals in these groups; their state of residence or their religious preferences; their attitudes toward a particular agency or gov- ernment in general; their views on space exploration, public spending, or the tax treatment of nonprofit organizations; and so on. Although such variables do not have quantitative measurement scales, it is still possible to measure them in terms of categories—for instance, white versus nonwhite; female versus male; favor tax decrease, favor no change, favor tax increase; and so on. Although these phenomena cannot be measured directly with numerical scales, they are impor- tant variables nonetheless. Public and nonprofit administrators need to know how to measure, describe, and analyze such variables statistically.
In many managerial situations, the manager does not consciously think about measurement. Rather, the manager obtains some data and subjects them to anal- ysis. Of course, problems arise with this approach. For example, in Chapter 11 we discuss a program where the Prudeville police department cracks down on prostitution in the city, and arrests by the vice squad increase from 3.4 to 4.0 on average per day. Based on these numbers, the police chief claims a success- ful program. This example illustrates a common measurement problem. The city council of Prudeville was concerned about the high level of prostitution activity, not the low level of prostitution arrests. Conceivably the number of prostitution arrests could be positively related to the level of prostitution activity (i.e., more prostitution arrests indicate greater prostitution activity). In this situation the police chief ’s data may reveal increased prostitution, not decreased prostitution. In fact, the only thing an analyst can say, given the police chief ’s data, is that the number of prostitution arrests increased.
In this chapter, we discuss some of the important aspects of measure- ment, both in theory and in application. The chapter presents the theory of measurement and discusses operational definitions and indicators. Following this discussion, the chapter explores the concept of measurement validity and then turns to increasing reliability and the types of measures, such as subjec- tive indicators, objective indicators, and unobtrusive indicators. Next, the chapter presents levels of measurement: nominal, ordinal, and interval. It follows with a discussion of the implications of selecting a particular level of measure- ment and concludes by considering performance measurement techniques and benchmarking.
Theory of Measurement
Measurement theory assumes that a concept that interests an analyst cannot be measured directly. Army officer effectiveness, educational achievement, bus driver performance, level of prostitution activity, social capital, program success, and civic engagement are all concepts that cannot be measured directly. Such concepts are measured indirectly through indicators specified by operational definitions. An operational definition is a statement that describes how a con- cept will be measured. An indicator is a variable, or set of observations, that results from applying the operational definition. Examples of operational defini- tions include the following:
•• Educational attainment for Head Start participants is defined by the achievement scores on the Iowa Tests of Basic Skills.
•• Officer effectiveness is defined by subjective evaluations by senior officers using form AJK147/285-Z.
•• Program success for the Maxwell rehabilitation program is defined as a recidivism rate of less than 50%.
•• A convict is considered a recidivist if, within 1 year of release from jail, the convict is arrested and found guilty.
•• Clients’ satisfaction with the service of the Department of Human Resources is measured according to the response categories that clients check on a questionnaire item (high satisfaction, medium satisfaction, and low satisfaction).
•• An active volunteer in the Environmental Justice Association is defined as a person who donates his or her time to the association at least 5 hours per week, on average.
One measure of board director activity of the Nature Society is the number of hours devoted by board members to this organization each month.
•• The efficiency of a fund-raising firm is defined as the money raised divided by the costs paid to the firm.
Operational definitions are often not stated explicitly but are implied from the research report, the memo, or the briefing. A manager should always en- courage research analysts to state explicitly their operational definitions. Then the manager can focus on these definitions and answer a variety of measurement questions, such as the ones we will discuss later. It is important for public and nonprofit managers to know how the complicated concepts they deal with are measured. Without this knowledge, they will be hard-pressed to understand quantitative analyses or explain them to others.
Reading the preceding operational definitions, you may have been troubled by the lack of complete congruence between the concept and the indicator. For example, assume the city transit system evaluates the job performance of its bus drivers by examining each one’s accident record and on-time rate. A driver may well have a good accident record and be on time in her bus runs and yet be a poor bus driver. Perhaps the on-time record was achieved by not stopping to pick up passengers when the driver was running late. Or perhaps the driver’s bus was continually in the shop because the driver did not see to maintaining the bus properly.
This example suggests that observed indicators may not offer a complete measure of the underlying concepts. Most students of measurement accept the following statement:
Indicator 5 concept 1 error
A good indicator of a concept contains very little error; a poor indicator is only remotely related to the underlying concept.
In many cases several indicators are used to measure a single concept. One reason for using multiple indicators is that a concept may have more than one dimension. For example, the effectiveness of a receptionist may be related to the receptionist’s efficiency and the receptionist’s courtesy to people. To measure ef- fectiveness adequately in this instance, we would need at least one indicator of efficiency and one of courtesy. To measure nonprofit financial standing would require several indicators of, for example, funds in reserve, diversity in funding sources, and operating efficiency. The term triangulation is sometimes used to describe how multiple indicators enclose or “hone in” on a concept.
Multiple indicators are also needed when the indicators are only poor or in- complete representations of the underlying concept. For example, a measure of a nonprofit agency’s receptivity to volunteers might include the presence of a volun- teer coordinator, procedures in place to welcome new volunteers, and an explicit orientation for new volunteers. The success of a neighborhood revitalization pro- gram would require several indicators. The increase in housing values might be one indicator. The decrease in crime, reduction in vandalism, willingness to walk outside at night, and general physical appearance might be other indicators. The start of a neighborhood association or of a day care cooperative could be addi- tional indicators. Each indicator reflects part of the concept of neighborhood re- vitalization but also reflects numerous other factors, such as economic growth in the entire city, demand for housing, street lighting, and so on. The theory behind multiple indicators in this situation is that the errors in one indicator will cancel out the errors in another indicator. What remains will measure the concept far better than any single indicator could alone. For these reasons, a multiple- indicator strategy to measure important concepts comes highly recommended in public and nonprofit management.
A valid indicator accurately measures the concept it is intended to measure. In other words, if the indicator contains very little error, then the indicator is a valid measure of the concept. The measurement validity of an indicator often becomes a managerial problem. For example, many governments administer civil service examinations that are supposed to be valid indicators of on-the-job performance. If minorities or women do not do as well as white males on these examinations, the agency is open to discrimination lawsuits. The agency’s only defense in such a situation is to prove that the civil service examination is a valid indicator of on-the-job performance (not an easy task).
Validity can be either convergent or discriminant. The preceding paragraph discusses convergent validity: Do the indicator and the concept converge? Does the indicator measure the concept in question? Discriminant validity asks whether the indicator allows the concept to be distinguished from other similar, but different, concepts. For example, using achievement scores on standardized tests may lack discriminant validity if the tests have some cultural bias. A good indicator of educational achievement will distinguish that concept from the con- cept of white, middle-class acculturation. A culture-biased test will indicate only educational achievement that corresponds with the dominant culture. As a result, such an indicator may not be valid.
Social scientists have long grappled with the idea of measurement validity. They have suggested several ways that validity can be established. An indicator has face validity if the manager using the indicator accepts it as a valid indicator of the concept in question. For example, years spent by students in school is accepted as a valid indicator of formal education. An indicator has consensual validity if numerous persons in different situations accept the indicator as a valid indicator of the concept. The recidivism rate, for example, has consensual validity as a good measure of a prison’s ability to reform a criminal. Often, consensual validity is established through finding a published research study in which the indicator has been used, thus suggesting its acceptance by scholars. An indicator has correlational validity if it correlates strongly with other indicators that are accepted as valid. For example, community satisfaction with a nonprofit organization as assessed in a survey might be strongly related to the amount of monetary donations received by the agency or the number of donors. (In Chapter 17 we discuss correlation and how it is measured.) Finally, an indicator has predictive validity if it correctly predicts a specified outcome. For example, if scores on a civil service examination accurately predict on-the-job performance, the exam has predictive validity.
These four types of validity offer ways in which a public or nonprofit manager can accept, or argue, that an indicator is valid. They do not, however, guarantee that the indicator is a particularly effective measure of the concept in question. An indicator may have face validity, consensual validity, correlational validity, and predictive validity and still not be as effective as other measures. Consider the Law School Admission Test (LSAT). The LSAT has face validity (it seems to make sense) and consensual validity (numerous law schools use it to screen appli- cants). It also has correlational validity (it correlates with undergraduate grades) and predictive validity (it correlates with law school grades). And yet the LSAT is not as strong a predictor of law school performance as is the socioeconomic status of the student’s family.
With all of the tests for validity and all of the different ways that an indi- cator can be validated, developing valid indicators of concepts remains an art. It requires all of the skills that a public or nonprofit manager has at his or her disposal. To be sure, in some cases, such as finding indicators of lawn-mower efficiency, valid indicators can be easy to derive. By contrast, developing valid indicators of community police effectiveness or of the “health” of the nonprofit community in a city is a very difficult task. Scholars and practitioners continually debate the meaning and measurement of crucial concepts, such as social capital, civic engagement, military preparedness, and board of directors’ leadership.
One approach to finding or developing valid indicators is to review the published literature in a field. Or you may check studies or reports from other jurisdictions or consult with experts in the field. In general, if an indicator is used in the published literature, it has at a minimum both face and consensual validity, and it may meet other validity criteria as well. Before (or while) you cre- ate your own indicators of an important concept, it is a good idea to consult the relevant literature. A review of the literature carries additional benefits, such as making you aware of how other researchers have approached related problems or issues and what they have found. Such information can make your analytical task easier. You may also find that the “answer” to your research question, for which you had planned to develop indicators, already exists, thus saving you the time and effort of conducting your own study.
A reliable indicator consistently assigns the same number to some phenomenon that has not, in fact, changed. For example, if a person measures the effectiveness of the police force in a neighborhood twice over a short period of time (short nough so that change is very unlikely) and arrives at the same value, then the indicator is termed reliable. Or, if the rate of volunteering to a volunteer center remains constant from one day to the next, it is probably a reliable indicator. If two different people use an indicator and arrive at the same value, then, again, we say that the indicator is reliable. Another way of defining a reliable indicator is to state that an indicator is a reliable measure if the values obtained by using the indicator are not affected by who is doing the measuring, by where the measur- ing is taking place, or by any other factors than variation in the concept being measured.
The two major threats to measurement reliability are subjectivity and lack of precision. A subjective measure relies on the judgment of the measurer or of a respondent, for example, in a survey. A general measure that requires the analyst to assess the quality of a neighborhood or the performance of a nonprofit board of directors is a subjective measure. Subjective measures have some inherent un- reliability because the final measures must incorporate judgment. Reliability can be improved by rigorous training of the individuals who will do the measuring. The goal of this training is to develop consistency. Another method of increasing reliability is to have several persons assign a value, and then select the consensus value as the measure of the phenomenon in question. Some studies report a mea- sured inter-rater reliability based on the consistency of measurement performed by several raters. Often, judgments about the effectiveness of nonprofit boards of directors are based on the ratings provided by multiple knowledgeable actors— for example, the board chairperson, the chief executive officer of the nonprofit, and nonprofit stakeholders such as funders, donors, and other similar nonprofits in the community.
Reliability can also be improved by eliminating the subjectivity of the analyst. Rather than providing a general assessment of the quality of the neighborhood, the analyst might be asked to answer a series of specific questions. Was there trash in the streets? Did houses have peeling paint? Were dogs running loose? Did the street have potholes? How many potholes? Or consider the performance of the local volunteer center. How many volunteers does it attract? What work do the volunteers perform? What are the results of their efforts for the community? Does the volunteer center recruit any new volunteers or only those already active in the community?
Reliability problems often arise in survey research. For example, suppose that you were asked to respond to survey questions concerning the performance of one of your instructors—or a local political figure, or “bureaucrats,” or the vol- unteers assisting in your agency—on a day that had been especially frustrating for you. You might well evaluate these subjects more harshly than on a day when all had seemed right with the world. Although nothing about these subjects had changed, extraneous factors could introduce volatility into the ratings, an indica- tion of unreliability. If your views of these subjects had actually changed though, and the survey instrument picked up the (true) changes, the measurement would be considered reliable. (For that reason, reliability is often assessed over a short time interval.) By contrast, a reliable measure, such as agency salaries or number of employees and volunteers, is not affected by such extraneous factors.
Unfortunately, although removing the subjective element from a measure will increase reliability, it may decrease validity. Certain concepts important to public and nonprofit managers—employee effectiveness, citizen satisfaction with services, the impact of a recreation program—are not amenable to a series of ob- jective indicators alone. In such situations a combination of objective and subjec- tive indicators may well be the preferred approach to measurement.
Lack of precision is the second major threat to reliability. To illustrate this problem, we use the example of Mrs. Barbara Kennedy, city manager of Barren, Montana (fictional cities are used throughout the book), who wants to identify the areas of Barren with high unemployment so that she can use the city’s federal job funds in those areas. Kennedy takes an employment survey and measures the unemployment rate in the city. Because her sample is fairly small, neighbor- hood unemployment rates have a potential error of 65%. This lack of precision makes the unemployment measure fairly unreliable. For example, neighborhood A might have a real unemployment rate of 5%, but the survey measure indicates 10%. Neighborhood B’s unemployment rate is 13.5%, but the survey measure indicates 10%. Thus, the manager has a problem with measurement imprecision.
One way to improve the precision of these measures is to take larger samples. But in many cases, having a larger sample is insufficient. For example, suppose the city of Barren has a measure of housing quality that terms neighborhood housing stock as “good,” “above average,” “average,” or “dilapidated.” Assume that 50% of the city’s housing falls into the dilapidated category. If the housing evaluation were undertaken to designate target areas for rehabilitation, the mea- sure lacks precision. No city can afford to rehabilitate 50% of its housing stock. Barren needs a more precise measure that can distinguish among houses in the dilapidated category. This need can be met by creating measures that are more sensitive to variations in dilapidated houses (the premise is that some dilapidated houses are more dilapidated than others; for example, “dilapidated” and “unin- habitable”). Improving precision in this instance is more difficult than simply increasing the sample size.
Unlike validity, the reliability of a measure can be determined objectively. A common method for assessing measurement reliability is to measure the same phenomenon or set of indicators or variables twice over a reasonably short time interval and to correlate the two sets of measures. The correlation coefficient is a measure of the statistical relationship or association between two characteristics or variables (see Chapter 17). In this instance, the higher the correlation between the two measures over time, the higher the reliability. This procedure is known as test-retest reliability. Another approach to determining reliability is to prepare alternative forms that are designed to be equivalent to measure a given concept, and then to ad- minister both of them at the same time. For example, near the beginning of a survey, a researcher may include a set of five questions to measure attitudes to- ward government spending or trust in nonprofit fund-raisers, and toward the end of the survey, he or she may present five more questions on the same topic, all parallel in content. The correlation between the responses obtained on the two sets of items is a measure of parallel forms reliability. Closely related is split-half reliability, in which the researcher divides a set of items intended to measure a given concept into two parts or halves; a common practice is to divide them into the even-numbered questions and the odd-numbered questions. The correlation between the responses obtained on the two halves is a measure of split-half reliability. Cronbach’s alpha, a common measure of reliability, is based on this method.
In all three types of reliability measurement—test-retest, parallel forms, and split-half—the higher the intercorrelations or statistical relationships among the items, the higher the reliability of the indicators.
If several individuals are responsible for collecting and coding data, it is also good practice to assess inter-rater reliability. Inter-rater reliability is based on the premise that the application of a measurement scheme should not vary de- pending on who is doing the measuring (see above). For example, in screening potential applicants for a food and clothing assistance program, a nonprofit com- munity center might use a 10-item checklist for assessing the level of need for each client. To determine whether agency staff are interpreting and applying the checklist consistently, we could ask five employees to screen the same group of 20 clients using the checklist. High inter-rater reliability would exist if all five employees came up with very similarly scored (or even identical) checklists for each client. Alternatively, if the scored checklists for each client turned out to be dramatically different, we would have low inter-rater reliability. Low inter-rater reliability can indicate that confusion exists over how a measurement instrument should be applied and interpreted.
Types of Measures
We have already presented examples of two types of indicators—subjective and objective. The subjective indicator requires some judgment to assign a value, whereas the objective indicator seeks to minimize discretion. Assume that the city manager wants to know the amount of city services delivered to each neigh- borhood in the city. Objective measures of city services would be acres of city parks, number of tons of trash collected, number of police patrols, and so on. Subjective measures of city services could be obtained by asking citizens whether the levels of various city services were adequate. A subjective measure of non- profit organizational effectiveness—a difficult concept to assess—might be the reputation of these organizations as assessed by local funding agencies. A third type of measure, an unobtrusive indicator, is intended to circum- vent the so-called Hawthorne effect, in which the act of measuring a phe- nomenon can alter the behavior being assessed. In the Hawthorne studies, employees who were observed by a research team seemed to change their workplace behavior as a result of being observed. For example, when you call the county services hotline and hear the message that “your call may be moni- tored and recorded,” you are likely to receive different (better) treatment than if no message were issued. As another example, asking city residents about the quality of police services may sensitize them to police actions. If an indi- vidual is asked his or her opinion again, the answer may be biased by earlier sensitizing. A city employment counselor, for example, will likely know that her evaluation is based on the number of individuals who are placed in jobs. She may then focus her efforts on the easiest persons to place, rather than devote time to all clients, to build up a favorable record. Any reactive measure (a measure that affects behavior when it is taken) has some inherent reliability and validity problems.
One way to circumvent this problem is through the use of unobtrusive measures (see Webb et al., 1999). A library, for example, could determine its most useful reference books (which do not circulate) by asking patrons which reference books they use most frequently. Among the problems in this situ- ation is that many people who do not use reference books might answer the question nevertheless. An unobtrusive measure of reference book popularity would be the amount of wear on each book. To gauge public interest in its operations, a nonprofit organization could count the number of Web “hits” on its home page.
Suppose the head of the Alcoholic Beverage Control Board in a state wants to know how much liquor is consumed “by the drink.” Because it is illegal to serve liquor by the drink in many counties in certain states, sending a survey questionnaire to private clubs would yield little (valid) response. An unobtrusive measure would be to count the number of empty liquor bottles found in the trash of private clubs. An unobtrusive measure of the interest of service volun- teers in the governance of a nonprofit board would be a simple count of how many volunteers attend open board sessions over a calendar year.
Unobtrusive measures can be used in a variety of situations and can take on as many different forms as the creative manager can devise. They do, how- ever, have some limitations. Unless care is taken in selection, the measures may lack validity. For example, a manager may decide that she can determine the amount of time an office spends in nonproductive socializing by mea- suring the office’s consumption of coffee and fountain water (this fountain uses bottled water). She assumes that more coffee and more water fountain meetings imply less productivity. In fact, one office might consume more cof- fee than another because it has older workers (who are more likely to drink coffee) or because the office puts in more overtime and needs more coffee to make it through the night.
Levels of Measurement
In many cases public and nonprofit managers can use actual numbers to measure phenomena: tons of garbage collected in a given town, number of arrests made by the police per week, response times in minutes of a local fire department, number of children attending a daily church after-school program, miles driven in a week by Meals on Wheels volunteers, and so forth. Because this information consists of real numbers, it is possible to perform all types of arithmetic calculations with the data— addition, subtraction, multiplication, and division. As we will learn in Chapter 5 on measures of central tendency, when we have numerical data, we can readily compute average scores, such as the mean or average number of tons of garbage collected per week, the average response time of the fire department, and so forth.
Unfortunately for public and nonprofit administrators, available data are often not measured in nearly as precise a fashion as are these variables. There are several reasons for the lack of precision. In some cases it is a reflection of the state of the art of measurement. For instance, although it may be possible to say that a client is “very satisfied,” “satisfied,” “neutral,” “dissatisfied,” or “very dissatisfied” with a new job training program contracted out to a nonprofit agency, it usually is not possible to state that his or her level of satisfaction is exactly 2.3—or 5 or 9.856 or 1.003. Most measures of attitudes and opinions do not allow this level of exactitude. In other instances, loss of precision results from errors in measurement or, perhaps, from lack of foresight. For example, you may be interested in the number of traf- fic fatalities in the town of Berrysville over the past few years. As a consequence of incomplete records or spotty reporting in the past, you may not be able to arrive at the exact number of fatalities in each of these years, but you may be quite confident in determining that there have been fewer fatalities this year than last year.
Finally, some variables inherently lack numerical precision: One could classify the citizens of a community according to race (white, African American, Hispanic, or other), gender (male or female), religion (Protestant, Catholic, Jewish, Buddhist, or other), and many other attributes. It would be futile to attempt to calculate the arithmetic average of race or religion, however, and it would be meaningless to say that a citizen is more female than male: A person is classified as either one or the other.
In discussing these different types of variables, social scientists usually refer to the concept of levels of measurement. Social scientists conventionally speak of three levels of measurement. The first or highest (most precise) level is known as the interval level of measurement. The name derives from the fact that the measurement is based on a unit or interval that is accepted as a common stan- dard and that yields identical results in repeated applications. Weight is measured in pounds or grams, height in feet and inches, distance in miles or kilometers, time in seconds or minutes, and so on. The variables discussed at the beginning of this section are all measured at the interval level: tons of garbage, number of arrests, response times in minutes. As a consequence of these standard units, it is possible to state not only that there were more arrests last week than this week but also that there were exactly 18 more arrests. (Some texts discuss a fourth level of measurement—ratio—but for our purposes it is effectively the same as interval measurement.)
The second level of measurement is called ordinal. At this level of measure- ment it is possible to say that one unit or observation (or event or phenomenon) has more or less of a given characteristic than another, but it is not possible to say how much more or less. Generally, we lack an agreed-on standard or metric (