The Supreme Court first used formal statistical methods in a case titled Castaneda vs. Partida which was brought before the court in 1977. This case was a discrimination case in which the plaintiffs argued that Mexican-Americans had been underrepresented in the selection of grand juries in Hidalgo County, TX. A second case, Hazelwood School District vs. United States, soon followed and this case was centered around the hiring practices of minority teachers in the Hazelwood School District in Florissant, MO. A third case, Greenholtz vs Nebraska Penal and Correctional Complex will be discussed here as the actual data is provided.
Source: Sugrue, T. and Fairley, W. (1983). “A Case of Unexamined Assumptions: The Use and Misuse of the Statistical Analysis of Castaneda / Hazelwood in Discrimination Litigation”. Boston College Law Review, Vol 24, Issue 4, Number 4.
Example 1.4.1: Sugrue and Fairley (1983) present a court case, Greenholtz vs. Nebraska Penal and Correctional Complex, which will be focus of the analysis presented here. The plaintiffs in this case argued that Native Americans and Mexican-Americans were not fairly paroled compared against others. The data from this case is presented in Table 2 on p942 of this article.
Race/Ethnicity |
Elgible for Parole |
Number Paroled |
||
White | 590 | 358 | ||
African-American | 235 | 148 | ||
Native-American | 59 | 24 | ||
Mexican-American | 18 | 5 |
According to Sugrue and Fairley (1983), the court’s initial ruling in this case was that the evidence provided did not “make out a case of purposeful racial and ethnic discrimination in Nebraska’s discretionary parole process” because the percentages for those eligible for parole were insignificantly different [all less than 2%] than those that received parole.
Court’s Initial Ruling::![]() Source: Sugrue and Fairly (1983), p943. |
To provide an understanding of court’s decision, let us first consider the inmates that were elgible for parole. There were 902 inmates elgible, of those 59 were Native-Americans. Converting this value to a proportion we get \(\frac{59}{902} = 6.5\%\). Next, consider the 535 inmates that recieved parole, of those 24 were Native-Americans. Converting this to a proportion we get \(\frac{24}{535} = 4.5\%\). The difference between these two proportion \((6.5\% - 4.5\%) = 2.0\%\) which is the threshold the court used as an insignificant difference.
From a statistical persepective, the logic used initially by the courts is flawed. For example, consider the fact that \(\frac{18}{902} = 2.0\%\) of Mexican-Amercians were eliglble for parole at the start of the study. The Nebraska Penal and Correctional Complex authorities could deny parole to all Mexican-Americans, yet they would be found innocent of discrimination against Mexican-Amercians because the difference in these proportions does not exceed 2%, i.e. the arbitrary choosen value to determine significance initially set forth by the courts.
At the beginning of the study, there were 902 people who were eligible for parole in the State of Nebraska. Of these 902, a total of \(535\) recieved parole, the remaining \(367\) did not recieve parole.
The proportion of inmates that received parole out of the total is \(59.3\%\). The rate is computed across all races and hence serves as a benchmark for what we’d expect to happen if race/ethnicity were irrelevent to whether or not an individual recieved parole.
\[\frac{\# Paroled}{Total\space Elgible} = \frac{535}{902} = 59.3\%\]
A simulation model can be used instead of the overly simplistic approach taken by the court, e.g. percentages less than 2% are close enough. The simultion model can be used to compare the number that received parole against the number we’d expect to receive parole. The simulation model set up here will be for the Native Americans.
First, an indivdiual trial in the context of this example represents each Native American. There were a total of 59 Native Amercians elgible for parole, thus our simulation study will consist of 59 trials. The simulation study will be set up under a “no discrimination” or “all things fair” situation. If race/ethnicity is irrelevent, then the proportion of inmates that recieved parole is about \(\frac{535}{902}=59.3\%\). Under an “all things fair” situation, this is the proporiton of Native Americans that should have recieved parole.
Necessary Information for Simulation | Values |
Number Eliglble for Parole | 59 |
The likelihood or chance of parole (under a fair situation) |
\(\frac{535}{902} = 59.3\%\) |
Expected Value:
Under a no discrimination situation, the number of Native Americans that we’d expect to see recieve parole would be
\[\begin{array} {rcl} Expected \space Value & = & 59 * \big(\frac{535}{902}\big) \\ & = & 59 * \big(0.593\big) \\ & = & 34.99 \\ & \approx & 35. \end{array} \]
A depiction of the simulation model is provided here. The Number of Native Americans Paroled is of interest and will be plotted on the number line. The number line starts at 0 and goes up to 59, i.e. the number of trials. The simulation will provide a set of outcomes that woud be likely under a “no discrimination” or “all things fair” situation.
As mentioned previously, one important goal of a simulation to gain an understanding or measure of the inherent variation that exists in our simulation model. The representation below has considerable more inherent variation that the representation above. Obtaining a measure of this inherent variation is necessary when determining which values are likely versus unlikley. Furthermore, obtaining a measure of the inherent variation is necessary to determine whether or not the study outcome is an outlier against a no discrimination situation.
After a set of likely outcomes are obtained to represent the situation of no discrimination, the number of Native Americans that were actually paroled can be compared against these values. In particular, interest lies in determining whether or not \(24\) is an outlier on the lower-end. The left-side of this graphic is of interest here because having too few provides evidence of discrimination against Native Amercians.
The following table provides a recap of the various quantites that are relevent for our simulation model.
Value | Discussion | |
Lowest Possible Value |
0 | There could have been 0 Native Americans Paroled |
Largest Possible Value |
59 | There could have been 59 Native Americans Paroled |
Label | # Native American Paroled | The quantity of interest in our investigation is the number of Native American Paroled |
Expected Value |
35 | This quantity is the expected number of Native Amercians we’d expect to recieve parole under no discimination |
Side of Interest |
Left-side | Goal is determine whether or not 24 is too few |
Study Outcome |
24 | The number of Native Amercians that were actually paroled |
Question: Does 24 Native Amercians paroled (out of 59) provide enough statistical evidence to suggest discrimination was taking place against Native Americans during the time of this study?
Goal: Determine whether or not 24 is an outlier against a no discrimination situation.
The simulation app can be used to obtain indvidual simulaiton outcomes.
Link to Simulation App: https://wsu-datascience.github.io/binomial_simulation/ |
|
Setup:![]() |
Discussion:
|
The following outcomes were obtained from the simulation app. The count for the number of Native Americans who were paroled for each repeated simulation is provided. If the simulation is setup correctly, we expect these number to be around about 35. These values are commonly referred to as statistics. A statistic is simply some type of summary measurement that is obtained from data and for this simulation the statistic of interest is a count of the number of Native Americans who were paroled.
Simulated Outcome #1 |
Simulated Outcome #2 |
Simulated Outcome #3 |
Simulated Outcome #4 |
|
38![]() |
33![]() |
27![]() |
35![]() |
Etc.. |
Definition |
A statistic is a summary measurement computed from data. |
The first plot below shows the 10 repeated outcomes from my simulation. I have included two other plots from other people.
Questions
1. What similarities exist in these three plots?A graph based on many simulated outcomes is better than a graph based on only a few simulated outcomes. This is especially true when interest lies in determining an appropriate cutoff value that will be used to separate likley from unlikely values. This cutoff value is on the edge of the distribution where few outcomes are present. The graph below includes 100 simulated outcomes. On this graph, the left edge of the graph appears to be around 25/26, and possible 27.
The following graph includes an additional 100 repeated outcomes from the simulation model. From this graph, it now appear the edge of the graph is at 25/26. The value of 27 is appearing somewhat often relative to the values of 25/26.
The following graph includes an additional 150 repeated outcomes – a total of 350 repeated outcomes. The graph confirms that the left edge of the graph appears to be around 25. In addition, anomalies such as 33 having considerable more dots than the expected outcome of 35 are reduced when many repeated outcomes are obtained from your simulation model.
Comments
Notice that the outcomes appear to be centered around the expected value of 35 on each graph
The outcomes stay between 25 and 35 or so on each graph
With several hundred repeated outcomes, it becomes easier to determine an appropriate value for the left edge of the graph
Next, consider the following set of graphs, generically labeled Graph A - Graph D, where the number of simulated outcomes are increased from 10, to 100, to 1,000, to 10,000. Answer the questions regarding this set of graphs below.
Questions
4. What similarities exist in Graph A - Graph D?Finally, let us consider Graph C: 1000 Outcomes which will be used to determine whether or not there is enough statistical evidence to suggest discrimination was taking place against Native Americans throughtout the time of this study. On Graph C: 1000 Outcomes, it is somewhat difficult to see the individual dots, so I counted them for you for for the following values.
Number of dots at 22: 1
Number of dots at 23: 1
Number of dots at 24: 2
Number of dots at 25: 3
Questions