Why Statistics is Not Data Science Chris Malone | Tisha Hooks
|
What elements of Task #1 are |
| Statistical in nature | Data Science in nature |
|
0: | > Matching race levels in such a way to minimize impact on analysis, i.e. reduction of bias, etc.
> Obtain summaries for SQF and Census data to compute a discrepancy measure
| > Retrieve data
> Incorporate precinct information (boundaries via shape files) into census data
> Create necessary variables so that a diversity measure can be computed
|
|
|
|
2: | Understanding the question
Define & decide what to calculate | Understanding the question
Merge, manipulate, and prepare data
Calculations, advanced visualizations |
|
|
3: | Create graphs of potential bias variables, segmented (colored?) by categories.
Find measures of center, spread for said potential bias variables.
Create two-way tables of potential bias variables
| Examining and addressing missing data.
Dealing with scale. How many individuals are unique?
|
|
|
|
5: | Definitions of bias, generalization to other cities/not (time), conditional cross-tabs, plots, consideration of age and reasons for stops as a confounding | Aggregation by zip code, file merging (matching algorithm for race) |
|
|
|
7: | comparing the percent of a race in those stopped to the percent of that race in that precinct, taking uncertainty into account | mapping blocks to precincts
mapping different race definitions to each other
merging datasets |
|
|
8: | Descriptive stats, and maybe inference.
Inference/modeling | web scraping? (if needed)
wrangling, merging categories
Map to see the data |
|
|
9: | understand question of interest
understand variables
making linkages across data
summary stats and walk through
analysis to answer question (multiple ways/models preferable) | cleaning data
understand data structures
executing data set linkage
|
|
|
|
11: | Agree on measure of bias.
Align racial defns.
Determine confounders for which to adjust.
Model development/analysis strategy.
Data Viz.
Talking with police/census for data clarity. | link precincts and neighborhoods.
Align racial defns.
Merge data.
Obtain data on confounders.
Fitting model chosen.
Data Viz.
Talking with police/census for data clarity. |
|
|
12: | Merging of categories (census)
Merge data sets
Goodness of fit test by precinct
Pseudo-measure bias looking at max test stat | Merging of categories (census)
Merge data sets |
|
|
13: | chi-square test and computing percentages | data wrangling (consolidating race, merge census and frisk data at neighborhood level) |
|
|
|