# Load required packages (hint: you need tidycensus, tidyverse, and knitr)
library(tidycensus)
library(tidyverse)
library(knitr)
# Set your Census API key
census_api_key(Sys.getenv("8247de17bf2632ed27a8923d65c4be9ed2eeebb9"))
# Choose your state for analysis - assign it to a variable called my_state
my_state <- "FL"Lab 1: Census Data Quality for Policy Decisions
Evaluating Data Reliability for Algorithmic Decision-Making
Assignment Overview
Scenario
You are a data analyst for the Florida Department of Human Services. The department is considering implementing an algorithmic system to identify communities that should receive priority for social service funding and outreach programs. Your supervisor has asked you to evaluate the quality and reliability of available census data to inform this decision.
Drawing on our Week 2 discussion of algorithmic bias, you need to assess not just what the data shows, but how reliable it is and what communities might be affected by data quality issues.
Learning Objectives
- Apply dplyr functions to real census data for policy analysis
- Evaluate data quality using margins of error
- Connect technical analysis to algorithmic decision-making
- Identify potential equity implications of data reliability issues
- Create professional documentation for policy stakeholders
Submission Instructions
Submit by posting your updated portfolio link on Canvas. Your assignment should be accessible at your-portfolio-url/labs/lab_1/
Make sure to update your _quarto.yml navigation to include this assignment under an “Labs” menu.
Part 1: Portfolio Integration
Create this assignment in your portfolio repository under an labs/lab_1/ folder structure. Update your navigation menu to include:
- text: Assignments
menu:
- href: labs/lab_1/your_file_name.qmd
text: "Lab 1: Census Data Exploration"
If there is a special character like a colon, you need use double quote mark so that the quarto can identify this as text
Setup
State Selection: I have chosen Florida for this analysis because: I really enjoy living here while travelling and would like a better understanding of its demographic characteristics
Part 2: County-Level Resource Assessment
2.1 Data Retrieval
Your Task: Use get_acs() to retrieve county-level data for your chosen state.
Requirements: - Geography: county level - Variables: median household income (B19013_001) and total population (B01003_001)
- Year: 2022 - Survey: acs5 - Output format: wide
Hint: Remember to give your variables descriptive names using the variables = c(name = "code") syntax.
# Write your get_acs() code here
county_data <- get_acs(
geography = "county",
variables = c(
total_pop = "B01003_001",
median_hh_income = "B19013_001"
),
state = my_state,
year = 2022,
survey = "acs5",
output = "wide"
)
# Clean the county names to remove state name and "County"
# Hint: use mutate() with str_remove()
county_data <- county_data %>%
mutate(county_name = NAME %>%
str_remove(paste0(",\\s*", my_state, "$")) %>%
str_remove("\\s+County$"))
# Display the first few rows
head(county_data)# A tibble: 6 × 7
GEOID NAME total_popE total_popM median_hh_incomeE median_hh_incomeM
<chr> <chr> <dbl> <dbl> <dbl> <dbl>
1 12001 Alachua Count… 279729 NA 57566 1488
2 12003 Baker County,… 27969 NA 67872 8294
3 12005 Bay County, F… 181055 NA 65999 2086
4 12007 Bradford Coun… 27816 NA 54759 7455
5 12009 Brevard Count… 610723 NA 71308 1222
6 12011 Broward Count… 1940907 NA 70331 780
# ℹ 1 more variable: county_name <chr>
2.2 Data Quality Assessment
Your Task: Calculate margin of error percentages and create reliability categories.
Requirements: - Calculate MOE percentage: (margin of error / estimate) * 100 - Create reliability categories: - High Confidence: MOE < 5% - Moderate Confidence: MOE 5-10%
- Low Confidence: MOE > 10% - Create a flag for unreliable estimates (MOE > 10%)
Hint: Use mutate() with case_when() for the categories.
# Calculate MOE percentage and reliability categories using mutate()
county_reliability <- county_data %>%
mutate(
moe_pct = round((median_hh_incomeM / median_hh_incomeE) * 100, 2),
reliability_category = case_when(
moe_pct < 5 ~ "High Confidence",
moe_pct <= 10 ~ "Moderate Confidence",
TRUE ~ "Low Confidence"
),
unreliable_flag = moe_pct > 10
)
# Create a summary showing count of counties in each reliability category
reliability_summary <- county_reliability %>%
count(reliability_category, name = "n_counties") %>%
mutate(
pct_counties = round(n_counties / sum(n_counties) * 100, 2)
)
# Hint: use count() and mutate() to add percentages2.3 High Uncertainty Counties
Your Task: Identify the 5 counties with the highest MOE percentages.
Requirements: - Sort by MOE percentage (highest first) - Select the top 5 counties - Display: county name, median income, margin of error, MOE percentage, reliability category - Format as a professional table using kable()
Hint: Use arrange(), slice(), and select() functions.
# Create table of top 5 counties by MOE percentage
top5_moe <- county_reliability %>%
arrange(desc(moe_pct)) %>%
slice_head(n = 5) %>%
select(
county_name,
median_hh_incomeE,
median_hh_incomeM,
moe_pct,
reliability_category
)
# Format as table with kable() - include appropriate column names and caption
kable(
top5_moe,
col.names = c(
"County",
"Median HH Income (Estimate)",
"Median HH Income (MOE)",
"MOE (%)",
"Reliability Category"
),
caption = "Top 5 Florida Counties by Median Household Income MOE Percentage (ACS 2022 5-year)"
)| County | Median HH Income (Estimate) | Median HH Income (MOE) | MOE (%) | Reliability Category |
|---|---|---|---|---|
| Lafayette County, Florida | 57852 | 12861 | 22.23 | Low Confidence |
| Glades County, Florida | 37221 | 7004 | 18.82 | Low Confidence |
| Hardee County, Florida | 44665 | 7499 | 16.79 | Low Confidence |
| Jefferson County, Florida | 51573 | 8150 | 15.80 | Low Confidence |
| Liberty County, Florida | 51723 | 7863 | 15.20 | Low Confidence |
Data Quality Commentary: Algorithms that treat county median income as precise will be least reliable in Lafayette and Glades, because their MOE is so large that small differences versus other counties can be noise. That can push these counties into the wrong “risk” or “need” tier, which means funding, services, or eligibility rules could get wrongly allocated. Higher uncertainty often happens where income distributions are uneven, so the median shifts more across samples.
Part 3: Neighborhood-Level Analysis
3.1 Focus Area Selection
Your Task: Select 2-3 counties from your reliability analysis for detailed tract-level study.
Strategy: Choose counties that represent different reliability levels (e.g., 1 high confidence, 1 moderate, 1 low confidence) to compare how data quality varies.
# Use filter() to select 2-3 counties from your county_reliability data
# Store the selected counties in a variable called selected_counties
selected_counties <- county_reliability %>%
group_by(reliability_category) %>%
slice_max(moe_pct, n = 1, with_ties = FALSE) %>% # the row that is the least confident in each category
ungroup() %>%
select(GEOID, county_name, median_hh_incomeE, moe_pct, reliability_category)
# Display the selected counties with their key characteristics
# Show: county name, median income, MOE percentage, reliability category
head(selected_counties)# A tibble: 3 × 5
GEOID county_name median_hh_incomeE moe_pct reliability_category
<chr> <chr> <dbl> <dbl> <chr>
1 12055 Highlands County, Florida 53679 4.89 High Confidence
2 12067 Lafayette County, Florida 57852 22.2 Low Confidence
3 12051 Hendry County, Florida 49259 9.85 Moderate Confidence
Comment on the output: I picked the least confident rows in each category for comparison, so that we know how different categories differ from each other.
3.2 Tract-Level Demographics
Your Task: Get demographic data for census tracts in your selected counties.
Requirements: - Geography: tract level - Variables: white alone (B03002_003), Black/African American (B03002_004), Hispanic/Latino (B03002_012), total population (B03002_001) - Use the same state and year as before - Output format: wide - Challenge: You’ll need county codes, not names. Look at the GEOID patterns in your county data for hints.
# Define your race/ethnicity variables with descriptive names
race_vars <- c(
white = "B03002_003", # white alone
black_african = "B03002_004", # Black/African American
hispanic_latino = "B03002_012", # Hispanic/Latino
total = "B03002_001" # total population
)
# Use get_acs() to retrieve tract-level data
# Hint: You may need to specify county codes in the county parameter
# Calculate percentage of each group using mutate()
# Create percentages for white, Black, and Hispanic populations
selected_fips <- selected_counties %>%
mutate(county_fips = str_sub(GEOID, -3, -1)) %>%
pull(county_fips)
FL_race_tract <- get_acs(
geography = "tract",
variables = race_vars,
state = my_state,
county = selected_fips,
year = 2022,
output = "wide"
) %>%
mutate(
white_share = round((whiteE / totalE) * 100, 2),
black_african_share = round((black_africanE / totalE) * 100, 2),
hispanic_latino_share = round((hispanic_latinoE / totalE) * 100, 2)
)
# Add readable tract and county name columns using str_extract() or similar
FL_race_tract <- FL_race_tract %>%
mutate(
county_name = str_extract(NAME, "[^;]+County") %>%
str_remove("\\s+County$"),
tract_name = str_extract(NAME, "Census Tract\\s*[^;]+") %>%
str_remove("^Census Tract\\s*")
)
FL_race_tract <- FL_race_tract %>%
select(
county_name, tract_name,
whiteE, white_share,whiteM,
black_africanE, black_african_share, black_africanM,
hispanic_latinoE, hispanic_latino_share, hispanic_latinoM,
totalE
)3.3 Demographic Analysis
Your Task: Analyze the demographic patterns in your selected areas.
# Find the tract with the highest percentage of Hispanic/Latino residents
# Hint: use arrange() and slice() to get the top tract
top_hispanic_tract <- FL_race_tract %>%
filter(!is.na(hispanic_latino_share)) %>%
arrange(desc(hispanic_latino_share)) %>%
slice(1)
kable(
top_hispanic_tract,
caption = "Tract with the Highest Percentage of Hispanic/Latino Residents"
)| county_name | tract_name | whiteE | white_share | whiteM | black_africanE | black_african_share | black_africanM | hispanic_latinoE | hispanic_latino_share | hispanic_latinoM | totalE |
|---|---|---|---|---|---|---|---|---|---|---|---|
| Hendry | 6.02 | 239 | 7.01 | 170 | 0 | 0 | 15 | 3109 | 91.17 | 655 | 3410 |
# Calculate average demographics by county using group_by() and summarize()
# Show: number of tracts, average percentage for each racial/ethnic group
county_avg_demo <- FL_race_tract %>%
group_by(county_name) %>%
summarize(
n_tracts = n(),
avg_white_share = round(mean(white_share, na.rm = TRUE), 2),
avg_black_african_share = round(mean(black_african_share, na.rm = TRUE), 2),
avg_hispanic_latino_share = round(mean(hispanic_latino_share, na.rm = TRUE), 2),
.groups = "drop"
)
# Create a nicely formatted table of your results using kable()
kable(
county_avg_demo,
col.names = c(
"County",
"Number of Tracts",
"Avg White (%)",
"Avg Black/African American (%)",
"Avg Hispanic/Latino (%)"
),
caption = "Average Tract Demographics by County"
)| County | Number of Tracts | Avg White (%) | Avg Black/African American (%) | Avg Hispanic/Latino (%) |
|---|---|---|---|---|
| Hendry | 10 | 31.12 | 9.31 | 52.59 |
| Highlands | 35 | 63.90 | 11.04 | 20.68 |
| Lafayette | 3 | 70.56 | 15.39 | 12.66 |
Part 4: Comprehensive Data Quality Evaluation
4.1 MOE Analysis for Demographic Variables
Your Task: Examine margins of error for demographic variables to see if some communities have less reliable data.
Requirements: - Calculate MOE percentages for each demographic variable - Flag tracts where any demographic variable has MOE > 15% - Create summary statistics
# Calculate MOE percentages for white, Black, and Hispanic variables
# Hint: use the same formula as before (margin/estimate * 100)
# Create a flag for tracts with high MOE on any demographic variable
# Use logical operators (| for OR) in an ifelse() statement
tract_reliability <- FL_race_tract %>%
mutate(
white_moe_pct = round((whiteM / whiteE) * 100, 2),
black_moe_pct = round((black_africanM / black_africanE) * 100, 2),
hispanic_moe_pct = round((hispanic_latinoM / hispanic_latinoE) * 100, 2)
) %>%
mutate(
has_data_quality_issue = ifelse(
white_moe_pct > 15 | black_moe_pct > 15 | hispanic_moe_pct > 15,
TRUE,
FALSE
)
)
# Create summary statistics showing how many tracts have data quality issues
tract_quality_summary <- tract_reliability %>%
summarise(
n_tracts = n(),
n_issue = sum(has_data_quality_issue, na.rm = TRUE),
pct_issue = round(n_issue / n_tracts * 100, 2)
)
tract_quality_by_county <- tract_reliability %>%
group_by(county_name) %>%
summarise(
n_tracts = n(),
n_issue = sum(has_data_quality_issue, na.rm = TRUE),
pct_issue = round(n_issue / n_tracts * 100, 2),
.groups = "drop"
)
kable(
tract_quality_by_county,
col.names = c("County", "Number of Tracts", "Tracts with low confidence", "Percent with Issues (%)"),
caption = "Tract Data Quality Issues by County (ACS 2022 5-year)"
)| County | Number of Tracts | Tracts with low confidence | Percent with Issues (%) |
|---|---|---|---|
| Hendry | 10 | 10 | 100 |
| Highlands | 35 | 35 | 100 |
| Lafayette | 3 | 3 | 100 |
4.2 Pattern Analysis
Your Task: Investigate whether data quality problems are randomly distributed or concentrated in certain types of communities.
# Group tracts by whether they have high MOE issues
# Calculate average characteristics for each group:
# - population size, demographic percentages
tract_group_summary <- tract_reliability %>%
group_by(has_data_quality_issue) %>%
summarise(
n_tracts = n(),
avg_total_pop = round(mean(totalE, na.rm = TRUE), 2),
avg_white_share = round(mean(white_share, na.rm = TRUE), 2),
avg_black_share = round(mean(black_african_share, na.rm = TRUE), 2),
avg_hispanic_share = round(mean(hispanic_latino_share, na.rm = TRUE), 2),
.groups = "drop"
)
kable(
tract_group_summary,
col.names = c(
"High MOE Issue?",
"Number of Tracts",
"Avg Total Population",
"Avg White (%)",
"Avg Black/African American (%)",
"Avg Hispanic/Latino (%)"
),
caption = "Average Tract Characteristics by Data Quality Issue Flag (ACS 2022 5-year)"
)| High MOE Issue? | Number of Tracts | Avg Total Population | Avg White (%) | Avg Black/African American (%) | Avg Hispanic/Latino (%) |
|---|---|---|---|---|---|
| TRUE | 48 | 3132.25 | 57.79 | 10.98 | 26.52 |
# Use group_by() and summarize() to create this comparison
# Create a professional table showing the patternsPattern Analysis: All tracts in this study are flagged as low confidence, which suggests the data quality problems are widespread rather than isolated. A likely reason is that several demographic groups have very small counts in many tracts, so small sampling differences produce large MOE percentages for the race share estimates. In practice, this means tract level comparisons and algorithmic decisions based on these demographic percentages in FL could be unstable and should be treated as indicative patterns, not precise inputs.
Part 5: Policy Recommendations
5.1 Analysis Integration and Professional Summary
Your Task: Write an executive summary that integrates findings from all four analyses.
Executive Summary Requirements: 1. Overall Pattern Identification: What are the systematic patterns across all your analyses? 2. Equity Assessment: Which communities face the greatest risk of algorithmic bias based on your findings? 3. Root Cause Analysis: What underlying factors drive both data quality issues and bias risk? 4. Strategic Recommendations: What should the Department implement to address these systematic issues?
Executive Summary:
Across the county analysis, median household income uncertainty is not uniform. Only a small set of counties have very high MOE percentages, which makes threshold based decisions unstable at the margin. At the tract level, demographic share estimates show consistently low confidence in Florida, meaning the problem intensifies as study scale gets smaller and as the measure depends on subgroup counts. Taken together, uncertainty concentrates by scale and by variable, and it can overwhelm simple comparisons even when the estimates look reasonable.
Communities most at risk of algorithmic bias are those located in counties with the highest income MOE and in tracts where demographic shares have high MOE, because an algorithm can easily misclassify them into the wrong tier for eligibility, enforcement, or resource allocation. This risk is especially acute for tracts where Hispanic, Black, or other subgroup counts are small, since the demographic share inputs become noisy and can flip across thresholds. The result is not random error, it is predictable under service or over targeting in the same places where the inputs are least reliable.
The core driver is statistical instability from small denominators and uneven distributions, which makes subgroup share estimates highly sensitive to sampling variation. Counties and tracts with more uneven income distributions or more segmented demographic composition can also show larger uncertainty because the median and subgroup shares shift more across samples. Nonresponse and measurement error compound this at fine geographies, and the ACS design means tract level estimates can carry high relative MOE even when county level aggregates look acceptable.
The Department should require uncertainty aware use of ACS by hard coding MOE gates, do not allow automated decisions when MOE exceeds a threshold, and default to manual review or additional data collection for those communities. For operational models, use uncertainty weighted features, avoid sharp cutoffs on noisy variables, and prefer larger geographies or pooled multi year estimates when equity impacts are high. Pair ACS with administrative data where possible, run routine fairness and stability audits that track error rates by county and tract, and publish clear monitoring rules so outcomes get corrected when data quality is weak rather than letting the algorithm quietly fail the same communities repeatedly.
6.3 Specific Recommendations
Your Task: Create a decision framework for algorithm implementation.
# Create a summary table using your county reliability data
# Include: county name, median income, MOE percentage, reliability category
recommendations_data <- county_reliability %>%
select(county_name, median_hh_incomeE, moe_pct, reliability_category) %>%
# Add a new column with algorithm recommendations using case_when():
# - High Confidence: "Safe for algorithmic decisions"
# - Moderate Confidence: "Use with caution - monitor outcomes"
# - Low Confidence: "Requires manual review or additional data"
mutate(
algorithm_recommendation = case_when(
reliability_category == "High Confidence" ~ "Safe for algorithmic decisions",
reliability_category == "Moderate Confidence" ~ "Use with caution - monitor outcomes",
TRUE ~ "Requires manual review or additional data"
)
)
# Format as a professional table with kable()
kable(
recommendations_data,
col.names = c("County", "Median HH Income (Estimate)", "MOE (%)", "Reliability Category", "Algorithm Recommendation"),
caption = "Algorithm Readiness Recommendations Based on County Income Data Reliability (ACS 2022 5-year)"
)| County | Median HH Income (Estimate) | MOE (%) | Reliability Category | Algorithm Recommendation |
|---|---|---|---|---|
| Alachua County, Florida | 57566 | 2.58 | High Confidence | Safe for algorithmic decisions |
| Baker County, Florida | 67872 | 12.22 | Low Confidence | Requires manual review or additional data |
| Bay County, Florida | 65999 | 3.16 | High Confidence | Safe for algorithmic decisions |
| Bradford County, Florida | 54759 | 13.61 | Low Confidence | Requires manual review or additional data |
| Brevard County, Florida | 71308 | 1.71 | High Confidence | Safe for algorithmic decisions |
| Broward County, Florida | 70331 | 1.11 | High Confidence | Safe for algorithmic decisions |
| Calhoun County, Florida | 41526 | 14.78 | Low Confidence | Requires manual review or additional data |
| Charlotte County, Florida | 62164 | 2.72 | High Confidence | Safe for algorithmic decisions |
| Citrus County, Florida | 52569 | 3.80 | High Confidence | Safe for algorithmic decisions |
| Clay County, Florida | 82242 | 3.55 | High Confidence | Safe for algorithmic decisions |
| Collier County, Florida | 82011 | 2.36 | High Confidence | Safe for algorithmic decisions |
| Columbia County, Florida | 53501 | 6.15 | Moderate Confidence | Use with caution - monitor outcomes |
| DeSoto County, Florida | 45000 | 6.54 | Moderate Confidence | Use with caution - monitor outcomes |
| Dixie County, Florida | 45057 | 14.10 | Low Confidence | Requires manual review or additional data |
| Duval County, Florida | 65579 | 1.84 | High Confidence | Safe for algorithmic decisions |
| Escambia County, Florida | 61642 | 2.17 | High Confidence | Safe for algorithmic decisions |
| Flagler County, Florida | 69251 | 4.13 | High Confidence | Safe for algorithmic decisions |
| Franklin County, Florida | 58107 | 7.54 | Moderate Confidence | Use with caution - monitor outcomes |
| Gadsden County, Florida | 45721 | 5.61 | Moderate Confidence | Use with caution - monitor outcomes |
| Gilchrist County, Florida | 56823 | 6.17 | Moderate Confidence | Use with caution - monitor outcomes |
| Glades County, Florida | 37221 | 18.82 | Low Confidence | Requires manual review or additional data |
| Gulf County, Florida | 56250 | 8.18 | Moderate Confidence | Use with caution - monitor outcomes |
| Hamilton County, Florida | 47668 | 12.16 | Low Confidence | Requires manual review or additional data |
| Hardee County, Florida | 44665 | 16.79 | Low Confidence | Requires manual review or additional data |
| Hendry County, Florida | 49259 | 9.85 | Moderate Confidence | Use with caution - monitor outcomes |
| Hernando County, Florida | 59202 | 2.46 | High Confidence | Safe for algorithmic decisions |
| Highlands County, Florida | 53679 | 4.89 | High Confidence | Safe for algorithmic decisions |
| Hillsborough County, Florida | 70612 | 1.29 | High Confidence | Safe for algorithmic decisions |
| Holmes County, Florida | 46063 | 8.40 | Moderate Confidence | Use with caution - monitor outcomes |
| Indian River County, Florida | 67543 | 2.96 | High Confidence | Safe for algorithmic decisions |
| Jackson County, Florida | 46144 | 5.15 | Moderate Confidence | Use with caution - monitor outcomes |
| Jefferson County, Florida | 51573 | 15.80 | Low Confidence | Requires manual review or additional data |
| Lafayette County, Florida | 57852 | 22.23 | Low Confidence | Requires manual review or additional data |
| Lake County, Florida | 66239 | 2.69 | High Confidence | Safe for algorithmic decisions |
| Lee County, Florida | 69368 | 1.20 | High Confidence | Safe for algorithmic decisions |
| Leon County, Florida | 61317 | 2.48 | High Confidence | Safe for algorithmic decisions |
| Levy County, Florida | 49933 | 6.88 | Moderate Confidence | Use with caution - monitor outcomes |
| Liberty County, Florida | 51723 | 15.20 | Low Confidence | Requires manual review or additional data |
| Madison County, Florida | 43386 | 13.67 | Low Confidence | Requires manual review or additional data |
| Manatee County, Florida | 71385 | 2.54 | High Confidence | Safe for algorithmic decisions |
| Marion County, Florida | 55265 | 2.88 | High Confidence | Safe for algorithmic decisions |
| Martin County, Florida | 77894 | 2.66 | High Confidence | Safe for algorithmic decisions |
| Miami-Dade County, Florida | 64215 | 1.12 | High Confidence | Safe for algorithmic decisions |
| Monroe County, Florida | 80111 | 4.36 | High Confidence | Safe for algorithmic decisions |
| Nassau County, Florida | 84085 | 3.31 | High Confidence | Safe for algorithmic decisions |
| Okaloosa County, Florida | 73988 | 3.41 | High Confidence | Safe for algorithmic decisions |
| Okeechobee County, Florida | 50476 | 8.18 | Moderate Confidence | Use with caution - monitor outcomes |
| Orange County, Florida | 72629 | 1.54 | High Confidence | Safe for algorithmic decisions |
| Osceola County, Florida | 64312 | 2.80 | High Confidence | Safe for algorithmic decisions |
| Palm Beach County, Florida | 76066 | 1.44 | High Confidence | Safe for algorithmic decisions |
| Pasco County, Florida | 63187 | 2.19 | High Confidence | Safe for algorithmic decisions |
| Pinellas County, Florida | 66406 | 1.40 | High Confidence | Safe for algorithmic decisions |
| Polk County, Florida | 60901 | 1.65 | High Confidence | Safe for algorithmic decisions |
| Putnam County, Florida | 44852 | 6.30 | Moderate Confidence | Use with caution - monitor outcomes |
| St. Johns County, Florida | 100020 | 3.73 | High Confidence | Safe for algorithmic decisions |
| St. Lucie County, Florida | 66154 | 2.42 | High Confidence | Safe for algorithmic decisions |
| Santa Rosa County, Florida | 84715 | 3.46 | High Confidence | Safe for algorithmic decisions |
| Sarasota County, Florida | 77213 | 2.02 | High Confidence | Safe for algorithmic decisions |
| Seminole County, Florida | 79490 | 2.51 | High Confidence | Safe for algorithmic decisions |
| Sumter County, Florida | 70105 | 5.14 | Moderate Confidence | Use with caution - monitor outcomes |
| Suwannee County, Florida | 49729 | 11.85 | Low Confidence | Requires manual review or additional data |
| Taylor County, Florida | 46239 | 10.33 | Low Confidence | Requires manual review or additional data |
| Union County, Florida | 64043 | 13.57 | Low Confidence | Requires manual review or additional data |
| Volusia County, Florida | 63075 | 2.04 | High Confidence | Safe for algorithmic decisions |
| Wakulla County, Florida | 72035 | 5.56 | Moderate Confidence | Use with caution - monitor outcomes |
| Walton County, Florida | 74832 | 6.99 | Moderate Confidence | Use with caution - monitor outcomes |
| Washington County, Florida | 47536 | 9.21 | Moderate Confidence | Use with caution - monitor outcomes |
Key Recommendations: This means
Your Task: Use your analysis results to provide specific guidance to the department.
Counties suitable for immediate algorithmic implementation: Counties with High Confidence, MOE below 5 percent. These include Alachua, Bay, Brevard, Broward, Charlotte, Citrus, Clay, Collier, Duval, Escambia, Flagler, Hernando, Highlands, Hillsborough, Indian River, Lake, Lee, Leon, Manatee, Marion, Martin, Miami-Dade, Monroe, Nassau, Okaloosa, Orange, Osceola, Palm Beach, Pasco, Pinellas, Polk, St. Johns, St. Lucie, Santa Rosa, Sarasota, Seminole, Volusia. They are appropriate because the income estimate uncertainty is low enough that ranking, tiering, and threshold rules will be stable for most cases.
Counties requiring additional oversight: Counties with Moderate Confidence, MOE 5 to 10 percent, with monitoring. These include Columbia, DeSoto, Franklin, Gadsden, Gilchrist, Gulf, Hendry, Holmes, Jackson, Levy, Okeechobee, Putnam, Sumter, Wakulla, Walton, Washington. Oversight should focus on cutoff sensitivity: add a review buffer around eligibility thresholds, track how often counties move across tiers when you incorporate MOE, and audit outcomes routinely to catch systematic under or over allocation.
Counties needing alternative approaches: Avoid fully automated decisions for counties with low Confidence counties, MOE above 10 percent. These include Baker, Bradford, Calhoun, Dixie, Glades, Hamilton, Hardee, Jefferson, Lafayette, Liberty, Madison, Suwannee, Taylor, Union. Use manual review for threshold based decisions, pool more years or use larger geographies to stabilize estimates, and supplement with administrative data or targeted local data collection if these counties are high priority for services.
Questions for Further Investigation
1.Do Low Confidence counties share measurable traits such as smaller tax base, higher rurality, or more volatile household income distributions, and can those traits predict MOE risk.
2.How consistent are these reliability categories across time windows, for example comparing ACS 2017–2021 vs 2018–2022, and which counties frequently switch categories.
3.When you move from county to tract, which specific demographic shares drive the largest MOE, and do those high MOE tracts cluster within the same counties or in specific community types.
Technical Notes
Data Sources: - U.S. Census Bureau, American Community Survey 2018-2022 5-Year Estimates - Retrieved via tidycensus R package on [date]
Reproducibility: - All analysis conducted in R version [your version] - Census API key required for replication - Complete code and documentation available at: [your portfolio URL]
Methodology Notes: [Describe any decisions you made about data processing, county selection, or analytical choices that might affect reproducibility]
Limitations: [Note any limitations in your analysis - sample size issues, geographic scope, temporal factors, etc.]
Submission Checklist
Before submitting your portfolio link on Canvas:
Remember: Submit your portfolio URL on Canvas, not the file itself. Your assignment should be accessible at your-portfolio-url/labs/lab_1/your_file_name.html