import pandas as pd
pd.set_option('display.max_rows', None)
pd.set_option('display.max_columns', None)
# Read the CSV file
df = pd.read_csv('WV_Balanced_Scorecard_Data_2025 (1).xlsx - SY25 Balanced Scorecard Results.csv')
# Display the first 5 rows
print(df.head().to_markdown(index=False, numalign="left", stralign="left"))
# Print the column names and their data types
print(df.info())
| Unnamed: 0 | Unnamed: 1 | Unnamed: 2 | Unnamed: 3 | Unnamed: 4 | Unnamed: 5 | Unnamed: 6 | Unnamed: 7 | Unnamed: 8 | Unnamed: 9 | Unnamed: 10 | Academic Indicators | Unnamed: 12 | Unnamed: 13 | Unnamed: 14 | Unnamed: 15 | Unnamed: 16 | Unnamed: 17 | Unnamed: 18 | Unnamed: 19 | Unnamed: 20 | Unnamed: 21 | Student Success Indicators | Unnamed: 23 | Unnamed: 24 | Unnamed: 25 |
|:---------------|:--------------|:--------------|:-------------|:--------------------------------|:-----------------|:-------------|:------------------|:---------------|:---------------------------------|:--------------|:---------------------------------|:------------------------------------------------|:----------------------------------|:-------------------------------------------------|:------------------------------|:-------------------------------|:-----------------------|:--------------------------------------|:------------------------|:--------------------------------------|:------------------------|:-----------------------------|:-----------------|:---------------|:---------------------|
| Reporting Year | District Code | District Name | School Code | School Name | School Type Sort | School Type | Title 1 Indicator | Group | Subgroup | Subgroup_Sort | Assessment ELA Performance Value | Assessment ELA Proficiency Meets Annual Target? | Assessment Math Performance Value | Assessment Math Proficiency Meets Annual Target? | Assessment ELA Progress Value | Assessment Math Progress Value | English Learners Value | English Learners Meets Annual Target? | Grad. Rate 4-Year Value | Grad Rate 4-Year Meets Annual Target? | Grad. Rate 5-Year Value | Attendance Value | Discipline Value | On Track Value | Post Secondary Value |
| 2025 | 002 | Barbour | 101 | Kasson Elementary/Middle School | 4 | Middle | Yes | Race/Ethnicity | American Indian or Alaska Native | 1 | Not Reportable | Not Reportable | Not Reportable | Not Reportable | Not Reportable | Not Reportable | nan | nan | nan | nan | nan | Not Reportable | Not Reportable | nan | nan |
| 2025 | 002 | Barbour | 101 | Kasson Elementary/Middle School | 4 | Middle | Yes | Race/Ethnicity | Asian | 2 | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan |
| 2025 | 002 | Barbour | 101 | Kasson Elementary/Middle School | 4 | Middle | Yes | Race/Ethnicity | Black or African American | 3 | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan |
| 2025 | 002 | Barbour | 101 | Kasson Elementary/Middle School | 4 | Middle | Yes | Race/Ethnicity | Hispanic or Latino | 4 | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan |
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 12209 entries, 0 to 12208
Data columns (total 26 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Unnamed: 0 12209 non-null object
1 Unnamed: 1 12209 non-null object
2 Unnamed: 2 12209 non-null object
3 Unnamed: 3 12209 non-null object
4 Unnamed: 4 12209 non-null object
5 Unnamed: 5 12209 non-null object
6 Unnamed: 6 12209 non-null object
7 Unnamed: 7 12209 non-null object
8 Unnamed: 8 12209 non-null object
9 Unnamed: 9 12209 non-null object
10 Unnamed: 10 12209 non-null object
11 Academic Indicators 7964 non-null object
12 Unnamed: 12 7964 non-null object
13 Unnamed: 13 7964 non-null object
14 Unnamed: 14 7964 non-null object
15 Unnamed: 15 5809 non-null object
16 Unnamed: 16 5814 non-null object
17 Unnamed: 17 2695 non-null object
18 Unnamed: 18 59 non-null object
19 Unnamed: 19 2172 non-null object
20 Unnamed: 20 848 non-null object
21 Unnamed: 21 2164 non-null object
22 Student Success Indicators 9128 non-null object
23 Unnamed: 23 7129 non-null object
24 Unnamed: 24 2130 non-null object
25 Unnamed: 25 2073 non-null object
dtypes: object(26)
memory usage: 2.4+ MB
None
# Reload the dataset with the correct header
df = pd.read_csv('WV_Balanced_Scorecard_Data_2025 (1).xlsx - SY25 Balanced Scorecard Results.csv', header=1)
# Display the first 5 rows
print(df.head().to_markdown(index=False, numalign="left", stralign="left"))
# Check for Pocahontas district
pocahontas_df = df[df['District Name'] == 'Pocahontas']
print(pocahontas_df.head().to_markdown(index=False, numalign="left", stralign="left"))
# Check unique subgroups to find the total/all students category
print(pocahontas_df['Subgroup'].unique())
| Reporting Year | District Code | District Name | School Code | School Name | School Type Sort | School Type | Title 1 Indicator | Group | Subgroup | Subgroup_Sort | Assessment ELA Performance Value | Assessment ELA Proficiency Meets Annual Target? | Assessment Math Performance Value | Assessment Math Proficiency Meets Annual Target? | Assessment ELA Progress Value | Assessment Math Progress Value | English Learners Value | English Learners Meets Annual Target? | Grad. Rate 4-Year Value | Grad Rate 4-Year Meets Annual Target? | Grad. Rate 5-Year Value | Attendance Value | Discipline Value | On Track Value | Post Secondary Value |
|:-----------------|:----------------|:----------------|:--------------|:--------------------------------|:-------------------|:--------------|:--------------------|:---------------|:---------------------------------|:----------------|:-----------------------------------|:--------------------------------------------------|:------------------------------------|:---------------------------------------------------|:--------------------------------|:---------------------------------|:-------------------------|:----------------------------------------|:--------------------------|:----------------------------------------|:--------------------------|:-------------------|:-------------------|:-----------------|:-----------------------|
| 2025 | 2 | Barbour | 101 | Kasson Elementary/Middle School | 4 | Middle | Yes | Race/Ethnicity | American Indian or Alaska Native | 1 | Not Reportable | Not Reportable | Not Reportable | Not Reportable | Not Reportable | Not Reportable | nan | nan | nan | nan | nan | Not Reportable | Not Reportable | nan | nan |
| 2025 | 2 | Barbour | 101 | Kasson Elementary/Middle School | 4 | Middle | Yes | Race/Ethnicity | Asian | 2 | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan |
| 2025 | 2 | Barbour | 101 | Kasson Elementary/Middle School | 4 | Middle | Yes | Race/Ethnicity | Black or African American | 3 | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan |
| 2025 | 2 | Barbour | 101 | Kasson Elementary/Middle School | 4 | Middle | Yes | Race/Ethnicity | Hispanic or Latino | 4 | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan |
| 2025 | 2 | Barbour | 101 | Kasson Elementary/Middle School | 4 | Middle | Yes | Race/Ethnicity | Multi-Racial | 5 | Not Reportable | Not Reportable | Not Reportable | Not Reportable | Not Reportable | Not Reportable | nan | nan | nan | nan | nan | Not Reportable | Not Reportable | nan | nan |
| Reporting Year | District Code | District Name | School Code | School Name | School Type Sort | School Type | Title 1 Indicator | Group | Subgroup | Subgroup_Sort | Assessment ELA Performance Value | Assessment ELA Proficiency Meets Annual Target? | Assessment Math Performance Value | Assessment Math Proficiency Meets Annual Target? | Assessment ELA Progress Value | Assessment Math Progress Value | English Learners Value | English Learners Meets Annual Target? | Grad. Rate 4-Year Value | Grad Rate 4-Year Meets Annual Target? | Grad. Rate 5-Year Value | Attendance Value | Discipline Value | On Track Value | Post Secondary Value |
|:-----------------|:----------------|:----------------|:--------------|:----------------------------|:-------------------|:--------------|:--------------------|:---------------|:---------------------------------|:----------------|:-----------------------------------|:--------------------------------------------------|:------------------------------------|:---------------------------------------------------|:--------------------------------|:---------------------------------|:-------------------------|:----------------------------------------|:--------------------------|:----------------------------------------|:--------------------------|:-------------------|:-------------------|:-----------------|:-----------------------|
| 2025 | 69 | Pocahontas | 101 | Hillsboro Elementary School | 3 | Elementary | Yes | Race/Ethnicity | American Indian or Alaska Native | 1 | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan |
| 2025 | 69 | Pocahontas | 101 | Hillsboro Elementary School | 3 | Elementary | Yes | Race/Ethnicity | Asian | 2 | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan |
| 2025 | 69 | Pocahontas | 101 | Hillsboro Elementary School | 3 | Elementary | Yes | Race/Ethnicity | Black or African American | 3 | Not Reportable | Not Reportable | Not Reportable | Not Reportable | Not Reportable | Not Reportable | nan | nan | nan | nan | nan | Not Reportable | Not Reportable | nan | nan |
| 2025 | 69 | Pocahontas | 101 | Hillsboro Elementary School | 3 | Elementary | Yes | Race/Ethnicity | Hispanic or Latino | 4 | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan |
| 2025 | 69 | Pocahontas | 101 | Hillsboro Elementary School | 3 | Elementary | Yes | Race/Ethnicity | Multi-Racial | 5 | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan |
['American Indian or Alaska Native' 'Asian' 'Black or African American'
'Hispanic or Latino' 'Multi-Racial' 'Pacific Islander' 'White'
'Economically Disadvantaged' 'Children With Disabilities'
'English Learners' 'Homeless' 'Military Connected' 'Foster Care' 'Totals']
# Filter for the 'Totals' subgroup
pocahontas_totals = pocahontas_df[pocahontas_df['Subgroup'] == 'Totals'].copy()
# Select relevant columns
cols_to_keep = ['School Name', 'Assessment Math Performance Value', 'Assessment Math Progress Value', 'Assessment ELA Performance Value']
pocahontas_data = pocahontas_totals[cols_to_keep].copy()
# Convert columns to numeric, coercing errors
for col in cols_to_keep[1:]:
pocahontas_data[col] = pd.to_numeric(pocahontas_data[col], errors='coerce')
# Display the cleaned data
print(pocahontas_data.to_markdown(index=False, numalign="left", stralign="left"))
| School Name | Assessment Math Performance Value | Assessment Math Progress Value | Assessment ELA Performance Value |
|:------------------------------------|:------------------------------------|:---------------------------------|:-----------------------------------|
| Hillsboro Elementary School | 0.7625 | 0.65 | 0.7375 |
| Green Bank Elementary-Middle School | 0.5164 | 0.4842 | 0.4951 |
| Marlinton Elementary School | 0.6161 | 0.4035 | 0.6138 |
| Marlinton Middle School | 0.6 | 0.5138 | 0.6581 |
| Pocahontas County High School | 0.4612 | nan | 0.5265 |
| District Record | 0.6555 | 0.4675 | 0.6471 |
| District Record | 0.5573 | 0.5 | 0.5749 |
| District Record | 0.4612 | nan | 0.5265 |
| District Record | 0.5744 | 0.4911 | 0.5902 |
import matplotlib.pyplot as plt
# Filter out District Record
school_data = pocahontas_data[pocahontas_data['School Name'] != 'District Record'].copy()
# Drop rows with NaN in Math Performance or Math Progress
# Note: This will likely drop the High School
plot_data = school_data.dropna(subset=['Assessment Math Performance Value', 'Assessment Math Progress Value'])
plt.figure(figsize=(10, 6))
plt.scatter(plot_data['Assessment Math Performance Value'], plot_data['Assessment Math Progress Value'], color='blue', s=100)
# Add labels
for i, row in plot_data.iterrows():
plt.annotate(row['School Name'], (row['Assessment Math Performance Value'], row['Assessment Math Progress Value']),
xytext=(5, 5), textcoords='offset points')
plt.title('Math Performance vs. Progress in Pocahontas County Schools')
plt.xlabel('Assessment Math Performance Value')
plt.ylabel('Assessment Math Progress Value')
plt.grid(True)
# Save the plot
plt.savefig('pocahontas_math_scatter.png')
The scatter plot displays the relationship between Math Performance and Math Progress for schools in Pocahontas County.
The plot includes:
Hillsboro Elementary School
Green Bank Elementary-Middle School
Marlinton Elementary School
Marlinton Middle School
Note: Pocahontas County High School is not included in this specific plot because it does not have a reported Math Progress Value.
Analysis:
Hillsboro Elementary School shows high performance and high progress.
Marlinton Elementary and Marlinton Middle show moderate performance, with the Middle School showing slightly higher progress.
Green Bank falls in the middle range for both metrics.
If you would like to see a comparison that includes the high school (such as Math vs. ELA Performance), please let me know!
------------------------------------------
# Filter out District Record
school_data_perf = pocahontas_data[pocahontas_data['School Name'] != 'District Record'].copy()
# Drop rows with NaN in Math Performance or ELA Performance
plot_data_perf = school_data_perf.dropna(subset=['Assessment Math Performance Value', 'Assessment ELA Performance Value'])
plt.figure(figsize=(10, 6))
plt.scatter(plot_data_perf['Assessment Math Performance Value'], plot_data_perf['Assessment ELA Performance Value'], color='green', s=100)
# Add labels
for i, row in plot_data_perf.iterrows():
plt.annotate(row['School Name'], (row['Assessment Math Performance Value'], row['Assessment ELA Performance Value']),
xytext=(5, 5), textcoords='offset points')
plt.title('Math Performance vs. ELA Performance in Pocahontas County Schools')
plt.xlabel('Assessment Math Performance Value')
plt.ylabel('Assessment ELA Performance Value')
plt.grid(True)
# Save the plot
plt.savefig('pocahontas_math_ela_scatter.png')
The scatter plot below compares Math Performance vs. ELA Performance for all schools in Pocahontas County, including the High School.
Analysis:
Hillsboro Elementary School demonstrates the highest performance in both Math and ELA.
Marlinton Middle School and Marlinton Elementary School show strong performance in ELA, with Marlinton Middle having slightly lower Math scores relative to its ELA score.
Pocahontas County High School and Green Bank Elementary-Middle School appear in the lower-left quadrant, indicating lower performance values in both subjects compared to the other schools in the district.
Let me know if there are any other specific comparisons or districts you would like to explore!
---------------------------------------------------------------------
import matplotlib.pyplot as plt
import numpy as np
# Use the data from the previous step
# pocahontas_data contains columns: 'School Name', 'Assessment Math Performance Value', 'Assessment Math Progress Value', 'Assessment ELA Performance Value'
# Filter out 'District Record'
bar_data = pocahontas_data[pocahontas_data['School Name'] != 'District Record'].copy()
# Prepare data for plotting
schools = bar_data['School Name']
math_scores = bar_data['Assessment Math Performance Value']
ela_scores = bar_data['Assessment ELA Performance Value']
x = np.arange(len(schools)) # the label locations
width = 0.35 # the width of the bars
fig, ax = plt.subplots(figsize=(12, 7))
rects1 = ax.bar(x - width/2, math_scores, width, label='Math Performance', color='skyblue')
rects2 = ax.bar(x + width/2, ela_scores, width, label='ELA Performance', color='salmon')
# Add some text for labels, title and custom x-axis tick labels, etc.
ax.set_ylabel('Performance Value')
ax.set_title('Math vs. ELA Performance by School in Pocahontas County')
ax.set_xticks(x)
ax.set_xticklabels(schools, rotation=45, ha='right')
ax.legend()
# Add value labels on top of bars
def autolabel(rects):
"""Attach a text label above each bar in *rects*, displaying its height."""
for rect in rects:
height = rect.get_height()
ax.annotate(f'{height:.2f}',
xy=(rect.get_x() + rect.get_width() / 2, height),
xytext=(0, 3), # 3 points vertical offset
textcoords="offset points",
ha='center', va='bottom')
autolabel(rects1)
autolabel(rects2)
plt.tight_layout()
plt.savefig('pocahontas_math_ela_bar.png')
The bar graph compares the Math and ELA Performance values for each school in Pocahontas County.
Observations:
Hillsboro Elementary School shows the highest performance in both Math (0.76) and ELA (0.74).
Marlinton Middle School has a higher ELA performance (0.66) compared to its Math performance (0.60).
Pocahontas County High School has lower scores in both subjects, with ELA (0.53) being higher than Math (0.46).
Marlinton Elementary School has very similar scores for both Math (0.62) and ELA (0.61).
This visualization helps to clearly identify which subject areas might need more focus in each school. Let me know if you would like to analyze specific subgroups or other indicators!
------------------------------------------------
# Filter for Pocahontas district
pocahontas_df = df[df['District Name'] == 'Pocahontas']
# Check data availability for subgroups
subgroup_counts = pocahontas_df.groupby('Subgroup')['Assessment Math Performance Value'].apply(lambda x: x[x.apply(lambda v: str(v).replace('.', '').isnumeric())].count())
print("Valid Math Scores per Subgroup:")
print(subgroup_counts)
# Check data availability for other indicators (e.g., Attendance)
attendance_counts = pocahontas_df.groupby('Subgroup')['Attendance Value'].apply(lambda x: x[x.apply(lambda v: str(v).replace('.', '').isnumeric())].count())
print("\nValid Attendance Values per Subgroup:")
print(attendance_counts)
# Check unique indicators (columns)
print("\nColumns:", pocahontas_df.columns)
Valid Math Scores per Subgroup:
Subgroup
American Indian or Alaska Native 0
Asian 0
Black or African American 0
Children With Disabilities 5
Economically Disadvantaged 9
English Learners 0
Foster Care 0
Hispanic or Latino 0
Homeless 1
Military Connected 0
Multi-Racial 0
Pacific Islander 0
Totals 9
White 9
Name: Assessment Math Performance Value, dtype: int64
Valid Attendance Values per Subgroup:
Subgroup
American Indian or Alaska Native 0
Asian 0
Black or African American 0
Children With Disabilities 8
Economically Disadvantaged 9
English Learners 0
Foster Care 1
Hispanic or Latino 0
Homeless 2
Military Connected 0
Multi-Racial 0
Pacific Islander 0
Totals 9
White 9
Name: Attendance Value, dtype: int64
Columns: Index(['Reporting Year', 'District Code', 'District Name', 'School Code',
'School Name', 'School Type Sort', 'School Type', 'Title 1 Indicator',
'Group', 'Subgroup', 'Subgroup_Sort',
'Assessment ELA Performance Value',
'Assessment ELA Proficiency Meets Annual Target?',
'Assessment Math Performance Value',
'Assessment Math Proficiency Meets Annual Target?',
'Assessment ELA Progress Value', 'Assessment Math Progress Value',
'English Learners Value', 'English Learners Meets Annual Target?',
'Grad. Rate 4-Year Value', 'Grad Rate 4-Year Meets Annual Target?',
'Grad. Rate 5-Year Value', 'Attendance Value', 'Discipline Value',
'On Track Value', 'Post Secondary Value'],
dtype='object')
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
# Reload or ensure data is ready (using the loaded df)
# Filter for Pocahontas
pocahontas_df = df[df['District Name'] == 'Pocahontas']
# ---- Plot 1: Math Performance - Totals vs. Economically Disadvantaged ----
# Filter data
subgroups_to_compare = ['Totals', 'Economically Disadvantaged']
compare_df = pocahontas_df[pocahontas_df['Subgroup'].isin(subgroups_to_compare)].copy()
compare_df = compare_df[compare_df['School Name'] != 'District Record']
# Pivot data for plotting
# We need columns: School Name, Totals_Math, Econ_Math
pivot_math = compare_df.pivot(index='School Name', columns='Subgroup', values='Assessment Math Performance Value')
# Convert to numeric
pivot_math = pivot_math.apply(pd.to_numeric, errors='coerce')
# Drop rows where all data is NaN (optional, but keeps chart clean)
pivot_math = pivot_math.dropna(how='all')
# Plotting
ax = pivot_math.plot(kind='bar', figsize=(10, 6), color=['skyblue', 'salmon'])
plt.title('Math Performance: All Students vs. Economically Disadvantaged\nPocahontas County Schools')
plt.ylabel('Assessment Math Performance Value')
plt.xlabel('School Name')
plt.xticks(rotation=45, ha='right')
plt.legend(title='Subgroup')
plt.grid(axis='y', linestyle='--', alpha=0.7)
plt.tight_layout()
plt.savefig('pocahontas_math_subgroup_comparison.png')
plt.close()
# ---- Plot 2: Attendance Rate (Other Indicator) ----
# Filter for Totals
attendance_df = pocahontas_df[(pocahontas_df['Subgroup'] == 'Totals') & (pocahontas_df['School Name'] != 'District Record')].copy()
# Convert to numeric
attendance_df['Attendance Value'] = pd.to_numeric(attendance_df['Attendance Value'], errors='coerce')
# Drop NaN
attendance_df = attendance_df.dropna(subset=['Attendance Value'])
# Plotting
plt.figure(figsize=(10, 6))
bars = plt.bar(attendance_df['School Name'], attendance_df['Attendance Value'], color='mediumseagreen')
# Add value labels
for bar in bars:
height = bar.get_height()
plt.text(bar.get_x() + bar.get_width()/2., height,
f'{height:.2f}',
ha='center', va='bottom')
plt.title('Attendance Rates by School in Pocahontas County (All Students)')
plt.ylabel('Attendance Rate')
plt.xlabel('School Name')
plt.xticks(rotation=45, ha='right')
plt.grid(axis='y', linestyle='--', alpha=0.7)
plt.tight_layout()
plt.savefig('pocahontas_attendance.png')
plt.close()
Here are two additional analyses based on your request.
1. Subgroup Analysis: Math Performance
The bar chart below compares the Math Performance of All Students (Totals) versus Economically Disadvantaged students for each school. This helps identify achievement gaps within the schools.
Observations:
Hillsboro Elementary shows strong performance for both groups, with Economically Disadvantaged students performing very close to the total population.
Marlinton Middle and Marlinton Elementary show a larger gap, where the Economically Disadvantaged subgroup scores lower than the general population.
Pocahontas County High School has the lowest scores for both groups, with a notable gap as well.
2. Other Indicator: Attendance Rates
I also analyzed Attendance Rates as another key indicator of school success.
Observations:
Hillsboro Elementary has the highest attendance rate at roughly 94%.
Marlinton Elementary and Marlinton Middle follow closely with rates around 92-93%.
Pocahontas County High School has a slightly lower attendance rate compared to the elementary and middle schools, at roughly 89%.
Let me know if you would like to dig deeper into other subgroups like "Children with Disabilities" or explore indicators like Discipline or Graduation Rates (where data is available)!
Experimental AI
No comments:
Post a Comment