Search This Blog

test item analysis

 test item analysis "west virginia" schools

In West Virginia, test item analysis is a critical process used by educators and administrators to evaluate the effectiveness of assessment questions and to gain deep insights into student mastery of the West Virginia College- and Career-Readiness Standards.

This analysis primarily revolves around the West Virginia General Summative Assessment (WVGSA) for grades 3–8 and the SAT School Day for grade 11, facilitated through the state's Centralized Reporting System (CRS).


1. Primary Platforms for Item Analysis

The West Virginia Department of Education (WVDE) provides several tools for analyzing test items at different levels:

The Centralized Reporting System (CRS)

The CRS is the "live" hub for school and district leaders. It allows users to "drill down" from district-level data to individual student responses.

  • Item-Level Data: Teachers can see how a group of students performed on a specific item compared to the state average.

  • Distractor Analysis: In multiple-choice questions, the system identifies which "distractor" (incorrect answer) students chose most often, helping teachers identify specific misconceptions.

  • Performance by Claim/Target: Analysis is grouped by specific content "claims" (e.g., "Concepts and Procedures" in Math) to show where the curriculum may have gaps.

ZoomWV (The Data Dashboard)

While the CRS is for internal use, ZoomWV provides public-facing longitudinal data. It is used to analyze trends across years and demographics, which helps in identifying if certain types of test items are consistently challenging for specific subgroups.


2. Key Metrics Used in WV Item Analysis

When reviewing technical manuals or CRS reports, West Virginia educators typically focus on two primary statistical measures:

MetricDefinitionInterpretation in WV Schools
P-ValueThe proportion of students who answered the item correctly.A low p-value (e.g., 0.30) suggests an item was "difficult" or the material hasn't been adequately covered in class.
Point BiserialA correlation that measures how well an item discriminates between high- and low-performing students.If high-performing students miss a question that low-performing students get right, the item may be flawed or confusing.
Omit RateThe percentage of students who skipped the item.High omit rates often indicate the question was too complex, the wording was confusing, or students ran out of time.

3. Non-Summative & Formative Analysis

To avoid waiting for end-of-year results, West Virginia utilizes Interim Assessments and Modulars:

  • Interim Comprehensive Assessments (ICAs): Mirror the WVGSA and provide a full "dry run" with the same item analysis tools.

  • Interim Assessment Blocks (IABs): Focused on specific sets of standards (e.g., Fractions). These are often used for "immediate item analysis" where a teacher can review results the next day to adjust a lesson plan.

  • Test Authoring Tool: Allows educators to build custom tests using a bank of items previously vetted for the WVGSA, ensuring the "item difficulty" matches state expectations.


4. Technical Quality & Bias Review

Before an item ever reaches a student's screen in West Virginia, it undergoes a rigorous item analysis review by committees of West Virginia educators. They look for:

  • Depth of Knowledge (DOK) Alignment: Ensuring the item asks students to think at the level required by the standard.

  • Universal Design: Ensuring items are accessible to students with disabilities and English Language Learners.

  • Bias and Sensitivity: Removing any content that might give an advantage to students based on their geographic location (e.g., urban vs. rural) or socioeconomic status.

Note for Administrators: The WVGSA Annual Technical Reports (Volume 1) provide the statewide item analysis data, including IRT (Item Response Theory) fit statistics that validate the reliability of each year's test.


Would you like me to create a template for a "Data Meeting" agenda that schools can use to walk teachers through their CRS item analysis reports?

This template is designed for a 45–60 minute collaborative data session. It focuses on moving from the "big picture" of the Centralized Reporting System (CRS) down to the specific instructional changes needed based on item-level performance.


Agenda: CRS Item Analysis Data Meeting

Goal: To identify specific instructional gaps and student misconceptions by analyzing performance on the West Virginia General Summative Assessment (WVGSA) or Interim reports.

I. The "Bird’s Eye View" (10 Minutes)

  • Log-in & Navigation: Ensure all team members are logged into the WV Portal and have accessed the Centralized Reporting System (CRS).

  • Overall Performance: Review the "Dashboard" for your specific grade level/subject.

    • Question: How does our overall proficiency compare to our school/district goals?

    • Question: Which Claims (e.g., Reading, Writing, Math Concepts) are our strongest and weakest areas?

II. Deep Dive: Item-Level Analysis (20 Minutes)

Open the "Item Detail" view for a specific Interim Assessment Block (IAB) or the most recent Summative data available.

  • The "Low-Hanging Fruit": Identify items with a high P-Value (high success rate).

    • Action: Briefly celebrate success. What did we do well in teaching these standards?

  • The "Gap" Items: Identify 3–5 items where the P-Value was significantly lower than expected.

    • Action: Click on the item to view the Target and Standard.

  • Distractor Analysis: For the "Gap" items, look at the distribution of student responses.

    • Question: Did students cluster around one specific incorrect answer (Distractor)?

    • Insight: What does this specific wrong answer tell us about their misconception? (e.g., "They added the denominators instead of finding a common one.")

III. Instructional Implications (20 Minutes)

This is the most critical part of the meeting: translating data into action.

FindingRoot CauseInstructional Response
Example: 60% of students chose Distractor B on Item #4.Students confused the perimeter formula with area.Reteach using manipulatives/square tiles next Tuesday; add "Perimeter vs. Area" to daily warm-ups.






  • Group Grouping: Based on the individual student reports, which students need Tier 2 intervention on this specific standard?

  • Resource Alignment: Do our current textbooks/materials cover this standard at the required Depth of Knowledge (DOK) level? If not, what supplemental WVDE resources (like the Metadata or Educator Resources) can we use?

IV. Next Steps & Accountability (10 Minutes)

  • The "One Thing": Each teacher commits to one specific instructional change to be implemented in the next 5 days.

  • Follow-up: Set the date for the next "Progress Check" (usually after the next IAB or modular assessment).


Pro-Tip for Administrators: Encourage teachers to use the "View Related Items" feature in the CRS. This allows them to see other questions that test the same standard, ensuring they aren't just "teaching to one question" but are teaching the entire standard.

Would you like me to create a "Reflection Sheet" for teachers to fill out during this meeting to document their findings?

OK. I've created the "CRS Item Analysis Reflection Sheet" note for you. It includes sections for overall observations, specific item analysis (including P-values and misconceptions), and an action plan for instructional changes and interventions.

Google Keep CRS Item Analysis Reflection SheetCRS Item Analysis Reflection Sheet

Teacher: _______________ Grade/Subject: _______________

Date: _______________

I. Overall Observations

Strongest Claim/Target: ______________________________

Areas for Growth: ______________________________

II. Specific Item Analysis

Item 1:

Standard/Target: ______________________________

P-Value: _______________

Common Misconception (Distractor Analysis): ______________________________

Item 2:

Standard/Target: ______________________________

P-Value: _______________

Common Misconception (Distractor Analysis): ______________________________

III. Action Plan

Immediate Instructional Change: ______________________________

Students for Tier 2 Intervention: ______________________________

Resources Needed: ______________________________

  

Technical and Pedagogical Analysis of Assessment Item Methodologies in West Virginia Public Education

Executive Overview of the West Virginia Assessment Landscape

The infrastructure of student assessment within the West Virginia public education system represents a sophisticated convergence of psychometric rigor, legislative mandates, and instructional accountability. Governed by the West Virginia Board of Education (WVBE) under Policy 2340, the state’s assessment system—collectively known as the West Virginia Measures of Academic Progress (WVMAP)—is designed not merely to rank students or schools, but to provide granular, item-level data that informs the pedagogical cycle. This report provides an exhaustive technical analysis of the methodologies employed for test item analysis across the state's diverse assessment portfolio, spanning the West Virginia General Summative Assessment (WVGSA) for grades 3-8, the SAT School Day for grade 11, and the distinct alternate assessment frameworks for special populations.  

The analysis of test items in West Virginia is a bifurcated process. On the technical side, it involves deep statistical scrutiny conducted by vendors such as Cambium Assessment, Inc. (CAI) and the College Board, utilizing Classical Test Theory (CTT) and Item Response Theory (IRT) to ensure validity and reliability. On the instructional side, it involves the dissemination of this data through complex digital ecosystems like the Cambium Reporting System (CRS) and ZoomWV, empowering educators to diagnose learning gaps at the level of specific content standards and targets. The transition from legacy assessments (such as WESTEST2) to the current computer-adaptive models has fundamentally altered the nature of item analysis, shifting the focus from static percentage-correct metrics to complex, multidimensional growth models.  

This document examines the full lifecycle of assessment data in West Virginia, from the initial blueprinting and item writing to the post-administration statistical flagging and final classroom application. It further explores the professional development structures, such as WV PEAKS, that are essential for translating these psychometric outputs into actionable educational strategies.

Psychometric Foundations and Statistical Methodologies

The validity of any standardized assessment rests entirely on the quality of its constituent items. In West Virginia, the "item analysis" process is primarily a quality control mechanism used to accept, reject, or revise test questions based on their statistical performance during field testing and operational use. The West Virginia Department of Education (WVDE) Technical Reports outline a rigorous set of statistical thresholds used to evaluate item performance.

Classical Item Analysis: Difficulty and Discrimination

Despite the modern reliance on Item Response Theory (IRT) for scoring, West Virginia continues to utilize Classical Item Analysis as a primary screening tool to flag problematic items. This methodology evaluates items based on two fundamental properties: difficulty (p-value) and discrimination (biserial correlation).

Item Difficulty (p-value): The p-value represents the proportion of examinees who answer an item correctly. It is the most direct measure of an item's difficulty. In the development of the WVGSA, the state aims for a distribution of difficulty that matches the ability distribution of the student population. However, items at the extremes—those that are universally passed or universally failed—provide little information about a student’s relative standing and are thus subject to statistical flagging.

According to the WVGSA Technical Report, specific thresholds are applied to flag items for review:

  • Multiple-Choice (MC) Items: An MC item is flagged if the p-value is less than 0.15 or greater than 0.90. An item with a p-value below 0.15 is considered extremely difficult, potentially indicating that the content is above grade level, the question is ambiguously worded, or the correct answer key is flawed. Conversely, a p-value above 0.90 suggests the item is trivial and fails to distinguish between varying levels of proficiency.  

  • Non-Multiple-Choice Items: For technology-enhanced items (TEIs) or constructed-response tasks, the flagging criteria are slightly adjusted. These items are flagged if the relative mean score is less than 0.10 or greater than 0.95.  

Item Discrimination (Point Biserial Correlation): While difficulty tells us how many students got an item right, discrimination tells us who got it right. The point biserial correlation () measures the relationship between a student's performance on a specific item and their total score on the test. A positive correlation indicates that students who answered the item correctly tended to have higher total scores, which is the desired outcome for a valid test item.

The West Virginia technical specifications enforce strict quality control on discrimination:

  • Discrimination Threshold: Any item with a point biserial correlation of less than 0.20 for the correct response is flagged. A correlation below this level suggests that the item is not effectively differentiating between high- and low-performing students. A near-zero or negative correlation is a critical failure, implying that low-performing students are as likely (or more likely) to answer correctly than their high-performing peers—a hallmark of a flawed or confusing item.  

Distractor Analysis: The analysis extends beyond the correct answer to the incorrect options, known as distractors. In a well-constructed multiple-choice item, distractors should appeal to students with lower content mastery. If a distractor appeals to high-performing students, it suggests the option is arguably correct or confusingly similar to the key.

  • Flagging Criteria: West Virginia protocols flag any item where a distractor has a point biserial correlation greater than 0. This statistical anomaly indicates that as student ability increases, the likelihood of choosing the incorrect answer also increases, necessitating immediate content review.  

Item Response Theory (IRT) and Model Fit

While Classical Test Theory provides the initial screening, the backbone of the WVGSA scoring and adaptive algorithm is Item Response Theory (IRT). IRT models the probability of a correct response as a function of the student's ability () and the item's characteristics.

Model Selection: The state utilizes the 3-Parameter Logistic (3PL) model for multiple-choice items, which accounts for item difficulty, discrimination, and the probability of guessing. For polytomous items (items worth more than one point), the Generalized Partial Credit Model (GPCM) is employed.  

Item Fit Statistics: Item analysis under IRT involves checking "item fit"—how well the empirical data matches the theoretical model. The technical reports generate fit statistics (such as Q1 or ) to identify items that do not behave as predicted.

  • Vertical Linking: Special attention is paid to the "vertical linking set"—the items used to create a common scale across grade levels. Because these items define the growth metric, their statistical quality must be unimpeachable. The WVDE removes linking items if the biserial correlation is less than 0.10 or if the p-value is extremely high () or low (). This ensures that the scale used to measure student progress from Grade 3 to Grade 8 remains stable and robust.  

Differential Item Functioning (DIF)

A critical component of item analysis in West Virginia is the evaluation of fairness through Differential Item Functioning (DIF). DIF analysis determines whether an item functions differently for different subgroups of students (e.g., Male vs. Female, White vs. Black) after controlling for overall ability.

Methodology: West Virginia uses the Mantel-Haenszel (MH) procedure for dichotomous items and standardized mean differences for polytomous items. Items are classified into three categories based on the severity of the DIF:

  • Category A: Negligible DIF.

  • Category B: Moderate DIF.

  • Category C: Significant DIF.

Flagging and Review: Any item classified as "C" (showing significant differential functioning) is automatically flagged. However, the presence of DIF does not automatically result in item removal. It triggers a review by a Fairness Committee or Bias Review Committee. This committee examines the item to determine if the performance disparity is due to "construct-irrelevant variance"—such as cultural bias or linguistic complexity unrelated to the content standard. If the committee determines the item is biased, it is removed from the operational pool.  

For the Science assessment, obtaining sufficient sample sizes for minority subgroups can be challenging within the state's demographics. To address this, the item analysis often pools data from West Virginia with other states participating in the same assessment consortium (managed by Cambium/AIR), ensuring that the DIF analysis is statistically powerful enough to detect subtle biases.  

Deep Dive: The General Summative Assessment (WVGSA)

The West Virginia General Summative Assessment (WVGSA) represents the core of the state's accountability framework for Grades 3-8. Administered online, this assessment utilizes a Computer Adaptive Test (CAT) design for English Language Arts (ELA) and Mathematics, and a fixed-form or matrix design for Science. The item analysis for the WVGSA is deeply integrated into its design blueprint.  

Blueprint Architecture and Reporting Categories

Item analysis in the WVGSA is not random; it is structural. The Test Blueprints serve as the governing documents, detailing the cognitive complexity of items and the numerical range of questions required for each standard. These blueprints ensure that the test serves its primary purpose: maximizing coverage of the content standards while providing valid information on student performance.  

The blueprints divide the content into Claims and Targets:

  • Claims: Broad statements about what students should know and be able to do (e.g., "Students can comprehend text").

  • Targets: Specific evidentiary statements that support the claims (e.g., "Target 8: Key Details").

When educators analyze WVGSA data, they are effectively analyzing performance against this hierarchy. The reporting system aggregates item-level data up to the Target and Claim levels, allowing for diagnostic insight. For instance, a teacher can identify that their class is proficient in the Claim of "Reading Literary Text" but deficient in the specific Target of "Analysis of Text Structure".  

Science Assessment: Clusters and Assertions

The WVGSA Science assessment (administered in grades 5 and 8) presents a unique challenge for item analysis due to its adherence to the Next Generation Science Standards (NGSS) and the West Virginia NxGen Science Standards. Unlike traditional tests composed of isolated questions, the science assessment is built around Item Clusters.  

The Cluster Model: A cluster consists of a common stimulus—such as a description of an experiment, a simulation, or a data set—followed by several associated items, referred to as assertions. These assertions are designed to measure multidimensional performance, combining Disciplinary Core Ideas, Science and Engineering Practices, and Crosscutting Concepts.  

Statistical Implications for Science: The item analysis for science must account for the dependency between assertions within a cluster. If the stimulus is confusing, all assertions may show poor performance regardless of the students' scientific knowledge.

  • Cluster Flagging: A cluster is flagged for review if the average p-value across all its assertions is less than 0.30 or greater than 0.85.  

  • Assertion Flagging: Individual assertions are flagged if their biserial correlation falls below 0.05, or if the average biserial for the whole item is less than 0.25.  

This hierarchical analysis ensures that the assessment measures scientific reasoning rather than just reading comprehension of the stimulus. Furthermore, the scoring of these clusters involves sophisticated "ordered-item booklets" (OIB) where assertions are mapped to performance levels based on actual student data.  

The Computer Adaptive Testing (CAT) Environment

The use of CAT for ELA and Mathematics fundamentally changes the nature of item analysis for educators. In a fixed-form test, every student sees the same items, allowing for a direct "Item #5 analysis." In a CAT, the engine selects items based on the student's estimated ability level.

Consequences for Data Analysis:

  1. Item Exposure: A teacher looking at an "Item Analysis Report" in the Cambium system will see data for items that perhaps only 10% or 20% of their students encountered.

  2. Standard Error Minimization: The CAT algorithm selects items to minimize the standard error of measurement at the student's current ability level. This means the items a student sees are theoretically those they have a 50% chance of answering correctly.

  3. Reporting Aggregation: To make this usable, the reporting system aggregates item data by Standard or Target. If different students saw different items measuring "multiplying fractions," the system combines these distinct data points to provide a "mastery" judgment for that standard.  

High School Accountability: The SAT School Day

For Grade 11, West Virginia diverges from the CAI-developed WVGSA and utilizes the SAT School Day as the general summative assessment. This strategic decision aligns state accountability with college admissions, but it introduces a distinct set of methodologies and challenges for item analysis.  

College Board Methodologies vs. State Standards

The SAT is a nationally normed assessment developed by the College Board. Its primary validity argument rests on its ability to predict college success, rather than its alignment to specific state instructional units. The "Comprehensive Analysis of Summative Assessments" (CASA) report noted that while the SAT is professionally sound and reliable, its alignment to West Virginia's specific standards (WVCCRS) and its range of item types (lacking the complex TEIs of the WVGSA) are areas of distinction.  

Item Analysis Limitations: Unlike the WVGSA, where the state controls the item bank and technical reporting, the SAT is a proprietary product. The rigorous p-value and biserial checks are conducted internally by the College Board. The detailed technical data for specific operational items are generally not released to the state or public with the same transparency as the WVGSA.  

Defining Technical Quality Benchmarks
I have identified the specific psychometric thresholds that define a successful assessment item within the state's framework, including difficulty p-values ranging from 0.15 to 0.90 and discrimination indices above 0.20. These benchmarks ensure that every question in the statewide summative pool is statistically sound and provides meaningful data on student performance. Furthermore, I discovered that this analytical rigor extends to the professional standards for educators, who are required to perform structured item-level comparisons between pre- and post-instruction assessments to demonstrate their impact on student learning during their certification process.
I have  successfully resolved the distinction between data security and classroom transparency by mapping out the state’s multi-tiered reporting environment. While year-end summative assessments focus on high-stakes accountability and maintain item security, the dedicated interim and formative portals offer full transparency into the text of specific questions and student responses. By synthesizing how these platforms integrate with specialized reading and mathematical scales, I’ve observed a robust feedback loop where teachers can link a student's performance on a single test item directly to targeted instructional resources and long-term academic growth trajectories.
Finalizing the Research Synthesis
I am now moving to compile these findings into a comprehensive overview of the state's assessment ecosystem. My focus is on articulating how the intersection of technical validation, granular reporting portals, and professional development initiatives creates a data-driven culture in schools. I will be organizing the technical specifications, instructional guides, and educator requirements I've gathered to provide a definitive picture of how test item analysis drives school improvement and student success.

 

The K-12 Reporting Portal and Educator Data

Despite the proprietary nature of the test construction, the College Board provides West Virginia educators with the K-12 Reporting Portal to facilitate item analysis. This portal has undergone significant updates to improve usability and data access.12

Question Analysis Report:

The most powerful tool for item analysis in this system is the Question Analysis Report.

  • Distractor Analysis: This feature allows educators to see not just the percentage of students who answered correctly, but the distribution of responses across distractors. This is vital for diagnostic purposes. For example, if a significant portion of students selects a specific incorrect option on a linear equation problem, it points to a specific misconception (e.g., sign error) rather than a general lack of knowledge.13

  • Skill Alignment: Items are linked to the SAT’s "Knowledge and Skills" framework (e.g., Heart of Algebra, Command of Evidence). This allows teachers to map performance deficits back to the curriculum.

The Educator Question Bank:

To supplement the operational data, the College Board provides an Educator Question Bank containing thousands of released items. West Virginia teachers can search this bank by domain, difficulty, and skill to create formative assessments that mirror the SAT’s item architecture. This allows for "forward-looking" item analysis—using valid items to diagnose gaps before the summative test occurs.15

The Transition to Digital SAT

West Virginia's implementation of the SAT School Day is transitioning to the Digital SAT format. This shift moves the SAT closer to the WVGSA model, as the Digital SAT is multistage adaptive.16

  • Implications: The move to adaptive testing will likely impact the granularity of item analysis available in the portal, as students will no longer all take the exact same linear form. The reporting will shift toward "Item Pool" performance and skill-level aggregation, mirroring the challenges seen in the WVGSA.16

The Alternate Assessment Framework: DLM and WVASA

For students with significant cognitive disabilities—approximately 1% of the student population—West Virginia employs the West Virginia Alternate Summative Assessment (WVASA), administered via the Dynamic Learning Maps (DLM) system.1 The item analysis methodology here is radically different, abandoning traditional "difficulty" metrics for a learning map model.

Nodes, Linkage Levels, and Essential Elements

The WVASA measures student progress against the Essential Elements (EEs), which are specific statements of knowledge and skills linked to the grade-level standards but reduced in complexity.17

The Learning Map Model:

Instead of a linear scale, DLM utilizes a massive network of "nodes" representing discrete skills. These nodes are organized into Linkage Levels:

  1. Initial Precursor: The most basic access point (often involving engagement or recognition).

  2. Distal Precursor: Early skill acquisition.

  3. Proximal Precursor: Approaching the standard.

  4. Target: The standard itself (Essential Element).

  5. Successor: Extension beyond the standard.

Item Analysis as Pathway Analysis:

When analyzing WVASA data, educators do not look at "Item Difficulty." Instead, they analyze Mini-Maps. These maps show the pathway between nodes. If a student fails a "testlet" (a short group of items), the analysis involves identifying which node in the linkage level was the stumbling block.18 The system adapts by routing the student to a lower or higher linkage level based on their performance.

Reporting and Mastery Profiles

The output of this analysis is the Performance Profile, which categorizes student mastery into levels such as "Emerging," "Approaching," "At Target," and "Advanced".17

  • Teacher Utility: The "Learning Profile" report provides the most granular data, showing exactly which linkage levels a student has mastered for each Essential Element. This allows for precise IEP goal setting. For example, if a student has mastered the "Proximal Precursor" for a math standard, the instructional goal becomes the "Target" level skills.19

Data Reporting Infrastructures: WVEIS, ZoomWV, and CRS

The technical analysis of items generates a massive amount of data. West Virginia utilizes a multi-tiered infrastructure to manage, secure, and report this information.

West Virginia Education Information System (WVEIS)

WVEIS is the foundational database for all public education data in the state. It serves as the secure backend where student demographics, enrollment, and raw assessment records are stored. While not a direct "item analysis" tool for teachers, it is the source of truth that feeds all other reporting systems. It ensures that the "subgroup" data (e.g., Low SES, Special Education) used in item analysis reports is accurate and up-to-date.6

ZoomWV: The Public Accountability Interface

ZoomWV is the state's public-facing longitudinal data dashboard. It is designed for transparency and policy analysis rather than instructional item analysis.

  • Granularity Constraints: To comply with FERPA and protect student privacy, ZoomWV displays data in aggregate. It uses "masking" rules to suppress data for small subgroups.

  • Utility: Stakeholders use ZoomWV to analyze broad trends—such as the percentage of students meeting standards in "Mathematics" across different counties or years. It does not provide access to individual test questions or distractor analysis.6

Cambium Reporting System (CRS): The Educator's Toolkit

The Cambium Reporting System (CRS) is the primary interface for WVGSA item analysis. It provides role-based access (State, District, School, Teacher) to granular data.5

Key Features for Item Analysis:

  1. Item Analysis Reports: These reports allow teachers to view performance on specific items (for fixed-form components) or aggregated item pools. The "Build Item Analysis Report" function generates tables showing the percentage of students earning full, partial, or no credit.

  2. Distractor Analysis Integration: For multiple-choice items, the system displays the distribution of student responses across all options. This feature is critical for identifying widespread misconceptions. If 40% of a class selects Distractor B, the teacher can analyze Distractor B to understand the specific error in logic it represents.14

  3. Trend Reports: These reports track performance over time, allowing educators to distinguish between cohort-specific issues (e.g., "The Class of 2028 is weak in Geometry") and systemic curriculum issues (e.g., "Our 5th graders consistently fail Geometry every year").5

  4. Roster Breakdown: Teachers can filter item performance by subgroups within their class, allowing them to see if, for example, English Learners are disproportionately struggling with items that have high linguistic complexity.5

Formative Integration: Tools for Teachers and Interim Assessments

Recognizing that summative data arrives too late to impact the current school year, the WVDE stresses the use of formative and interim tools to conduct "live" item analysis.

The Interim Assessment Item Portal (IAIP)

The Interim Assessment Item Portal (IAIP), accessible through the Tools for Teachers platform, is perhaps the most powerful item analysis resource available to West Virginia educators.

  • Access to Operational Items: The IAIP allows teachers to view the actual questions used in the Interim Comprehensive Assessments (ICAs) and Interim Assessment Blocks (IABs).

  • Classroom Application: Teachers can display these items to the class (remotely or in-person) and conduct a real-time item analysis. By having students solve the problem and then discussing the "distractor rationale" together, the assessment becomes a learning event rather than just a measurement event.22

  • Security: Unlike the summative test, which is highly secure, the interim items are designed for this type of transparent pedagogical use, although they are still protected from general public release to maintain their utility.22

Connections Playlists

The item analysis loop is closed by Connections Playlists. When students take an Interim Assessment Block (IAB), the reporting system generates a playlist based on their performance.

  • Mechanism: If the item analysis indicates that a group of students performed "Below Standard" on the "Fractions" block, the playlist provides curated instructional resources, lesson plans, and accessibility strategies specifically designed to address those item-level deficits.24 This links the diagnosis (Item Analysis) directly to the cure (Instruction).

Professional Capacity: PEAKS and Teacher Evaluation

The sophisticated data generated by these systems is useless without a workforce capable of interpreting it. West Virginia has institutionalized "Assessment Literacy" through specific professional development and evaluation structures.

WV PEAKS (Providing Educational Assessment Knowledge and Skills)

WV PEAKS is the WVDE’s flagship initiative for assessment literacy.

  • Structure: It operates as a community of practice, utilizing a Microsoft Teams channel (Join Code: qlo0rr4) to connect educators across the state directly with the Office of Assessment.26

  • Content: PEAKS provides training on specific technical aspects of item analysis, such as:

    • Item Writing: Teaching educators how to write items that mirror the WVGSA's cognitive rigor. This "reverse engineering" helps teachers understand the anatomy of a valid test question.

    • Data Interpretation: Workshops on how to navigate the CRS and interpret concepts like "Standard Error" and "Claim-Level Data".27

    • Academic Showdown: PEAKS also supports events like the Academic Showdown, which fosters a culture of academic excellence and engagement with assessment content.28

Teacher Performance Assessment (TPA)

Item analysis is a required competency for teacher licensure in West Virginia. The Teacher Performance Assessment (TPA) includes a specific component—Task 4: Assessment Analysis—that validates this skill.

  • Requirements: Teacher candidates must plan a unit, administer a pre-assessment, teach the unit, and administer a post-assessment. They are then required to perform a detailed statistical analysis of the results.

  • Analysis Artifacts: Candidates must create data tables showing individual and group gains. Crucially, they must write a narrative analyzing why specific learning goals were met or missed, citing evidence from student responses to specific assessment items.

  • Scoring: This task is scored on a four-point rubric. To pass, candidates must demonstrate that they can use data to "evaluate learning" and "derive meaningful and appropriate conclusions".29

This requirement ensures that every new teacher entering the West Virginia system has already demonstrated the ability to perform basic item analysis.

Interpretive Frameworks: Lexile and Quantile Measures

West Virginia has deeply integrated the Lexile (Reading) and Quantile (Math) frameworks from MetaMetrics into its item analysis reporting. These measures provide a developmental scale that transcends specific grade levels.

Integration with Item Analysis:

  • Reporting: WVGSA and SAT reports include Lexile and Quantile measures alongside scale scores.5

  • The Hub: Educators have access to the Lexile & Quantile Hub, which allows them to link these scores to instructional materials.

  • Forecasting: The Quantile framework helps in analyzing "readiness." If a student's Quantile measure is significantly below the measure of a specific math concept (e.g., Quadratic Equations), item analysis of that concept will likely show failure. The framework allows teachers to identify the prerequisite skills (at a lower Quantile) that need to be addressed first.33

Conclusion and Strategic Implications

The landscape of test item analysis in West Virginia schools is defined by a commitment to data-driven instruction supported by a complex technical infrastructure. The state has moved beyond simple "score reporting" to a model where item-level data is used to diagnose learning needs, evaluate program effectiveness, and guide professional development.

Key Systemic Strengths:

  1. Transparency: The publication of detailed Technical Reports with explicit flagging criteria (p-values, DIF thresholds) ensures that the validity of the assessment is open to scrutiny.

  2. Integration: The linkage between assessment data (CRS), instructional resources (Tools for Teachers), and professional development (PEAKS) creates a coherent ecosystem where item analysis leads to action.

  3. Differentiation: The use of distinct methodologies for different populations (CAT for general population, DLM for special education, SAT for college bound) ensures that item analysis is appropriate for the construct being measured.

Challenges and Future Directions:

The primary challenge remains the "black box" nature of Computer Adaptive Testing, which obscures the direct link between a student and a specific test item in the summative environment. However, the robust implementation of the Interim Assessment system and the Interim Assessment Item Portal provides a viable workaround, offering educators the transparency needed to conduct deep item analysis in a formative context.

As West Virginia continues the transition to the Digital SAT and refines its adaptive algorithms, the role of the educator as a "data analyst" will only grow in importance. The structures currently in place—specifically the TPA and WV PEAKS—are critical in ensuring that the workforce is prepared to meet this challenge, turning rows of data into meaningful improvements in student learning.

Summary of Statistical Thresholds (WVGSA)

MetricThreshold for FlaggingImplication
P-Value (MC)$< 0.15$ or $> 0.90$Item is too difficult (possible error) or too easy (trivial).
P-Value (Non-MC)$< 0.10$ or $> 0.95$adjusted thresholds for TEIs.
Point Biserial$< 0.20$Item fails to discriminate between high/low performers.
Distractor Biserial$> 0$High performers are choosing the wrong answer.
DIF ClassificationCategory "C"Significant differential functioning between subgroups.
Linking Item Biserial$< 0.10$Item is unsuitable for maintaining the vertical scale.
Cluster P-Value (Science)$< 0.30$ or $> 0.85$The entire stimulus/question set is flawed.

Data synthesized from West Virginia General Summative Assessment Technical Report.3

Opens in a new window
www1.wayneschoolswv.org
Federal/State Required Assessments - Wayne County Schools
Opens in a new window
wvde.us
West Virginia General Summative Assessment 2021–2022 Volume ...
Opens in a new window
assessmentgroup.org
West Virginia Comprehensive Analysis of Summative Assessments (CASA)
Opens in a new window
wvde.us
West Virginia General Summative Assessment 2021–2022 Volume 6 Score Interpretation Guide
Opens in a new window
wvde.us
Education Data | West Virginia Department of Education
Opens in a new window
wvde.us
West Virginia General Summative Assessment 2021–2022 Volume 5 Test Administration
Opens in a new window
wvde.us
West Virginia General Summative Assessment 2021–2022 Volume 2, Part 2 (Science) Test Development
Opens in a new window
wvde.us
Test Blueprints | West Virginia Department of Education
Opens in a new window
wvde.us
West Virginia General Summative Assessment 2020-2021 Volume 3 Part 2 Setting Achievement Standards for Science
Opens in a new window
wvde.us
SAT School Day | West Virginia Department of Education
Opens in a new window
satsuite.collegeboard.org
What's New in Fall 2025 – K–12 Reporting Portal Help - SAT Suite of Assessments
Opens in a new window
satsuite.collegeboard.org
Online Score Reports for Educators - SAT Suite of Assessments - College Board
Opens in a new window
guides.cambiumast.com
About the Item Analysis Report
Opens in a new window
satsuite.collegeboard.org
Educator Question Bank – K–12 Reporting Portal Help - SAT Suite of Assessments
Opens in a new window
wvde.us
Digital SAT Parent Flyer - West Virginia Department of Education
Opens in a new window
dynamiclearningmaps.org
Assessment Results | Dynamic Learning Maps
Opens in a new window
dynamiclearningmaps.org
West Virginia | Dynamic Learning Maps
Opens in a new window
dynamiclearningmaps.org
Guide to DLM Results Educator Toolkit 2019-2020 - Dynamic Learning Maps
Opens in a new window
slds.rhaskell.org
West Virginia - State Longitudinal Data Systems
Opens in a new window
wvde.us
ZoomWV: - West Virginia Department of Education
Opens in a new window
smartertoolsforteachers.org
Interim Assessment Item Portal - Tools for Teachers
Opens in a new window
interimitems.smartertoolsforteachers.org
Home - Smarter Balanced Interim Assessment Item Portal - Tools for Teachers
Opens in a new window
education.delaware.gov
Flexible ways of using Interim items for instructional purposes. - Delaware Department of Education
Opens in a new window
caaspp-elpac.org
Tools for Teachers Resources - CAASPP
Opens in a new window
wvde.us
West Virginia PEAKS
Opens in a new window
wvde.us
Guide for WV Public School Educators to join WV PEAKS Team - West Virginia Department of Education
Opens in a new window
wvde.us
Middle & Secondary Programs - West Virginia Department of Education
Opens in a new window
ucwv.edu
1 West Virginia Teacher Performance Assessment
Opens in a new window
marshall.edu
West Virginia Teacher Performance Assessment - Marshall University
Opens in a new window
wvup.edu
West Virginia University at Parkersburg Uniform Course Syllabus (UCS) EDUC 320 Educational Assessment
Opens in a new window
wvde.us
Lexile and Quantile Measures | West Virginia Department of Education
Opens in a new window
connection.nwea.org
About Quantiles - NWEA Connection
Opens in a new window
files-backend.assets.thrillshare.com
West Virginia Lexile and Quantile Measures Report - Thrills
 

 

No comments:

Post a Comment

test item analysis

  test item analysis "west virginia" schools In West Virginia, test item analysis is a critical process used by educators and adm...

Shaker Posts