SMARTER Balanced Assessment Consortium

From FreedomKentucky
Jump to: navigation, search


Two national efforts are under way to create uniform tests to assess student performance in all states against the new Common Core State Standards, which have been developed by a partnership of the National Governors’ Association and the Council of Chief State School Officers.

One of the two groups competing to create the main Common Core State Assessments is called the SMARTER Balanced Assessment Consortium.

The States in this consortium include:
















New Hampshire*

New Jersey*

New Mexico

North Carolina

North Dakota*





South Carolina*

South Dakota*




West Virginia


Note: * States are Advisory Only, Others Are Governing States

SMARTER Consortium’s Executive Director Joe Willhoft made an important presentation on his group’s effort to a Brookings Institution conference on October 28, 2010. Comments below are based on papers, Power Point presentations, and the audio recording from that conference. All are available on line.

In his Power Point, Willhoft explains the main SMARTER goal:

“All students leave high school college and career ready.”

That is a good goal, and it is well-aligned with requirements from Kentucky’s Senate Bill 1 from the 2009 Regular Legislative Session.

However, once Kentucky’s two-decades of history with reform assessments is considered, SMARTER’s proposals to implement this basic goal appear to have some serious problems. In fact, the proposals imply that the SMARTER consortium has at best limited knowledge of Kentucky’s assessment experience since 1992. In consequence, it is possible the effort could repeat some serious mistakes that the Bluegrass State took almost two decades to figure out.

In the following discussion, key points from the SMARTER presentation to Brookings are listed along with following commentary based on Kentucky’s rich history with Progressive testing programs.

Key Features of the SMARTER Proposal

• SMARTER Comment: “Summative assessments using online computer adaptive technologies (including) Incorporate adaptive precision into performance tasks and events”

Unfortunately, this sounds EXACTLY like the kinds of promises Kentuckians heard when the Kentucky Instructional Results Information System (KIRIS) assessment started way back in 1992. Kentuckians heard somewhat similar promises again when the state’s Commonwealth Accountability Testing System (CATS) assessment replaced the discredited KIRIS assessment in 1999. CATS only lasted until 2009, when it was also thrown out by the Kentucky legislature.

The fact that both those reform-oriented assessments ultimately proved unsuccessful creates considerable cause for concern about the PARCC plan.

Performance Items as Assessment Elements – Issues

Kentucky has a lot of experience with performance items in assessments (Performance Events, Math and Writing Portfolios, and lots of Open-Response Questions). The history isn’t very good.

For example, the “Performance Events” that were used at the start of KIRIS crashed by 1996. They never provided stable information. Worse, in 1996 a poor choice for the Performance Event task resulted in totally unusable results for every middle school in Kentucky.

Other performance items that were dropped for poor performance in Kentucky include “Math Portfolios,” which were launched in 1993 and crashed in 1996.

The state also had an extended period of using “Writing Portfolios” in the accountability program. These were finally removed from the assessment and accountability program in 2009. Math Portfolios interfered with real math instruction, and Writing Portfolios, while a great instructional tool for teachers, proved hopelessly unworkable as an assessment item. They actually wound up interfering with writing instruction, as pointed out in this You Tube video.

Finally, open response questions have been a continual challenge in our testing program. They also have issues about validity and reliability, and they are expensive and time-consuming to create, administer and score. Furthermore, due to the time involved to administer them, open response written questions tend to limit the amount of the curriculum that can be tested with each individual student. You simply cannot use many of these types of questions in assessments without making undesirable tradeoffs: either testing time gets grossly excessive, or you limit the amount of testing of content for each student.

In Kentucky, to date, the extensive use of open-response, performance-oriented questions has resulted in incomplete testing of individual students. Neither the KIRIS nor CATS assessments ever provided valid and reliable data for individual students. Senate Bill 1 will not allow that deficiency to continue; so, if Kentucky is to use SMARTER’s assessments, those products must manage some very challenging “test engineering” issues of adequate content coverage versus use of open-response questions versus acceptable testing times in a way Kentucky never discovered.

Based on the presentation made to the Brookings Institution, it looks like SMARTER is a very long way away from making that happen.

SMARTER Proposed Interim Assessments

SMARTER Comment: “Optional interim assessments • Are aligned to and reported on the same scale as the summative assessments • Help identify specific needs of each student, so teachers can provide appropriate, targeted instructional assistance”

Supposedly, these will be non-secure and created and scored by teachers. That would seen to be in line with the concept of formative assessments, but

Costs will also rise sharply for states that currently rely on few open-response questions, as these are expensive as well as time-consuming test items.

Gambling on Technology that Does Not Currently Exist

SMARTER Comment: “Adaptive summative assessments benchmarked to college & career readiness”

The SMARTER presentation also says the summative assessments will include performance tasks and summative computer adaptive assessments to be administered during the last 12 weeks of the school year.

There is a very long road to travel from this exciting concept to a working and practical program. Testing expert Greg Cizek, who also spoke at the Brookings conference, says that we don’t have the technology today. Never-the-less, SMARTER’s proposal promises it will be on line and operating in 2015.

If we can really make that work, it will be very valuable.

While not mentioned, it is implied that SMARTER’s assessments will provide comparability across states. However, this requires more than just using the same tests. The National Assessment of Educational Progress (NAEP) has already shown us that, as pointed out in this other Wiki item. States today have very different student demographics, and it isn’t possible to develop an accurate understanding of real performance across states through simplistic comparison of overall average scores. I have written extensively about how this problem impacts interpretation of NAEP, such as here.

For this to really work, SMARTER will have to provide considerable disaggregated data for each state by race, poverty rate, and learning disability status along with some carefully developed guidance on how to FAIRLY use the data to draw valid conclusions, an issue the NAEP has been wrestling with for years.

Testing expert Cizek also pointed out at the conference that there isn’t agreement at present on what college readiness actually looks like and how it can be measured. This requires the test to not only report on achievement, but to also have predictive qualities. Cizek says that is going to be hard to do, although we already have tests in Kentucky from the ACT, Incorporated that are doing this job. Though he didn’t say so, the issue for Cizek may involve the fact that in a number of states, the SAT, not the ACT, is used for college acceptance. Those states might not want to sign on to an ACT-like model. Thus, while Kentucky already has a model test for college readiness, not all states may agree with it, and SMARTER has a way to go to achieve this goal.

Can SMARTER find a way to administer performance items at the end of the year, score them, and get the results back in a timely manner?

For sure, KIRIS and CATS always failed in this area. The results have almost always arrived back at schools well after the next school term was already under way, too late to help inform changes to curriculum before teachers were already back in the “classroom trenches” and too busy to do the job well. Also, this makes the SMARTER tests diagnostic as well as predictive of college work and measures of achievement. That is an awful lot to accomplish with one test.

Concerns about the SMARTER effort

The Bluegrass Institute is concerned that the SMARTER effort seems to be operating in ignorance of the two-decade-long history in Kentucky with performance type testing. SMARTER has not mentioned specifically how they plan to overcome the issues that Kentucky has encountered with trying to make a remarkably similar assessment program function well, especially the thorny issues surrounding the performance events.

The Prichard Committee for Academic Excellence is also raising concerns about the apparent misunderstanding of the concepts of formative assessments that are found in the SMARTER proposals. They posted a series of blogs with those concerns here and here.

Those Prichard concerns draw in considerable measure on a paper by Margaret Heritage, created for the Council of Chief State School Officers. Heritage’s paper strongly criticizes the consortia plans for the Common Core State Assessments, saying the proposals are uninformed about the key concepts of formative assessment and may actually interfere with the success of schools with formative approaches.

Heritage writes:

“At a time of unprecedented opportunity, it is regrettable that roles of the teacher and the student in enabling learning are not at the center of current thinking about formative assessment within the proposed next-generation assessment systems. This may well result in a lost opportunity to firmly situate formative assessment in the practices of U.S. teachers.”

She also takes issue with the concept of interim, during the year assessments, saying there is no empirical evidence that interim assessments improve student learning.

It won’t be cheap!

The Common Core State Assessment effort isn’t cheap. While costs for the SMARTER effort were not mentioned during the Brookings conference, we do know that the PARCC effort has already received grants of $185.9 million for work it will do over the next four years. Presumably, the SMARTER group has received similar funding.

Additional information

You can hear Willhoft’s spoken comments to the Brookings conference in an audio presentation accessible from a link here.

His comments begin at 1 hour 7 minutes and 15 seconds into the audio recording.