|
Assessment and Accountability
A Modest Proposal
Heide Spruck Wrigley
Aguirre International,
San Mateo, CA
At times it seems that everything
there is to say about testing and assessment in adult literacy has
been said. By now, practitioners and administrators alike can cite
the shortcomings of standardized tests using multiple choice formats
and are familiar with the inadequacy of grade levels as indicators
of what adult learners know and are able to do.
Yet, pencil and paper, multiple-choice tests continue to be used
not only as placement instruments but as measures of learner gains
and evidence of program success. Given current reporting requirements,
their use is likely to increase, at least in the near future.
From the perspective of programs, there seem few viable alternatives
that would meet the information needs of funders interested in reliable
data that indicate how a program is doing overall. Portfolio approaches,
for example -- considered the last great hope a few years back --
have not quite matured to the level where they might be used as
a means to report and aggregate learner gains by group (although
they are invaluable as evidence of individual learner progress),
largely because the field has not invested in the development of
benchmarks and rubrics.
Local approaches have remained just that, local approaches, primarily
for two reasons: 1) there has not been enough field testing to establish
the reliability of these measures and 2) there have not been sufficient
efforts to implement alternative assessments across programs. At
this time, it is easy to see how even programs that have been enthusiastic
about developing an assessment system that captures what they consider
worthwhile outcomes are becoming distressed about the prospects
of an alternative system being able to rival the standardized tests
currently in fashion.
All is Not Lost
Yet, the picture is not as dim and grim as it might first appear.
Indeed, it may be premature to give in to cynicism ("it's all a
sham and no one really cares"), paranoia ("next year, all funding
will be tied to the results of standardized tests"), and paralysis
("in the end, no one will care about alternative assessment, so
let's just sit and wait to see what comes down the pike"). Since
a Pollyanna attitude does not appear to be justified either, given
recent legislation, perhaps it is time to take an existentialist
perspective where we commit ourselves to forge ahead although (and
even because) life in adult literacy does not always make sense,
but what else are we going to do to stay sane?
Let's ask then if there is anything positive happening in assessment,
and how we can help shape new directions on the national or state
level, while continuing to strive for sane assessments within and
across local programs.
The Federal Outcome Reporting System
You may have heard that the U.S. Department of Education has mandated
a uniform outcome-based reporting system that requires that all
states send data for all programs funded under Adult Basic Education
(ABE) to the Department of Education in Washington. Assessments
for capturing outcomes must be "valid and reliable." In other words,
they must either be in the form of a standardized test (considered
reliable by definition) or by some other means that meet these requirements.
States (and the programs they fund) will be asked to report "learner
gains" in reading, writing, speaking, and listening (and possibly
additional skills related to workforce development) and show that
learners are advancing across levels, such as the Student Performance
Levels (SRL) established for ESL. These are minimal requirements
and individual states can define progress in various ways or even
suggest additional outcomes as evidence of literacy progress and
program success.
To understand the thinking behind the initiative, it is important
to keep in mind that the primary focus is neither curriculum reform,
nor program improvement (although new assessment systems are often
used for these purposes), but rather an accountability measure to
bring adult literacy in line with the requirements of GPRA -- the
Government Performance and Results Act. GPRA requires that all federal
agencies have to show that they, as well as the agencies and programs
they fund, are achieving results or else risk loss of funding. Since
the focus of GPRA is on the performance of the overall system (made
up of thousands of programs), nei ther the federal government nor
the states are likely to pay a great deal of attention to the progress
made by any given learner at any given site, although site performance
will be open for review (think standardized testing in K-12). Rather,
funders will want to know how a program is doing overall (that is,
whether it is positively affecting literacy skills), and they expect
to see numbers in aggregate (summarized) form.
While in many ways, documenting the kinds of outcomes required
by the new reporting system are "doable" (at least for programs
that have long reported literacy gains for a sample of their students),
two dangers loom as programs try to show gains for all students
(not just a sample) and as results are increasingly tied to funding.
There is a risk that programs will be a) tempted to manipulate assessment
results in their favor and b) succumb to a practice known as "creaming".
Manipulating Assessment Results
Any time success (and subsequent funding) is determined by the
data a program reports, there are concerns about administrators
"fudging the data." For example, programs have long known that the
trick to increasing test scores is to NOT prepare students for the
test, but rather to assess them as soon as they walk in the door.
This keeps baseline scores artificially low and progress is inflated,
since gains are due to increases in test-wiseness, rather than any
real gains in literacy skills. Although this kind of manipulation
is considered unethical, since the resulting data "lack integrity".
The practice is nevertheless quite commonplace among programs pressured
to demonstrate learner progress in short amounts of time.
(Clearly, this trick only works once for each set of students,
since the effects tend to level off after subsequent administrations
of the test).
Top of Page
The Dangers of Creaming
It is an unfortunate fact of adult literacy that programs that
help those "hardest to serve" (for example, learners who are both
new to English and new to literacy) have the greatest difficulties
showing gains, not only because their learners need a great deal
of time until progress is evident, but because the kind of progress
they are making is not easily captured by standardized multiple
choice, paper and pencil tests. In addition, programs who serve
these students (often community-based organizations) don't have
the resources to set up testing alternatives appropriate for a low
literacy population.
There is a danger, then, that programs not fully committed to
serving learners who need both special support and extended time
will decide to focus their efforts instead on those students who
most easily advance, since the incremental progress of "slower"
students only makes the program "look bad."
Thinking along those lines, ESL programs, for example, might decide
to focus the curriculum on immigrants with higher levels of education,
rather than serving ESL literacy students. This process of focusing
on participants who are easy to serve is known as "creaming" and
has long been decried as an unintended outcome of programs that
have signed performance-based contracts (where funds are linked
to learner outcomes and program impacts, such as job placement).
So far, not many public debates have taken place around this issue
in adult literacy on the state level, but concerns are sure to arise
as programs realize the difficulties they face in reporting progress
across levels in the time periods envisioned by the reporting system.
So Why Not Ask for An Exemption?
Two solutions to the problem of creaming seem possible: 1) set
aside monies so that programs can develop an alternative assessment
for lower level students or 2) ask that learners who have difficulty
negotiating paper and pencil tests be exempted from testing. In
my view, exemptions, as attractive as they may seems, are not the
best solution in the long run, since we may end up marginalizing
both this group and programs that serve them. As ESL programs in
K-12 have seen, being exempt from accountability requirements is
not the blessing that it might seem. As a rule, if certain types
of learners are excused from testing, they tend to disappear from
the radar screen of administrators and are ignored when program
decisions are being made. Furthermore, it is difficult to ask for
funding for a population for whom no data is available.
I believe that, rather than asking for exemptions for students
who cannot cope with the standardized tests approved by a state,
we are better off advocating for the development of an alternative
assessment framework for this group. There is an additional advantage
to advocating for resources to develop an alternative assessment
for those new to literacy. Once such an assessment is developed
for one group, it is easier to acquire the resources to extend it
to other levels and other populations.
Alternative Testing for Low Literate Students
What might an assessment that measures the incremental changes
that occur at the initial levels of language and literacy development
look like? It is entirely possible to design a framework that allows
learners to demonstrate what they can say and understand in English
despite limited proficiency (in fact the oral interview component
of the BEST test does just that). It is also possible to design
a "can-do" literacy assessment (of the type first suggested by Lytle
and Wolfe) based on the kinds of texts and tasks that those new
to literacy deal with every day. For example, tasks could be designed
that allow learners to select pieces of print that they can recognize
fairly easily, along with those that give them some difficulty and
others that pose a still greater challenge (e.g., McDonald's logos,
sale signs, 50% off promotions, their own street address, a letter
from the INS or the TANF office). After selecting these print pieces,
learners would read the items once together with the friendly teacher/facilitator/assessor
and would then try a few text pieces that they have selected on
their own.
If a program wants to create an
assessment that works double duty (as a basis for program improvement
and for accountability), a further step is necessary: the development
of scales, rubrics, and benchmarks that indicate the expectations
for any given level and to what degree learners are close to acquiring
the kind of knowledge, skills, and strategies that are a core part
of our curriculum.
The assessor rates individual performance on a scale without making
a big deal of it. On the third round, the assessor might select
an item that is slightly more difficult than the previous one, again
encouraging the person to discuss the item and interpret what it
says. Through assessments of this sort, we should be able to tell
to what extent learners can handle a variety of literacy task at
varying levels of confidence and proficiency. It would help us to
see evidence of skills worth having, such as: 1) telling an electricity
bill from a phone bill or a notice from the INS from a notice from
school, 2) recognizing certain types of applications (housing, employment;
citizenship), 3) interpreting real life environmental print (reading
stop or danger signs), or 4) writing a note to a repair person,
the landlord, or the worker on the next shift.
Asking learners to select tasks that they can do with confidence
as a starting point for assessment and then moving up from there
is not limited to the domains of practical literacy. For those interested
in basic skills acquisition that focus on the subtasks of reading,
one-on-one student-initiated assessment can tell us to what extent
learners have developed the kind of "phonemic awareness" that allows
them to select familiar words that start with the same consonant
or identify words that rhyme. Those interested in basic writing
proficiency can ask learners to select an evocative photograph or
some other prompt, discuss it with the facilitator and then write
the response.
Such an assessment plus conversation model can also provide baseline
data on literacy practices, documenting the kind of print task that
learners engage in (looking at TV Guide; reading the Bible;
checking the horoscope or soccer scores (in English or in a native
language newspaper) and recording how these practices change over
time.
Assessments that allow learners to select a simple task and then
branch out is hardly a new concept. In fact, it is the basis for
the kind of "adaptive" assessment that has been used in computer-based
testing. True this this type of assessment requires oneon-one administration,
but as practices in K-12 have shown, after the initial intake assessment
has been completed, teachers can take out a few minutes with each
student during class time over the course of three weeks or so to
document what learners can do that they could not do before (trained
facilitators could do short "pull out" sessions as well). As funding
for adult literacy is increasing, the old refrain of "there is no
money to do this" no longer holds true. There are alternatives to
multiple choice tests and we must advocate for their development
and their use if we are serious about documenting progress for all
learners, including those who still struggle with basic literacy.
Building an Assessment Framework that Yields Worthwhile Results
Developing an assessment that captures gains at the lower levels
is only the starting point in a larger effort to build a system
that works. Other efforts are needed, at both the local and the
state levels so that we don't end up with an accountability system
that is driven in large part by what current standardized tests
are able to measure. If we want the quality of adult literacy to
increase, we need an approach that measures to what extent learners
are acquiring the knowledge, skills, and strategies that matter
in the long run. These might include generative skills, such as
gaining meaning from various print sources important to one's life;
communicating one's thoughts and ideas; lean-Ling how to learn;
knowing about and using resources effectively; and learning with
and from others (along with the sub skills that help learners become
increasingly more proficient in these areas).
How can this be done? At the local level, a three-pronged approach
might be necessary: 1) finding a way to live with the currently
available standardized tests, selecting the "LOT" -- the least objectionable
test -- and keeping in mind the principle of "first, do no harm"
to students; 2) convincing the state that the data a program has
provided over the years are at least as valid and reliable as standardized
tests such as the TABE and therefore the process should continue
and 3) work with others to develop an assessment system that reflects
the realities of adult learners' lives and focuses on what participating
programs have deemed to be the core sets of knowledge, skills, and
strategies important enough to teach and test.
Top of Page
Components of an Alternative Assessment System
Profiles and Portfolios
What might be the components of such a system? To start with,
any program concerned about serving different groups of learners
equally well, needs to collect demographic information that captures
the kind of learner characteristics and experiences that may have
a bearing on school success. After all, only by having rich descriptive
information can we know what learners want and need to do with English
and literacy (given their current circumstances and their goals
for the future), how much schooling they have had (and how successful
they were), and what the print and communication challenges are
that they face in their everyday lives. Having descriptive information
of this kind is invaluable since it allows us to see which learners
are succeeding in our programs and which are languishing (or leaving)
because their needs are not met.
This information can be collected in the form of profiles that
travel with the student and to which teachers and learners contribute
on an ongoing basis. In addition to background variables such as
age, employment status, years of schooling, country of origin and
languages spoken, these profiles can 1) capture current literacy
practices (who is now speaking to the doctor without a translator;
who has started to pick up a newspaper to check the weather); 2)
chart shifts in learner goals and 3) record changes in life circumstances
(new job, citizenship; economic self-sufficiency) important to stakeholders.
In these profiles, progress can be captured as it occurs (requiring
only a line or two for two or three students per class). Profiles
have the added advantage of encouraging teachers to create opportunities
for learners to discuss what is happening in their lives, so they
can spend some time observing. Profiles of this sort (also known
as "running records") can be connected with portfolios that demonstrate
student progress through writing samples, reading inventories, and
various types of performance tasks. If a standardized test is used,
results can be included in the profile as well, helping to flesh
out the general picture of achievements and struggles.
From Learner Success to Accountability
This must be said: While an approach that combines rich profiles
and individual portfolios will produce important information on
individual students and provide insights into the relative success
of certain learner groups, it does not, in and of itself, yield
the kind of data needed for accountability. After all, we cannot
ship boxes of profile folders to funders to have them realize what
a great job we are doing.
To make profiles work for funders, a further step is needed, one
that yields data in aggregate form so that policymakers can get
a picture of the shape and size of the forest, not just a close-up
of the trees. To measure progress and report to funders who is getting
better at what, profiles need to include the following: a broad
set of language and literacy tasks that are accompanied by rubrics,
scales, and benchmarks for transition.
Rubrics are used to indicate what expectations are for any given
area (face-to-face communication, dealing with print, accessing
resources, etc.) and what evidence of success might look like. The
scales that accompany the rubrics allow us to document where learners
fall on a continuum of proficiency, documenting what they can do
with relative ease, where they succeed with some help, and where
they are struggling.
Since rubrics and scales can be designed for different skill domains
(SCANS skills, communication strategies, navigating systems, civic
involvement, learning how to learn, empowerment, etc.) and for various
contexts (school, family, community), they can easily be matched
to the goals of learners and adapted to the focus of particular
program. They also allow for the kind of student control in task
selection discussed above.
Once rubrics and scales are in place, meeting accountability requirements
that call for aggregate data becomes relatively easy. Since the
descriptors on a scale can easily be numbered (from 1 for "struggles"
to 6 for "no problem", say), assessment results can be easily compiled,
summarized, analyzed and reported out. If matched with demographic
profiles, they allow a program to see which groups of learners are
being served well by the program and where program changes are in
order because success is lacking.
The beauty is that this kind of approach fulfills the same function
as standardized test: learners are assessed on a variety of skills
under standard conditions with common instruments on similar tasks
(yet given choices in task selection and afforded multiple opportunities
to shine on tasks that matter in a given context important to learners).
But unlike the standardized tests currently available, profile assessments
do not rely on multiple choice, paper and pencil items.
Rather they give learners the opportunity to demonstrate what
they can do with language and literacy through more open ended assignments.
Furthermore, profile approaches to assessment can be adapted for
certain learner groups and modified to match the focus of a particular
program (e.g., workplace, family literacy, citizenship). Most importantly,
perhaps, they provide rich information that makes sense to teachers
and learners, information that is useful to programs, not just funders.
Why then, are we not seeing more of these kinds of assessments?
While extremely worthwhile and high in validity, these types of
assessment carry a significant burden: they require consensus building
on what is worth teaching and learning and a common understanding
of what evidence of success might look like for any given skill
domain. To be successful, profiles and portfolios have to be integrated
into the curriculum and ongoing assessment must either be part of
the day-to-day teaching we do, or time must be set aside at intake
to establish baseline and toward the end of a teaching cycle to
document progress. If that means the end of open-entry/open-exit
as we know it and forces us into shorter instructional cycles that
have a clear teaching/learning focus, so be it. To give such a framework
a chance, a significant amount of teacher orientation, training
and buy-in will be needed.
Clearly, there are not many adult hteracy programs that have the
commitment, energy and resources to embark on that endeavor, although
some, like the Arlington Education and Employment Program in Virginia
are well on their way. But, given sufficient advocacy from local
programs along with a modicum of political will on the part of state
directors and other funders, teams, working groups and consortia
could be set up to develop an assessment framework that, if not
based on profiles, at least includes them. In fact, the National
Institute for Literacy is moving in that direction, developing an
assessment framework that combines the use of alternative assessments
with standardized tests where appropriate in order to capture the
gains that learners make who are part of the "Equipped for the Future"
initiative.
What then is the bottom line, given the current climate of accountability
for accouhtability's sake? We have several options: we can decide
that cynicism is the only sane response to the current requirements,
live with standardized tests as best as we can, try to lay low,
figuring "this too shall pass," or commit ourselves to fighting
for a saner system for our own sake and that of our students. On
the local level, we must be prepared to work with others to decide
on the focus of our programs and be willing to map out a core set
of knowledge, skills and strategies that matter.
At the federal level, we must push for an accountability system
that is driven not by what the current standardized tests are able
to assess (which is rather limited), but by outcomes that reflect
what sound adult literacy programs should be all about. Furthermore,
if we are asked to show accountability related to outcomes and impacts,
we must be given the resources to document success in meaningful
ways.
Finally, while we may need to play the accountability game for
the time being, we can also work toward a system that measures effectiveness
where it counts: adult learners acquiring the kinds of knowledge,
skills and strategies that are important to them now and that matter
in the long run. If we give up too soon, we will only marginalize
adult literacy further.
Originally published in Adventures in Assessment,
Volume 11 (Winter 1998),
SABES/World Education, Boston, MA, Copyright 1998.
Funding support for the publication of this document
on the Web provided in part by the Ohio State Literacy Resource
Center as part of the LINCS
Assessment Special Collection.
|
|