A Writing Rubric to Assess ESL Student Performance
Inaam Mansoor and Suzanne Grant
Performance-based assessments are
popular because they are often program-based and learner-centered;
however, funders tend to question their credibility. We challenged
ourselves to address this issue by finding a way to satisfy technical
quality issues, such as validity and reliability, while also keeping
in mind how assessment influences learning. We believed that this
approach would facilitate reporting student achievement both fairly
Who We Are
The Arlington Education and Employment Program (REEP) is an adult
English as a Second Language (ESL) program administered through
the Arlington Public Schools in Arlington, Virginia. Because of
its close proximity to our nation's capitol, the area draws large
numbers of immigrants attracted by job opportunities in the service
industry and a large number of national and international organizations.
Nine levels of ESL instruction are offered, including workplace
literacy and computer-assisted instruction. There are some 6,000
enrollment slots at 8-10 locations throughout Arlington County.
There are 55 trained and experienced ESL teachers, who are supported
by 5 coordinators. In addition, more than 100 volunteers support
In 1995, REEP staff developed a writing rubric. A rubric is a scoring
device that specifies performance expectations and the various levels
at which learners can perform a particular skill. By articulating
what our adult ESL learners could do at various proficiency levels,
we hoped to fine-tune placement of learners into appropriate class
levels and monitor their progress. Our rubric was developed by collecting
writing samples from each class level and analyzing them. We found
that although we had nine instructional levels, our students' writing
fell into six distinct writing
performance levels. The differences in these levels could be articulated
using five characteristics (learning targets) of our learners' writing:
content and vocabulary, organization and development, structure,
mechanics, and voice (See REEP Writing
Rubric attached). As part of our work with the What Works Literacy
Partnership (WWLP: a group of adult basic education programs from
across the country building their capacity to effectively use data
for program improvement and decision-making. For more on WWLP, please
go to www.wwlp.org), we designed
and implemented a study to determine the effectiveness of using
the REEP Writing Rubric to measure progress. With support from WWLP,
we developed pre- and post-test writing tasks to assess writing
Developing writing tasks that could be used for program-wide testing
of beginning through advanced level students was challenging. To
be fair, the tasks needed to generate a wide variety of responses
and enable students at different levels to demonstrate their abilities
and life experiences. We decided that the performance task of writing
a letter of advice based on their own experiences would meet the
above criteria and be consistent with skills that students were
practicing in class. Moreover, we structured the testing process
to mirror instructional practice by engaging students in warm-up
activities prior to the actual writing test.
Reliability of test data is extremely important in the context
of program-wide assessment, especially when the assessments are
reported to funders.
To maximize the reliability of our results, WWLP researchers provided
extensive guidance on field-testing, test administration procedures,
scoring, performance task development, and rater training.As a result,
we implemented the following:
Before administering the pre- and post- writing tests to hundreds
students, we conducted field-testing
to answer the following questions:
1. Can we expect measurable progress within the specified test
interval, that is, 120-180 hours
2. Can beginning through advanced level students demonstrate
their writing skills in response to our writing tasks?
3. Are the pre- and post-test tasks equivalent, that is, do
they represent the same level of difficulty?
To answer questions 1 and 2, a small group of experienced teachers
administered the pre-test to five students from each class level
at the beginning of an instructional cycle. At the end of the cycle,
the teachers administered the post-test to the same group. Students
were asked for feedback and they said they felt that they were able
to demonstrate their writing skills with these tests. Teachers also
thought that the tests demonstrated the students' writing abilities.
Experienced readers scored the tests, and then a WWLP researcher
analyzed the results. The analysis showed that significant gains
could be measured and that reliable results could be achieved using
the scoring procedures we had implemented. We were ready for large-scale
To answer question 3, the same group of students representing all
class levels was given the pre-test followed by the post-test within
a three-day period. A WWLP researcher analyzed the results and found
no difference between students' pre- and post-test scores, which
demonstrated that the two tasks represented the same level of difficulty.
One of the key elements in achieving equivalence was the use of
the letter genre and parallel warm-up activities for both the pre-
- Test Administration.
Prior to each test administration, testers participated in
trainings on ground rules and how to administer the test, for
example, time limits, no dictionaries, and how to conduct warm-up
activities developed for the particular writing task. This ensured
that all students completed the pre-writing activities and the
test in a uniform way.
- Scoring Procedures.
Each of the five writing characteristics receives a score between
0 and 6, with 6 the highest. The total score is determined by
adding each characteristic score and dividing by 5. A sample
scoring grid follows.
- Building scoring consensus.
Reep staff were trained to the writing rubric to score the
two ( pre- and post) performance tasks. Developed readers scored
a range of essays. scores for each writing characteristic were
charted out as shown above, and the scoring rationale was discussed.
This enabled the trainers to see how consistently the rubric
was being interpreted, to pinpoint areas of discrepancy, and
build scoring consensus.
A shortened version of this process was repeated prior to each
scoring session to ensure continued consistency in rubric interpretation
and scoring. Consistency among the readers was tracked to determine
how many tests needed a third reader.
Each test was scored by two readers, and a third reader was
used if the total score was more than one point different. The
second reader did not know how the first reader had scored the
test. In this way, the firstr reader's score did not influrncre
the second reader. Similiarly, students' class levels were not
indicated on the test paper.
Scoring of the tests occured in group sesssions of no longer
than two hours each. ?this seemed to be the point at which readers
began to "burn out."
The training and scoring procedures described above resulted
in an inter-rater reliability of 98%. Only 2% of the tests needed
a third reader.
REEP teachers were involved in every step: developing writing tasks
and warm-up activities, administering tests, developing scoring
procedures, scoring tests, and analyzing data. Through this involvement,
teachers developed a deeper appreciation of testing. They used their
students' test results to inform their instruction so that they
could better meet the needs of their students. Scoring tests written
by beginning to advanced level students gave them a broader picture
of writing levels within the program and informed their decisions
about subsequent class placements.
Teachers shared the writing rubric with their students, giving
them a better sense of how they were being evaluated. Students at
all levels started paying more attention to their writing as a result
of the more formalized writing test. Many began to embrace writing
instruction in the classroom. Learning English now meant more than
learning to "speak" English.
We have all gained a greater understanding of the testing process
and its need to be both fair and credible to all stakeholders. By
participating in the test development process, teachers have developed
skills and knowledge that will enable them to develop performance-based
classroom assessments which meet this criteria as well. These skills
enable us to feel more confident about accepting and reporting gains
derived by performance based assessments.
A Word to the Wise
Developing and using a performance- based assessment requires tremendous
time and financial commitment as well as access to the expertise
of researchers. This commitment must be weighed against the outcomes,
and in our case, the results for the program were significant and
We had hoped to demonstrate that a performance-based assessment
could be a potentially superior instrument for measuring learner
gains and thereby gain credibility with funders. Indeed, our work
with WWLP gave us access to researchers who both guided us through
the testing process and provided feedback on quality issues. At
this writing, we are pleased to report that our WWLP researcher
has concluded that "the REEP Writing Rubric is a carefully
designed and validated instrument with sufficiently high reliability."
We were fortunate in having access to the WWLP project and the professional
support it provided. Practitioners need opportunities like this
in the future if performance-based assessments are to become accepted
Top of page
Inaam Mansoor, Director
Arlington Education and Employment Program (REEP)
2801 Clarendon Boulevard, #218
Arlington, Virginia 22201
Tel.: (703) 228-4200
Fax: (703) 527-6966
Originally published in Adventures in Assessment,
Volume 14 (Spring 2002), SABES/World Education, Boston, MA, Copyright
Funding support for the publication of this document
on the Web provided in part by the Ohio State Literacy Resource
Center as part of the LINCS
Assessment Special Collection.