logo image

TD Magazine Article

Stay the Course With Test Design

Map out a plan to ensure your test will accurately measure the desired data.

By

Wed Aug 01 2018

Loading...

What can assessments do? They can diagnose performance problems and evaluate the effectiveness of solutions, such as training. For example, a team is having problems meeting deadlines and the quality of its work is poor. The company administered a pretest for teamwork to determine whether examinees knew how to behave in different teamwork situations. The test verified that the team members knew how to be good team members. As a result, the company was able to focus its resources on why the team members were not motivated to exhibit good team member behaviors on the job.

While assessments can do many things, test developers should be clear about exactly what they intend to measure. For example, a large retail chain invested a tremendous amount into the development of a test to identify future store managers. The test worked so well that the company decided to use the test to identify all their future managers (such as for the warehouse, customer service, and loss prevention). While there is some overlap in the competencies required for these different manager positions, many different skill sets also are required. As a result, when used with other future managers, the test was not measuring what it was intended to.

Advertisement

Measuring what is intended

So, how does a developer ensure that a test is measuring what it is supposed to measure—that is, that it's valid? The key is to develop a test plan and follow it throughout the development and delivery process. A test plan defines what the test is and what it is not; it defines the test and guides its development and use.

A test plan may contain more or less detailed information depending on the test's purpose. However, all test plans should be documented in writing and provide specific answers to at least these basic questions:

  • What are the purpose, goals, and desired outcomes?

  • Who will be tested?

  • What item format(s) will be used?

  • How many items will be needed?

  • How much time will be needed or allocated for the test?

  • How will the test be delivered?

  • What is the timeline for development?

  • What resources will be required, including labor?

  • Who will receive the results, and how will they be delivered and explained?

The sample test plan in the adjacent sidebar provides a sense of what this may look like in practice. The first two questions on the list are hopefully easy for the test developer to answer. Documenting them should help the organization avoid a negative situation. However, the questions regarding the time needed, number of items, and item format may not be as easy to answer, especially because the answer to one question affects the answer to another.

Item format

Let's look at item format first. Typically, item types are distinguished based on the degree to which an examinee must supply, develop, perform, or create something. This is often viewed as the amount of restraint that is placed on the examinee in producing an answer and is viewed as a continuum. These are methods of categorization:

  • multiple choice

  • selection and identification

  • reordering and rearrangement

  • substitution and correction

  • completion

  • construction

  • presentation and portfolio.

On its website, the Society for Industrial and Organizational Psychology provides a summary of pros and cons of different item types. For example, multiple-choice tests usually require less test time for attendees, but different examinees may misinterpret the questions. On the other hand, essay questions can test complex learning objectives, but they usually take more time to answer and may be limited in scope in terms of how much of the course is covered.

Advertisement

How many items?

It is important to remember that a test is a sample of a person's knowledge, skills, or abilities (KSAs) as they relate to performing a task. Even a driving test is a sample. If the driving test occurs in a small rural area, that doesn't mean the person can drive in a large city; or if the test occurs on a sunny day, that doesn't confirm with confidence that the individual can drive in poor weather conditions.

The best you can do is try to sample adequately enough to feel comfortable saying that the examinee has the KSAs to perform the task (that is, reliability). Unfortunately, most test developers assign the same weight to every area or learning objective they are testing.

For example, I want a 100-item test and I have 10 objectives, so I ask 10 items for each one. This assumes that there is a plan. By identifying the number of items to be written for each content area prior to writing items, you will ensure that there is an adequate sampling of each area and that there is a sufficient number of items to effectively measure the content area.

When deciding how many items to ask for each area, think about these elements for each content area:

  • Criticality. How important is the material covered on the test? Can the outcomes affect the safety of employees, the public, or clients? Can it affect the company's success, such as the bottom line? If it can, you want to have more items on the test overall or on that particular area of the test.

  • Consequences. What are the consequences of misclassifying someone based on the test results, such as retaking a 30-minute online class or not being eligible for a promotion? The more severe the consequences, the more items are needed. Increasing the number of items increases the reliability of the results.

  • Size. How large is the sample of KSAs that the test is covering? If it is large, you need to have more items.

  • Homogeneity. The greater the similarity of the material covered on the test, the fewer items are needed. For example, if a test covered addition and multiplication, you would need fewer items than if it covered addition, multiplication, subtraction, and division.

  • Resources. This is where practicalities come in. You may want to give a three-hour test with 150 items, but do you have the time and resources to develop a 150-item test, and can you afford to take employees off the job to take the test for three hours?

Calculating the number

There is a simple formula you can use to determine the number of items:

Advertisement

Number of test questions = CC x RSH x DS

Total number of questions to write = Number of test questions x 3

First, rate each content area or learning objective using the following three scales. One person can do this or take the average of multiple people's ratings.

Criticality and consequences scale (CC). The criticality value is a number based on how critical this content area is to job performance and how high the consequences are of passing someone who is not ready. Give an unimportant area a 0 and give an extremely critical area a 4.

Relative size and homogeneity scale (RSH). The size of the content area is relative to the size of the other content areas on the test content outline, or test blueprint as it is sometimes known. For example, if the content area covers a complex process described in 50 paragraphs, then it may be considered a large body of knowledge. In comparison, a four-step process that is outlined in one or two paragraphs would be a small body of knowledge.

Difficulty scale (DS). If a content area involves simple material that is easy to master, routine, or predictable, assign less weight (0.5) to that area, whereas assign more weight (1.5) to a content area that is complex or unpredictable, or has multipleinterdependent steps or many types of information to evaluate. If you are using the test to determine whether training objectives have been met, consider the length of time participants spend in the course material as one way to determine the difficulty, because they will likely spend more time covering it if it is a difficult concept.

Time needed

After you have determined what item types you will use, think about resources. Resources include the time required to develop, administer, and score the test and report the results to stakeholders. Table 1 provides general guidelines to use when estimating how much time you will require for examinees to take the test. Remember to add time for examinees to read or receive instructions.

The formula described above can be a great starting place, but it doesn't factor in the reality of limited resources. This is where you need to make hard decisions.

For example, Table 2 indicates that there should be 54.5 items. The workshop is two days and you only want to allocate 60 minutes for the test, with five minutes for directions. As shown in Table 2, 80.5 minutes would be required if you followed the formula. If you assume that you can write three to five items for each visual with a total of four visuals for the process content area, you can reduce the time estimate from 31.5 minutes to 23-25 minutes.

If you reduce the number of application items from 19.5 to 13, you meet the 55-minute time limit. Alternatively, you could decide that it is worth adding 13 minutes to the testing time to include 19 application items. Sometimes it is helpful to return to the original individual scale ratings used in the formula when making decisions on how to allocate resources.

By developing a test plan and staying the course throughout the development and delivery process, your stakeholders will be much more confident that the test is measuring what it is supposed to measure. More importantly, you will be able to demonstrate that employees have gained the required competencies and that critical decisions can be made based on the results.


Jefferson County Librarian Test

  • What is the purpose of the test? To identify the librarian's skills gaps related to the administration of a library, such as software usage, organizational planning and development, accounting procedures and analysis, customer service, and team-building skills. The purpose is not to identify skills gaps related to being a librarian, such as cataloging materials.

  • Who will be tested? Each library in Jefferson County has a librarian in charge of the facility.

  • How much time will be needed for the test? One to two hours for the pretest and post-test

  • How will the test be delivered? Paper and pencil

  • How many items will be needed? 60

  • What item format(s) will be used? Multiple choice

You've Reached ATD Member-only Content

Become an ATD member to continue

Already a member?Sign In

ISSUE

August 2018 - TD Magazine

View Articles
Advertisement
Advertisement

Copyright © 2024 ATD

ASTD changed its name to ATD to meet the growing needs of a dynamic, global profession.

Terms of UsePrivacy NoticeCookie Policy