GUIDELINES FOR THE CONSTRUCTION OF MULTIPLE CHOICE QUESTIONS TESTS


Mohammed O. Al-Rukban


Department of Family & Community Medicine, College of Medicine and King Khalid University Hospital, King Saud University, Riyadh, Saudi Arabia


Multiple Choice Questions (MCQs) are generally recognized as the most widely applicable and useful type of objective test items. They could be used to measure the most important educational outcomes - knowledge, understanding, judgment and problem solving. The objective of this paper is to give guidelines for the construction of MCQs tests. This includes the construction of both “single best option” type, and “extended matching item” type. Some templates for use in the “single best option” type of questions are recommended.


INTRODUCTION


In recent years, there has been much discussion about what should be taught to medical students and how they should be assessed. In addition, highly publicized instances of the poor performance of medical doctors have fuelled the drive to find a way for ensuring that qualified doctors achieve and maintain appropriate knowledge, skills and attitudes throughout their working lives. 1


Selecting an assessment method for measuring students’ performance remains a daunting task for many medical institutions. 2 Assessment should be educational and formative if it is going to promote appropriate learning. It is important that individuals learn from any assessment process and receive feedback on which to build their knowledge and skills. It is also important for an assessment to have a summative function to demonstrate competence. 1


Assessment may act as a trigger, informing examinees what instructors really regard as important 3 and the value they attach to different forms of knowledge and ways of thinking. In fact, assessment has been identified as possibly the single most potent influence on student learning; narrowing students’ focus only on topics to be tested on (i.e. what is to be studied) and shaping their learning approaches (i.e. how it is going to be studied). 4 Students have been found to differ in the quality of their learning when instructed to focus either on factual details or on the assessment of evidence. 5 Furthermore, research has reported that changes in assessment methods have been found to influence medical students to alter their study activities. 4 As methods of assessment drive learning in medicine and other disciplines, 1 it is important that the assessment tools test the attributes required of students or professionals undergoing revalidation. Staff subsequently, redesign their methods of assessment to ensure a match between assessment forms and their educational goals. 6


Methods of assessment of medical students and practicing doctors have changed considerably during the last 5 decades. 7 No single method is appropriate, however, for assessing all the skills, knowledge and attitudes needed in medicine, so a combination of assessment techniques will always be required. 8 – 10


When designing assessments of medical competencies, a number of issues need to be addressed; reliability, which refers to the reproducibility or consistency of a test score, validity, which refers to the extent to which a test measures what it purports to measure, 11 ,12 and standard setting which defines the endpoint of the assessment. 1 Sources of the evidence of validity are related to the content, response process, internal structure, relationship to other variables, and consequences of the assessment scores. 13


Validity requires the selection of appropriate test formats for the competencies to be tested. This invariably requires a composite examination. Reliability, however, requires an adequate sample of the necessary knowledge, skills, and attitudes to be tested.


However, measuring students’ performances is not the sole determinant for choosing an assessment method. Other factors such as cost, suitability, and safety have profound influences on the selection of an assessment method and, most probably, constitute the major reason for inter-institutional variations for the selection of assessment methods as well success rates. 14


Examiners need to use a variety of test formats when organizing test papers; each format being selected on account of its strength as regards to validity, reliability, objectivity and feasibility. 15


For as long as there is a need to test knowledge in the assessment of doctors and medical undergraduates, multiple choice questions (MCQs) will always play a role as a component in the assessment of clinical competence. 16


Multiple choice questions were introduced into medical examinations in the 1950s and have been shown to be more reliable in testing knowledge than the traditional essay questions. It represents one of the most important well-established examination tools widely used in assessment at the undergraduate and postgraduate levels of medical examinations. The MCQ is an objective question for which there is prior agreement on what constitutes the correct answer. This widespread use may have led examiners to use the term MCQ as synonym to an objective question. 15 Since their introduction, there have been many modifications to MCQs resulting in formats. 16 Like other methods of assessment, they have their strengths and weaknesses. Scoring of the questions is easy and reliable, and their use permits a wide sampling of student’s knowledge in an examination of reasonable duration. 15 – 19 MCQ-based exams are also reliable because they are time-efficient and a short exam still allows a breadth of sampling of any topic. 19 Well-constructed MCQs can also assess taxonomically higher-order cognitive processing such as interpretation, synthesis and application of knowledge rather than the test of recall of isolated facts. 20 They could test a number of skills in addition to the recall of factual knowledge, and are reliable, discriminatory, reproducible and cost-effective. It is generally, agreed that MCQs should not be used as a sole assessment method in summative examinations, but alongside other test forms. They are designed to broaden the range of skills to be tested during all phases of medical education, whether undergraduate, postgraduate or continuing. 21


Though writing the questions requires considerable effort, their high objectivity makes it possible for the results to be released immediately after marking by anyone including a machine. 15 ,18 This facilitates the computerized analysis of the raw data and allows the examining body to compare the performance of either the group or an individual with that of past candidates by the use of discriminator questions. 22 Ease of marking by computer makes MCQs an ideal method for assessing the knowledge of a large number of candidates. 16 ,22


However, a notable concern of many health professionals is that they are frequently faced with the task of constructing tests with little or no experience or training on how to perform this task. Examiners need to spend considerable time and effort to produce satisfactory questions. 15


The objective of this paper is to describe guidelines for the construction of two common MCQs types: the “single best option” type, and “extended matching item” type. Available templates for the “single best option” type will be discussed.


Single Best Option


The first step for writing any exam is to have a blueprint (table of specifications). Blueprinting is the planning of the test against the learning objectives of a course or competencies essential to a specialty. 1 A test blueprint is a guide used for creating a balanced examination and consists of a list of the competencies and topics (with specified weight for each) that should be tested on an examination, as in the example presented in Table 1 .


Example of a table of specifications (Blueprint) based on the context, for Internal Medicine examination



If there is no blueprint, the examination committee should decide on the system to be tested by brainstorming to produce a list of possible topics/themes for question items. For example, abdominal pain, back pain, chest pain, dizziness, fatigue, fever, etc, 23 and then select one theme (topic) from the list. When choosing a topic for a question, the focus should be on one important concept, typically a common or a serious and treatable clinical problem from the specialty. After choosing the topic, an appropriate context for the question is chosen. The context defines the clinical situation that will test the topic. This is important because it determines the type of information that should be included in the stem and the response options. Consider the following example: (Topic= Hypertension; Context= Therapy).


The basic MCQ model comprises a stem and a lead-in question followed by a number of answers (options). 19 The option which matches the key in a MCQ is best called “the correct answer” 15 and the other options are the “distracters”.


For writing a single best option type of MCQs, as shown in Appendix 1 , it is recommended that the options are written first. 23 A list of possible homogeneous options based on the selected topic and context is then generated. The options should be readily understood and as short as possible. 18 It is best to start with a list of more than five options (although only five options are usually used in the final version). This allows a couple of ‘spares’, which often come in handy! It is important that this list be HOMOGENOUS (i.e. all about diagnoses, or therapeutics, lab investigations, complications… etc) 23 and one of the options selected as the key answer to the question.



MCQ Preparation Form


A good distracter should be inferior to the correct answer but should also be plausible to a non-competent candidate. 24 All options should be true and contain facts that are acceptable to varying degrees. The examiner would ask for the most appropriate, most common, least harmful or any other feature which is at the uppermost or lowermost point in a range. It needs to be expressed clearly that only one answer is correct. A candidate’s response is considered correct if his/her selection matches the examiner’s key. 15


When creating a distracter, it helps to predict how an inexperienced examinee might react to the clinical case described in the stem. 24


A question stem is then written with lead-in statement based on the selected correct option. Well-constructed MCQs should test the application of medical knowledge (context-rich) rather than just the recall of information (context- free). Schuwirth et al, 25 found that context-rich questions lead to thinking processes which represent problem solving ability better than those elicited by context-free questions. The focus should be on problems that would be encountered in clinical practice rather than an assessment of the candidate’s knowledge of trivial facts or obscure problems that are seldom encountered. The types of problems that commonly encountered in one’s own practice can provide good examples for the development of questions. To make testing both fair and consequentially valid, MCQs should be used strategically to test important content, and clinical competence. 19


The clinical case should begin with the presentation of a problem and followed by relevant signs, symptoms, results of diagnostic studies, initial treatment, subsequent findings, etc. In essence, all the information that is necessary for a competent candidate to answer the question should be provided in the stem. For example:


The lead-in question should give clear directions as to what the candidate should do to answer the question. Ambiguity and the use of imprecise terms should be avoided. 16 ,18 There is no place for trick questions in MCQ examinations. Negative stems should be avoided, as should double negatives. Always, never and only are obviously contentious in an inexact science like medicine and should not be used. 16 , 18


Consider the following examples of lead-in questions:


Example 1: Regarding myocardial infarction.


Example 2: What is the most likely diagnosis?


Note that for Example 1, no task is presented to the candidate. This type of lead-in statement will often lead to an ambiguous or unfocused question. In the second example, the task is clear and will lead to a more focused question. To ensure that the lead-in question is well constructed, the question should be answerable without looking at the response options. As a check, the response options should be covered and an attempt made to answer the question.


Well constructed https://www.the-essays.com/buy-essay should be written at a level of difficulty appropriate to level of the candidates. A reason often given for using difficult questions is that they help the examiner to identify the `cream’ of the students. However, most tests would function with greater test reliability when questions of medium difficulty are used. 26 An exception, however, would be the assessment of achievement in topic areas that all students are expected to master. Questions used here will be correctly answered by nearly all the candidates and consequently, will have high difficulty index values. On the other hand, if a few candidates are to be selected for honours, scholarships, etc., it is preferable to have an examination of the appropriately high level of difficulty specifically for that purpose. It is important to bear in mind that the level of learning is the only factor that should determine the ability of a candidate to answer a question correctly. 15


The next step is to reduce the list of option to the intended number of options which is usually five options (including, of course, the correct answer).


Lastly, the option list is to be arranged into a logical order to reduce guessing and avoid putting the correct answer in habitual location (e.g. using alphabetical order will make it possible to avoid choosing options B or C as key answers more frequently).


The role of guessing in answering MCQs has been debated extensively and a variety of approaches have been suggested to deal with the candidate who responds to questions without possessing the required level of knowledge. 27 – 29 A number of issues need closer analysis when dealing with this problem. Increasing the number of questions in a test paper will reduce the probability of passing the test by chance. 15


Once the MCQs have been written, they should be criticized by as many people as possible and they should be reviewed after their use. 16 ,18 The most common construction error encountered is the use of imprecise terms. Many MCQs used in medical education contain undefined terms. Furthermore, there is a wide range of opinions among the examiners themselves about the meanings of such vague terms. 30 The stem and options should read logically. It is easy to write items that look adequate but do not constitute proper English or do not make sense. 18


When constructing a paper from a bank of MCQs, care should be taken to ensure that there is a balanced spread of questions across the subject matter of the discipline being tested. 16 A fair or defensible MCQ exam should be closely aligned with the syllabus; be combined with practical competence testing; sample taken broadly from important content and be free from construction errors. 19


Extended Matching Items (EMIs)


Several analytic approaches have been used to obtain the optimal number of response options for multiple-choice items. 31 – 35 Focus has shifted from traditional 3–5 branches to larger numbers of branches. This may be 20-30 in the case of extended-matching questions (EMIs), or up to 500 for open-ended or ‘uncued’ formats. 36 However, the use of smaller numbers of options (and more items) results in a more efficient use of testing time. 37


Extended-matching items are multiple choice items organized into sets that use one list of options for all items in the set. There is a theme, an option list, a lead-in statement and at least two item stems. A typical set of EMIs begins with an option list of four to 26 options; more than ten options are usually used. The option list is followed by two or more patient-based items requiring the examinee to indicate a clinical decision for each item. The candidate is asked to match one or more options to each item stem.


Extended matching items have become popular in such specialties as internal and family medicine because they can be used to test diagnostic ability and clinical judgment. 15 Its use likely to increase in postgraduate examinations as well as in undergraduate assessment. 21 Computer-based extended matching items have been used for in-course continuous assessment. 38


EMIs are more difficult, more reliable, more discriminating, and capable of reducing the testing time. In addition, they are quicker and easier to write than other test formats. 39 ,40 Over the past 20 years, multiple studies have found that EMI-based tests are more reproducible (reliable) than other multiple-choice question (MCQ) formats aimed at the assessment of medical decision making. 20 ,41 ,42 There is a wealth of evidence that EMIs are the fairest format. 19


Another more recent development is uncued questions where answers are picked from a list of several hundred choices. These have been advocated for use in assessing clinical judgment, 43 but extended matching questions have surprisingly been shown to be as statistically reliable and valid as uncued queries. 20 ,41 ,44


Extended matching questions overcome the problem of cueing by increasing the number of options and are a compromise between free-response questions and MCQs. This offers an objective assessment that is both reliable and easy to mark. 45 – 47


Nevertheless, MCQs have strengths and weaknesses and those responsible for setting MCQ papers may consider investigating the viability and value of including some questions in the extended matching format. Item writers should be encouraged to use the EMI format with a large number of options because of the efficiencies this approach affords in item preparation. 20 ,39


For the construction of EMIs the following steps are suggested ( Appendix 2 ):