• Tiada Hasil Ditemukan

View of EVALUATING MULTPLE CHOICE QUESTIONS FROM ENGINEERING STATISTICS ASSESSMENT

N/A
N/A
Protected

Academic year: 2022

Share "View of EVALUATING MULTPLE CHOICE QUESTIONS FROM ENGINEERING STATISTICS ASSESSMENT"

Copied!
14
0
0

Tekspenuh

(1)

International Journal of Education and Pedagogy (IJEAP) eISSN: 2682-8464 [Vol. 3 No. 4 December 2021]

Journal website: http://myjms.mohe.gov.my/index.php/ijeap

EVALUATING MULTPLE CHOICE QUESTIONS FROM ENGINEERING STATISTICS ASSESSMENT

Aishah Mohd Noor1*

1 Faculty Applied Science and Humanity, Universiti Malaysia Perlis, Perlis, MALAYSIA

*Corresponding author: aishah.mn@unimap.edu.my

Article Information:

Article history:

Received date : 31 October 2021 Revised date : 17 November 2021 Accepted date : 4 December 2021 Published date : 7 December 2021 To cite this document:

Mohd Noor, A. (2021).EVALUATING MULTPLE CHOICE QUESTIONS FROM ENGINEERING STATISTICS ASSESSMENT. International Journal of Education and Pedagogy, 3(4), 33-46.

Abstract: Item analysis is an essential aspect of test construction and continuous test items improvement. This study evaluates fifteen test items: difficulty index, discriminant index, distractor efficiency, and point-biserial correlation. For this purpose, the data was obtained from a sample of 38 students' test responses. Two sets of questions were randomly assigned to eighteen and twenty students.

The analysis was performed using Microsoft Excel. The difficulty index was moderate for both sets, with mean and standard deviation found to be 0.47 0.20 and 0.65 0.21 respectively. Ten items (Set 1) and eight items (Set 2) showed a moderate difficulty index. The discriminant index was reasonably good with a mean of 0.37 0.18 (Set 1) and considered marginal with a mean of 0.29 0.18 (Set 2). The correlation between item scores and total scores was found to be significant for six items (Set 1) at a 5% significance level. All these six items were with moderate difficulty and very good discriminant index. Out of four items, three items (Set 2) were significantly correlated with moderate difficulty and very good discriminant index. Out of 45 distractors, only four (Set1) were non-functioning distractors. Meanwhile, twelve non-functioning distractors in Set 2. The evaluation of the test items provides information on the quality of the test items. The items with a moderate difficulty index and very good discriminating power can be included in the questions bank. Continuous test items modifications based on item analysis performed eventually improve the overall quality of assessments.

(2)

1. Introduction

Assessment is an integral part of the teaching and learning process. Planning, developing, and executing assessment is essential to evaluate students' abilities to use and apply content knowledge.

There are many ways for students to demonstrate mastery in content knowledge, either in a formative or summative assessment. One of the common types of questions used in a formative assessment is the multiple-choice questions (MCQs). Regardless of assessment methods, effective assessment must be aligned with the content and context (Yang et al.,2019; Kibble, 2017; Parkes & Zimmaro,2016).

MCQs are popular because it is easy to implement and score. Automation of the scores makes the process quicker, and since there is only one answer, MCQs eliminate marker biased. However, more time is required to construct MCQs. A good MCQ should assess students' abilities ranging from recall knowledge to higher-level cognitive demands as in Bloom’s Taxonomy, for example, understanding, applying, and analyzing (Thompson & O’Loughlin, 2015). Well-written MCQs offer a significant advantage over free-response questions and measure in-depth understanding (Buckles & Siegfried, 2006). Hence, it is vital to construct an item with a specific cognitive process (Butler, 2018).

Therefore, assessment approaches should be continuously evaluated, including the suitability of MCQs within the particular subject matter to ensure the quality of the constructed questions and the quality of the assessment as a whole.

This study aims to provide an example of an item analysis for MCQs statistics test items. The analysis performed includes difficulty index, discriminant index, distractor efficiency, and point-biserial correlation between the score of individual items and the total scores. Hence, the objectives were:

1. To determine whether an item is easy, moderate, or difficult.

2. To identify whether an item separates student who show mastery from those who are not.

3. To evaluate which item should be reviewed or eliminated.

2. Literature Review

MCQs consist of a question known as the stem, the correct answer known as a key, and the incorrect alternatives known as distractors. An effective test question assesses course learning outcomes that the learner should master. Hence, developing a test question should reflect students' ability to understand the tested learning outcomes, not due to confusion or beyond learner competence. In addition, a test question should reasonably discriminate students who understand the materials from those who do not. Therefore, it is crucial to perform an item analysis for MCQs to evaluate the quality of items constructed (Mehta and Mokhasi, 2014).

Keywords: assessment, item analysis, difficulty index, discriminant index, distractor efficiency, point-biserial correlation.

(3)

Item analysis involves collecting, summarizing, and using information from students’ responses to assess the quality of test items (Ingale et al. 2017; D’Sa & Visbal-Dionaldo, 2017; Burud et al., 2019).

By evaluating the test items, instructors can diagnose whether the items are reliable and valid (Mahjabeen et al., 2017; Kheyami et al., 2018; Hingorjo & Jaleel, 2012). The analysis allows the instructor to measure test items based on an item's difficulty index and discriminant power. The difficulty index tells whether an item is too easy or too difficult. Meanwhile, the discrimination index tells how well an item discriminated between high achievers and low achievers on a test. In addition to difficulty and discriminant index, a distractor efficiency of an item helps identify how an item is being answered and allows the instructor to see whether the distractors are functioning the way it should. Therefore, instructors should incorporate item analysis to guide the construction of test items (Baldwin, 1984).

The advantages of the item analysis are not only to assess the quality of test items but also the effectiveness of teaching and learning strategies. The instructor could identify students’ weaknesses in specific concepts or applications which require greater clarity and emphasis. The data and results from item analysis can be helpful to interpret the flaws of test items. Retained the good items and meanwhile, the bad items should be revised or removed. Therefore, using item analysis will consequently improve the quality of an assessment and the teaching and learning process.

2.1 Problem Statement

MCQs are a practical and easy tool to assess a wide range of content knowledge. The development of MCQs across different cognitive levels is ideal for large groups of students in higher education, especially for non-statistics majors who took the course as part of their program. Little attention has been given to performing item analysis as part of continuous quality assessment (CQI). Usually, the CQI reported at the end of the semester depends solely on students’ final scores to measure course learning outcomes. CQI should impart a data-driven approach to inform students’ learning outcomes achievement and the course effectiveness concerning the quality of questions constructed rather than limiting the report to students' overall performance. Hence, MCQs' effectiveness can guide classroom instructions and assessments and eventually empower the teaching and learning process (Brady, 2005;

Malau-Aduli et al.,2014).

3. Method

This study was conducted on undergraduate students at Universiti Malaysia Perlis who took Engineering Statistics course. There were two sets of questions assigned to two groups of students at random. For each set, there were 15 items with four options. Therefore, there were 45 distractors.

One mark was given to the correct answer and a zero mark to the incorrect answer. Hence, the possible total marks were 15, and the lowest mark was zero. The time allocated for the test was 30-minute.

The topic tested was one-way analysis of variance (One-Way ANOVA). The cognitive levels are remembering, understanding, and applying. The item analysis performed were difficulty index, discrimination index, point-biserial correlation analysis, and distractor effectiveness.

(4)

3.1 Materials

The questions were administrated using the simplest online tool, Google Form, and assigned to each student through their Google Classroom. The responses collected through Google Form were stored in Google Sheets and downloaded as a Microsoft Excel file. The item analysis was performed by using MS Excel 365.

3.1.1 Samples

The purposive sampling technique was employed to collect data from a sample of 38 mechanical engineering students at Universiti Malaysia Perlis who took the Engineering Statistics course.

3.2 Measurement 3.2.1 Difficulty Index

The difficulty index (DIF) shows the proportion of students who answered MCQs items correctly. A high difficulty index demonstrates a high ability to respond to the corresponding item, and therefore, the item is considered an easy item. Similarly, for a low index value, the item is regarded as a difficult item. DIF can be from 0 to 1 (0% to 100%), and the formula to calculate DIF for each item is given as follows (Mahjabeen et al., 2017):

U L

C C DIF p

N

= = +

where

number of students who answered correctly in the upper group CU =

number of students who answered correctly in the lower group CL =

total number of students in the upper and lower groups N =

The upper group is defined as students with higher total scores after sorting the scores from high to low. In contrast, the lower group refers to those with lower scores. The proportion of students in both groups can be determined by identifying the percentage of students in the upper (lower) group either 27%, 33%, or 50%. In most applications, the standard percentage of students is 27% (see e.g., Kelly, 19239); however, it can be in the range between 27% to 50% based on the sample size.

In this study, the DIF calculated is based on Ebel & Frisbie (1991):

DIF p C

= = N where

number of students who answered correctly C=

total number of students N=

(5)

An item's difficulty index can be considered a good item if the index is between 0.3 and 0.7 (Date et al., 2019). Meanwhile, too easy or too difficult item, the DIF will have a lower power to discriminate between the upper and lower group students. Therefore, the higher the DIF (shows that the item is easy) or, the lower the DIF (the item is difficult), the lower the power of DIF.

3.2.2 Discriminant Index

Discriminant index (DI) refers to the ability of an item to distinguish the higher-performing students (upper group) from lower-performing students (lower group). This index is the difference between the proportion of those who correctly respond to the item from the upper and lower groups. The higher the difference, the better the item discriminates between the two groups. DI can be calculated using the formula:

2 CU CL

DI N

 − 

=   where

number of students who answered correctly in the upper group CU =

number of students who answered correctly in the lower group CL =

total number of students in the upper and lower groups N =

DI index takes the value from a negative one to one. The closer the index is to one, the more effectively the item separates the upper from the lower groups. A positive index shows a higher proportion of those who get correct answers in the upper group than the lower group students.

Meanwhile, a negative index indicates higher lower group students responding to the item correctly than the upper group. In this regard, the item needs to be further revised since, most likely, it reflects a flawed item. The interpretation of the DI index may affect by the range of students’ abilities.

3.2.3 Distractor Analysis

Distractor analysis examines the number of test-takers in the upper and lower groups selecting each option in MCQs. It can be helpful to identify whether a distractor is functioning as intended. A distractor is effective when some test takers choose it. A distractor is a functional distractor (FD) if selected by more than 5% of the test takers. Otherwise, the distractor is a non-functional (NFD) distractor. The range of DE is 0-100%. A good distractor is when more test takers select it in the lower than the upper group. Any unselected distractor reveals information about the inadequate performance of the corresponding distractor. Hence, the distractor must be modified so that the distractor is believable.

(6)

3.2.4 Point-Biserial Correlation

Point-biserial correlation determines possible correlations between each test item and the total test score (Ebel & Frisbie, 1991; Kornbrot, 2005). The single test item score is a dichotomous variable (label as ‘1’ for a correct answer, ‘0’ for an incorrect answer) while the total test score is continuous.

The strength of the correlation between the variables can be any value from a negative one to one.

A high and positive correlation shows that higher-performing test-takers answered the item correctly more than the lower-performing group. A negative correlation shows higher-performing test-takers wrongly answer the item and the low performing test-takers correctly answer the item.

The point biserial correlation for the ith item is calculated based on LeBlanc (2017):

C w

i c w

x x

r p q

s

= −

Where

: average score of test takers who answered correctly th-item

xC i

: average score of test takers who answered wrongly th-item

xw i

: propotion of students who answered correctly th-item

pc i

: propotion of students who answered wrongly th-item

qw i

3.3 Data Analysis

Steps for Data Preparation:

1. The test items were prepared using the Google Form, and one point for each item was assigned so that the form graded each response automatically.

2. From the Google Form responses tab, a new spreadsheet was created to store all responses.

3. The Google Spreadsheet file was downloaded as a Microsoft Excel file.

4. In Microsoft Excel, all test items' responses and total scores were copied and pasted into a new sheet. There were sixteen columns. Fifteen columns correspond to all the items (item 1,2, …, 15) and the total scores column. Each row referred to students’ responses.

5. For each test item (represented by a column), the correct response (A, B, C, or D) was replaced to ‘1’ and ‘0’ for the incorrect answers.

: population standard deviation of the test scores s

(7)

Steps for Data Analysis: Descriptive Statistics:

1. The descriptive analysis of test scores was performed using Microsoft Excel 365. From Data Analysis Toolpax, select Descriptive Statistics. The total scores column was chosen as the input range, and ticked the summary statistics checkbox.

2. The distribution of the test scores was visualized using a boxplot. From Insert Chart in Microsoft Excel, choose Boxplot.

Steps for Item Analysis: Difficulty, Discriminant, Point-Biserial Correlation, Distractor Efficiency.

1. The total test scores were ranked from highest to lowest scores.

2. The test-takers were divided into groups; the upper group (the top 50%) and the lower group (the bottom 50%). The middle group was ignored (in the case of an odd sample size).

3. The total number of correct responses from both upper (CU) and lower group (CL) was computed for each item.

4. The difficulty index and the discriminant index were calculated for each item.

5. The point-biserial correlation for each item was calculated using CORREL (array1, array2) function in Microsoft Excel 365. For example, the point-biserial correlation for item 1 was computed using CORREL (item 1 column, total scores column).

6. Each distractor selected by the upper and lower groups was recorded in a table to determine the distractor efficiency.

4. Results and Discussion 4.1 Item Analysis of Set 1

Two sets of questions, each comprised of fifteen items, were analyzed and presented in tables (Table 1 until Table 8) and figures (Figure 1 and 2). The descriptive statistics in Table 1 show the average score of 7.0556 with a standard deviation of 3.1710. The range of scores was between 3 and 12. There were at least 50% of students scored above the score of 6.5. The distribution shape was right-skewed, as shown in Figure 1, with a kurtosis value of -1.6865. The negative value indicated the distribution is flatter and thinner tail than a normal curve. Lastly, from Figure 1, the dispersion of students' total scores in the top 50% (from the median to the maximum value) was higher than in the bottom 50%

(from the median to the minimum value).

Table 1: Descriptive Statistics of Set 1 Total Scores Descriptive Statistics

Total students 18

Average 7.0556

Min 3

Max 12

Median 6.5000

Standard Deviation 3.1710

Kurtosis -1.6865

Skewness 0.0426

(8)

Figure 1: Boxplot of Total Scores of Eighteen Students Table 2: Item Analysis of Set 1 MCQs Item Total

Correct

Upper Correct

(CU)

Lower Correct

(CL)

DIF DI

Point- Biserial Correlation

p-value

1 15 9 6 0.8333 0.3333 0.4434 0.0653

2 14 8 6 0.7778 0.2222 0.2698 0.2789

3 10 8 2 0.5556 0.6667 0.7417 0.0004*

4 11 8 3 0.6111 0.5556 0.6430 0.0040*

5 6 5 1 0.3333 0.4444 0.3697 0.1311

6 8 5 3 0.4444 0.2222 0.3467 0.1587

7 6 4 2 0.3333 0.2222 0.3697 0.1311

8 3 2 1 0.1667 0.1111 0.1371 0.5876

9 3 2 1 0.1667 0.1111 0.2338 0.3504

10 10 7 3 0.5556 0.4444 0.5603 0.0156*

11 13 8 5 0.7222 0.3333 0.5747 0.0126*

12 3 3 0 0.1667 0.3333 0.4273 0.0770

13 9 7 2 0.5000 0.5556 0.4867 0.0405*

14 6 6 0 0.3333 0.6667 0.7521 0.0003*

15 10 7 3 0.5556 0.4444 0.3426 0.1640

DIF: Difficulty Index, D: Discriminant Index, *significant at 5% significance level

Table 2 shows the DIF, DI, point-biserial correlation, and the p-value of the correlation. The mean and standard deviation of DIF and DI is 0.47 0.20 and 0.37 0.18 respectively. Therefore, the DIF is generally moderate with a reasonably good DI. All the point-biserial correlations are positive.

Hence, the items were reliable since higher-performance students chose the correct answer more than the lower-performance group. In addition, there were six items (item 3,4,10,11,13,14) with a significant correlation between an item score and total scores at a 5% significance level.

(9)

Table 3: Interpretation and Analysis of Difficulty Index, Discriminant Index and Distractor Efficiency

Parameter Item statistic Interpretation Total

Difficulty Index < 0.25 Difficult 3

0.25 – 0.75 Moderate 10

Above 0.75 Easy 2

Discriminant Index < 0.20 Poor items, to be rejected or improved by revision

2 0.20 to 0.29 Marginal items, usually needing and

being subject to improvement.

3 0.30 to 0.39 Reasonably good but possibly subject

to improvement

3

0.40 and higher Very good items 7

Distractor Efficiency > 0.05 Functional distractor 41

0.05 Non-functional distractor 4

Out of fifteen items, ten items with acceptable or moderate difficulty index (Table 3). Three items were considered difficult, and two of the items also had a poor DI. Meanwhile, two items were easy (above 0.75). The two poor items (DI was less than 0.20) should be rejected or may possibly require major revision. Three items with DI between 0.30 and 0.39, and hence, there were reasonably good.

Another three items are considered as marginal items and can be rewritten for improvement. Seven items, or about 47%, were considered very good items (DI was 0.40 and higher). Meanwhile, the distractor analysis shows that out of 45 distractors, 41 functioning distractors and four were non- functional distractors.

Table 4: Cross-tabulation Between Difficulty and Discriminant Level Poor Marginal Reasonably

good

Very good Total

Difficult 2 1 3

Moderate 2 1 7 10

Easy 1 1 2

Total 2 3 3 7 15

The cross-tabulation between DIF and DI for each item shows 67% of the items with moderate DIF.

Likewise, 67% of the items were good items with a DI of at least 0.30. Consequently, 80% of the moderate items rated a good DI (DI at least 0.30). Hence, the items can be included in the questions bank. Three items with marginal DI and one of the items were easy. This item can be rewritten in terms of the cognitive level. Finally, the two easy items with poor DI should be removed or reviewed.

(10)

4.2 Item Analysis of Set 2

The descriptive statistics in Table 5 show the average score of 9.8 with a standard deviation of 2.6077.

The range of scores was between 5 and 14. There were at least 50% of students scored above 9.5. The shape of the distribution is left-skewed, as shown in Figure 2, with a kurtosis value of -1.0504. The negative kurtosis indicates that the distribution is flatter and thinner tail than a normal curve.

Moreover, higher dispersion of test scores is in the top 50% of test scores compared to the bottom 50%.

Table 5: Descriptive Statistics of Set 2 Total Scores Descriptive Statistics

Total students 20

Average 9.8

Min 5

Max 14

Median 9.5

Standard Deviation 2.6077

Kurtosis -1.0504

Skewness -0.0736

Figure 1: Boxplot of Scores from Twenty Students

(11)

Table 6: Item Analysis of Set 2 MCQs Item Total

Correct

Upper Correct

(CU)

Lower Correct

(CL)

DIF DI

Point- Biserial Correlation

p-value

1 11 7 4 0.5500 0.3000 0.4429 0.0505

2 16 9 7 0.8000 0.2000 0.5016 0.0242*

3 18 10 8 0.9000 0.2000 0.4328 0.0567

4 11 9 2 0.5500 0.7000 0.5615 0.0100*

5 17 10 7 0.8500 0.3000 0.3526 0.1273

6 14 8 6 0.7000 0.2000 0.2919 0.2117

7 9 7 2 0.4500 0.5000 0.5457 0.0128*

8 18 10 8 0.9000 0.2000 0.4328 0.0567

9 8 5 3 0.4000 0.2000 0.1847 0.4356

10 11 8 3 0.5500 0.5000 0.4033 0.0778

11 18 9 9 0.9000 0.0000 0.1049 0.6598

12 4 3 1 0.2000 0.2000 0.4328 0.0567

13 17 9 8 0.8500 0.1000 0.0771 0.7465

14 14 8 6 0.7000 0.2000 0.5495 0.0121*

15 10 8 2 0.5000 0.6000 0.5508 0.0118*

DIF: Difficulty Index, DI: Discriminant Index, *significant at 5% significance level

The mean and standard deviation of DIF and DI was 0.65 0.21 and0.29 0.18 respectively.

Therefore, the DIF was moderate with a marginal DI. All the point-biserial correlations were positive.

Hence, the items were reliable since higher-performance students chose the correct answer more than the lower-performance group. In addition, there were five items (items 2,4, 7, 14,15) with a significant correlation between an item score and total scores at a 5% significance level.

Table 7: Interpretation and Analysis of Difficulty Index, Discriminant Index and Distractor Efficiency

Parameter Item statistic Interpretation Total Item

Difficulty Index < 0.25 Difficult 1

0.25 – 0.75 Moderate 8

Above 0.75 Easy 6

Discriminant Index < 0.20 Poor items, to be rejected or improved by

revision 2

0.20 to 0.29 Marginal items, usually needing and being

subject to improvement. 7

0.30 to 0.39 Reasonably good but possibly subject to

improvement 2

0.41 and higher Very good items 4

Distractor Efficiency > 0.05 Functional distractor 33

0.05 Non-functional distractor 12

Out of fifteen items, six items were easy (Table 7), with two of them having a poor DI (Table 8). The result found to be consistent with the previous analysis wherein the difficult items were also had a poor DI. Meanwhile, eight items were considered moderate (Table 7), with four being a very good DI (Table 8). One item was easy; however, the distractors were acceptable (marginal DI). Hence, we

(12)

Out of seven marginal items (Table 8), three easy items should be examined. It can be either the distractor's lack of efficacy or the question type that needs some improvement. Next, two items were reasonably good, each with the moderate and easy DIF (Table 8). From the cross-tabulation, six items (five moderate and one easy item with a good DI can be included in the questions bank.

An easy item does not necessarily be removed or reviewed if the distractors are functioning. A combination of easy items (lower-level cognitive), moderate items (mixed of lower and higher-level), and difficult items (higher-level cognitive) will produce a balance test items. Next, from the distractor analysis, out of 45 distractors, 33 were functioning distractors, and twelve were non-functional distractors.

Table 8: Cross-tabulation Between Difficulty and Discriminant Level Poor Marginal Reasonably

good

Very good Total

Difficult 1 1

Moderate 3 1 4 8

Easy 2 3 1 6

Total 2 7 2 4 15

The cross-tabulation between DIF and DI for each item shows that 53% with moderate difficulty.

Half of the moderate items had a very good DI and one moderate item with reasonably good DI. This indicates that these items can be included in the questions bank. The two easy items with poor DI should be removed or reviewed. 47% of marginal items had an equal proportion of moderate and easy DIF, and only one item was considered difficult. Therefore, the distractors can be improved.

5. Conclusion

The multiple-choice test is a standard test format used in formative and summative assessments. A well-structured MCQs can examine and evaluate distinct cognitive levels. In addition, MCQs test items allow for a wide range of content knowledge compared to an open-ended item. In this study, item analysis of fifteen MCQs from two different question sets was performed. Overall, ten questions from Set 1 can be included in the questions bank. The DI is higher than 0.3, and DIF ranges from easy, moderate, to difficult. On the other hand, six questions from Set 2 were acceptable for questions bank. However, from the distractor efficiency analysis, 27% of the distractors were not functioning.

Therefore, the distractors should be revised.

Item analysis is an essential aspect of test construction. The procedure to perform item analysis is straightforward. It allows for a data-driven approach in deciding on types of questions (factual, conceptual, procedural knowledge) that should be included and the range of cognitive levels required.

Moreover, the test score validity and reliability can be continuously improved through item analysis and regular adjustments. In the long run, instructors can ensure the quality of items and the quality of the test.

(13)

6. Acknowledgement

This research received no specific grant from any funding agency.

References

Baldwin, B. A. (1984). The role of difficulty and discrimination in constructing multiple-choice examinations: with guidelines for practical application. Journal of Accounting Education, 2(1), 19-28.

Brady, A. M. (2005). Assessment of learning with multiple-choice questions. Nurse Education in Practice, 5(4), 238-242.

Buckles, S., & Siegfried, J. J. (2006). Using multiple-choice questions to evaluate in-depth learning of economics. The Journal of Economic Education, 37(1), 48-57.

Burud, I., Nagandla, K., & Agarwal, P. (2019). Impact of distractors in item analysis of multiple- choice questions. International Journal of Research in Medical Sciences, 7(4), 1136.

Butler, A. C. (2018). Multiple-choice testing in education: Are the best practices for assessment also good for learning? Journal of Applied Research in Memory and Cognition, 7(3), 323-331.

D'Sa, J. L., & Visbal-Dionaldo, M. L. (2017). Analysis of Multiple Choice Questions: Item Difficulty, Discrimination Index and Distractor Efficiency. International Journal of Nursing Education, 9(3).

Date, A. P., Borkar, A. S., Badwaik, R. T., Siddiqui, R. A., Shende, T. R., & Dashputra, A. V. (2019).

Item analysis as tool to validate multiple choice question bank in pharmacology. International Journal of Basic & Clinical Pharmacology, 8(9), 1999-2003.

Ebel, R. L., & Frisbie, D. A. (1991). Essentials of Educational Measurement. 5th edn. Englewoods Cliffs.

Hingorjo MR & Jaleel F. Analysis of one-best MCQs: The difficulty index, discrimination index and distractor efficiency. J Pak MedAssoc 2012; 62:142-7.

Ingale, A. S., Giri, P. A., & Doibale, M. K. (2017). Study on item and test analysis of multiple choice questions amongst undergraduate medical students. International Journal of Community Medicine and Public Health, 4(5), 1562-1565.

Kelley, T. L. (1939). The selection of upper and lower groups for the validation of test items. Journal of educational psychology, 30(1), 17.

Kheyami, D., Jaradat, A., Al-Shibani, T., & Ali, F. A. (2018). Item analysis of multiple-choice questions at the department of paediatrics, Arabian Gulf University, Manama, Bahrain. Sultan Qaboos University Medical Journal, 18(1), e68.

Kibble, J. D. (2017). Best practices in summative assessment. Advances in physiology education, 41(1), 110-119.

Kornbrot, D. (2005). Point biserial correlation. Wiley StatsRef: Statistics Reference Online.

LeBlanc, V., & Cox, M. A. (2017). Interpretation of the point-biserial correlation coefficient in the context of a school examination. Tutorials in Quantitative Methods for Psychology, 13(1), 46- 56.

Mahjabeen, W., Alam, S., Hassan, U., Zafar, T., Butt, R., Konain, S., & Rizvi, M. (2017). Difficulty index, discrimination index and distractor efficiency in multiple choice questions. Annals of PIMS-Shaheed Zulfiqar Ali Bhutto Medical University, 13(4), 310-315.

(14)

Malau-Aduli, B. S., Assenheimer, D., Choi-Lundberg, D., & Zimitat, C. (2014). Using computer- based technology to improve feedback to staff and students on MCQ assessments. Innovations in Education and Teaching International, 51(5), 510-522.

Mehta, G., & Mokhasi, V. (2014). Item analysis of multiple-choice questions-an assessment of the assessment tool. Int J Health Sci Res, 4(7), 197-202.

Parkes, J., & Zimmaro, D. (2016). Learning and assessing with multiple-choice questions in college classrooms. Routledge.

Thompson, A. R., & Husmann, P. R. (2020). Developing Multiple-Choice Questions for Anatomy Examinations. In Teaching Anatomy (pp. 405-416). Springer, Cham.

Yang, B. W., Razo, J., & Persky, A. M. (2019). Using testing as a learning tool. American journal of pharmaceutical education, 83(9).

Rujukan

DOKUMEN BERKAITAN