Development of a Science Literacy Test for Junior High School Students Based on the PISA 2025 Framework


(1) Department of Physics Education, Syarif Hidayatullah State Islamic University, Indonesia
(2) Department of Physics Education, Syarif Hidayatullah State Islamic University, Indonesia
(3) Department of Physics Education, Syarif Hidayatullah State Islamic University, Indonesia
(4) Department of Physics Education, Syarif Hidayatullah State Islamic University, Indonesia


Metrics→ |
Indexing Site→ | ![]() |
![]() |
![]() |
![]() |
Copyright (c) 2025 Farah Amara Valio
This study aims to develop a valid, reliable, and practical science literacy test instrument based on the PISA 2025 framework as a tool for assessing the science literacy abilities of junior high school students in a contextual and in-depth manner. The issue of low science literacy in Indonesia, where 53.60% of students are in the very low category, highlights the urgency of providing an evaluation instrument that can comprehensively represent scientific thinking abilities. This achievement is closely related to the lack of student training using international assessment-based testing instruments such as PISA during the learning process. The research method used was Research and Development (R&D) with a 4-D development model (Define, Design, Development, Dissemination). The instrument was developed based on four dimensions of science literacy (competence, context, knowledge, and cognitive level) within the PISA 2025 framework and covers topics in the science subject, namely electricity, waves, and magnetism. Content validation was conducted by content, instrument, and language experts, while empirical testing was carried out to evaluate the quality of the test items. The research results showed that the instrument had high content validity (CVI = 0.93; CVR = 0.8–1). The average validity of each item reached 0.60. The instrument showed good consistency in terms of reliability, as indicated by the Omega McDonald coefficient of 0.79 for the combination of essay and complex multiple-choice questions, and the Cronbach’s Alpha value of 0.68 for multiple-choice questions analysed separately. Most items were categorized as moderately difficult (72.73%) and had good discriminative power (63.63%). Additionally, the practicality value of 78.85% indicates that the instrument is easy to use in an educational context. Therefore, this instrument is suitable for use as a science literacy assessment tool aligned with the PISA 2025 framework and supports the development of higher-order thinking skills in national assessments.
Keywords: science literacy, instruments, assessment, PISA.
Alatlı, B. (2020). Cross-cultural measurement invariance of the items in the science literacy test in the programme for international student assessment (PISA-2015). International Journal of Education and Literacy Studies, 8(2), 16. https://doi.org/10.7575/aiac.ijels.v.8n.2p.16
Amarulloh, R. R., Utari, S., & Feranie, S. (2017). The implementation of levels of inquiry with writing-to-learn assignment to improve vocational school students’ science literacy. Journal of Physics: Conference Series, 812(1), 012049. https://doi.org/10.1088/1742-6596/812/1/012049
Amini, S., & Sinaga, P. (2021). Inventory of scientific literacy ability of junior high school students based on the PISA framework competency criteria Journal of Physics: Conference Series, 1806(1), 012017. https://doi.org/10.1088/1742-6596/1806/1/012017
Anderson, J. C., & Gerbing, D. W. (1984). The effect of sampling error on convergence, improper solutions, and goodness-of-fit indices for maximum likelihood confirmatory factor analysis. PSYCHOMETRIKA, 49(2), 155–173.
Arikunto, S. (2018). Dasar-Dasar evaluasi pendidikan [fundamentals of educational evaluation] (3rd ed.; R. Damayanti, Ed.). Jakarta: PT Bumi Aksara. Retrieved from https://books.google.co.id/books?id=j5EmEAAAQBAJ&printsec=frontcover&hl=id#v=onepage&q&f=false
Azwar, S. (2013). Penyusunan skala psikologi [development of psychological scales] (II). Yogyakarta: Pustaka Pelajar.
Azzopardi, M., & Azzopardi, C. (2022). Biology essay questions: cognitive level as revealed by bloom’s taxonomy. PUPIL: International Journal of Teaching, Education and Learning, 5(3), 99–111. https://doi.org/10.20319/pijtel.2022.53.99111
Bank, W. (2020). Janji Pendidikan Indonesia [indonesia’s education promise].
Besche-Truthe, F., & Seitzer, H. (2025). Testing for the money: an analysis on the interdependence of participation in international large-scale assessments and development aid networks. Frontiers in Education, 10(April), 1–13. https://doi.org/10.3389/feduc.2025.1429107
Black, P., & Wiliam, D. (2018). Classroom assessment and pedagogy. Assessment in Education: Principles, Policy and Practice, 25(6), 551–575. https://doi.org/10.1080/0969594X.2018.1441807
Boomsma, A. (1985). Nonconvergence, improper solutions, and starting values in LISREL maximum likelihood estimation. PSYCHOMETRIKA, 50(2), 229–242.
Borup, J., Shin, J. K., Powell, M. G., Evmenova, A. S., & Kim, W. (2022). Revising and validating the community of inquiry instrument for moocs and other global online courses. The International Review of Research in Open and Distributed Learning, 23(3), 82–103. https://doi.org/10.19173/irrodl.v23i2.6034
Bujang, M. A., Omar, E. D., Foo, D. H. P., & Hon, Y. K. (2024). Sample size determination for conducting a pilot study to assess reliability of a questionnaire. Restorative Dentistry and Endodontics, 49(1), 1–8. https://doi.org/10.5395/rde.2024.49.e3
Bybee, R. W., & McCrae, B. (2009). PISA Science 2006: Implications for science teachers and teaching. National Science Teachers Association. https://doi.org/10.2505/9781933531311
Chan, C. K. Y., & Luo, J. (2021). A four-dimensional conceptual framework for student assessment literacy in holistic competency development. Assessment & Evaluation in Higher Education, 46(3), 451–466. https://doi.org/10.1080/02602938. 2020.1777388
Costa, A., Loureiro, M., & Ferreira, M. E. (2021). Scientific literacy: the conceptual framework prevailing over the first decade of the twenty-first century. Revista Colombiana de Educación, 1(81), 195–228. https://doi.org/10.17227/rce.num81-10293
Das, K., Wibowo, P., Chui, M., Agarwal, V., & Lath, V. (2019). Automation and the future of work in Indonesia. McKinsey & Company (September). Retrieved from https://www.mckinsey.com/featured-insights/asia-pacific/automation-and-the-future-of-work-in-indonesia
Elhai, J. (2023). Science literacy: a more fundamental meaning. Journal of Microbiology & Biology Education, 24(1), 10–16. https://doi.org/10.1128/jmbe.00212-22
Fonna, N., Bunawan, W., & Derlina. (2022). Development of teaching materials like PISA for the physics mechanical wave topic in high school. Journal of Physics: Conference Series, 2193(1), 012065. https://doi.org/10.1088/1742-6596/2193/1/012065
Gok, S., & Goldstone, R. L. (2024). How do students reason about statistical sampling with computer simulations? An integrative review from a grounded cognition perspective. Cognitive Research: Principles and Implications, 9(1). https://doi.org/10.1186/s41235-024-00561-x
Guillot-Valdés, M., Guillén-Riquelme, A., & Buela-Casal, G. (2022). Content validity through expert judgment for the depression clinical evaluation test. International Journal of Clinical and Health Psychology, 22(2). https://doi.org/10.1016/j.ijchp.2022.100292
Hastuti, P. ., Setianingsih, W., & Anjarsari, P. (2020). How to develop students ’ scientific literacy through integration of local wisdom in Yogyakarta on science learning ? Journal of Physics: Conference Series, 1440, 1–7. https://doi.org/10.1088/1742-6596/1440/1/012108
Herlanti, Y., Amalia, U., & Nurlaela, A. (2022). Profile of school readiness in applying stem (science, technology, engineering, mathematics) education. Edusains, 14(1), 14–23. https://doi.org/10.15408/es.v14i1.25541
Hijriati, Sahyar, & Derlina. (2021). The development of physics test instrument based on pisa for optical topic in high school. Journal of Physics: Conference Series, 1811(1), 012039. https://doi.org/10.1088/1742-6596/1811/1/012039
Iñarrairaegui, M., Fernández-Ros, N., Lucena, F., Landecho, M. F., García, N., Quiroga, J., & Herrero, J. I. (2022). Evaluation of the quality of multiple-choice questions according to the students’ academic level. BMC Medical Education, 22(1), 779. https://doi.org/10.1186/s12909-022-03844-3
Johnson, S. (1999). International association for the evaluation of educational achievement science assessment in developing countries. Assessment in Education: Principles, Policy & Practice, 6(1), 57–73. https://doi.org/10.1080/ 09695949992991
Klotz, E., Ehmke, T., & Leiss, D. (2025). Text comprehension as a mediator in solving mathematical reality-based tasks: the impact of linguistic complexity, cognitive factors, and social background. European Journal of Educational Research, 14(1), 23–39. https://doi.org/10.12973/eu-jer.14.1.23
Kruit, P. M., Oostdam, R. J., van den Berg, E., & Schuitema, J. A. (2018). Assessing students’ ability in performing scientific inquiry: instruments for measuring science skills in primary education. Research in Science & Technological Education, 36(4), 1–27. https://doi.org/10.1080/02635143.2017.1421530
Lawshe, C. H. (1975). A quantitative approach to content validity 1. Personnel Psychology, 28(4), 563–575. https://doi.org/10.1111/j.1744-6570.1975.tb01393.x
Le Hebel, F., Montpied, P., Tiberghien, A., & Fontanieu, V. (2017). Sources of difficulty in assessment: example of PISA science items. International Journal of Science Education, 39(4), 468–487. https://doi.org/10.1080/09500693.2017.1294784
Le Hebel, F., Tiberghien, A., Montpied, P., & Fontanieu, V. (2019). Teacher prediction of student difficulties while solving a science inquiry task: example of PISA science items. International Journal of Science Education, 41(11), 1517–1540. https://doi.org/10.1080/09500693.2019.1615150
Linn, M. (2003). Technology and science education: Starting points, research programs, and trends. International Journal of Science Education, 25(6), 727–758. https://doi.org/10.1080/09500690305017
Löfgren, S. K., Weidow, J., & Enger, J. (2023). Rolling balls or trapping ions? How students relate models to real‐world phenomena in the physics laboratory. Science Education, 107(5), 1215–1237. https://doi.org/10.1002/sce.21802
Lu, Y., & Sireci, S. G. (2007). Validity issues in test speededness. Educational Measurement: Issues and Practice, 26(4), 29–37. https://doi.org/10.1111/j.1745-3992.2007.00106.x
McDonald, R. P. (1999). Test Theory: A Unified Treatment (1st ed.). Mahwah: Lawrence erlbaum associates. https://doi.org/10.2307/2669496
Nasution, I. B., Liliawati, W., & Hasanah, L. (2019). Development of scientific literacy instruments based on pisa framework for high school students on global warming topic. Journal of Physics: Conference Series, 1157(3). https://doi.org/10.1088/1742-6596/1157/3/032063
Nguyen, T. H., Han, H.-R., Kim, M. T., & Chan, K. S. (2014). An introduction to item response theory for patient-reported outcome measurement. The Patient, 7(1), 23–35. https://doi.org/10.1007/s40271-013-0041-0
Ninomiya, S. (2016). The impact of PISA and the interrelation and development of assessment policy and assessment theory in Japan. Assessment in Education: Principles, Policy & Practice, 26(1), 91–110. https://doi.org/10.1080/0969594X.2016.1261795
Nisak, F., & Yulkifli, Y. (2021). Development of electronic module using inquiry based learning (IBL) model integrated high order thinking skill (HOTS) in 21 st century physics learning class X. Journal of Physics: Conference Series, 1876(1), 012085. https://doi.org/10.1088/1742-6596/1876/1/012085
Nunnally, J. C., & Bernstein, I. H. (1994). Psychometric theory, masalah 972. New York: McGraw-Hill.
Nurmalasari, M., & Hertanti, E. (2021). The effect of guided inquiry based hypermedia on students’ high order thinking skills in thermodynamics concepts. Journal of Physics: Conference Series, 1836(1), 012062. https://doi.org/10.1088/1742-6596/1836/1/012062
OECD. (2019). PISA 2018 results (volume I). OECD. https://doi.org/10.1787/5f07c754-en
OECD. (2023a). PISA 2022 Results (Volume I). Paris: OECD. https://doi.org/10.1787/53f23881-en
OECD. (2023b). PISA 2025 science framework (drAFT). OECD Publishing. Retrieved from https://pisa-framework.oecd.org/science-2025/
Pavkov-Hrvojević, M., & Bogdanović, I. (2019). Making real-life connections and connections between physics and other subjects. AIP Conf, 180013. https://doi.org/10.1063/1.5091410
Pluye, P., Granikov, V., Bartlett, G., Grad, R. M., Tang, D. L., Johnson-Lafleur, J., … Doray, G. (2014). Development and content validation of the information assessment method for patients and consumers. JMIR Research Protocols, 3(1), 1–15. https://doi.org/10.2196/resprot.2908
Polit, D. F., & Beck, C. T. (2006). The content validity index: Are you sure you know what is being reported? Critique and recommendations. Research in Nursing & Health, 29(5), 489–497. https://doi.org/10.1002/nur.20147
Pongsakdi, N., Kajamies, A., Veermans, K., Lertola, K., Vauras, M., & Lehtinen, E. (2020). What makes mathematical word problem solving challenging? Exploring the roles of word problem characteristics, text comprehension, and arithmetic skills. ZDM, 52(1), 33–44. https://doi.org/10.1007/s11858-019-01118-9
Purwaningsih, W., Sari, S. P., & Suryadi, A. (2020). The effect of stem-pjbl and discovery learning on improving students ’ problem-solving skills of the impulse and momentum topic. Jurnal Pendidikan IPA Indonesia, 9(4), 465–476. https://doi.org/10.15294/jpii.v9i4.26432
Redaksi. (2024). (Siaran Pers) Temuan riset mafindo: mayoritas warga belum bisa bedakan hoaks dan fakta [(press release) mafindo research findings: the majority of citizens cannot distinguish between hoaxes and facts].
Retnawati, H. (2016). Analisis kuantitatif instrumen penelitian (panduan peneliti, mahasiswa, dan psikometrian) (i) [quantitative analysis of research instruments (guide for researchers, students, and psychometricians) (I)] (I). Yogyakarta: Parama Publishing.
Rodić, D. D. (2018). Best practices of assessment as a way to promote effective learning. Journal of Baltic Science Education, 17(5), 748–750. https://doi.org/10.33225/jbse/18.17.748
Rodriguez, M. C. (2003). Construct equivalence of multiple‐choice and constructed‐response items: a random effects synthesis of correlations. Journal of Educational Measurement, 40(2), 163–184. https://doi.org/10.1111/j.1745-3984.2003.tb01102.x
Ropii, M., & Fahrurrozi, M. (2017). Evaluasi hasil belajar [evaluation of learning outcomes]. In Yogyakarta: Pustaka Pelajar.
Samsudin, H., Sadiman, & Pachrozi, I. (2019). Kajian sosial dan pemerintahan berbasis geospasial bidang pendidikan (sistem informasi pendidikan berbasis geospasial) [geospatial-based social and government studies in the field of education (geospatial-based education information Sys)] (S. Franeka, Ed.). Banyuasin: Penerbit Bappeda Litbang.
Schwab, K. (2016). The fourth industrial revolution (Vol. 11). World Economic Forum.
Shaffer, J. F., Ferguson, J., & Denaro, K. (2019). Use of the test of scientific literacy skills reveals that fundamental literacy is an important contributor to scientific literacy. CBE—Life Sciences Education, 18(3), ar31. https://doi.org/10.1187/cbe.18-12-0238
She, H. C., Stacey, K., & Schmidt, W. H. (2018). Science and mathematics literacy: pisa for better school education. International Journal of Science and Mathematics Education, 16, 1–5. https://doi.org/10.1007/s10763-018-9911-1
Sidiq, M. N., & Permanasari, A. (2022). The analysis of STEM career interest of students aged 13-15 as an overview for the development of STEM career counseling. ICONSEIR. https://doi.org/10.4108/eai.21-12-2021.2317329
Stein, M. M., Smith, E. M., & Holmes, N. G. (2019). Confirming what we know: Understanding questionable research practices in intro physics labs. 2018 Physics Education Research Conference Proceedings. American Association of Physics Teachers. https://doi.org/10.1119/perc.2018.pr.Stein
STEM, I. (2024). Jumlah lulusan STEM indonesia jauh lebih rendah dibanding negara lain [Indonesia’s STEM graduates are much lower than other countries]. Retrieved July 12, 2025, from STEM Indonesia website: https://stem.or.id/2024/02/21/jumlah-lulusan-stem-indonesia-jauh-lebih-rendah-dibanding-negara-lain/
Sudirman, Rusilowati, A., & Susilaningsih, E. (2024). Development of multiplechoice test instruments to improve scientific literacy in madrasah aliyah (MA). International Journal of Scientific Research and Management (IJSRM), 12(06), 3465–3475. https://doi.org/10.18535/ijsrm/v12i06.el04
Suhaini, M., Ahmad, A., & Bohari, N. M. (2021). Assessments on vocational knowledge and skills: a content validity analysis. European Journal of Educational Research, volume-10-(volume-10-issue-3-july-2021), 1529–1540. https://doi.org/10.12973/eu-jer.10.3.1529
Suwarna, I. P., & Fatimah. (2018). Implementasi penugasan digital untuk meningkatkan higher order thinking skills (hots) pada kemampuan siswa menengah atas dalam konsep hukum newton [implementation of digital assignments to improve higher order thinking skills (HOTs) Ability of Senior High. 10(2), 335–340. https://doi.org/http://dx.doi.org/10.15408/es.v10i2.10236
Taber, K. S. (2018). The use of cronbach’s alpha when developing and reporting research instruments in science education. Research in Science Education, 48(6), 1273–1296. https://doi.org/10.1007/s11165-016-9602-2
Teig, N. (2020). Scientific inquiry in TIMSS and PISA 2015. Inquiry as an instructional approach and the assessment of inquiry as an instructional outcome in science. Nordic Studies in Science Education, 16(2), 235. https://doi.org/10.5617/nordina.8029
Thiagarajan, S., Semmel, D. S., & Semmel, M. I. (1974). Instructional development for training teachers of exceptional children: A sourcebook. In Bloomington: Indiana University. https://doi.org/10.1016/0022-4405(76)90066-2
Townley, A. (2018). Teaching and learning science in the 21st century: challenging critical assumptions in post-secondary science. Education Sciences, 8(1), 12. https://doi.org/10.3390/educsci8010012
Valladares, L. (2021). Scientific literacy and social transformation. Science & Education, 30, 557–587. https://doi.org/10.1007/s11191-021-00205-2
Vázquez-López, V., & Huerta-Manzanilla, E. L. (2021). Factors related with underperformance in reading proficiency, the case of the programme for international student assessment 2018. European Journal of Investigation in Health, Psychology and Education, 11(3), 813–828. https://doi.org/10.3390/ejihpe11030059
Viladrich, C., Angulo-Brunet, A., & Doval, E. (2017). A journey around alpha and omega to estimate internal consistency reliability. Anales de Psicología, 33(3), 755–782. https://doi.org/10.6018/analesps.33.3.268401
Walsh, C., Quinn, K. N., & Holmes, N. G. (2019). Assessment of critical thinking in physics labs: concurrent validity. 2018 Physics Education Research Conference Proceedings. American Association of Physics Teachers. https://doi.org/10.1119/perc.2018.pr.Walsh
Yuberti, Y., Sairi, A. P., Nanto, D., & Sholeha, S. (2020). Physics Ludo integrated with scientific literacy as a Newton’s law learning media. Journal of Physics: Conference Series, 1572(1), 012051. https://doi.org/10.1088/1742-6596/1572/1/012051
Zhang, L., Liu, X., & Feng, H. (2023). Development and validation of an instrument for assessing scientific literacy from junior to senior high school. Disciplinary and Interdisciplinary Science Education Research, 5(1), 21. https://doi.org/10.1186/s43031-023-00093-2
Zulfiani, Z., Permana Suwarna, I., Muin, A., Mulyati, T., & El Islami, R. A. Z. (2023). Developing the MathSci 21st app: Enhancing higher-order thinking skills assessment in mathematics and science education within an Islamic context. International Journal of Advanced and Applied Sciences, 10(8), 19–31. https://doi.org/10.21833/ijaas.2023.08.003
Refbacks
- There are currently no refbacks.

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.