Beyond Final Answers: Explainable AI for Step-Level Formative Feedback in Transformational Geometry

Isbadar Nursit(1,Mail), Anies Fuady(2), Ahmad Sufyan Zauri(3), Muneeroh Phadung(4) | CountryCountry:


(1) Department of Mathematics Education, Universitas Islam Malang, Indonesia
(2) Department of Mathematics Education, Universitas Islam Malang, Indonesia
(3) Department of Mathematics Education, Universitas Islam Malang, Indonesia
(4) Program in Teaching Science, Mathematics, and Computer, Yala Rajabhat University, Thailand

MailCorresponding Author

Metrics Analysis (Dimensions & PlumX)

Indexing:
Similarity:

© 2025 Isbadar Nursit, Anies Fuady, Ahmad Sufyan Zauri

Providing high-quality feedback on students’ solution steps in transformational geometry is challenging in large university classes. Explainable AI (XAI) offers a potential way to automate step-level assessment while keeping model decisions transparent and educationally meaningful. This study examines whether an XAI-based system can validly and reliably score students’ solution steps in transformational geometry, how faithful and fair its explanations are, and whether step-level XAI feedback improves learning in an authentic course setting. This study used a two-phase quantitative design complemented by a small qualitative component. In Phase 1, XAI-based step scores were compared with expert ratings of items involving reflections, rotations, translations, and compositions of transformations, using a rubric with eight indicators (GT1–GT8), and explanation fidelity and subgroup fairness were evaluated. In Phase 2, a clustered quasi-experiment was conducted comparing XAI-based feedback with conventional rubric-based feedback in two classes. Brief and semi-structured interviews were conducted with six students from the XAI class to explore how they interpreted and used the feedback. The results show that the XAI system approximated expert step scoring with acceptable agreement, produced explanations whose highlighted features were meaningfully related to predictions, and exhibited no large performance disparities across gender or study programme. In the classroom experiment, the XAI group achieved moderately higher post-test scores than the control group, with gains concentrated on indicators related to parameter specification and composition of transformations. Interview data suggest that students used the XAI interface to locate and revise specific steps while still relying on the lecturer for deeper conceptual clarification. Overall, the findings indicate that when aligned with a domain-specific rubric, XAI-based step assessment can serve as scalable, task- and process-level formative feedback in transformational geometry, best used in a human-in-the-loop configuration that complements rather than replaces teacher feedback.

 

Keywords: artificial intelligence, mathematics assessment, quasi-experimental design, transformational geometry.

Abazi Chaushi, B., Selimi, B., Chaushi, A., & Apostolova, M. (2023). Explainable artificial intelligence in education: a comprehensive review. In Communications in Computer and Information Science: Vol. 1902. Explainable Artificial Intelligence (xAI, 2023) (pp. 48–71). Springer. https://doi.org/10.1007/978-3-031-44067-0_3

Ada, T., & Kurtulus, A. (2010). Students’ misconceptions and errors in transformation geometry. International Journal of Mathematical Education in Science and Technology, 41(7), 901–909. https://doi.org/10.1080/0020739X.2010.486451

Adebayo, J., Gilmer, J., Muelly, M., Goodfellow, I., Hardt, M., & Kim, B. (2018). Sanity checks for saliency maps. Retrieved from https://papers.nips.cc/paper/2018/file/294a8ed24b1ad22ec2e7efea049b8737-Paper.pdf

Baker, R. S., & Hawn, A. (2021). Algorithmic bias in education. International Journal of Artificial Intelligence in Education, 32(4), 1052–1092. https://doi.org/10.1007/s40593-021-00285-9

Ballance, O. J. (2024). Sampling and randomisation in experimental and quasi-experimental CALL studies: Issues and recommendations for design, reporting, review, and interpretation. ReCALL, 36(1), 58–71. https://doi.org/DOI: 10.1017/S0958344023000162

Banihashem, S. K., Mahroeian, H., Khosravi, H., Sadiq, S., & Gasevic, D. (2022). A systematic review of the role of learning analytics in enhancing feedback practices in higher education. Educational Research Review, 37, 100489. https://doi.org/10.1016/j.edurev.2022.100489

Barana, A., Marchisio, M., & Sacchet, M. (2021). Interactive feedback for learning mathematics in a digital learning environment. Education Sciences, 11(6), 279. https://doi.org/10.3390/educsci11060279

Barredo Arrieta, A. (2024). Explainable AI (XAI) 2.0: A manifesto of open challenges and research directions. Information Fusion, 103, 102224.

Braun, V., & Clarke, V. (2019). Reflecting on reflexive thematic analysis. Qualitative Research in Sport, Exercise and Health, 11(4), 589–597. https://doi.org/10.1080/2159676X.2019.1628806

Clarke, V., & Braun, V. (2021). Thematic analysis: A practical guide. London, UK: SAGE.

DeYoung, J., Jain, S., Rajani, N. F., Lehman, E., Xiong, C., Socher, R., & Wallace, B. C. (2020). ERASER: A benchmark to evaluate rationalized NLP models. Proceedings of the Annual Meeting of the Association for Computational Linguistics, 4443–4458. https://doi.org/10.18653/v1/2020.acl-main.408

Elagha, N., & Pellegrino, J. W. (2024). Understanding error patterns in students’ solutions to linear function problems to design learning interventions. Learning and Instruction, 92, 101895. https://doi.org/10.1016/j.learninstruc.2024.101895

Fuchs, A., Gliwiński, M., Grageda, N., Spiering, R., Abbas, A. K., Appel, S., … Trzonkowski, P. (2018). Minimum information about t regulatory cells: a step toward reproducibility and standardization. Frontiers in Immunology, 8, 1–14.

Gerke, O. (2020). Reporting standards for a Bland–Altman agreement analysis: A review of methodological reviews. Diagnostics, 10(5), 334. https://doi.org/10.3390/diagnostics10050334

Gillard, D., Wright, D., McNally, A., Flaxman, P. E., McIntosh, R., & Honey, K. (2021). Acceptance & commitment therapy for school leaders’ well-being: an initial feasibility study. Educational Psychology in Practice, 37(1), 34–51. https://doi.org/10.1080/02667363.2020.1855120

Götz, S., & Gasteiger, H. (2022). Reflecting geometrical shapes: Approaches of primary students to reflection tasks. Educational Studies in Mathematics, 110(2), 241–265.

Green, J. (2023). Primary students’ experiences of formative feedback in mathematics. Education Inquiry, 14(3), 285–305. https://doi.org/10.1080/20004508.2021.1995140

Hadjerrouit, S., & Nnagbo, C. I. (2022). Exploring Numbas formative feedback for teaching and learning mathematics: An affordance theory perspective. Proceedings of the 18th International Conference on Cognition and Exploratory Learning in the Digital Age (CELDA 2021). IADIS Press.

Hao, S., Pan, H., & Zhang, D. (2025). A process-oriented approach to assessing high school students’ mathematical problem-solving competence: insights from multidimensional eye-tracking analysis. Education Sciences, 15(6), 761. https://doi.org/10.3390/educsci15060761

Hattie, J., & Timperley, H. (2007). The power of feedback. Review of Educational Research, 77(1), 81–112. https://doi.org/10.3102/003465430298487

Hedges, L. V, Tipton, E., Zejnullahi, R., & Diaz, K. G. (2023). Effect sizes in ANCOVA and difference-in-differences designs. British Journal of Mathematical and Statistical Psychology, 76(2), 259–282. https://doi.org/10.1111/bmsp.12296

Herbert, S., Vale, C., White, P., & Bragg, L. A. (2022). Engagement with a formative assessment rubric: A case of mathematical reasoning. International Journal of Educational Research, 111, 101899. https://doi.org/10.1016/j.ijer.2021.101899

Hirose, M., & Creswell, John W. (2022). Applying Core Quality Criteria of Mixed Methods Research to an Empirical Study. Journal of Mixed Methods Research, 17(1), 12–28. https://doi.org/10.1177/15586898221086346

Holmes, W., Porayska-Pomsta, K., Holstein, K., Sutherland, E., Baker, T., & Buckingham Shum, S. (2022). Ethics of AI in education: Towards a community-wide framework. International Journal of Artificial Intelligence in Education, 32(4), 629–653. https://doi.org/10.1007/s40593-021-00239-1

Hontvedt, M., Prøitz, T. S., & Silseth, K. (2023). Collaborative display of competence: A case study of process-oriented video-based assessment in schools. Teaching and Teacher Education, 121, 103948. https://doi.org/10.1016/j.tate.2022.103948

Hoth, J., Larrain, M., & Kaiser, G. (2022). Identifying and dealing with student errors in the mathematics classroom: Cognitive and motivational requirements. Frontiers in Psychology, 13, 1057730. https://doi.org/10.3389/fpsyg.2022.1057730

Khalil, M., Prinsloo, P., & Slade, S. (2023). Fairness, trust, transparency, equity, and responsibility in learning analytics (Editorial). Journal of Learning Analytics, 10(1), 1–7.

Khosravi, H., Buckingham Shum, S., Chen, G., Conati, C., Gašević, D., Kay, J., … Tsai, Y.-S. (2022). Explainable artificial intelligence in education. Computers & Education: Artificial Intelligence, 3. https://doi.org/10.1016/j.caeai.2022.100074

Koedinger, K. R., Baker, R. S. J. d., Cunningham, K., Skogsholm, A., Leber, B., & Stamper, J. (2010). A data repository for the EDM community: The PSLC DataShop. In Handbook of Educational Data Mining (pp. 43–56). CRC Press. https://doi.org/10.1201/b10274-10

Koo, T. K., & Li, M. Y. (2016). A guideline for selecting and reporting intraclass correlation coefficients for reliability research. Journal of Chiropractic Medicine, 15(2), 155–163. https://doi.org/10.1016/j.jcm.2016.02.012

Kraft, M. A. (2020). Interpreting effect sizes of education interventions. Educational Researcher, 49(4), 241–253. https://doi.org/10.3102/0013189X20912798

Lakens, D. (2017). Equivalence tests: A practical primer for t tests, correlations, and meta-analyses. Social Psychological and Personality Science, 8(4), 355–362. https://doi.org/10.1177/1948550617697177

Li, M., Gao, Q., & Yu, T. (2023). Kappa statistic considerations in evaluating inter-rater reliability between two raters: Which, when, and context matters. BMC Cancer, 23, 799. https://doi.org/10.1186/s12885-023-11325-z

Liu, Q., Pinto, J. D., & Paquette, L. (2024). Applications of explainable AI (XAI) in Education BT - trust and inclusion in ai-mediated education: where human learning meets learning machines (D. Kourkoulou, A.-O. (Olnancy) Tzirides, B. Cope, & M. Kalantzis, Eds.). Cham: Springer Nature Switzerland. https://doi.org/10.1007/978-3-031-64487-0_5

Lopes, M. N. (2024). An overview of the empirical evaluation of Explainable AI (XAI): A systematic review. Applied Sciences, 14(23), 11288.

Maskos, K., Schulz, A., Oeksuez, S. S., & Rakoczy, K. (2025). Formative assessment in mathematics education: A systematic review. ZDM–Mathematics Education, 57(4), 679–693. https://doi.org/10.1007/s11858-025-01696-x

Mathaba, P. N., Bayaga, A., Tîrnovan, D., & Bossé, M. J. (2024). Error analysis in algebra learning: Exploring misconceptions and cognitive levels. Journal on Mathematics Education, 15(2), 575–592. https://doi.org/10.22342/jme.v15i2.pp575-592

Mbusi, N., & Luneta, K. (2023). Implementation of an intervention program to enhance student teachers’ active learning in transformation geometry. SAGE Open, 13(2).

Mbusi, N. P., & Luneta, K. (2021). Mapping pre-service teachers’ faulty reasoning in geometric translations to the design of Van Hiele phase-based instruction. South African Journal of Childhood Education, 11(1), a871. https://doi.org/10.4102/sajce.v11i1.871

Miró-Nicolau, M., Jaume-i-Capó, A., & Moyà-Alcover, G. (2024). A comprehensive study on fidelity metrics for XAI. Information Processing & Management, 61, 103988.

Ndlovu, M. (2022). Teachers’ challenges in implementing philosophical approaches to mathematics education. Pythagoras, 43(1), 1–10. https://doi.org/10.4102/pythagoras.v43i1.647

Ndungo, I. (2024). A qualitative investigation on learners' experiences and understanding of transformation geometry with van hiele phased instruction and technology-enhanced van hiele phased instruction.

Petsiuk, V., Das, A., & Saenko, K. (2018). RISE: Randomized input sampling for explanation of black-box models. Proceedings of BMVC, Vol. 151.

Ribeiro, M. T., Singh, S., & Guestrin, C. (2016). “Why should I trust you?” Explaining the predictions of any classifier. Proceedings of NAACL: Demonstrations, pp. 97–101. https://doi.org/10.18653/v1/N16-3020

Schwendimann, B. A., Rodriguez-Triana, M. J., Vozniuk, A., Prieto, L. P., Shirvani Boroujeni, M., Holzer, A., … Dillenbourg, P. (2017). Perceiving learning at a glance: A systematic literature review of learning dashboard research. IEEE Transactions on Learning Technologies, 10(1), 30–41. https://doi.org/10.1109/TLT.2016.2599522

Shimizu, Y., & Kang, H. (2025). Research on classroom practice and students’ errors in mathematics education: A scoping review of recent developments for 2018–2023. ZDM – Mathematics Education, 57, 695–710. https://doi.org/10.1007/s11858-025-01704-0

Söderström, S., & Palm, T. (2024). Feedback in mathematics education research: A systematic literature review. Research in Mathematics Education.

St. Goar, J., & Lai, Y. (2022). Designing Activities to Support Prospective High School Teachers’ Proofs of Congruence from a Transformation Perspective. PRIMUS, 32(7), 827–842. https://doi.org/10.1080/10511970.2021.1940403

Sundararajan A.; Yan, Q., M. T. (2017). Axiomatic attribution for deep networks. Proceedings of ICML, 70, 3319–3328. Retrieved from https://proceedings.mlr.press/v70/sundararajan17a.html

Sundararajan, M., Taly, A., & Yan, Q. (2017). Axiomatic attribution for deep networks. Proceedings of the 34th International Conference on Machine Learning, 3319–3328. https://doi.org/10.48550/arXiv.1703.01365

ten Hove, D., Jorgensen, T. D., & van der Ark, L. A. (2024). Updated guidelines on selecting an intraclass correlation coefficient for interrater reliability, with applications to incomplete observational designs. Psychological Methods, 29(5), 967–979. https://doi.org/10.1037/met0000516

Tornqvist, M., Mahamud, M., Méndez Guzmán, E., & Farazouli, A. (2023). ExASAG: Explainable framework for automatic short answer grading. Proceedings of the 18th Workshop on Innovative Use of NLP for Building Educational Applications (BEA, 2023), 361–371. https://doi.org/10.18653/v1/2023.bea-1.29

Wisniewski, B., Zierer, K., & Hattie, J. (2020). The power of feedback revisited: A meta-analysis of educational feedback research. Frontiers in Psychology, 10, 3087. https://doi.org/10.3389/fpsyg.2019.03087

Zorn, K., Larkin, K., & Grootenboer, P. (2022). Student perspectives of engagement in mathematics. In N. Fitzallen, C. Murphy, V. Hatisaru, & N. Maher (Eds.), Mathematical confluences and journeys: Proceedings of the 44th Annual Conference of the Mathematics Education Research Group of Australasia (MERGA) (pp. 570–577). Launceston, Australia: MERGA.

Zumba-Zúñiga, M.F., Ríos-Zaruma, J., Pardo-Cueva, M., & Chamba-Rueda, L. (2021). Impact of information and communication technologies in Higher Education Institutions in times of COVID-19 : A look from collaborative work and study modality. 2021 16th Iberian Conference on Information Systems and Technologies (CISTI), 1–6. https://doi.org/10.23919/CISTI52073.2021.9476642

HASIL CEK PLAGIASI TURNITIN

Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.