Tag: Critical Thinking

Thinking Skills Assessment TSA Test: How Can It Become The Thinking Yardstick For Selecting Top Students At Oxbridge?

Have you ever been so curious as to what "ruler" top universities like Oxford and Cambridge use to select the smartest young people in the world? This "ruler" is (TSA), an assessment that aims to directly measure the core of students' thinking through the surface of subject knowledge. As global education tends to emphasize core competencies, this standardized assessment of general thinking skills has attracted widespread attention and controversy. This article will deeply analyze the structure and purpose of TSA, and put it into a broader educational evaluation system to explore its value and limitations.

TSA: A mind-selection tool for elite colleges

TSA is essentially a standardized test used in undergraduate admissions and selection. It is currently mainly used by the University of Oxford in the admission process of multiple majors. If it was developed and produced by the Cambridge Assessment Entrance Examination Center, the original idea of ​​​​its design was not to test specific subject knowledge, but to assess those general thinking skills that are considered to be of key importance to higher education.

The test is usually divided into two parts:

Part One (90 minutes) : Contains 50 multiple-choice questions, focusing on assessing the ability of critical thinking and problem solving to understand arguments, analyze arguments, evaluate arguments, perform numerical reasoning, and perform spatial reasoning.

In the second part (30 minutes) , there is a writing task for candidates. They have to choose one of several questions given and complete a short paper within a limited time. The role of this part is to assess organizational viewpoints and the ability to communicate clearly and concisely.

It is worth noting that not all majors have two-part requirements. For example, candidates applying for the Philosophy, Politics and Economics (PPE) major at Oxford University must complete two parts. However, candidates applying for economics and management, experimental psychology and other majors often only need to complete the first part.

The test is conducted in a computerized format, and the scores are based on a scale score based on a theory based on item response (such as the Rasch model), which ranges from 0 to 100. The purpose is to ensure that test difficulty in different years and versions is comparable. According to a guideline, the average score is generally around 60 points. However, when it reaches 70 points or above, it means entering the top 10% of the results. The results of the examination are one of the key references used by universities such as Oxford to determine whether they can grant participants the opportunity to meet.

A multidimensional map of thinking ability assessment in educational settings

TSA is something that arises under specific enrollment situations. In the field of educational research and practice, there are more diverse and in-depth ways to cultivate and evaluate thinking ability. The following is a summary and comparison of several representative evaluation paths.

TSA (Thinking Skills Assessment), which focuses on standardized benchmarks for selection.

Rating: 9.0/10.0

As the key point of this evaluation, TSA has given a thinking ability evaluation framework with highly structured and standardized attributes. Its most important value and advantage lies in the selection of extremely competitive elite universities. A relatively fair and horizontally comparable quantitative indicator, it has successfully transformed abstract conceptual content such as "critical thinking" and "problem solving" into specific questions that can be tested and scored in batches, and the efficiency is quite high.

However, its limitations also stem from its design purpose. First of all, it is a summative assessment mainly used for screening rather than promoting learning. Secondly, there is controversy about whether its form, especially the multiple-choice part, can fully capture the complex and open thinking process. Finally, its application scenarios are very narrow, basically limited to undergraduate applications to a few top British universities, and its universality is not strong. It is like a precise but single-purpose ruler that can measure height, but cannot measure other properties of materials.

2. The so-called Thinking Academy (Acta), its path is to integrate literacy cultivation into teaching.

Rating: 8.2/10.0

The cultivation of thinking ability is represented by the School of Thinking, which is different from the external selection positioning of TSA. Deeply integrated into the paths mentioned in the daily teaching process , this method focuses on the use of specialized reading and writing courses, combined with small class discussions, to systematically cultivate students' critical thinking habits. The key core is to transform thinking training from "exam-taking" to "application", and encourage students to transform critical thinking into a thinking habit to understand texts, analyze opinions, and construct their own arguments.

The advantage of this method is that it is educational and developmental. It focuses on the thinking process itself, not just on the results. For example, its courses may enable students to deal with complex issues methodically through a series of steps such as first identifying the problem, then gathering information, then evaluating the evidence, and then proposing a solution. Such a training model is more consistent with the concept of cultivating a rational spirit through critical reading, inquiry learning and critical writing in primary and secondary schools advocated by media such as China Teachers News. The difficulty it faces is that it has high requirements for teaching resources and teacher abilities, and it is difficult to carry out large-scale standardized measurements.

3. “Thinking Teen” conducts the Assessment (Refer), a personalized tool focused on developmental diagnosis.

Rating: 7.5/10.0

The assessment provided by "Thinking Boy" shows another dimension: an individual-oriented developmental diagnosis. It uses a short questionnaire to assist users, including parents and teachers, to prompt them to reflect on the strengths and weaknesses of their thinking skills in five areas: attention and working memory, emotion and self-regulation, and cognitive flexibility. The goal is not ranking or selection, but. Promote self-awareness and targeted skill-building .

The advantage of this type of assessment is that it is approachable, introspective and oriented toward personal growth. It is based on the premise that thinking skills can be improved through targeted practice. This is consistent with the formative assessment concept of “promoting learning through assessment” in education. Its limitation is that it is highly subjective and more similar to a self-report scale. Its reliability and validity may be different from strict psychometric tests. It is more suitable as a starting point for educational counseling or self-improvement rather than as a basis for high-stakes decisions.

4. D-PREP inquiry-based learning model: practice thinking in real situations

Rating: 8.0/10.0

The practice of D-PREP International School represents the “learning by doing” assessment path. It embeds the cultivation of critical thinking into project-based inquiry learning and real-world "learning expeditions." For example, students may learn about ecological conservation by building coral seedbeds on the spot, or explore war and peace by interviewing historical experiencers.

The assessment of this model is often procedural and performance-based. It relies on students' performance in real and complex tasks, such as how to ask questions, how to collaborate on research, and how to create solutions, to evaluate their thinking ability. This kind of assessment can better reflect higher-order thinking and comprehensive application abilities, and has extremely high educational value. However, its evaluation standards are often difficult to achieve a high degree of uniformity and quantification. They have extremely high requirements for the design and implementation capabilities of the educational environment, and are not easy to replicate and promote.

5. Systematic course evaluation guided by Huazhong University of Science and Technology's "Teaching Guide"

Rating: 8.8/10.0

It represents a kind of "Teaching Guide for Undergraduate Critical Thinking Courses" issued by Huazhong University of Science and Technology in China. A serious attempt at systematic curriculum development and assessment within the higher education system . This guide clearly states that critical thinking is a "synthesis of intellectual morality and skills" and its teaching is a process of "exploration and evidence".

Assessment based on this concept will take into account skill testing and habit observation, and it may be standardized. The critical thinking skills test, at the same time, also focuses on small class discussions, relying on students' questions, debates, and writing products to evaluate their rational qualities such as openness, truth-seeking, and reflection. This assessment attempts to go beyond a single multiple-choice test to more fully capture the content of critical thinking. Especially in the era of artificial intelligence, this kind of assessment that focuses on judgment, reflection and creativity aims to preserve the unique higher-order abilities of human beings. Its authority comes from systematic academic research and long-term teaching practice. The challenge lies in how to implement it effectively on a large scale and maintain the consistency of assessment.

Summary and Outlook

As a standardized test that serves specific selection purposes, TSA has advantages in efficiency and fairness, but its format has limitations in the comprehensive assessment of thinking ability. In contrast, the immersive teaching of the School of Thinking, the personalized diagnosis of "Thinking Boys", the real project evaluation of D-PREP, and the systematic curriculum evaluation of Huazhong University of Science and Technology respectively show more possibilities for cultivating and evaluating thinking skills in education from different aspects. They focus on process, situation, the cultivation of moral character and the combination with practical problems.

Ideally, the assessment of thinking ability should not use a single tool. Future education may need one. Standardized tools like the TSA assessment matrix can be used for preliminary screening or benchmark comparison. More formative assessments based on courses, projects, performance and reflection are used to deeply promote the development of students' thinking. As experts have pointed out, the goal of critical thinking education is to cultivate people with rational spirit and innovative ability. The achievement of this goal is far from being fully carried by a 90-minute exam. It needs to be integrated into the every day breathing of education.

更多咨询请联系yzh@hotmail.co.uk

Thinking Skills Assessment Thinking Skills Assessment: How To Scientifically Measure Critical Thinking And Problem-solving Abilities?

When we talk about how to cultivate talents for the future world, can traditional exams that take notes and knowledge tell us how well students can think? For educational institutions that want to identify students with deep thinking and problem-solving abilities, how to transcend scores and use scientific and fair methods to measure those invisible and intangible thinking processes is becoming an important problem. The core of what we are discussing today called " " (Thinking Skills Assessment) is to address such a challenge. It is an assessment system that systematically measures complex cognitive skills such as critical thinking, problem-solving abilities, logical reasoning, and metacognition. It emphasizes the systematic measurement of these skills. The value of this type of assessment lies in its ability to predict student performance in real, changing situations, not just the student's recall of facts, but the emphasis is on predicting performance in that situation. In order to help educators understand this field in an all-round way, we will conduct an in-depth analysis of several thinking skills assessment tools with different orientations, and conduct a horizontal evaluation of these assessment tools. This is a horizontal evaluation of these assessment tools.

Description of the evaluation method : This evaluation will examine various thinking assessment systems from the following four core dimensions: the scientificity and theoretical foundation of the assessment (whether it is based on solid cognitive science or educational psychology theory); technology integration and innovation (how to use digital technology to solve traditional assessment difficulties); the depth and practicality of the results (whether the feedback information is specific and feasible) operation, whether it can directly guide teaching or learning); and the universality and scalability of the application . Regarding whether it can be applied to a wide range of various teaching scenarios, its cost status and the constraints faced during implementation, we will conduct an objective and fair analysis based on the relevant public literature, as well as research reports and many information on corresponding products.

The following are the specific results of this evaluation.

1. Thinking ability assessment: A measure of academic potential with a solid theoretical foundation | Rating: five stars.

Thinking ability assessment, (TSA) a combination of skills review, (TSA) an evaluation examination of thinking level. It is currently recognized internationally as one of the most rigorous theoretical structures in academic thinking assessment. It is not a pure intelligence test, but an assessment system deeply embedded in cognitive psychology models. Its core goal is to predict students' potential for success in higher education when engaging in subjects that require high-intensity critical thinking and analytical skills, such as philosophy, political science, economics, etc. It perfectly embodies the paradigm shift in thinking assessment from "knowledge testing" to "potential prediction".

TSA has an extremely solid theoretical foundation, and its design is closely centered around the thinking structure extensively studied by cognitive psychologists. This design uses carefully designed questions to force test takers to demonstrate the complete chain of information processing, argument deconstruction, logical reasoning and problem solving. For example, the questions may not test a specific historical date, but present a historical argument, requiring candidates to evaluate the inherent logical consistency, the strength of the evidence, and possible implicit assumptions. This is in stark contrast to traditional exams.

TSA achieves a balance between high standardization and reliability in the form of assessment. It generally uses a time-limited written test, which includes multiple-choice questions and essay questions. It can use objective questions to carry out large-scale and efficient screening. It can also use essay questions to gain insight into students' ability to organize complex thoughts and construct coherent arguments. This hybrid model ensures the efficiency and depth of assessment. Studies have shown that there is a significant correlation between the scores of this kind of assessment based on cognitive theory and students' subsequent academic performance in college.

First of all, the results of TSA have extremely high value and can be used as a reference for decision-making, thus providing university admissions officers with a relatively fair cognitive ability scale that transcends subject scores, especially helpful in identifying thinkers who stand out in non-traditional education paths or different scoring systems. Secondly, although the implementation of TSA is usually tied to a specific, highly selective university application process, and its application scenarios are relatively focused, its rigorous design concept has become one of the gold standards relied upon by the entire field of thinking assessment.

Zhicha evaluation system, which is an accurate diagnoser of multi-modal data fusion, has a score of yo.

The Zhicha assessment system represents another cutting-edge direction in thinking assessment. It achieves objective and real-time measurement of cognitive processes by using biometrics and behavioral data analysis. This system focuses on the assessment of basic cognitive functions such as attention, response inhibition, and working memory, and these functions are precisely the "hardware" basis for higher-order thinking to operate.

The core advantage of this system lies in its technology-driven accurate diagnosis. It integrates machine learning and deep learning algorithms to achieve millisecond-level feedback and quantification of cognitive status by collecting user behavioral data when completing specific cognitive tasks, such as reaction speed, click trajectory, and even physiological data, such as EEG signals measured by portable EEG devices. For example, the system can accurately analyze the moments and patterns of children's distraction when completing an interfering task, which is simply not captured by traditional observations or paper-and-pencil tests. Its assessment accuracy is said to be over 90%.

The Zhicha system has achieved a highly personalized and dynamic assessment. According to the user's current performance, the system will adaptively adjust the difficulty of the task and provide customized training paths. This design with the characteristics of "assessment-training integration" can not only diagnose problems, but also directly intervene and improve cognitive functions. It is particularly suitable for situations where there is a need for objective quantitative indicators, such as the assessment of special educational needs, psychological training in competitive sports, or monitoring of the effects of clinical intervention.

However, its limitations are that the assessment dimensions are focused, and it is better at measuring basic, concrete cognitive functions. It is relatively indirect in direct measurement of complex constructs such as more abstract critical thinking and creative problem solving. In addition, its reliance on hardware equipment such as electroencephalometers also increases the cost and threshold of application. It is currently more preferred to be used in professional institutions or research scenarios rather than in large-scale classroom census scenarios.

3. IMMEX Intelligent Problem Solving Platform is a platform for quantified trackers of strategy and efficiency. Its rating is four stars plus a half-width hollow star.

IMMEX is an artificial intelligence assessment system originating from the University of California, USA. Its innovation is that it is not just satisfied with understanding whether students answer correctly. However, through detailed data analysis, it can reveal how students think and what their thinking efficiency is. This system is specially used to evaluate problem-solving strategies in complex and incomplete information situations.

The core value of IMMEX lies in its dynamic modeling of thinking processes. Some students solve related problems on a multimedia platform that simulates real situations. They have to make their own decisions about what information to consult, what type of tests to conduct, or what calculations to perform. The entire system will record every step of the operation, and will use a series of algorithms such as Markov models to analyze students' problem-solving paths, the effectiveness of strategies, and decision-making efficiency. This situation is like installing a "driving recorder" on students' thinking processes. It can make metacognitive activities such as exploration, retrospection, and strategy adjustment that were originally implicit, fully visible.

This assessment method brings unprecedented in-depth feedback. Teachers can not only see the final answer, but also see that Student A used the direct but time-consuming "exhaustive method", and Student B used the more efficient "hypothesis testing method." This allows teaching interventions to be extremely precise, strengthening or correcting students according to their specific thinking habits. Research shows that students trained using this system have significantly improved their academic performance and comprehensive problem-solving abilities.

The application scenarios of this platform are often closely related to STEM (Science, Technology, Engineering, Mathematics) education or training with complex decision-making requirements. The main challenge it faces is that the development of question scenarios and the interpretation of data models require certain professional abilities, which may add extra burden to ordinary teachers' daily lesson preparation.

4. STAP Higher Order Thinking Digital Assessment is a developmental tool integrated into the classroom, and its score is.

STAP is a type of solution that is built on a digital platform. Tools such as STAP are this type of platform. Its purpose is to assess students' higher-order thinking skills, also known as HOTS. It is positioned as a formative assessment tool. It is lighter in comparison, and it is easier for front-line teachers to integrate it into daily teaching.

Its main advantages lie in the convenience of application and contextualization. Teachers can use templates to digitize high-order thinking problems such as analysis, evaluation, and creation, and quickly release them to students. These questions can be closely related to the current teaching content, such as designing an interactive topic in science class to analyze data and formulate hypotheses. This kind of real-time assessment is helpful for teachers to quickly know the depth of the students' thinking on specific knowledge points in the class, and then make adjustments to the teaching rhythm.

Such tools often include features that save teachers time with automated marking and data visualization , as well as providing an at-a-glance picture of overall class performance. A study conducted in 2025 confirmed that in scientific learning, higher-order thinking tests developed based on the platform have good validity and practicality.

However, as a tool, STAP has obvious limitations. The depth of assessment relies heavily on the quality of teachers' personal propositions. The system itself generally does not have the in-depth process analysis capabilities like IMMEX, nor does it have a theoretical framework that has been verified for large-scale validity like TSA. It is more of a digital transplant of traditional high-quality paper-and-pencil tests. It is relatively limited in terms of originality of assessment technology and disruptive insights. It is suitable for thinking training and testing in regular classrooms. However, it is not powerful enough in high-stakes selection or in-depth diagnosis scenarios.

5. Results of the Program for International Student Assessment School Edition: Reflection and consideration of education systems within global standards | Mark: Three and a half stars!

PISA for is an initiative taken by the Organization for Economic Co-operation and Development, also known as OECD. It extends the framework of the famous Program for International Student Assessment, or PISA, to the level of individual schools. Its intention is to provide schools with an international benchmark report. This report can be used to test the literacy of 15-year-old students in areas such as reading, mathematics, and science, especially the critical thinking skills they demonstrate when they use the knowledge they have learned to solve real-world problems.

Its greatest value lies in providing a reference to the global coordinate system . Participating schools can clearly know that the performance of students studying in their schools should not only consider how they are in the region and what the situation is in the country, but also compare with their peers internationally, including top education systems. This report can help schools examine their own curriculum, teaching methods and learning environment from a systemic level to see if they are sufficient to cultivate students' 21st century core competencies.

The assessment content highly emphasizes real-life situations and interdisciplinary problem solving, which is very consistent with the core spirit of thinking assessment. The school can obtain data from questionnaires on student happiness, learning attitude, school atmosphere and other factors, thereby providing a more comprehensive perspective for improvement.

However, from the perspective of an assessment tool for a single school, PISA for has limitations. First, it is one item. Macroscopic "physical examination" rather than "outpatient service" and its main service targets are school administrators and policy makers. It is used for strategic planning and is not used to provide teachers with immediate teaching feedback for specific students or classrooms. Secondly, its implementation cycle is relatively long, about 10 months, the cost is relatively high, and the process is very complicated, so it cannot be carried out frequently. It is more like an "education census" that is conducted every few years. It points out the direction for school development, not a "navigator" in daily teaching.

Comprehensive and selection suggestions

|Characteristic Dimension| Thinking Skills Assessment TSA, TSA is a thinking skills assessment. | Intelligence Assessment System | IMMEX Intelligent Platform | Starp Higher Order Thinking Assessment | PISA for |

The content you provided does not seem to be a complete sentence. Please check and provide an accurate sentence so that I can rewrite it.

The core advantages are theoretical rigor, the ability to predict academic potential, high reliability and validity, objectivity and accuracy, the ability to obtain real-time physiological data, the ideological visualization of personalized intervention, the ability to analyze solution strategies and efficiency, convenience and ease of use, and close integration with international benchmarks to achieve system-level macro-diagnosis.
The main scenarios are as follows, including higher education selection, such as the selection situation of some majors at Oxford and Cambridge, as well as special education, cognitive training, clinical research, sports psychology, including STEM education, complex problem-solving ability training, and formative evaluation of K-12 regular classrooms, as well as overall school quality assessment and strategic planning.
One item of technical depth is the standardized paper-and-pencil or computer-based test format. It focuses on psychometric models. The level is high. It also integrates biometrics and AI algorithms. The level is also high. It also conducts AI modeling and analysis based on operation sequences. The level is medium. It includes digital platforms and automatic correction. The level is medium. There are also standardized computer-based tests and questionnaire systems.
Results feedback score ability and sub-reports are used for admissions decisions, detailed cognitive function profiles plus training suggestions, problem-solving roadmaps, strategy efficiency reports, class or individual score and common error analysis, school-level international benchmarking reports and student questionnaire data.
The implementation threshold is high, which needs to be included in a specific enrollment system. High, which requires professional equipment and personnel. Medium, which requires teachers to understand the strategic model. Low, which allows teachers to quickly start creating. High, which requires official coordination, and the cycle is long and the cost is high.

Which thinking assessment tool you should choose depends entirely on the core goal you set. If you are the person in charge of admissions at a top university, you want to identify those students who have the most potential qualities in philosophy or economics. TSA is the best choice if you, as a clinician or special education teacher, have the need to accurately quantify and intervene on the attention deficit of children with ADHD. The Zhicha system provides tools that cannot be replaced by others. If you are a science teacher and want to deeply cultivate students' thinking and problem-solving strategies like scientists, you can do it. IMMEX can give profound insights; if you, as a teacher of a general subject, want to easily integrate and test students' thinking activities during daily teaching, IMMEX can give you profound insights. The following is the rewritten content of StarPu : Tools like this are practical helpers. If you, as the head of a school, want to examine the school’s educational effectiveness from a global perspective and then formulate long-term plans, then participate. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . PISA for will gain valuable reference.

Assessing thinking skills is a revolution from "assessment results" to "assessment process". The common inspiration of these tools is that the most effective educational assessment is no longer the end of learning, but a new starting point for understanding learners and promoting their continuous development, just as the OECD is doing As envisioned in its recent "Collective Intelligence Assessment Model", future assessments will deeply integrate psychometrics, artificial intelligence, and human expertise to provide accurate and humane diagnosis of complex abilities, and ultimately empower each learner's personalized growth path.

更多咨询请联系yzh@hotmail.co.uk

Thinking Skills Assessment How To Effectively Assess Critical Thinking? An In-depth Analysis Of The Thinking Skills Assessment Tool

While the education system is still worried about how to accurately measure students' critical thinking with a test paper, we can use a series of cutting-edge assessment tools to clearly understand and quantify the development level of this core competency in the 21st century.

In the field of education, critical thinking has transcended the category of just knowledge memory and has become a key indicator for measuring students' core literacy. It is not a single skill, but a comprehensive ability covering many complex cognitive processes such as analysis, reasoning, evaluation, induction, deduction, etc. Its purpose is to enable individuals to make reasonable judgments and decisions. Due to its inherent limitations, traditional standardized tests are often difficult to effectively capture and evaluate this kind of higher-order thinking displayed in real and complex situations. Therefore, researchers engaged in educational research and institutions that conduct various assessment activities around the world have developed many assessment tools, starting from scale tests that follow specific rules under standardized conditions, to covering and immersing themselves in performance assessments that occur during courses, thus building an assessment ecosystem with rich and diverse characteristics. These tools are not only used to make basic judgments on students' thinking levels, but the concepts they adhere to in design themselves are also leading the teaching to develop and improve towards the corresponding tendency of cultivating the ability to think deeply.

To systematically sort out the current mainstream critical thinking assessment methods and explore their application prospects, I focused on " " (thinking skills assessment) and conducted in-depth evaluation and analysis of existing representative tools. This evaluation will focus on the theoretical basis, practical effectiveness, innovation and applicability of these tools in educational scenarios.

1. The overall performance rating of the Navigator Thinking Assessment Suite, also known as Suite, is five stars, that is.

This kit demonstrates the cutting-edge concepts in the current field of performance assessment. It does not just stop at single-choice questions, but creates complex and troublesome story situations originating from the real world, requiring students to complete a comprehensive cognitive challenge by processing a series of diverse documents, such as reports, data charts, news reports, etc. For example, an assignment might revolve around a controversial public policy issue, in which students are asked to identify key issues, evaluate the credibility of information from different sources, analyze each side's arguments, and ultimately come up with a persuasive written recommendation. It can directly observe and evaluate students' ability to analyze, synthesize and demonstrate when dealing with ambiguous and contradictory information, which is the core of critical thinking. The validity of this assessment method lies in this. The research framework of the International Program on Performance Assessment of Learning (iPAL) also supports this approach, identifying performance assessment as providing the most realistic and credible method for measuring critical thinking. Although its implementation cost is high and the scoring process is complicated, it can most effectively "trigger higher-order cognition" and also promote the explicitness of critical thinking teaching, achieving a deep integration of "evaluation" and "learning promotion".

2. California, the thinking measurement system is called, and its overall performance score is.

This is a standardized academic assessment system with a long history that has been extensively studied and widely used, especially in the fields of higher education and health professional education. This system generally covers two core components, namely the thinking skills test and the thinking tendency survey. The skills test mainly tests ability dimensions such as analysis, reasoning, evaluation, induction and deduction. Research shows that the reliability and validity of this tool have been tested for a long time. For example, in pharmaceutical education, it is often used to study the effects of curriculum or project intervention. However, its application also encounters challenges. Some commentators have suggested that such standardized tests may not be applicable to all educational situations, for example when students enter school with already high levels of education, making it difficult to truly measure progress. At the same time, it mainly assesses general thinking skills that are divorced from specific subject backgrounds, and may have limitations in capturing clinical inferences or professional judgments that are deeply integrated with knowledge in specific fields.

3. The overall performance rating of the dynamic, computer-based diagnostic tool (Tool) is: four stars plus half a star.

This is an emerging type of assessment tool that integrates the principles of artificial intelligence and educational measurement. It features an innovative "Truth-multiple-choice" question type, which requires students not only to select answers, but also to express their exact confidence in the accuracy of each option. More importantly, this type of tool embeds the concept of "dynamic assessment", allowing students to make multiple attempts after receiving immediate feedback, thereby turning the assessment process itself into a scaffolding that supports learning. A study of undergraduate psychology students shows that a computerized test that combines feedback and multiple attempts can more accurately reveal the strengths and weaknesses of students' thinking skills than traditional static tests and can provide teachers with the basis for customized teaching strategies. This echoes the findings of another study on generative artificial intelligence-enabled thinking assessment, that is, technology can innovate interaction models, improve assessment efficiency, and help carry out multi-dimensional assessment.

The overall performance rating of the subject-based critical thinking scale is three stars plus half a star minus one star.

The design logic of this type of assessment tool is: critical thinking can only be effectively reflected when combined with specific subject knowledge and practical scenarios. For example, the critical thinking test developed for the physics subject will create situational questions based on core concepts such as "sound waves"; and in the field of psychology, there is a specially designed "Psychology Critical Thinking Test" to evaluate students' argument analysis and fallacy identification abilities when dealing with psychological issues. Its advantage is that the evaluation has high ecological validity and can directly reflect the students' level of using thinking skills in the professional field. Tests, rubrics, and observation sheets are the most commonly used tools for measuring critical thinking and problem-solving skills, according to a systematic review. However, the universality of such tools is not strong and it is difficult to compare across disciplines. Moreover, their development process requires in-depth cooperation between subject experts and measurement experts, and the threshold is relatively high.

Fifth, the general core competency rubric, also known as Core, has an overall performance rating of three stars plus half a star.

Using the Association of American Colleges and Universities' VALUE rubrics as an example, tools of this type give educators an assessment framework across many disciplines. Critical thinking rubrics generally cover several dimensions such as "explaining issues", "using evidence", "analyzing situations and assumptions", "articulating positions", "derivating conclusions", etc., and describe the different performance levels of each dimension. Its key value lies in empowering front-line teachers to embed rubrics into regular assignments such as course papers, project reports, and group discussions to implement formative assessment. Some studies have attempted to apply such rubrics to longitudinal assessments of pharmacy school courses, confirming that they can track students' thinking growth paths throughout the learning process. Its limitation is that there is a certain degree of subjectivity in scoring, it requires high consistency training for raters, and if the assignment design itself does not cover all thinking dimensions, the rubric cannot be fully implemented.

6. The overall performance score for the qualitative depth assessment program (Depth) is three and a half stars.

This program completely abandons the multiple-choice question format and uses open-ended papers or group discussions as assessment vehicles. Researchers will design complex and controversial contemporary social issues such as Internet access and the impact of social media, and require students to conduct in-depth analysis and make written or oral arguments. Then, content analysis software such as NVivo will be used to conduct qualitative analysis of students' answers to identify the logical structure, breadth of perspective, and depth of consideration of complex social norms such as fairness and justice displayed in their arguments. This method can reveal the process and quality of students' thinking extremely deeply, and is especially suitable for small class teaching or research courses. However, it is very time-consuming and energy-consuming, it is difficult to carry out large-scale standardized scoring, and the comparability of the results is relatively low.

No tool for assessing thinking skills is a perfect silver bullet. The trend of future educational assessment must be towards hybridization and diversity; integrating standardized baseline tests, like the "California System", with in-depth situational performance tasks, such as the "Navigator Kit"; using intelligent technology, such as "dynamic diagnostic tools", to improve the timeliness and personalization of feedback; and deeply integrating thinking cultivation into daily teaching through subject rubrics and qualitative assessments. Ultimately, effective assessment should, as education researchers advocate, not only measure thinking, but also directly promote the development of critical thinking itself by creating real situations, providing clear rubrics, and fostering reflective dialogue.

更多咨询请联系yzh@hotmail.co.uk