What is HLT Evaluation?

Evaluation is the systematic acquisition and assessment of information to provide useful feedback about some object which could refer to a program, policy, technology, person, activity, etc. The International Standard ISO 9126 sets out a framework for the evaluation of software quality and defines six quality characteristics to be used in the evaluation: functionality, reliability, usability, efficiency, maintainability, portability.

Evaluation as an activity in HLT development has been broadly applied in the U.S. and in Europe. It can provide:
- data collections,
- profiles of users populations,
- classification schemes for HLT systems,
- a set of representive evaluation tasks,
- metrics for effectiveness, efficiency and satisfaction, etc.

Goals of Evaluation

Any evaluation has pragmatically chosen goals. In HLT evaluation, the goal can be summarized by the following questions:
- "Which one is better?" The goal of evaluation is to compare different systems for a given application.
- "How good is it?" The evaluation aims to determine the degree of desired qualities of a system.
- "Why is it bad?" The goal is to determine the weakness of a system for further development.

Types of Evaluation

An evaluation can be user-oriented when an evaluation metric focuses on the users’ satisfaction. However, this kind of methodology is particularly suitable for the evaluation of ready-to-sell products.

As HLT Evaluation portal focuses on the evaluation of research systems rather than that of commercial products ready for the market, we describe evaluation as a technology that could deal with both the need to validate the HLT system as a product and to produce useful feedback for further improvement of the system.

There are many different types of evaluations depending on the object being evaluated and the purpose of the evaluation [1]

- Research evaluation tends to validate a new idea or to assess the amount of improvement it brings on older methods.
- Usability evaluation aims to measure the level of usability of a system. Typically it enables users to achieve a specified goal in an efficient manner.
- Diagnostic evaluation attempts to determine how worthwhile a funding evaluation program has been for a given technology.
- Performance evaluation aims to assess the performance and relevance of a technology for solving a problem well defined.

When the performance processing of a system consists of several components associated with different stages, an additional distinction should be respected between intrinsic evaluation, designed to evaluate each component independently, and extrinsic evaluation to assess the overall performance of the system.

Testing Techniques of Evaluation

There are in general two main testing techniques for system measurement: glass box and black box which approximates to the intrinsic and extrinsic evaluations. In the former approach, the test data is built by taking into account the individual component of the tested system. In the latter approach, however, the test data is chosen, for a given application, only according to the specified relations between input and output without considering the internal component.