This project is concerned with the problem of learning sequentially, adaptively and in partial information on an uncertain environment. In this setting, the learner collects sequentially and actively the data, which is not available before-hand in a batch form. The process is as follows: at each time t, the learner chooses an action and receives a data point, that depends on the performed action. The learner collects data in order to learn the system, but also to achieve a goal (characterized by an objective function) that depends on the application. In this project, we will aim at solving this problem under general objective functions, and dependency in the data collecting process - exploring variations of the so-called bandit setting which corresponds to this problem with a specific objective function.
As a motivating example, consider the problem of sequential and active attention detection through an eye tracker. A human user is looking at a screen, and the objective of an automatized monitor (learner) is to identify through an eye tracker zones of this screen where the user is not paying sufficient attention. In order to do so, the monitor is allowed at each time t to flash a small zone a t in the screen, e.g. light a pixel (action), and the eye tracker detects through the eye movement if the user has observed this flash. Ideally the monitor should focus on these difficult zones and flash more often there (i.e. choose more often specific actions corresponding to less identified zones). Therefore, sequential and adaptive learning methods are expected to improve the performances of the monitor.