Research Programme 5 (RP5)
- Research Programme Overview
- Research Programme 1 (RP1)
- Research Programme 2 (RP2)
- Research Programme 3 (RP3)
- Research Programme 4 (RP4)
- Research Programme 5 (RP5)
- Research Programme 6 (RP6)
RP5 - Content Organisation and Management
Programme Leaders: Prof. Alan Smeaton and Prof. Barry Smyth
Context and Objectives
Digital libraries provide some level of retrieval and access which operates across different media. Yet there are at least two other dimensions of information access besides the media that digital libraries and other contemporary approaches to information retrieval do not usually address. The first is context, in particular user and data or document contexts. When users seek information, either directly from a repository or through some application that integrates the personalization and recommendation strategies to be developed in RP6, they have a context which includes their current overall work tasks, their background, what they have already seen and done, their data access channel, etc. Capturing and using such context represents a key research challenges in contemporary information retrieval. In a similar manner, when documents, or indeed any kind of data, are collected or aggregated, they also have contexts, which include the application, date, time, place, environment and the circumstances that form the setting for why the data was gathered in the first place. All this can be captured within CLARITY given the centre’s unique position to sense user, document and data contexts, and this can leveraged for more effective access and retrieval.
The second dimension which has been largely ignored in research to date addresses a user’s search type. This refers to the kind of retrieval that is best suited for the user’s situation at the time. The default retrieval modality is to rank documents or data objects in decreasing similarity to a query and to ignore issues of the novelty value of retrieved items, or whether a user wishes to retrieve an exhaustive set of results for an exploratory type of search, or to retrieve a known item, or items similar in some way to a given item. None of these are satisfied by a simple similarity ranking. While some research has been carried out on developing some of these retrieval types individually, no work has yet been done on combining them.
The goal of this research programme is to research and develop approaches to allow information access mechanisms to operate across different media, to incorporate users and document/data contexts as appropriate and to facilitate different types of retrieval for users. This will allow us to exploit the natural synergies which exist across media, across user contexts and even across search types.
Work Packages
WP 5.1: Using User and Data Contexts for Retrieval
Information retrieval works best when the user’s information need can be characterised and modeled appropriately. Access mode, query formulation, matching and presentation of results have different requirements depending on the type of user and his/her information need. This poses difficulties when designing generic information retrieval systems based on only one sensor modality, the user’s query. By broadening out the number of sensors, i.e. introducing user and data/document contexts into retrieval, the precision of the resulting retrieval operation can be significantly improved.
This work package will investigate content-based and context-aware IR as a service for the personalization applications (RP6) as well as for the enduser to use directly [1, 2]. These services will use user and/or group profiles as well as the mined user/application contexts of when the data was sensed or gathered. For example, knowing a care-giver’s information needs based on his/her work tasks such as nutritional analysis or exercise profiling of a patient, a visual diary of a patient’s activities could be automatically structured to facilitate efficient navigation. In this case, the context of the searcher allows for a more targeted approach to information access, and this approach is equally applicable in a variety of other environments, such as CCTV or applications for mobile platforms.
WP 5.2: Integrating Retrieval for Different Search Types
Being able to seamlessly integrate retrieval across different qery types & media, rather than just bolting together different media search engines, is one of the most challenging aspects of the multimedia field, as identified in [3]. This work package develops techniques for retrieval which span different media, addresses different search types identified from WP 5.1, and different user and data contexts, and concentrates on exploiting synergies which lie across these. For example, the concept of visterms has recently become popular, in particular in work by R. Manmatha and others [4][5][6]. In the approach advocated here, visterms correspond to a kind of visual language analogous to words and the principle can be extended to any kind of representation or index features in any kind of collection of data, including the kinds of semantic features identified in RP4.
Visterms, either directly observed or derived as semantic feature descriptions, will have statistical distribution properties among data collections, just as words do in texts or any index feature will have in any data collection. Identifying these allows the problem of information access and information management of objects with different media to be viewed in terms of a unified language of representation, and a unified modeling approach may be used to model the distributions of terms or features, whether index terms in text, visual features in image/video or repeating patterns in sensed data similar to the approach taken in [7]. Once feature and term distributions can be modeled in a unified framework it becomes feasible to develop different retrieval strategies for different retrieval types which will then span the different media and which can incorporate different user and document contexts. Combining all these represents a significant challenge, and will significantly advance the field.
Novelty
This research programme addresses information access with novel contributions in three different areas, namely access across different media by modeling the notion of index “terms or visterms in a unified way, by catering for different types of information need by introducing algorithmic variations on the conventional rank by similarity to query model, and by incorporating both user and data/document context into the information access programme. The novelty of the proposed work is in combining all three directions into one, which is facilitated by the rich context capture (for users and for data/documents) to possible within CLARITY demonstrators.
A recent report from a group of leading researchers in information retrieval which defined a set of global challenges in information retrieval and language modeling [8], recognised “user and context-sensitive retrieval” as the first of a set of recurring themes throughout all the specific challenges for the field. The overall CLARITY work programme ensures that all the different kinds of data that we work with from users, from sensed or gathered information or aggregated and created data, will have a history of where it came from, how it relates to other data and how it is used. The first workpackage on Using User and Data Contexts for Retrieval will use these histories to provide exactly the kinds of user and data/document contexts which should form part of an information access process. The CLARITY demonstrators, for example, in addition to the various overlaps and intersections among the research programmes in CLARITY, will provide a rich set of contexts which can be captured and used for retrieval. The major contributions of this first work package is that we will be able to use these contexts to actually make progress on developing context-aware retrieval.
The second workpackage will address integrating retrieval across different types of media and across different types of search, something that has been attracting the attention of IR researchers for a long time -- see [9]. Our work will concentrate on exploiting synergies which exist across these different media and search types within a single unified framework and use these to improve the effectiveness of information access.
References:
[1] X. Shen, B. Tan, and C. Zhai, “Implicit user modeling for personalized search,” in CIKM ’05: Proceedings of the 14th ACM international conference on Information and knowledge management. New York, NY, USA: ACM Press, 2005, pp. 824–831.
[2] P. J. Brown and G. J. F. Jones, “Context-aware retrieval: Exploring a new environment for information retrieval and information filtering,” Personal Ubiquitous Comput., vol. 5, no. 4, pp. 253–263, 2001.
[3] A. Jaimes, M. Christel, S. Gilles, R. Sarukkai, and W.-Y. Ma, “Multimedia information retrieval: what is it, and why isn’t anyone using it ?” in MIR ’05: Proceedings of the 7th ACM SIGMM international workshop on Multimedia information retrieval. New York, NY, USA: ACM Press, 2005, pp. 3–8.
[4] S. Chang, R. Manmatha, and T. Chua, “Combining Text and Audio-Visual Features in Video Indexing,” Acoustics, Speech, and Signal Processing, 2005. Proceedings.(ICASSP’05). IEEE International Conference on, vol. 5, 2005.
[5] V. Lavrenko, S. Feng, and R. Manmatha, “Statistical models for automatic video annotation and retrieval,” Acoustics, Speech, and Signal Processing, 2004. Proceedings.(ICASSP’04). IEEE International Conference on, vol. 3, 2004.
[6] P. Quelhas, F. Monay, J. Odobez, D. Gatica-Perez, T. Tuytelaars, and L. van Gool, “Modeling Scenes with Local Descriptors and Latent Aspects,” Computer Vision, 2005. ICCV 2005. Tenth IEEE International Conference on, vol. 1, 2005.
[7] C. Meghini, F. Sebastiani, and U. Straccia, “A model of multimedia information retrieval,” J. ACM, vol. 48, no. 5, pp. 909–970, 2001.
[8] J. Allan, J. Aslam, N. Belkin, C. Buckley, J. Callan, B. Croft, S. Dumais, N. Fuhr, D. Harman, D. J. Harper, D. Hiemstra, T. Hofmann, E. Hovy, W. Kraaij, J. Lafferty, V. Lavrenko, D. Lewis, L. Liddy, R. Manmatha, A. McCallum, J. Ponte, J. Prager, D. Radev, P. Resnik, S. Robertson, R. Rosenfeld, S. Roukos, M. Sanderson, R. Schwartz, A. Singhal, A. Smeaton, H. Turtle, E. Voorhees, R.Weischedel, J. Xu, and C. Zhai, “Challenges in information retrieval and language modeling: report of a workshop held at the center for intelligent information retrieval, University of Massachusetts Amherst, September 2002,” SIGIR Forum, vol. 37, no. 1, pp. 31–47, 2003.
[9] M. S. Lew, N. Sebe, C. Djeraba, and R. Jain, “Content-based multimedia information retrieval: State of the art and challenges,” ACM Trans. Multimedia Comput. Commun. Appl., vol. 2, no. 1, pp. 1–19, 2006.
