Basic services of exploitation of full text
Other than the research on descriptive metadata of the collections and articles and the indexing of full text, we forsee three supplementary basic services:
- Search for terms and their variants. It is appropriate to determine in the initial text the textual sequences that are the most susceptible to be good candidate terms in the scientific domain corresponding to the document analysed. A team that brings together the skills of the TALN group of LINA and of the INIST gives itself the aim, in specialized domains, of ensuring the detection and tagging of terms and of their variants in full text. Furthermore, it maintains a frame of reference of scientific terminology for the exploitation of ISTEX’s data.
- Research on the named entities. This requires prior ability of detection, standardisation, and tagging of such entities named in the full text. A team that has all the necessary skills, made up of the Tours Computing Laboratory and the INIST, is responsible for this aspect. Named entities generally refer to dates, names of places (towns, regions, countries), names of individuals or groups of individuals (name of the team, laboratory, or institution). The internet addresses of resources or data could be added to this list, along with the names of projects linked to a publication or cited in a publication. In a specialised domain, it could be much more subtle: the names of stars in astronomy, molecules in chemistry, formulae in mathematics, plants in botany, etc.
- Access to the principal fields of bibliographic references. Preliminary automatic tagging of this information in the bibliographic references of the articles is underway in the INIST. Such access will allow scientific maps to be constructed for sub-domains, and questions of the following types to be answered: Who works with whom? Which networks of citations exist? What are the most striking vehicles of publication? What are the preferred vehicles of publication for a given scientific community? How do they evolve with time? etc.