Edylex enhances semantic annotation of documents through the detection of proper names and neologisms for linguistic processing and speech transcripts.
New words and new uses are being created constantly. How to detect and classify an unknown word or a new proper name in a text or in a flow of words? How to assign a phonetic value, a category, syntactic properties or a place in a semantic network?
To answer these questions, the EDylex project had set its goal to experiment on the contents of Agence France-Presse all opportunities of dynamic enrichment of lexicons offered by the tools of Natural Language Processing. With a daily production of 5,000 wire stories in six languages (English, French, Spanish, German, Portuguese and Arabic), AFP is the ideal ground to test solutions of multimodal and multilingual linguistic analysis capable of dynamically enriching its own language models and lexicons.
The main purpose of the processes and tools developed within the framework of EDyLex is to improve the semi-automatic annotation of documents and transcription of video soundtracks.
Project funded by the French National Research Agency.