Nom: | wordnet francais |
Format: | Fichier D'archive |
Système d'exploitation: | Windows, Mac, Android, iOS |
Licence: | Usage Personnel Seulement |
Taille: | 37.61 MBytes |
Un prochain billet sera consacré à cette ressource. Communications of the ACM Vol. Ce projet bénéficie d'une validation manuelle partielle des littéraux qu'il contient [ 10 ]. The extracted multilingual lexical has been semantically disambiguated thanks to wordnets for the languages involved. Publications Only the most recent publications are listed below.
This architecture consists of two levels:. The UDLexicons collection is a multilingual collection of 53 morphological lexicons covering 38 languages that follow the guidelines and format of the Universal Dependencies UD initiative. These lexicons were created based on exiting resources using three different approaches described in Sagot They are named using the following naming scheme: Alexina is both a formalism for wirdnet morphological and syntactic lexicons and a series of tools for developing and exploiting such lexicons.
Over a dozen Alexina lexicons are available for the Le fffcf.
Polysemous literals have been dealt with fdancais an approach based on word-aligning a parallel corpora in 5 languages. The extracted multilingual lexical has been semantically disambiguated thanks to wordnets for the languages involved. Moreover, a bilingual approach was sufficient for building new entries for monosemous words. To achieve this, we extracted bilingual lexicons from Wikipedia and thesauri.
The resulting wordnet has been evaluated against the French wordnet developed during the EuroWordNet project.
WordNet — Wikipédia
Since then, several efforts have allowed for an extension of WOLF's coverage and a reduction of its noise. First, a disambiguation technique for translation pairs extracted from freely available resources lead to version 0. An approach targeted towards nominalisation extracted from parsed wlrdnet version 0. In parallel, most verbal Basic Concept Set synsets were validated and extended manually.
Finally, we performed a manual filtering of a large number of literal, synset pairs that were inconsistent with POS information from the Le fff lexicon, which allowed for an additional reduction of the noise in the resource. The result of these semi-manual efforts is WOLF version 1. For now, SENSE elements are filled with information on the sources and approaches thanks to which the lexeme was found, and not with sense numbers. Among those, a tag starting with "ManVal" indicates a manually validated literal, synset pair, a tag starting with "ManAdd" indicates a pair that was manually added.
Blog Onyme
SxPipe is a modular and customisable language processing pipeline aimed at applying a performing of shallow processing steps on raw corpora. It can be used both as a preliminary step before parsing, or for shallow processing purposes.
Developed for French and several other languages, SxPipe includes, among others, several named entity recognition modules, a sentence segmenter and tokeniser, a spelling corrector and multi-word unit detector, as well as an original architecture for detecting context-free patterns, used by several specialised grammars numbers, impersonal constructions in French….
One of the principles underlying SxPipe is the preservation of ambiguities. A linear succession of processing steps accumulates information about the input text. However, certain steps can lack part of the necessary information to perform certain choices. In such cases, SxPipe, whenever possible, preserves ambiguities, thus delaying the disambiguation decision to a later stage, when more information is available.
This requires that all modules involved be capable of producing ambiguous outputs, but also of accepting ambiguous inputs direct acyclic graphs, or DAGs. MElt is a freely available LGPL state-of-the-art sequence labeller that is meant to be trained on both an annotated corpus and an external lexicon.
It was initially developed by Pascal Denis and Benoît Sagot. Recent evolutions have been carried out by Benoît Sagot.
MElt was trained on various annotated corpus, using for instance Alexina lexicons as source of lexical information. MElt also includes a normalisation wrapper aimed at helping processing noisy text, such as user-generated data retrieved on the web.
This wrapper is only available for French and English. You can retrain MElt on your own data, provided you put it in the Brown format, using the MElt-train script.
Tools and Resources Le fff Morphological and syntactic lexicon for French. Alexina Morphological and sometimes syntactic lexicons other than Le fff.
EtymDB Etymological database extracted worddnet wiktionary. SxPipe Shallow language processing chain. Publications Only the most recent publications are listed below. Main participant besides Alpage: Extensional Le fff morphology only: Latest release corresponds to Sagot Compiling Alexina intensional lexicons into extensional lexicons requires the preliminary installation of the alexina-tools.
No comments:
Post a Comment