Portal:Computational linguistics

Welcome to the Wikiversity Center for Computational Linguistics.

Summary

The Center for Computational Linguistics is a Wikiversity content development project where participants create, organize and develop learning resources for Computational Linguistics. This general goal intersects the Schools of Computer Science and Linguistics. It relates also to Translation, Multilingual Studies and other topics.

Specific Goals

This content development project is concerned with learning activities for Computational linguistics. We need learning activities that will help learners:

To get familiarized with basic terminology of the field.
To get to know different experiences on this field related to Mediawiki: conjugators, bots, multilingual websites approach such as WiktionaryZ,...
To practice in your own computer software tools, such as Natural language Toolkit and Apertium.
To propose, discuss or even develop new applications which can be used with Mediawiki, especially to improve projects such as Wiktionary, language learning methodologies in Wikiversity or language learning books in Wikibooks.

Concepts to learn include: /concepts

Learning materials

Mini-icons of 10*10 pixels.

Learning materials and learning projects are located in the main Wikiversity namespace. Simply make a link to the name of the lesson (lessons are independent pages in the main namespace) and start writing!

You should also read about the Wikiversity:Learning model. Lessons should center on learning activities for Wikiversity participants. Learning materials and learning projects can be used by multiple projects. Cooperate with other departments that use the same learning resource.

Lessons

Brainstormed list of possible lessons:

Lesson 1: What does Computational Linguistics mean?
Lesson 2: Computational Morphology
- The conjugator based on templates: An example in Wiktionary
Lesson 3: The corpus (corpus linguistics)
- What is it? What can it be used for?
Lesson 4: The parser
Lesson 5: OmegaWiki as a corpus or lexicon
Lesson 6: Audio interfaces and the relationship between sound and meaning
Lesson 7: Human/Machine interfaces and linguistics framework
Lesson 8: Language acquisition for youngsters and their machines
Lesson 9: Computational applications for foreign language learning
- The multilingual platforms: user preference selection.
...Lesson brainstorm continues...

Remember: All actual learning resources should be on pages in the main namespace (page names with no prefix).

Alternative

First course — Introduction

Introduction
- Including Unix for Poets (how to mangle text)
Lexical analysis
- Morphological analysis
  - Finite state automata and transducers
    - Tour of free-software packages (including at least SFST and lttoolbox)
    - Paradigms and lemma-paradigm pairs
  - Two-level morphology
  - POS tagging
    - HMMs
Syntactic analysis
- - Finite state grammars
Semantic analysis
- Word sense disambiguation
Machine translation
- Sub-fields: Direct, Transfer, Example-based, SMT
  - Practicals on creating MT systems for a given pair of languages within the RBMT/Transfer paradigm (using Apertium), and in the SMT paradigm (using GIZA++/Moses)

Second course — Probabilistic methods

Activities

Creating templates for automatic regular conjugated verbs creation. (Regular verbs conjugated with templates in Spanish Wiktionary)
Using a corpus: [1]
Work on the Translator's Handbook sections, Machine Translation and Computer-assisted Translation
Participate locally at OmegaWiki or at OmegaWiki.org
...develop an activity...

Readings

Each activity has a suggested associated background reading selection.

Study guide: ...write me...
Wikipedia article: Computational linguistics
Multilingualism at Meta
...add more...

References

Additional helpful readings include:

Active participants

If you are an active participant here, please resign here every six months to a year. You can see past participants here. Active participants in this Learning Group

--Copyleft 22:29, 22 June 2009 (UTC)
-- ...