Best Open Source Linguistics Software 2026

Linguistics Software

Linguistics Information Analysis Clear Filters

Browse free open source Linguistics software and projects below. Use the toggles on the left to filter open source Linguistics software by OS, license, language, programming language, and project status.

OpManager the network monitoring software used by over 1 million IT admins
Network performance monitoring, uncomplicated.

ManageEngine OpManager is a powerful network monitoring software that provides deep visibility into the performance of your routers, switches, firewalls, load balancers, wireless LAN controllers, servers, VMs, printers, and storage devices. It is an easy-to-use and affordable network monitoring solution that allows you to drill down to the root cause of an issue and eliminate it.

Learn More
Remotely access and manage devices to provide on-demand IT support. View the screen and control a remote computer or mobile device.
Be Efficient Support Remotely

ISL Light is an easy-to-use remote desktop software for security-conscious users. It comes at a great price-performance. ISL Light is a powerful tool that helps IT staff and support technicians solve problems remotely, either through unattended access, remote support or even though screen-sharing on mobile devices. It works cross-platform and offers 256-bit encrypted sessions with all standard remote access features plus some important extras: session recording, live chat, videocall, multi-monitor support, file transfer, reporting and many more. Users can choose between cloud or on-premise service. ISL Online license does not limit the number of users, workstations and clients you support. It's a reliable and highly secure software used in all industry sectors including banks, hospitals, governmental institutions and insurances.

Free Trial
1

iramuteq

IRAMUTEQ : Interface de R pour les Analyses Multidimensionnelles de Textes et de Questionnaires. Logiciel de traitement de données pour des corpus texte ou de type individus/caractères. Permet notamment de réaliser des analyses de type "ALCESTE"

Downloads: 997 This Week

Last Update: 2024-11-03
See Project
2

Wordcorr

Data management for comparative linguistics

Wordcorr automates the tedious and risky process of tabulating and managing the sound correspondences used in working out the historical development of natural languages. Initial support was from NSF.

4 Reviews

Downloads: 15 This Week

Last Update: 2013-01-05
See Project
3

Ghawwas_V4

An open source system for Arabic corpora processing

Ghawwas (previously known as Khawas) is an open source system for Arabic corpora processing. Ghawwas V4.0 provides the following main functions: a. Frequency list for single word and N-Grams b. Concordance c. Collocation (MI, CHI Squared, LL, T-Score, Z Score, Dice, Log Dice, Weirdness Coefficient) d. Lexical patterns search e. Two corpora frequency profile comparison based on MI, CHI, LL, T-Score, Z Score, Dice, Log Dice, Weirdness Coefficient f. Accept Windows and UTF-8 character encoding g. Accept TXT, DOC, DOCX, RTF and HTML formats h. Export the processing results in CSV file format

1 Review

Downloads: 20 This Week

Last Update: 2018-12-09
See Project
4

Korean Analyzer Rhino

Parsing Korean words by morpheme and part-of-speech

RHINO parses Korean words by morpheme and part-of-speech. Its dictionaries are based on Korean Modern Tagged Corpus(12 million phrases scale) which was made by Korean government. So it analyses many cases of stems and endings. And the newly developed Dynamic Dictionary Technology can make words to react with their context. That is, a programmed database. For more information see the files in the help folder.

Downloads: 18 This Week

Last Update: 2020-10-11
See Project
Dominate AI Search Results
Generative Al is shaping brand discovery. AthenaHQ ensures your brand leads the conversation.

AthenaHQ is a cutting-edge platform for Generative Engine Optimization (GEO), designed to help brands optimize their visibility and performance across AI-driven search platforms like ChatGPT, Google AI, and more.

Learn More
5

TXM

Unicode XML TEI text analysis platform

TXM is a free and open-source cross-platform Unicode & XML based text analysis environment and graphical client, supporting Windows, Linux and Mac OS X. It can also be used online as a J2EE standard compliant web portal (GWT based) with access control built in. DOWNLOAD LATEST VERSION OF TXM : http://textometrie.ens-lyon.fr/spip.php?rubrique61&lang=en TXM offers a comprehensive range of analysis tools (concordances, collocate search, frequency lists, etc.) based on the powerfull CQP full text search engine (http://cwb.sourceforge.net) and a range of statistical functions (factorial analysis, classification, cooccurrency analysis, etc.) based on R packages (http://www.r-project.org). Read the scientific background at the Textométrie project web site http://textometrie.ens-lyon.fr/?lang=en. Read a full description at the TEI Tools wiki http://wiki.tei-c.org/index.php/TXM.

Downloads: 10 This Week

Last Update: 2024-12-09
See Project
6

BioC

We describe a simple XML format to share text documents and annotation

A minimalist approach to share text documents and data annotations. Allows a large number of different annotations to be represented. Project files contain: - simple code to hold/read/write data and perform sample processing. - BioC-formatted corpora - BioC tools that work with BioC corpora BioC goals - simplicity - interoperability - broad use - reuse There should be little investment required to learn to use a format or a software module to process that format. We are interested in reuse, and we focus on common NLP tasks that are broadly useful for textmining.

Downloads: 9 This Week

Last Update: 2016-08-08
See Project
7

Welsh Natural Language Toolkit

WNLT is a suite of open source natural language modules for the Welsh

The project supports the Welsh Language Technology domain with a set of NLP tools that drive innovation and advance the development of sophisticated textual analysis solutions. The WNLT project delivers four core NLP modules; a) Word Segmentation for separating text into words b) Sentence Boundary Disambiguation for finding sentence boundaries c) Part of Speech Tagger for determining the part of speech of each word d) Morphological Analyser for identifying the root form (lemma) of words. The modules are written in JAVA and ‘wrapped’ for execution under the General Architecture for Text Engineering (GATE) framework. The project also includes CYMRIE an adapted version for Welsh of the GATE - ANNIE Named Entity Recognition (NER) application for a range of entities such as Persons, Organisations, Locations, and date and time expressions.

Downloads: 2 This Week

Last Update: 2016-11-29
See Project
8

Aramorpher

Based on the Buckwalter Morphological Analyzer (Version 1.0) for doing Arabic stemming and POS tagging. Includes a rewrite of the original Perl script, with better documentation and more flexible options, and a C++ interface (usable as a library or app).

Downloads: 1 This Week

Last Update: 2016-08-09
See Project
9

Colloquium QDA

A free and open source qualitative ethnographic interview coding tool.

Colloquium QDA is a tool for custom coding and analyzing qualitative ethnographic interviews. To run, make sure you first have JRE 8 or later installed (http://www.oracle.com/technetwork/java/javase/downloads/). Colloquium QDA is an open source cross-platform Java Swing app utilizing an embedded Java DB with Lucene integrated search.

Downloads: 1 This Week

Last Update: 2017-01-23
See Project
Managed File Transfer Software
Products to help you get data where it needs to go—securely and efficiently.

For too many businesses, complex file transfer needs make it difficult to create, manage and support data flows to and from internal and external systems. Progress® MOVEit® empowers enterprises to take control of their file transfer workflows with solutions that help secure, simplify and centralize data exchanges throughout the organization.

Learn More
10

Khawas

An Arabic Corpora Processing Tool

The new version is available at https://sourceforge.net/projects/ghawwasv4/

Downloads: 1 This Week

Last Update: 2014-08-02
See Project
11

ACOPOST - a collection of POS taggers

Part-of-speech tagging is the task of assigning symbols from a particular set to words in a natural language text. ACOPOST implements and extends well-known machine learning techniques and provides a uniform environment for testing.

1 Review

Downloads: 0 This Week

Last Update: 2016-02-26
See Project
12

CRFSharp

CRFSharp is a .NET(C#) implementation of Conditional Random Field

CRFSharp(aka CRF#) is a .NET(C#) implementation of Conditional Random Fields, an machine learning algorithm for learning from labeled sequences of examples. It is widely used in Natural Language Process (NLP) tasks, for example: word breaker, postagging, named entity recognized, query chunking and so on. CRF#'s mainly algorithm is the same as CRF++ written by Taku Kudo. It encodes model parameters by L-BFGS. Moreover, it has many significant improvement than CRF++, such as totally parallel encoding, optimizing memory usage and so on. Currently, when training corpus, compared with CRF++, CRF# can make full use of multi-core CPUs and only uses very low memory, and memory grow is very smoothly and slowly while amount of training corpus, tags increase. with multi-threads process, CRF# is more suitable for large data and tags training than CRF++ now. For example, in machine with 64GB, CRF# encodes model with more than 4.5 hundred million features quickly.

Downloads: 0 This Week

Last Update: 2015-08-03
See Project
13

DCTFinder

Extract title and creation time from web page.

Web pages do not offer reliable metadata concerning their creation date and time. However, getting the document creation time is a necessary step for allowing to apply temporal normalization systems to web pages. DCTFinder is a system that parses a web page and extracts from its content the title and the creation date of this web page. DCTFinder combines heuristic title detection, supervised learning with Conditional Random Fields (CRFs) for document date extraction, and rule-based creation time recognition. DCTFinder is released under CeCILL free software license agreement. The system is described in the following paper (see 'Files' section): Xavier Tannier. "Extracting News Web Page Creation Time with DCTFinder". Proceedings of the 9th Language Resources and Evaluation Conference. Reykjavik, Iceland.

Downloads: 0 This Week

Last Update: 2016-10-21
See Project
14

ELIA(eye-tracking for psycholinguistics)

ELIA(Eyegaze Language Integration Analysis) supports the analysis of eye-tracking data for studies in language processing. ELIA eases early analysis of data to enable iterative development of experiments in response to spoken language.

1 Review

Downloads: 0 This Week

Last Update: 2013-04-24
See Project
15

EyeMap - Eye Movement Data Analyzer

EyeMap is a visualization and analysis tool for text reading eye movement data. It can process Unicode, proportion/non-proportion and spaced/unspaced reading materials, which supports various languages and experiment methods.

1 Review

Downloads: 0 This Week

Last Update: 2013-08-10
See Project
16

FALCON - Text Search Java Project

JSON based text search Java Project

----------------- - What is it? - ----------------- The "Falcon Search" is a JAVA API and tool to search inside the documents. It was originally started to search the content in pdf files under the project "HAWK Search". Searching with this tool is query-based not word-based as in most of the document search tools OR document readers. It also takes care of jumbling of words within query and spelling mistakes. Commonly used techniques in this project are Natural Language Processing, Information Extraction and Question-Answering Architecture. ---------------------- - Latest Version - ---------------------- Details of latest version can be found on project website - http://geekdadaji.com --------------------------- - CONTACT DETAILS - --------------------------- CREATOR : SWAPNIL A JADHAV (saj1919) EMAIL ID : dadajibudhau@gmail.com WEBSITE : http://geekdadaji.com LICENSE : CC BY-NC 4.0

Downloads: 0 This Week

Last Update: 2014-04-18
See Project
17

FREJ

FREJ stands for "Fuzzy Regular Expressions for Java" - it is a command-line tool and library which allow you easily compare strings with patterns disregarding nasty typos and considering several variants (like "Barack Obama", "B.H.Obama" etc.) Project sources are moved to github: https://github.com/RodionGork/FREJ

Downloads: 0 This Week

Last Update: 2015-10-17
See Project
18

HAWK - PDF Text Search Java Project

No more support for this project - TAKE A LOOK AT FALCONSEARCH

No more support for this project - TAKE A LOOK AT FALCONSEARCH "https://sourceforge.net/projects/falcontextsearch/"

Downloads: 0 This Week

Last Update: 2014-04-19
See Project
19

HanNanum - Korean POS Tagger

HanNanum is a Korean Morphological Analyzer and POS Tagger. A plug-in component-based architecture is adapted to the new Java version for flexible use. You can find the work flow for morphological analysis, POS tagging, noun extraction, etc. Contact: kschoi@kaist.ac.kr hjjeong@world.kaist.ac.kr

2 Reviews

Downloads: 0 This Week

Last Update: 2015-08-02
See Project
20

HebMorph

Making Hebrew properly searchable by IR software. Right now, most work is being done in our mailing list (planning), and on our github repository (concept code, see below).

Downloads: 0 This Week

Last Update: 2022-11-16
See Project
21

JoBimText

Linking Language to Knowledge with Distributional Semantics

JobimText is a software solution for automatic text expansion using contextualized distributional similarity. It provides text analysis tools for large corpora and has capabilities to create distributional semantic models (JoBimText models) and multi-word expressions.

Downloads: 0 This Week

Last Update: 2022-08-04
See Project
22

KH Coder

Quantitative Content Analysis or Text Mining

************************************************************ THIS PROJECT IS MOVED. See http://khcoder.net/en for the latest & greatest. You can download this tool from the new home. See you there! ************************************************************

9 Reviews

Downloads: 0 This Week

Last Update: 2018-12-30
See Project
23

LanguageTool

Proofreading Software for 20+ Languages

LanguageTool is an Open Source language/grammar checker. *** THIS REPOSITORY IS OUT OF DATE, see https://github.com/languagetool-org INSTEAD ***

2 Reviews

Downloads: 0 This Week

Last Update: 2014-05-09
See Project
24

Maui Topic Indexer

Maui is a multi-purpose automatic topic indexing algorithm. Given a document, Maui automatically identifies its topics. Depending on the task topics are tags, keywords, keyphrases, vocabulary terms, descriptors or Wikipedia titles.

Downloads: 0 This Week

Last Update: 2014-04-25
See Project
25

RDRPOSTagger

A Rule-based Part-of-Speech and Morphological Tagging Toolkit

RDRPOSTagger is a robust, easy-to-use and language-independent rule-based toolkit for Part-of-Speech (POS) and morphological tagging. RDRPOSTagger obtains fast performance in both learning and tagging process. RDRPOSTagger also achieves a very competitive accuracy in comparison to the state-of-the-art results. RDRPOSTagger now supports pre-trained POS and morphological tagging models for Bulgarian, Czech, Dutch, English, French, German, Hindi, Italian, Portuguese, Spanish, Swedish, Thai and Vietnamese. Additionally, RDRPOSTagger supports the pre-trained Universal POS tagging models for 40 languages. See the full usage of RDRPOSTagger at: http://rdrpostagger.sourceforge.net/

2 Reviews

Downloads: 0 This Week

Last Update: 2017-05-24
See Project