Page 4 | Best Open Source Java Linguistics Software

Java Linguistics Software

Linguistics Java Clear Filters

Browse free open source Java Linguistics Software and projects below. Use the toggles on the left to filter open source Java Linguistics Software by OS, license, language, programming language, and project status.

MongoDB Atlas runs apps anywhere
Deploy in 115+ regions with the modern database for every enterprise.

MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.

Start Free
Application Monitoring That Won't Slow Your App Down
AppSignal's Rust-based agent is lightweight and stable. Already running in thousands of production apps.

Full APM with errors, performance, logs, and uptime monitoring. 99.999% uptime SLA on the platform itself.

Start Free
1

LANGANA-E

LANGANA-E is an English natural language parser

LANGANA-E is an English natural language parser. It is the main structure that automatic understanding will be built upon. Automatic understanding can be used for answering questions automatically or finding an answer from a reference text archive.

Downloads: 0 This Week

Last Update: 2022-11-02
See Project
2

LanguageTool

Proofreading Software for 20+ Languages

LanguageTool is an Open Source language/grammar checker. *** THIS REPOSITORY IS OUT OF DATE, see https://github.com/languagetool-org INSTEAD ***

2 Reviews

Downloads: 0 This Week

Last Update: 2014-05-09
See Project
3

Large Document Search Engine

A system to perform analysis of large documents for the purpose of cataloging similar documents. Similarity is based upon contextual analysis of these documents done by identifying common words and proper nouns.

Downloads: 0 This Week

Last Update: 2016-11-02
See Project
4

Leseratte

Leseratte is a Java parser for German written language. Currently, it contains a German lexicon (based on the Wiktionary), inflexion rules, a grammar and a parser. (Semantics component planned.) Usable as a Java library, also provides a graphical UI.

Downloads: 0 This Week

Last Update: 2020-10-03
See Project
Earn up to 16% annual interest with Nexo.
Access competitive interest rates on your digital assets.

Generate interest, borrow against your crypto, and trade a range of cryptocurrencies — all in one platform. Geographic restrictions, eligibility, and terms apply.

Get started with Nexo.
5

LexSub

A Lexical Substitution Framework

Lexical substitution framework for supervised all-words lexical substitution using delexicalized features. For a runnable (but GPL-licensed) version of LexSub, see LexSub-GPL (sf.net/p/lexsub/lexsub-gpl)

Downloads: 0 This Week

Last Update: 2015-04-01
See Project
6

Live Transcribe Speech Engine

Live Transcribe is an Android application

Live Transcribe Speech Engine provides on-device speech recognition components that power real-time transcription for accessibility and everyday voice-first experiences. Its design prioritizes latency and robustness in noisy, far-field environments, enabling continuous transcription with low delay on mobile hardware. The engine manages audio front-end processing—such as noise suppression and voice activity detection—before feeding audio into compact, accurate acoustic and language models. Partial hypotheses stream as words are recognized, then stabilize with minimal jitter as confidence increases, which is crucial for usability. The code emphasizes efficient use of CPU and neural accelerators to balance battery life with responsiveness. Deployed in accessibility contexts, it aims for dependable behavior across accents, environments, and intermittent connectivity, with graceful degradation when resources are constrained.

Downloads: 0 This Week

Last Update: 2025-10-10
See Project
7

Lojban Glossary Builder

Java program to create a (potentially multilingual) glossary of the unique words in any given Lojban text. Note that the Sourceforge page for this was superceded by the Bitbucket repository: https://bitbucket.org/pretoriusjf/vlastezba/overview Any further updates will be made there.

Downloads: 0 This Week

Last Update: 2013-05-02
See Project
8

Maui Topic Indexer

Maui is a multi-purpose automatic topic indexing algorithm. Given a document, Maui automatically identifies its topics. Depending on the task topics are tags, keywords, keyphrases, vocabulary terms, descriptors or Wikipedia titles.

Downloads: 0 This Week

Last Update: 2014-04-25
See Project
9

Mechaglot, Calculate Semantic Similarity

Calculate semantic similarity for any human and human-like languages

WARNING: There are too many false-positives! This is Alpha release, expect many things to improve, including the algorithms. PLEASE GO TO BROWSE ALL FILES TO READ A FULL DESCRIPTION. The goal of this project is simple: Input two sentences of the same language, and obtain the number (from 0 to 1) denoting the similarity between the inputted sentences, according to semantic categories. This project models my previous project: https://sourceforge.net/projects/semantics/ Difference is, this project does not use any database and computes any Strings as an input. JAVA was the language of choice, due to availability of modelling tools. This work is licensed under the Creative Commons Attribution-ShareAlike 4.0 International License. To view a copy of this license, visit http://creativecommons.org/licenses/by-sa/4.0/. -Powered by WEKA, Classifier4J and SimMetrics.

Downloads: 0 This Week

Last Update: 2014-10-07
See Project
Gemini 3 and 200+ AI Models on One Platform
Access Google's best plus Claude, Llama, and Gemma. Fine-tune and deploy from one console.

Build generative AI apps with Vertex AI. Switch between models without switching platforms.

Start Free
10

Metalanguage And Analysis Toolkit

Downloads: 0 This Week

Last Update: 2015-05-09
See Project
11

Mitzuli

The open, easy-to-use and powerful translator app for Android

Mitzuli is an open source translator app for Android featuring a full offline mode, voice input (ASR), camera input (OCR), voice output (TTS), and more!

Downloads: 0 This Week

Last Update: 2015-03-02
See Project
12

Morfologik

ATTENTION! Morfologik is now at GitHub: https://github.com/morfologik/

1 Review

Downloads: 0 This Week

Last Update: 2015-09-10
See Project
13

Multiparse

This project is contains implementations of algorithms to integrate the output of different NLP tools (part of speech taggers, morphologies, parsers, etc.) in order to obtain more accurate, more robust and more fine-grained linguistic analyses. Note that the code is outdated, but left here for documentation purposes. Its functionality may be reimplemented within the NLP2RDF project (http://code.google.com/p/nlp2rdf).

Downloads: 0 This Week

Last Update: 2013-04-25
See Project
14

Nasira

Nasira is a Java library for reading text files with non-ASCII characters (e.g. documents in German, Swedish,...). To do so, it automatically determines the character encoding (iso-8859-1, utf-8) used to encode the file through user-provided hints.

Downloads: 0 This Week

Last Update: 2013-04-22
See Project
15

NeurPheus Morphological Analyser

The Neurpheus Morphological Analyser performs morphological analysis, stemming or word form generation tasks using sophisticated classification methods for an analysis of words unseen in a training dictionary.

Downloads: 0 This Week

Last Update: 2013-12-20
See Project
16

OPTIMA cidoc-crm Semantic Annotation

Semantic annotation of archaeology reports with respect to CIDOC-CRM

The semantic annotation system OPTIMA is the result of Andreas Vlachidis PhD work, (supervised by Prof. Douglas Tudhope, University of Glamorgan, UK). OPTIMA performs the NLP tasks of Named Entity Recognition, Relation Extraction, Negation Detection and Word Sense Disambiguation using hand-crafted rules and SKOS terminological resources (English Heritage Thesauri and Glossaries). The resulted semantic annotations are associated with classes of the (ISO 21127:2006) CIDOC Conceptual Reference Model (CRM) and its archaeological extension, CRM-EH. OPTIMA is also targeted at the detection and recognition of contextual relations between CRM entities. Such relations are modeled with respect to the CRM-EH archaeology extension. The pipeline targets the CIDOC-CRM entities; E19.Physical_Object, E53.Place, E49.Time_Appellation and E57.Material and the CRM-EH entities; EHE1001.Context_Event, EHE1002.Production_Event, EHE1004.Deposition_Event and P45.consists_of material property

Downloads: 0 This Week

Last Update: 2015-10-11
See Project
17

Ontology Creation

The program creates OWL ontology files that describe relationships between entities. Basis are definitions found by searching Wikipedia articles for specific lexico-syntactic patterns.

Downloads: 0 This Week

Last Update: 2014-06-26
See Project
18

PRDL Tools

Privacy Rule Definition Language to write Enterprise Privacy Policies

PRDL is one of the core components within the ENDORSE project. The scope of the language is to encompass clauses from data protection legislation and enterprise privacy policies in order to e.g. derive data access decisions automatically based on the enterprise privacy policies (EPPs). There have been many initiatives for expressing privacy rules and legal restrictions into a computable way. The attempt of PRDL is to present a collaborative result towards a multistakeholder language. The goal was that PRDL should be sufficiently expressive to define EPPs for SMEs, it should link the wording of the data privacy laws of different European countries, and it should be represented in natural language and therefore should be easy to understand. Additionally it should be able to express the workflows that have to be conducted within the helping wizards. After all, it should be automatically or semi automatically executable by a rule engine.

Downloads: 0 This Week

Last Update: 2012-07-09
See Project
19

PatchCatcher

Software for Patchwriting Detection

PatchCatcher uses suffix arrays to detect common types of patchwriting among scientific papers.

1 Review

Downloads: 0 This Week

Last Update: 2014-05-29
See Project
20

Phonology Charts

A linguistic tool to aid in the study of Linguistics/Phonology, specifically distinctive features of possible language sounds. Comprised of both a Visual C++ .NET version as well as a Java based web applet version. The C++ version has all but been ab

Downloads: 0 This Week

Last Update: 2015-06-04
See Project
21

Phrasal

Statistical phrase-based machine translation system

Stanford Phrasal is a state-of-the-art statistical phrase-based machine translation system, written in Java. At its core, it provides much the same functionality as the core of Moses. Distinctive features include: providing an easy to use API for implementing new decoding model features, the ability to translating using phrases that include gaps (Galley et al. 2010), and conditional extraction of phrase-tables and lexical reordering models. Developed by The Natural Language Processing Group at Stanford University, a team of faculty, postdocs, programmers and students who work together on algorithms that allow computers to process and understand human languages. Our work ranges from basic research in computational linguistics to key applications in human language technology, and covers areas such as sentence understanding, automatic question answering, machine translation, syntactic parsing and tagging, sentiment analysis.

Downloads: 0 This Week

Last Update: 2021-01-19
See Project
22

Porter Stemmer

Java version of Porter's Stemming algorithm

The Stemmer class transforms a word into its root form. The input word is provided from the add() methods. The stem() method will return the stem as will toString() after stem() has been called). The clear() method will wipe the Stemmer buffer and allow a new word to be input. This version extends Martin Porter's original stemming algorithm by allowing capital letters to exist in words. This version should also be plugged in wherever the old algorithm is used with few accommodations necessary. The code in this version is more readable (in my opinion) than the old version. There is a main at the bottom that shows how to use the Stemmer.

Downloads: 0 This Week

Last Update: 2015-10-07
See Project
23

RDRPOSTagger

A Rule-based Part-of-Speech and Morphological Tagging Toolkit

RDRPOSTagger is a robust, easy-to-use and language-independent rule-based toolkit for Part-of-Speech (POS) and morphological tagging. RDRPOSTagger obtains fast performance in both learning and tagging process. RDRPOSTagger also achieves a very competitive accuracy in comparison to the state-of-the-art results. RDRPOSTagger now supports pre-trained POS and morphological tagging models for Bulgarian, Czech, Dutch, English, French, German, Hindi, Italian, Portuguese, Spanish, Swedish, Thai and Vietnamese. Additionally, RDRPOSTagger supports the pre-trained Universal POS tagging models for 40 languages. See the full usage of RDRPOSTagger at: http://rdrpostagger.sourceforge.net/

2 Reviews

Downloads: 0 This Week

Last Update: 2017-05-24
See Project
24

Reconcile

Reconcile is an open source research platform for coreference resolution. It combines a large number of open source NLP components and provides extension points for researchers to plug in additional features and techniques.

Downloads: 0 This Week

Last Update: 2013-05-02
See Project
25

Rhyme Analyzer

A lyrical analysis and classification tool focused specifically on rhyming style in rap lyrics. Functions include phonetic transcription, rhyme visualization, and rapper classification.

Downloads: 0 This Week

Last Update: 2013-04-23
See Project