Georgian Morphological Segmentation & Visualization

Situation

Georgian verbs are famously complex, featuring a multi-slot morphological system where a single word form can encode subject, object, and version markers. Manually analyzing these forms is time-consuming, and standard text-based dictionaries fail to illustrate how individual morphemes shift across different grammatical categories.

Task

I was responsible for building a technical bridge between raw linguistic data and human-readable analysis:

ML Segmentation: Develop a model to automatically split verb forms into constituent parts (e.g., ga-v-aket-eb-di).
Interactive Visualization: Build a web-based dashboard to align and compare these segments visually for researchers and students.

Action

I approached this as a two-stage engineering problem:

1. The Segmentation Model (Back-End)

I implemented a Conditional Random Field (CRF) model using sklearn-crfsuite to treat morphological segmentation as a sequence labeling task.

State Logic: To handle Georgian orthography, I used a three-state logic—I (Inside), S (Single), and N (Null)—to account for the “empty slots” common in Kartvelian verb structures.
Accuracy: The model was trained on a dataset of over 13,000 forms, learning to identify root variations and prefix/suffix changes across different screeves.

2. The Morphological Dashboard (Front-End)

I built a reactive interface using Vue.js to display the model’s output in a meaningful way.

Synchronized Alignment: Using CSS Grid, I engineered a master table that vertically aligns every grammatical slot, ensuring that roots and preverbs stay perfectly stacked across an entire paradigm.
Interactive Highlighting: I added a “hover-sync” feature where interacting with a specific morpheme (like a versionizer) highlights its counterparts throughout the chart, making subtle linguistic shifts instantly visible.

Result

High-Precision Analysis: The model achieved ~98% word-level accuracy on complex verb forms.
Visual Clarity: The dashboard successfully isolates stem developments and thematic changes that are traditionally difficult to trace in flat text.
Scalable Infrastructure: This system allows for the rapid processing of massive verb databases, turning raw strings into structured, educational datasets.

Summary

Georgian NLP Segmentation & Visualization

Situation

Task

Action

1. The Segmentation Model (Back-End)

2. The Morphological Dashboard (Front-End)

Result

Georgian NLP Segmentation & Visualization

Situation

Task

Action

1. The Segmentation Model (Back-End)

2. The Morphological Dashboard (Front-End)

Result

Related Work