218
submitted 4 weeks ago* (last edited 4 weeks ago) by fxomt@lemm.ee to c/dataisbeautiful@mander.xyz

Cross posted from: Latin@lemm.ee

lingua latina pater linguarum dimidum est ๐Ÿ˜Ž

I hope it's okay for me to crosspost here.

you are viewing a single comment's thread
view the rest of the comments
[-] Hackworth@lemmy.world 5 points 4 weeks ago

I wonder if something like the semantic tokenization method would benefit from using etymological data like this, particularly for a multilingual llm.

[-] gandalf_der_12te@discuss.tchncs.de 3 points 4 weeks ago* (last edited 4 weeks ago)

i know that my NN internally uses semantic tokenization method.

i literally often seek the word roots when talking to somebody. it helps me focus.

[-] fxomt@lemm.ee 2 points 4 weeks ago

Interesting paper, thanks for sharing

this post was submitted on 11 Jan 2025
218 points (100.0% liked)

Data is Beautiful

1487 readers
1 users here now

Be respectful

founded 7 months ago
MODERATORS