Are there subtle differences from languages using the Latin alphabet involving letters?

I got this reference from the video (linked) where he's talking about the differences between Mandarin and Japanese, mainly involving 漢字 which is present in both languages but there are actual differences (besides simplified or traditional characters) associated with either printed text and handwritten formats, kind of like this:

I mean, it's literally the same word in both languages but they're different if you look closely. No wonder why people mistake Japanese for CHINESE! It's infuritating when you know what Kanji actually looks like and how it's pronounced. The issue is that characters for the most part inherit same unicode between Japanese & Mandarin.

Mainly talking about word processors and how Kanji / Hanzi is encoded onto computers, what ends up happening is that they overlap depending on the font used (as Japanese has it's own font set while Mandarin uses a completely different one). Look at it like this, which language is the "A" looking letter from (shown below):

Both use the same font, but have the stroke on top facing different directions indicating they are from different languages. However if one doesn't pay close attention to the letters present within a word, they can face confusion over which language it belongs to. That's the kind of crap happens when discussing Kanji or Hanzi.

For example, the unicode for 軍 is U+8ECD which is present in both Japanese and Mandarin. Basically, the same as to why the letter "Í" (U+CD) appears in multiple languages like Spanish, Portuguese, Icelandic, Hungarian, Czech all inherit unicode U+CD so you may get mixed results in one of those languages.

View original on lemmy.zip

Comments7

Hapankaali

lemmy.world

Both use the same font, but have the stroke on top facing different directions indicating they are from different languages.

Languages using the Latin alphabet use varying sets of diacritics, often to introduce ways to express sounds that may not have been present in (Vulgar) Latin. A particular diacritic or diacritic-letter combination can generally not be associated with any specific Latin-script-based language, of which there are many hundreds if not thousands (depending on where one draws the line between language and dialect). An interesting example is Vietnamese, a tonal language using the Latin alphabet, which uses a large number of diacritics to express tonality.

𝕱𝖎𝖗𝖊𝖜𝖎𝖙𝖈𝖍

lemmy.world

If the question is whether the Latin languages use letters differently: yes, every language is different?

I can speak for English, Spanish, and French. English is a bastard language with more exceptions than rules, as we all know. Spanish mostly uses accents as a pronunciation guide with some exceptions, whereas French accents change the letter sound more consistently. Both can change the meaning. French uses ç but not Spanish, and Spanish uses ñ exclusively. French is much more contextual.

Sp: "Si llego a tiempo" (if I arrive in time) vs "Sí, llegó a tiempo" (yes, he arrived in time). Same general sound, different emphasis.

Fr: «Bon mais sale» (good but dirty) vs «bon maïs salé» (good salted corn). Wildly different sounds and even syllable counts.

HobbitFoot reply

thelemmy.club

I feel like, to add on it, letter combinations also yield wildly different sounds in different languages. For instance, "ll" in English sounds like an "l" while in Spanish it sounds like a "y".

𝕱𝖎𝖗𝖊𝖜𝖎𝖙𝖈𝖍 reply

lemmy.world

In French, «queue» rhymes with euh, «clown» rhymes with spoon, and «comment» sounds like c'mon. The Spanish ñ is closest to the French gn («mignon») and English ny/ni ("canyon", "onion"). Don't get me started on the R sounds.

quediuspayu reply

lemmy.dbzer0.com

...it sounds like a "y"

That is called yeismo and is caracteristic of some dialects, the traditional pronunciation is /ʎ/,

quediuspayu reply

lemmy.dbzer0.com

I must say that Ç is also present in both Catalan and Portuguese.

blackbrook

mander.xyz

Yes different languages use different sets of characters, most of them overlapping in the case of Latin derived characters. It would be vastly less convenient if there was a completely different set (with mostly the same characters) for every single language. And what about variants over time and region and dialect? So it is much better to model them as subsets of the same larger character set. Note that À and Á are different characters with different Unicode symbols.

The human world is messy with lots of inconsistencies and irregularities, particularly from pre-computer times because humans just deal with them and tolerate mistakes. This is a challenge for modelling properly in computer systems. It does not surprise me that there are even bigger challenges around this for Kanji / Hanzi.