I thought the most mode sane and modern language use the unicode block identification to determine something can be used in valid identifier or not. Like all the 'numeric' unicode characters can't be at the beginning of identifier similar to how it can't have '3var'.
So once your programming language supports unicode, it automatically will support any unicode language that has those particular blocks.
I think they exclude some unicode characters from being use in identifiers. At least last I tried it wouldn't allow me to use an emoji as a variable name.
Unironically awesome. You can debate if it hurts the ability to contribute to a project, but folks should be allowed to express themselves in the language they choose & not be forced into ASCII or English. Where I live, English & Romantic languages are not the norm & there are few programmers since English is seen as a perquisite which is a massive loss for accessibility.
The hotter take: languages like APL, BQN, & Uiua had it right building on symbols (like we did in math class) for abstract ideas & operations inside the language, where you can choose to name the variables whatever makes sense to you & your audience.
Yeah. Tbh, I always wondered why programming languages weren't translated.
I know CS is all about english, but at least the default builtin functions of programming languages could get translated (as well as APIs that care about themselves).
Like, I can't say I don't like it this way (since I'm a native english speaker), but I still wonder what if you could translate code.
Variables could cause problems (more work with translation or hard to understand if not translated). But still - programming languages have no declentions and syntax is simpler so it shouldn't even compare to "real" languages with regards to difficulty of implementation.
I'm German, and I would not want that. German grammar works differently in a way that makes programming a lot more awkward for some reason. Things like, ".forEach" would technically need three different spellings depending on the grammatical gender of the type of element that's in the collection it's called on. Of course you could just go with neuter and say it refers to the "items" in the collection, but that's just one of lots of small pieces of awkwardness that get stacked on top of each other when you try to translate languages and APIs. I really appreciate how much more straightforward that works with English.
Programs aren’t written by a single team of developers that speak the same language. You’d be calling a library by a Hungarian with additions from an Indian in a framework developed by Germans based on original work by Mexicans.
If no-one were forcing all of them to use English by only allowing English keywords, they’d name their variables and functions in their local language and cause mayhem to readability.
[Edit:] Even with all keywords being forced to English, there’s often half-localized code.
I can’t find the source right now, but I strongly believe that Steve McConnell has a section in one of his books where he quotes a function commented in French and asks, “Can you tell the pitfall the author is warning you about? It’s something about a NullPointerException”. McConnell then advises against local languages even in comments
Honestly it wouldn't even be that hard to release full translated versions of existing programming languages. Like Python in Punjabi or Kotlin in Chinese or something (both of which already support unicode variable/class/function names). Just have a lookup table to redefine each keyword and standard library name to one in that language, it can literally just be an additional translation layer above the compiler/interpreter that converts the code to the original English version.
It's honestly really surprising that non-English speakers have developed entirely new programming languages in their own language (unfortunately none of which are getting very widespread use even among speakers of that language), but the practice of simply translating a widely used and industry standard English programming language doesn't seem to be much of a thing.
If I ever make my own programming language, I'm probably going to bake multi-language support into the compiler. Just supply it with a lookup table of translated terms and the code in that language.
Inside of strings or comments or as an encoding is close to universal now, but for wide support for operators & variable names I would generally it isn’t. Some languages straight up do not support non ASCII like OCaml, others only support bicameral scripts like PureScript, but others like JavaScript can support Unicode for variable names but doesn’t support defining infix operators or uses Unicode for any existing operators. Raku is probably the most Unicode-friendly language, & some of the mathier ones like Agda as well.
Depends on the compiler, I'm pretty sure some versions of Borland shit themselves if you introduce an accent mark at the wrong time, much less support Unicode.
Because it supports Unicode as variable/class/function names and Unicode includes all the characters humans have ever used, even dead languages (I assume for historians to digitize ancient texts?)