The grumpy serialisation format
The grumpy serialisation format
#Unicode is one of those little things in life that I can't help but smile about.
Is it perfect? No, of course not. Is it better than the alternative? Yes, so much so that every time I'm confronted with a long list of character encodings I can choose from, I feel a sense of relief when I find #UTF8 among them.
I wouldn't have thought it possible to standardize a single character encoding for everyone, and yet, somehow, there is just such a standard.
Imutin kaikki #Facebook'in julkaisuni – ainakin jos #Meta'a uskotaan. Pyysin #JSON-muodossa toivossa, että tulisi sutjakammin. Hieman ongelmia aiheutti JSONin koodaus: merkkijonot ovat validia #UTF8:aa mutta JSON ilmeisesti olettaa #UTF16:n, joten vaaditaan mukamuunnos eestaas; apua löytyi #StackOverflow’sta. Aikaleimat sentään olivat standardi-#POSIX’ia.
En tiedä, kuinka täydellinen ”arkisto” on, mutta ainakin jotakin saisi talteen, kun lähtee lätkimään. #some #atkjuttuja
Hey everyone. I must admit, I don't believe I have ever seen someone enter #utf8 #unicode characters on a #computer in a natural way. Which seems weird, because a bunch of languages use them.
I wrote a #commonLisp #asdf package that just looks up a list of symbols in a file that has every non-surrogate unicode codepoint in it, and an #emacs #elisp function that just calls the #lisp one.
https://codeberg.org/tfw/unicode-chars
Multilingual people, what can you tell me about doing this at all?
In #UTF8 gitb es das Symbol ⍼.
Es heißt Angzarr, eine Art Ligatur aus einem L und einem Blitz.
Niemand weiß so richtig, warum es in UTF8 enthalten ist, aber in diesem Dok. ist am Ende ein Brief vom AMS-Präsidenten!
https://www.unicode.org/wg2/docs/n2191.pdf
Also wofür steht es??
Wrong answers only
#UserAgent based banning of #textmode browsers is sooooo lame.
$ lynx -useragent=
Why does this PHP construct:
normalizer_normalize( $search_string, \Normalizer::FORM_D );
Convert ÖÖÖ to OOO, but keeps ÅÅÅ as ÅÅÅ ... WTF?!
Glyphe surprenant du jour : ₣, symbole monétaire du Franc français (!).
Rendu par "un F majuscule doublement barré, qui a été proposé par Édouard Balladur en 1988, une ligature Fr ou d’autres variantes.
Selon Yannis Haralambous en 2004, ce symbole n’a jamais été utilisé"
https://fr.wikipedia.org/wiki/%E2%82%A3
#typo #UTF8
'worst-fit' attacks are the latest iteration of the classic "let's guess what the user wants" idea. This has always lead to issues down the line.
It will be really hard to reason about and fix for apps that rely on the affected Windows APIs.
https://blog.orange.tw/posts/2025-01-worstfit-unveiling-hidden-transformers-in-windows-ansi/
If you want a deep dive on the underlying mechanicss of these types of attacks, check out my colleague's blog post from a couple months ago: https://herolab.usd.de/en/the-security-risks-of-overlong-utf-8-encodings/
@fanf it's absolutely fantastic. and if, like me, you want to see for yourself that it actually behaves correctly and in which sense of correct, here's something which might help: https://fosstodon.org/@slink/111019934949852505 #utf8
From 2003, but still very helpful to me: "The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)" https://www.joelonsoftware.com/2003/10/08/the-absolute-minimum-every-software-developer-absolutely-positively-must-know-about-unicode-and-character-sets-no-excuses/ #programming #utf8 #internationalization
wow i think i actually managed to get utf8 working in my VM? i used `setlocale(LC_ALL, ".UTF8")` and then print the UTF32 char 0x0107 ć from my program. the interpreter uses c32rtomb to convert it to UTF8 and then i fwrite that to stdout
but the character doesn't show up in the windows console cause windows is bad. but i figured out a workaround! i put this in my run.bat file:
set run=./bismuth.exe prog.bst
powershell $OutputEncoding = [Console]::InputEncoding = [Console]::OutputEncoding = New-Object System.Text.UTF8Encoding; %run%
and it runs the VM through powershell with powershell forced to use UTF8 and then it works somehow, yay! more bizarrely, it doesn't show me the powershell prompt but the regular command prompt, but it works anyway? idk why but i'll take the win!
also i have no idea if any of this is the right way to do things so, yanno, don't take this as a tutorial or anything
TIL UTF8 names in the author-email field of Python packages are not supported by warehouse (PyPI), and warehouse waits for the PEP/standard to be clarified before moving forward.
- GitHub issue: https://github.com/pypi/warehouse/issues/16496
- Discussion: https://discuss.python.org/t/core-metadata-email-fields-unicode/7421