Sigma Female Mar 22, 2024 @ 6:23am
Comment limit of 1000 characters broken?
Trying to post a comment that's 864 characters long and Steam won't let me, I get an error message. No emojis or anything.
Originally posted by metamec:
Latin characters occupy 1 byte.
Cyrillic characters occupy 2 bytes.

> new TextEncoder().encode('Й').length; // Cyrillic Й <. 2 > new TextEncoder().encode('ё').length; // Cyrillic ё <. 2 > new TextEncoder().encode('N').length; // Latin N <. 1 > new TextEncoder().encode('здрав').length; <. 10 > new TextEncoder().encode('hello').length; <. 5

You can verify the above in the console (ctrl+shift+i) of your internet browser.

TLDR: If your comment is entirely in Cyrillic then your new limit is 500 characters.

Edit: Actually, I just realised that it will be slightly higher than 500 because spaces occupy only 1 byte. It's always the same space character, regardless of script.
< >
Showing 1-5 of 5 comments
nullable Mar 22, 2024 @ 6:33am 
It's hard to say without seeing the content. The count could be broken, but it's possible there's some details you're ignoring.

Also if you're using a lot of unicode characters there's times depending on how the clientside versus serverside validation works where you might get disagreeing validation results. But seeing the content would help clarify that.
Ettanin Mar 22, 2024 @ 6:49am 
Note that characters that are not of the ASCII space such as ä ö ü ß are still subject to multi-byte mapping in UTF-8 and therefore count as multiple characters.
Sigma Female Mar 22, 2024 @ 7:17am 
Originally posted by Ettanin:
Note that characters that are not of the ASCII space such as ä ö ü ß are still subject to multi-byte mapping in UTF-8 and therefore count as multiple characters.
I wrote the entire thing in Cyrillic letters, maybe that's the case?
Ettanin Mar 22, 2024 @ 7:27am 
Originally posted by Mistress Yaoi:
Originally posted by Ettanin:
Note that characters that are not of the ASCII space such as ä ö ü ß are still subject to multi-byte mapping in UTF-8 and therefore count as multiple characters.
I wrote the entire thing in Cyrillic letters, maybe that's the case?
yes, cyrillic characters are mapped using multiple bytes. The english Wikipedia has a good explanation on how this mapping with non-ASCII characters works.
The author of this thread has indicated that this post answers the original topic.
metamec Mar 22, 2024 @ 8:40am 
Latin characters occupy 1 byte.
Cyrillic characters occupy 2 bytes.

> new TextEncoder().encode('Й').length; // Cyrillic Й <. 2 > new TextEncoder().encode('ё').length; // Cyrillic ё <. 2 > new TextEncoder().encode('N').length; // Latin N <. 1 > new TextEncoder().encode('здрав').length; <. 10 > new TextEncoder().encode('hello').length; <. 5

You can verify the above in the console (ctrl+shift+i) of your internet browser.

TLDR: If your comment is entirely in Cyrillic then your new limit is 500 characters.

Edit: Actually, I just realised that it will be slightly higher than 500 because spaces occupy only 1 byte. It's always the same space character, regardless of script.
Last edited by metamec; Mar 22, 2024 @ 8:50am
< >
Showing 1-5 of 5 comments
Per page: 1530 50

Date Posted: Mar 22, 2024 @ 6:23am
Posts: 5