Smarten up the profanity filter: Base64 detection
We've just run into some trouble with the profanity filter in the Factorio forums. The stupid thing is modifying blueprint strings, randering them useless for use in-game. The strings are Base64-encoded, and just by sheer dumb luck have eneded up including something the filter doesn't like.

Please smarten it up to recognise Base64-encoded text and go to sleep when it hits it.
< >
Сообщения 115 из 23
Автор сообщения: Roxor128
Please smarten it up to recognise Base64-encoded text and go to sleep when it hits it.
Or in other words, ignore every word that is a multiple of 4 letters long?
THe quoted text doesn't even have any headers how is it supposed to figure out its base64. You can't actually tell its base64 encoded. All base64 implemenations just 'assume' the input stream is binary. There is no way to 'detect' it is base64 because the ouput from the stream isnt readable nor has standard headers nor would be detected as base64 either.

THe forums aren't there to do massive text dumps of 'blueprints'
Отредактировано Satoru; 2 июн. 2017 г. в 20:06
Автор сообщения: Satoru
THe quoted text doesn't even have any headers how is it supposed to figure out its base64. You can't actually tell its base64 encoded. All base64 implemenations just 'assume' the input stream is binary. There is no way to 'detect' it is base64 because the ouput from the stream isnt readable nor has standard headers nor would be detected as base64 either.

You could use a rough rule like "ten characters without a space not forming a recognised word" and then assume the rest of the block (which is going to be a lot longer than that) is Base64. Wouldn't be perfect, but would probably be close-enough.

THe forums aren't there to do massive text dumps of 'blueprints'

Good luck telling that to the Factorio player base! Every day sees at least one post with a blueprint string in it just on the Steam forums, and many more on the devs' official ones.
Автор сообщения: Roxor128
You could use a rough rule like "ten characters without a space not forming a recognised word" and then assume the rest of the block (which is going to be a lot longer than that) is Base64. Wouldn't be perfect, but would probably be close-enough.

...how would you account for non-english words? These are multi-national forums after all.
this is why i suggested pastebin for this. and just paste the link in the post.

:qr:
Отредактировано cSg|mc-Hotsauce; 2 июн. 2017 г. в 21:44
Just have it skip code tags.
Автор сообщения: Spawn of Totoro
Автор сообщения: Roxor128
You could use a rough rule like "ten characters without a space not forming a recognised word" and then assume the rest of the block (which is going to be a lot longer than that) is Base64. Wouldn't be perfect, but would probably be close-enough.

...how would you account for non-english words? These are multi-national forums after all.
I never said a "recognised word" had to be an English one, and I did say it wasn't a perfect solution.

EDIT: Removed section found to be wrong.
Отредактировано Roxor128; 3 июн. 2017 г. в 1:33
Автор сообщения: Roxor128
I never said a "recognised word" had to be an English one, and I did say it wasn't a perfect solution.

EDIT: Removed section found to be wrong.

Can't realy compensate for all the languages out there though. Currently there are 6909 languages in the world.

http://www.education.rec.ri.cmu.edu/fire/naclo/pages/Ling/Fact/num-languages.html

That would be a huge database that needed to be searched for a single word.

I see it as an unrealistic solution that would cause more harm then good in the forums. People will always find a way to bypass the filter. All one can do is block the direct spelling as much as possible. After that, it is up to the human factor to make a judgement call.
Автор сообщения: Roxor128
You could use a rough rule like "ten characters without a space not forming a recognised word" and then assume the rest of the block (which is going to be a lot longer than that) is Base64. Wouldn't be perfect, but would probably be close-enough.

That coudl be literally ANYTHING. What if I use unicode. What if I use another language. What if I use "l33t". What if I just mispell a word. If you do not have a standardized header for detection the filter isn't going to find it.

It would also mean that you could put in this 'header' and then bypass all the URL/language filters. This would result in massive amounts of abuse to bypass both the language and the URL filters for scams

Good luck telling that to the Factorio player base! Every day sees at least one post with a blueprint string in it just on the Steam forums, and many more on the devs' official ones.

And maybe the community should try to find better ways to share such information

Maybe the devs should implement the workshop which makes sharing such things easy
Отредактировано Satoru; 3 июн. 2017 г. в 6:19
so, no to the pastebin option?

:qr:
Автор сообщения: Gus the Crocodile
Just have it skip code tags.

That would bypass all filters

Which means not only would every troll on the forums use it to bypass the language filter but scammers would immediately use it to bypass the URL filters
Автор сообщения: Satoru
That would bypass all filters
That's what I said, yes.

Bypassing the language filter is already trivial. Anyone who wants to bypass it will bypass it; one more way to do that isn't the end of the world. We have moderators to handle behaviour that there isn't a magic filter for.

As for URLs, there's zero chance of a well-formed URL randomly appearing in base64 (let alone one that's on Steam's blacklist), meaning there'd be no harm running a link filter over these blocks for security reasons. Just another way in which it's a silly, limiting idea for Steam to treat morality filters and security filters as one big conglomerate.
Автор сообщения: Satoru
That coudl be literally ANYTHING. What if I use unicode. What if I use another language. What if I use "l33t". What if I just mispell a word.

You do realise that's an argument against having a profanity filter at all, right?

Besides, Base64 only uses 7-bit ASCII characters, so that already eliminates a large chunk of the world's languages.

If you do not have a standardized header for detection the filter isn't going to find it.

You're in luck. I found out just yesterday that Factorio blueprint strings from v0.15 all start with a zero.

Good luck telling that to the Factorio player base! Every day sees at least one post with a blueprint string in it just on the Steam forums, and many more on the devs' official ones.

And maybe the community should try to find better ways to share such information[/quote]

Blueprint strings are designed to be easily posted in forums and such. That's why they use Base64 encoding. Valve is just sabotaging that by using an overzealous profanity filter.

Maybe the devs should implement the workshop which makes sharing such things easy

Not this again... The devs have said multiple times that they will not be supporting Steam Workshop because it's not portable. They have players who bought the game via other means and using the workshop would lock them out.

Honestly, it'd probably be easier to just make the profanity filter a client-side script the user can set to filter whatever they want and supply it with the current database as a default.
Yes... so because developer x chose to make it share in that way, valve should have to waste their dev time to accommodate it, even though those developers specifically choose NOT to use the no cost assets available to them to make it a non-issue.
< >
Сообщения 115 из 23
Показывать на странице: 1530 50

Дата создания: 2 июн. 2017 г. в 19:05
Сообщений: 23