Bajkę Jul 5, 2020 @ 1:24am
Would it be ok if I scrape the Steam forum?
I just have this personal project of mine to scrape the forum and analyze it in some way or another.

I looked this up and it seems that Steam is friendly with scrappers but I don't want to risk it. Is there something I should know before doing so?
< >
Showing 1-13 of 13 comments
MalikQayum Jul 5, 2020 @ 1:53am 
scraping is not advised, and generally frowned upon, however people still do it.
I also have a project that revolves around the group forums which is partially scraping data.
Bajkę Jul 5, 2020 @ 2:15am 
Originally posted by MalikQayum:
scraping is not advised, and generally frowned upon, however people still do it.
I also have a project that revolves around the group forums which is partially scraping data.
Figured as much
Karien Jul 5, 2020 @ 3:37am 
If you really want to do it, rate limit/throttle the speed the scraper bot. You don't want to basically 'ping' the servers constantly at a high rate. Keeping to more normal use rates should be alright.
Last edited by Karien; Jul 5, 2020 @ 3:39am
Bajkę Jul 5, 2020 @ 3:45am 
Originally posted by Karien:
If you really want to do it, rate limit/throttle the speed the scraper bot. You don't want to basically 'ping' the servers constantly at a high rate. Keeping to more normal use rates should be alright.
I will keep that in mind thank you
MalikQayum Jul 5, 2020 @ 6:46am 
just curious, what kind of data are you interested in, from the forums?
Sazzouu Jul 5, 2020 @ 6:56am 
Originally posted by MalikQayum:
scraping is not advised, and generally frowned upon, however people still do it.
I also have a project that revolves around the group forums which is partially scraping data.

Scraping is absolutely allowed. The only thing that Valve forbids int their service is automation (like market transactions, spamming, etc...)
Bajkę Jul 5, 2020 @ 7:01am 
Originally posted by MalikQayum:
just curious, what kind of data are you interested in, from the forums?
Initially it was mostly titles and locked threads which then I will do manual work on to determine trolls and what's not. Honestly, I was thinking of doing this manually since it wouldn't take long but I just wanted to test what I learned from these few months of learning python. Then I figured let me do some more work and analyze words used in each thread and each locked thread. I am also thinking of including mods comments if available into my "research."

It is just a stupid thing that I am thinking of doing. It is not really useful but I just wanted to analyze some game forums (not the popular ones).
Bajkę Jul 5, 2020 @ 7:03am 
Originally posted by BeatZ:
Originally posted by MalikQayum:
scraping is not advised, and generally frowned upon, however people still do it.
I also have a project that revolves around the group forums which is partially scraping data.

Scraping is absolutely allowed. The only thing that Valve forbids int their service is automation (like market transactions, spamming, etc...)
I actually thought of making a bot to oversee threads in case they get deleted but I guess this would not be allowed.
MalikQayum Jul 5, 2020 @ 7:15am 
Originally posted by BeatZ:
Originally posted by MalikQayum:
scraping is not advised, and generally frowned upon, however people still do it.
I also have a project that revolves around the group forums which is partially scraping data.

Scraping is absolutely allowed. The only thing that Valve forbids int their service is automation (like market transactions, spamming, etc...)
I'll keep this short;
I never said it was not allowed or allowed.
It is a general consensus that you do not scrape websites and that it is frowned upon, hence why you get temp ip bans more often than you would get from overstepping rate limits on apis.
now i have repeated myself twice if need be i can do it in deutsch as well, if you still don't understand what i wrote.
MalikQayum Jul 5, 2020 @ 7:22am 
Originally posted by 5e rules:
Originally posted by MalikQayum:
just curious, what kind of data are you interested in, from the forums?
Initially it was mostly titles and locked threads which then I will do manual work on to determine trolls and what's not. Honestly, I was thinking of doing this manually since it wouldn't take long but I just wanted to test what I learned from these few months of learning python. Then I figured let me do some more work and analyze words used in each thread and each locked thread. I am also thinking of including mods comments if available into my "research."

It is just a stupid thing that I am thinking of doing. It is not really useful but I just wanted to analyze some game forums (not the popular ones).
interesting but wouldn't it be quite a big task to keep up with a db of topics after a while?
as not every topic is instantly locked / deleted, it could be days, weeks, months or years. you would have to check the tbl for that change.

i did make something similar but again i don't actually update the tbl row or check all the topics for changes.

but i guess it is somewhat manageable if you do it smart i guess, getting 50 topics per page.
in short duration i can see this be manageable but once you db tbl hits more than 50k topics then you are going to have trouble i guess.

i just skimmed the discussions and just by a glance i can tell from the english forums alone that there are 1 million + discussions.

but ye if you do limit it to the 50k i guess you could get some good data.
Last edited by MalikQayum; Jul 5, 2020 @ 7:29am
Bajkę Jul 5, 2020 @ 7:47am 
Originally posted by MalikQayum:
Originally posted by 5e rules:
Initially it was mostly titles and locked threads which then I will do manual work on to determine trolls and what's not. Honestly, I was thinking of doing this manually since it wouldn't take long but I just wanted to test what I learned from these few months of learning python. Then I figured let me do some more work and analyze words used in each thread and each locked thread. I am also thinking of including mods comments if available into my "research."

It is just a stupid thing that I am thinking of doing. It is not really useful but I just wanted to analyze some game forums (not the popular ones).
interesting but wouldn't it be quite a big task to keep up with a db of topics after a while?
as not every topic is instantly locked / deleted, it could be days, weeks, months or years. you would have to check the tbl for that change.

i did make something similar but again i don't actually update the tbl row or check all the topics for changes.

but i guess it is somewhat manageable if you do it smart i guess, getting 50 topics per page.
in short duration i can see this be manageable but once you db tbl hits more than 50k topics then you are going to have trouble i guess.

i just skimmed the discussions and just by a glance i can tell from the english forums alone that there are 1 million + discussions.

but ye if you do limit it to the 50k i guess you could get some good data.

I wouldn't dare analyzing that much plus how would I tell which thread is a troll and which one is not.
MalikQayum Jul 5, 2020 @ 7:58am 
well you obviously wouldn't have to analyze all 50k topics, that would just be the limit i would set for the max records in my table, after which i would keep iterating over the records until the change happen (locked / deleted) then when i have an acceptable amount of locked / deleted topics (be it 10, 100, 1000) i would either resort to some of keyword or some basic text analyzing for content analyzing.
(since you use python and it has a wide community of contributors, i bet you can find something basic to help you with getting that.)
Last edited by MalikQayum; Jul 5, 2020 @ 8:02am
Bajkę Jul 5, 2020 @ 8:01am 
Originally posted by MalikQayum:
well you obviously wouldn't have to analyze all 50k topics, that would just be the limit i would set for the max records in my table, after which i would keep iterating over the records until the change happen (locked / deleted) then when i have an acceptable amount of locked / deleted topics (be it 10, 100, 1000) i would either resort to some of keyword or some basic ai for content analyzing.
(since you use python and it has a wide community of contributors, i bet you can find something basic to help you with getting that.)
I am thinking of learning more about AI when I get into uni. Right now, I just want to know the basics. AI is intimidating and takes a lot of time to get it done right. Anyway great advice, thank you a lot.
< >
Showing 1-13 of 13 comments
Per page: 1530 50

Date Posted: Jul 5, 2020 @ 1:24am
Posts: 13