A critique of the scoring system :: Turing Complete General Discussions

Store Page

Turing Complete

All Discussions Screenshots Artwork Broadcasts Videos News Guides Reviews

Turing Complete > General Discussions > Topic Details

<NO_NAME>

Aug 11, 2023 @ 6:47pm

A critique of the scoring system

Someone needs to say it: The current scoring system is a nonsense.
All the values (gate cost, delay, ticks) are just added together and that's it. This can more or less work only if all the values are in the same order of magnitude. This is not always the case and when it isn't it creates bizarre scoring.

E.g: I've created some solution that uses 1000 gates but has a vary small delay of 20. Next, I've created an improved version that uses 1100 gates but the delay is reduced to an insanely small value of 10. (This is only 10% more gates for 50% less delay, which is insanely good.) The score for the new solution will be 90 points worse than the old one; the game will still use the old solution. Moreover, someone who's solutions has 500 gates and 500 (sic!) delay will be higher on the scoring board. I rest my case.

A much better way to calculate that would be to multiply the values.
Here a comparison:

Gates

Delay

Current scoring

Multiplication-based scoring

1000

1020

20000

1100

1110

11000

500

1000

250000

The problem is even worse in the programming challenges where number of ticks is usually order of magnitude lower than the other numbers, which makes it nothing more than irrelevant.

It would also be helpful to also have separate leader boards for each criteria, like in Zachatronics games. I've made a solution with a very small delay today and I wanted to check if anyone managed to have even less but I couldn't because the top of the leader board was dominated by solutions optimized for smaller number of gates.

Last edited by <NO_NAME>; Aug 13, 2023 @ 3:41am

< >

Showing 1-15 of 17 comments

MegaIng

Aug 11, 2023 @ 7:20pm

The server does already track each site individually and it you check these leaderboards you can see them: https://turingcomplete.win

Just multipling all three numbers had the drawback of creating absurd numbers very quickly.

The gate and delay score are actually scaled in a way so they relate to each other, that's why the white of the basic gates is 1/2. This stops making sense once you get to architectures, but the current system does produce nice tradeoffs in component levels.

For architectures, there is also the time score. The gate + delay * tick would probably make the most sense.

But saying the current system is nonsense is a bit extreme. Many people have spend quite a bit of time trying to find a clearly better system, but we couldn't come up with anything that stays easy to understand with tradeoffs we wanted to encourage.

<NO_NAME>

Aug 12, 2023 @ 1:45am

Originally posted by MegaIng:
The server does already track each site individually and it you check these leaderboards you can see them: https://turingcomplete.win

It doesn't really. It only stores the one result that is consider as the best by the game.
Here a real-world example. My first attempt to make the "The product of nibbles" was the standard boring 4 shifts and 3 adders that in my case has 478 gate score and 46 delay score. After that I've made a solution optimized for speed that has 2382 gate score but only 12 delay score. The game didn't acknowledge my effort and the new solution cannot be found on the leaderboard.

Originally posted by MegaIng:
Just multipling all three numbers had the drawback of creating absurd numbers very quickly.

I don't know what is wrong with big numbers. Players don't care about exact number either way. The important thing is which one is bigger which can be still compared rather easily. (See the table in my previous post.)
If it is really important to keep the numbers readable, there are ways to do that without changing the scoring system. There is the scientific notation which could probably be understood by default by an average player of this game. Or the score could be a logarithm of the product of the multiplication (which to me is not a real change from just multiplication because it doesn't affect the relative order of scores).

Originally posted by MegaIng:
The gate and delay score are actually scaled in a way so they relate to each other, that's why the white of the basic gates is 1/2. This stops making sense once you get to architectures, but the current system does produce nice tradeoffs in component levels.

I would need some examples of that tradeoffs to really say anything about them.

Originally posted by MegaIng:
For architectures, there is also the time score. The gate + delay * tick would probably make the most sense.

I agree that it would be better than the current solution but on the other hand this would still have the same problems that gate + delay has in component levels.

Originally posted by MegaIng:
But saying the current system is nonsense is a bit extreme. Many people have spend quite a bit of time trying to find a clearly better system, but we couldn't come up with anything that stays easy to understand with tradeoffs we wanted to encourage.

OK, maybe it isn't a total nonsense. I agree that it kinda works in early levels, although, this is partially caused by the fact that in those levels number of gates and delay are both pretty small which keeps them in the same order of magnitude; and also that those simple circuits tend to get faster when you find a solution with a smaller number of gates so there is not any real tradeoff at all. I can find examples when this scoring system chokes as early as the level "Counter".
I would argue that the multiplication approach would work at least as well in every case.

MegaIng

Aug 12, 2023 @ 4:32am

Originally posted by <NO_NAME>:
Originally posted by MegaIng:
The server does already track each site individually and it you check these leaderboards you can see them: https://turingcomplete.win
It doesn't really. ... The game didn't acknowledge my effort and the new solution cannot be found on the leaderboard.

The server really does. https://turingcomplete.win/#multiply;mode=delay There is your score in third place. I know this game better than you.

Originally posted by <NO_NAME>:
I don't know what is wrong with big numbers.

The problem is that it becomes umnagable to put any meaning to numbers that large (or even worse, to their logarithm). But I agree that this is not that important.

Originally posted by <NO_NAME>:
I would need some examples of that tradeoffs to really say anything about them.

See the adding Bytes level top scores in gate, delay and sum. Same for Product of Nibbles.

Originally posted by <NO_NAME>:
I agree that it would be better than the current solution but on the other hand this would still have the same problems that gate + delay has in component levels.

No? Not really? Like, you haven't convinced me that there is a problem. The fact that the best delay solution is not the best sum solution is a benefit.

Originally posted by <NO_NAME>:
I can find examples when this scoring system chokes as early as the level "Counter".

Really? I can't looking at the top scores https://turingcomplete.win/#tick_tock;mode=sum

pleegwat

Aug 12, 2023 @ 5:51am

I agree the 'added' score (which is the only one currently visible in the game itself) is suboptimal, but I do not think plain multiplying is an improvement. Rather, I think we need to keep track of two 'best' solutions per level - one based on gate count (= manufacturing cost) and one based on delay * ticks (= execution walltime).

I also think the costs of certain builtin components (specifically the RAM) make for bad gameplay. If your architecture includes a RAM, then that RAM completely dominates the delay score and heavily contributes to the gate score of that architecture. Yet, without RAM, it becomes impossible to solve certain levels (like the sorting one) in software.

MegaIng

Aug 12, 2023 @ 7:31am

Originally posted by pleegwat:
I also think the costs of certain builtin components (specifically the RAM) make for bad gameplay. If your architecture includes a RAM, then that RAM completely dominates the delay score and heavily contributes to the gate score of that architecture. Yet, without RAM, it becomes impossible to solve certain levels (like the sorting one) in software.

That is why the game provides you two other RAM components to solve this problem: FastRAM and LatencyRAM. This is supposed to show of IRL tradeoffs. What you are describing is a feature of the scoring system. The baseline RAM is supposed to be the simple to use one when you don't care about score.

SunCat

Aug 12, 2023 @ 1:30pm

consider this:
you have a 1100/10 solution. how many gates can you add to improve it to /8, and still be on top of the leaderboard?
in current score sum system, it's 1 gate to be better, and 2 gates to be tied - which means that the game encourages balancing the "cost" of the circuit (gates) and the "speed" of the circuit (delay), making sure neither of them is very large

if we take your score system, we can use 11000/8 = 1375 gates, until we are considered tied - aka we can add up to 275 gates in order to 'improve' this solution
this kind of thing encourages players to add tons of gates, in order to reduce delay by a little bit - and that's not the kind of leaderboard i would want to compete in, and probably no one else does, unless they personally value delay score a lot - in which case delay-specific leaderboard should be better

sidenote: 500/500 score is pretty unrealistic, 'cos you need to spend 250 gates just to get that delay number. whatever that circuit is, it's probably suboptimal, 'cos it should be easy to rearrange gates and get a lot lower delay
regular RAM score is also unrealistic in this way, but the reasoning there is to discourage people from building custom RAMs using individual registers =)

Last edited by SunCat; Aug 12, 2023 @ 2:07pm

<NO_NAME>

Aug 13, 2023 @ 2:19am

Originally posted by MegaIng:
Originally posted by <NO_NAME>:
It doesn't really. ... The game didn't acknowledge my effort and the new solution cannot be found on the leaderboard.

The server really does. https://turingcomplete.win/#multiply;mode=delay There is your score in third place. I know this game better than you.

No. That's not me. That's not my user name. The gate score is also different from what I wrote.

Originally posted by MegaIng:
Originally posted by <NO_NAME>:
I don't know what is wrong with big numbers.
The problem is that it becomes umnagable to put any meaning to numbers that large (or even worse, to their logarithm). But I agree that this is not that important.

If we calculate 1 / delay we have something akin to Hertz. (We don't know the unit of delay but it will be proportional to Hertz.) So, to calculate the cost of Hertz in gates, we do gates / value-of-hertz = gates / (1 / delay) = gates * delay. This result has meaning - it's the gates cost of hertz (in some arbitrary unit).
On the other hand, I cannot find any meaning in adding together two unrelated units of measurement.

Originally posted by MegaIng:
See the adding Bytes level top scores in gate, delay and sum. Same for Product of Nibbles.

Gate and delay scores wouldn't change. The general score board would have slightly different, and in my opinion much saner order.
Let's take the "Product of Nibbles" as the example because the order there would change more if we used multiplication and I gonna argue that all the changes would be for the better. For example, currently the 89 gates 20 delay solution is lower on the leaderboard than the 87 gates 22 delay. 22 is very hard to get under a 100 getes; only a few people managed it. Imagine how hard must it been to optimize it even further and go to 20 adding only 2 gates. (2% more gates for 9% speed gain) That must have been insanely hard, a great achievement, but the score still remain the same. That is not fair for this person and it is not a good balance either. There is more examples like that near on this leaderboard. (You may try argue that I'm putting delay at more important position that gates but no. The delay leaderboard is dominated by people with gates numbers in thousands. This would never reach the top of the general leaderboard if it used multiplication - the balancing would is still be there but the harder scores to achieve would be promoted, not some arbitrary chosen ones.)

Originally posted by MegaIng:
Originally posted by <NO_NAME>:
I can find examples when this scoring system chokes as early as the level "Counter".
Really? I can't looking at the top scores https://turingcomplete.win/#tick_tock;mode=sum

If anything this proves my point. (See the previous paragraph of this answer.)

<NO_NAME>

Aug 13, 2023 @ 2:40am

Originally posted by SunCat:
consider this:
you have a 1100/10 solution. how many gates can you add to improve it to /8, and still be on top of the leaderboard?
in current score sum system, it's 1 gate to be better, and 2 gates to be tied - which means that the game encourages balancing the "cost" of the circuit (gates) and the "speed" of the circuit (delay), making sure neither of them is very large

if we take your score system, we can use 11000/8 = 1375 gates, until we are considered tied - aka we can add up to 275 gates in order to 'improve' this solution
this kind of thing encourages players to add tons of gates, in order to reduce delay by a little bit - and that's not the kind of leaderboard i would want to compete in, and probably no one else does, unless they personally value delay score a lot - in which case delay-specific leaderboard should be better

Why would 275 gates be considered a lot when someone already has 11000? This is a huge circuit and adding those gates will be barely noticeable if you look at it. On the other hand delay 10 is tiny. You cannot tell me that decreasing it even more isn't a big deal.
I can tell you that these 275 wouldn't be a brute-force solution. Normally, when your circuit is already optimized that much for speed, the only option to make it even faster is to duplicate big parts of it and run things in parallel. It would take more than 275 gates to do that. 275 proves that the creator of the circuit found some really clever optimization. It would be very hard to do something like this. The leaderboard should promote the solution that require the most effort to achieve, not the ones that has some sum of numbers lower.
(See also my argument about "Product of Nibbles" in the previous post.)

Originally posted by SunCat:
sidenote: 500/500 score is pretty unrealistic, 'cos you need to spend 250 gates just to get that delay number. whatever that circuit is, it's probably suboptimal, 'cos it should be easy to rearrange gates and get a lot lower delay
regular RAM score is also unrealistic in this way, but the reasoning there is to discourage people from building custom RAMs using individual registers =)

Yeah, exactly, 500/500 is ridiculously bad. It's a solution of the type: let's just slap RAM on it because I am too lazy to figure out how to do it with registers. But the game still regards this approach higher than designing a large complicated circuit to reduce the delay to a really small number.

MegaIng

Aug 13, 2023 @ 2:40am

Originally posted by <NO_NAME>:
No. That's not me. That's not my user name. The gate score is also different from what I wrote.

Ok, sorry. Don't know your username (it isn't just NO_NAME), but my point was more that the server actually does track low delay scores. I don't know why it isn't uploading your score in particular, but the server does track the lowest score in each category.

Originally posted by <NO_NAME>:
If we calculate 1 / delay we have something akin to Hertz. (We don't know the unit of delay but it will be proportional to Hertz.) So, to calculate the cost of Hertz in gates, we do gates / value-of-hertz = gates / (1 / delay) = gates * delay. This result has meaning - it's the gates cost of hertz (in some arbitrary unit).

Uhm... What?Just putting random units together is not meaning.

Originally posted by <NO_NAME>:
On the other hand, I cannot find any meaning in adding together two unrelated units of measurement.

Fair, also doesn't really have meaning.

Originally posted by <NO_NAME>:
Gate and delay scores wouldn't change. The general score board would have slightly different, and in my opinion much saner order.
Let's take the "Product of Nibbles" as the example because the order there would change more if we used multiplication and I gonna argue that all the changes would be for the better. For example, currently the 89 gates 20 delay solution is lower on the leaderboard than the 87 gates 22 delay. 22 is very hard to get under a 100 getes; only a few people managed it. Imagine how hard must it been to optimize it even further and go to 20 adding only 2 gates. (2% more gates for 9% speed gain) That must have been insanely hard, a great achievement, but the score still remain the same.

Actually no. Most likely you can do that switch around with very little effort by replacing a critical OR or two with switches.

Originally posted by <NO_NAME>:
That is not fair for this person and it is not a good balance either. There is more examples like that near on this leaderboard. (You may try argue that I'm putting delay at more important position that gates but no. The delay leaderboard is dominated by people with gates numbers in thousands. This would never reach the top of the general leaderboard if it used multiplication - the balancing would is still be there but the harder scores to achieve would be promoted, not some arbitrary chosen ones.)

*the one you, without knowledge, choose to define as hard.

Originally posted by <NO_NAME>:
Originally posted by MegaIng:
Really? I can't looking at the top scores https://turingcomplete.win/#tick_tock;mode=sum
If anything this proves my point. (See the previous paragraph of this answer.)

Unsure what you mean here. For Counter, the tradeoff between delay and gate is very small. Unless you arbitrary define the 8 delay solution to be fundamentally better than the 10 delay solution (despite more than likely just being a switch-OR tradeoff), I doubt much will change. Product of Nibbles is a better example for your point.

MegaIng

Aug 13, 2023 @ 2:42am

Originally posted by <NO_NAME>:
Why would 275 gates be considered a lot when someone already has 11000?

Who already has 1100, you confused yourself by an order of magnitude. (not sure if that changes anything for you).

#10

SunCat

Aug 13, 2023 @ 3:03am

Originally posted by <NO_NAME>:
Why would 275 gates be considered a lot when someone already has 11000?

the original score was 1100 gates/10 delay, the 11000 is the multiplied result

#11

<NO_NAME>

Aug 13, 2023 @ 3:38am

Originally posted by MegaIng:
Actually no. Most likely you can do that switch around with very little effort by replacing a critical OR or two with switches.

If other players who clearly put a lot of effort into optimizing their solutions missed this, then it is not obvious and it should be worth some points.

Originally posted by MegaIng:
... For Counter, the tradeoff between delay and gate is very small. Unless you arbitrary define the 8 delay solution to be fundamentally better than the 10 delay solution (despite more than likely just being a switch-OR tradeoff), I doubt much will change. Product of Nibbles is a better example for your point.

You cannot with a straight face tell that 56/8 solution isn't better that the 51/10. 10 delay is already amazing; I cannot go below 14. 8 is insane and 5 gates more doesn't change that in the slightest.

You accuse me of being arbitrary but arbitrariness is exactly what I'm fighting against.
The gates + delay store is arbitrary. Gates * delay represents the gate cost per hertz and it is a logical way to assess the quality of a solution.

Originally posted by MegaIng:
Who already has 1100, you confused yourself by an order of magnitude. (not sure if that changes anything for you).

Originally posted by SunCat:
the original score was 1100 gates/10 delay, the 11000 is the multiplied result

You're right but no that's doesn't really change anything. I think that 275 is a fair cost for the insane achievement of decreasing delay from 10 to 8.
Let's compare this to the current system which says that 1103/8 solution is worse than 1100/10. This is obviously ridiculous.

Last edited by <NO_NAME>; Aug 13, 2023 @ 3:43am

#12

MegaIng

Aug 13, 2023 @ 3:52am

Originally posted by <NO_NAME>:
If other players who clearly put a lot of effort into optimizing their solutions missed this, then it is not obvious and it should be worth some points.

How do you know they missed this? The game only stores your first solution for sum. So if they found the /22 solution first, then found the /20 solution, the server doesn't update.

Originally posted by <NO_NAME>:
You cannot with a straight face tell that 56/8 solution isn't better that the 51/10. 10 delay is already amazing; I cannot go below 14. 8 is insane and 5 gates more doesn't change that in the slightest.

It is literally trivial to beat every level in /8 (one you have switches). That you can't do it isn't a factor for how hard it is. Sure, getting a good gate score for that is impressive. But so is trading 2 delay for 5 gates.

Originally posted by <NO_NAME>:
You accuse me of being arbitrary but arbitrariness is exactly what I'm fighting against.
The gates + delay store is arbitrary. Gates * delay represents the gate cost per hertz and it is a logical way to assess the quality of a solution.

It's two mostly arbitrary scoring system with different tradeoffs. You are valuing percentages arbitrary more than absolute values.

Originally posted by <NO_NAME>:
You're right but no that's doesn't really change anything. I think that 275 is a fair cost for the insane achievement of decreasing delay from 10 to 8.
Let's compare this to the current system which says that 1103/8 solution is worse than 1100/10. This is obviously ridiculous.

As I said above, literally every level can trivially be beaten in /8. And the order of magnitude between 1100 gate and 10 delay is so absurd that this can literally never happen in any level (for an actually optimized solution), unless you are bruteforcing. Also, "this is obviously ridiculous" is not an argument. I mostly disagree tbh. It's a bit weird, but not ridiculous. If you want to optimize for delay, go for it. And we will probably get a better scoring system at some point (maybe even the exact multiplication system you described here). But saying the current system is "nonsense" and "absurd" to me shows that you aren't that good at optimizing this game.

#13

<NO_NAME>

Aug 13, 2023 @ 4:49am

Originally posted by MegaIng:
Originally posted by <NO_NAME>:
If other players who clearly put a lot of effort into optimizing their solutions missed this, then it is not obvious and it should be worth some points.
How do you know they missed this? The game only stores your first solution for sum. So if they found the /22 solution first, then found the /20 solution, the server doesn't update.

I was talking hypotheticals about having the same solutions under the multiplication scoring. If someone got better score because of this method and others did not, then it deserves points. I agree that other players might modify their solutions if the scoring were different but this changes nothing in the context of my the argument.

Originally posted by MegaIng:
Originally posted by <NO_NAME>:
You cannot with a straight face tell that 56/8 solution isn't better that the 51/10. 10 delay is already amazing; I cannot go below 14. 8 is insane and 5 gates more doesn't change that in the slightest.
It is literally trivial to beat every level in /8 (one you have switches). That you can't do it isn't a factor for how hard it is. Sure, getting a good gate score for that is impressive. But so is trading 2 delay for 5 gates.

Well, obviously I wasn't discussing a 8 delay solution where the gate count is very high. It is impressive when the gate cost is relatively low and my proposed scoring represents exactly that. (It's describes the relation between gates and delay.)

Originally posted by MegaIng:
Originally posted by <NO_NAME>:
You accuse me of being arbitrary but arbitrariness is exactly what I'm fighting against.
The gates + delay store is arbitrary. Gates * delay represents the gate cost per hertz and it is a logical way to assess the quality of a solution.

It's two mostly arbitrary scoring system with different tradeoffs. You are valuing percentages arbitrary more than absolute values.

I'm valuing percentages because they can always be related to each other. Comparing absolute values works only if they are very close to each other which limits is usefulness to mostly the early levels.

Originally posted by MegaIng:
... And the order of magnitude between 1100 gate and 10 delay is so absurd that this can literally never happen in any level (for an actually optimized solution), unless you are bruteforcing. ...

Of course, the example was exaggerated to show the problem more clearly. I've already shown a real-life case where I think the system does not work. (the 87/22 vs 89/20 in "The product of Nibbles")

Originally posted by MegaIng:
... It's a bit weird, but not ridiculous. If you want to optimize for delay, go for it. ...

Optimization purely for delay is an entirely different thing and you can see clearly by looking at leaderbords of (again) "Product of Nibbles". This discussion is about the general score - the one used by the game and the official leaderboard.

Originally posted by MegaIng:
... But saying the current system is "nonsense" and "absurd" to me shows that you aren't that good at optimizing this game.

I'm fairly good at that, thank you; please, do not lower yourself to personal attacks.
The problem is not that I'm unable to get a good score. The problem is that to do that I have to do things that would make no sense in the real world. This is supposed to be an educational game and that kind of things annoys me.
Beside that, I believe that trying to minimize the cost of a hertz makes for a better challenge, even if we asses it purely at a game terms. The current scoring just tells you to stop optimizing at some arbitrary point where the absolute number of gates is too high compared to the delay. After this point, even if you have some ideas for clever optimizations, the game will only punish you for them.

#14

MegaIng

Aug 13, 2023 @ 5:08am

Ok, I think we have both said our pieces and there is nothing more I can add to convince you, and you haven't convinced me either. I don't see any value in continuing this discussion.

#15

< >

Showing 1-15 of 17 comments

Per page: 1530 50

Turing Complete > General Discussions > Topic Details

Date Posted: Aug 11, 2023 @ 6:47pm

Posts: 17

Discussions Rules and Guidelines

Turing Complete

Report this post