WDL and centipawn transition everywhere notes

2019-2021. LCO had to create own conversion curve from its intrinsic design using WDL outcome as the only chess value being used as environment (chess world) feedback into its playing model of chess (evaluation and policy) during training batches.

It has to comply with UCI standard which are direct transmission of previously exisiting engine tournament uniververse formats and categories, based of engine with centipawns as the main currency.

Meanwhile lichess was also using such currency for a completly different purpose (or was it) to modulate its human per game error time series so that centipawn errors would be weighted somehow with winning odds before being made subjective feedback to work on as "learn from your errors"

Blogs happened, about centipawns getting a old currency ... no really NNue happenned.. And its information loop using istelf (SF) to train new NNues to mimis itself. (and the begginning of urban legen that SF was doing RL at all was born, for that informatin feedback loop that one could definitely try, I would wonde about convergene (empirically), and also dependency of intput training data set, and output traget training data set (that which is rarely being explained clearly , save some SF12 Blog, about a moderate search (the is was SF classical at first such loop iteration).. SF16 is somehow using "leelas data".....

The use of WDL to centipawns or vice-versa fitted curves of various signmoid families (different algebraic fourmalations, and interpretations perhaps).

SF has made a repo about it. LC0 had a long time ago done similar work in the other direction. (as I started this post).

but now, some new ideas about using WDL statistics instead of centipawns to analyses games are being proposed.

So, I think this is not a fluke or sometthing that will disappear. I might start gathering links to read further.. and analysis in my own from afar tendencies of approach...

dboing

Why does SF keep its "reinforcement learning" confusing terminology beats me... I know precious words that were not code were written once, when NN and ML was new stuff for SF devs.... But those words are still there, and do not only confuse humans, they confuse language machines (AI) when asked about SF NNue training methods.. please spare those machines at least.. They might get a bad reputation eventually. Other mystery: Komodo or Dragon usage of the words Reinforcement learning and MCTS. To what extent is that true, and what kind of self-play scheduling compared to LC0. They can put such words in documentation or web site blurb, but that is not open source code. LC0 might want to check, but my real interest is the logic of it. I hope the documentation there is more self contained.

so here. I put stuff related to WDL, chess and engine. first that blog that just came out. I want to understand the color code..
and how it complements the centipawn game profile. I like this effort, wherever it managed to progress.

As a lot of chess insight on lichess (and probably other online services or sites) rely on the notion of SF per position scoring, and their differences. Although SF is also globallly being optimized loosely on outcome through its own fish"test" pipeline and other tournaments ELO across releases generations, the way position difficulty based on SF score per position was using centipawn scores that flew off the handle from SF12 to SF15.

I suspect that docking moderate depth further evaluation with prevous input depth evaluations might have had some effect of how to compare them adequately, hyperparameter of such combined optimization of different depth order of magnitudes (NNue evaluation on its input being fitted to some SF unknown depth search, itself might be another hyperparameter, possibly fishtest optmized, me guessing from ML education in the past, and other optimization notions).

One thing that classical SF had going for it, was its clearly being dominated by 1,3,3,5,9 rule of thumb some input depth from root being scored. Now, with this probable ramping from mixing near learf evaluations with deeper leafs evaluations, but even if not the reason, the empirical fact that the maximal scores kept going to high value, seem to have brought a need for a stable maximal range of scoring.

So, now we need to review a bunch of past human targeting analytics gismos, where centipawns were creeping in all sorts of places, without any one flinching ever (It seems so, hoping i have been wrong all that time).

dboing

Using LC0's WDL to Analyze Games
lichess.org/@/jk_182/blog/using-lc0s-wdl-to-analyze-games/spKmwjw5
lichess.org/forum/community-blog-discussions/ublog-spKmwjw5

from the blog content, backrefs:
lczero.org/blog/2020/04/wdl-head/

lczero.org/blog/2020/04/wdl-head/game53.svg

www.chess-journal.com/evaluatingSharpness1.html
www.chess-journal.com/evaluatingSharpness2.html

twitter.com/LeelaChessZero/status/1637763896596463616
@LeelaChessZero
> We are experimenting trying to detect sharpness inside the engine itself too.
> The current formula that we look into is 2 / (log(1/L - 1) + log(1/W - 1)).
6:31 AM · Mar 20, 2023

dboing

Also find the links to the new repositories that SF created for its transition of WDL point of view. and the new wiki for NNue, trying to find the courage to see if they finally gave the respect that chess deserve in clarifying half of the training data vector, the target out ... more than one sibyline sentence as in SF12, never reproduced elsewhere. but with the other sibyline phrase "using leela's data".

find real komodo dragaon explaination of how an exhaustibe searhc based engine is managing using actual RL training with self-play produced games or should I say deep RL to give room to the reentrant iterative NNue training based on some SL scheme, with mysterious target data vector. (it is embedded in genfen still, that is what the nodchip repo still provides, and AI would still relate in their confusion...).

also lichess own improved win ratio explanations for this "learn from you rmistake" SF score transformation.

also . maia most massive initia work on such curves, with the hopefullly full reproducibility standards of publication, if they did not drink the engine working good enough, no need to analyse it further.. figure 11.... find its raw data..... recreate of find the error bars.... but it had a more complete model,, still mystery about the definitoin of position difficulty... using centipawn.... and how to relate to WDL for the games containing the position.....

dboing

edited

lichess.org/forum/community-blog-discussions/ublog-spKmwjw5?page=3#28

this is musing quality.. (i.e. one needs to want to read this kind of stuff).
also, the blog is about sharpness, which this is not about (the ideas about having some sense of neutral moves or neutral position. To have some idea of a move being better than another, not just all mistakes. Might allow a better mental disposition (sounds irrational?, well learning is an emotional thing, it might be imperceptible at times, but memory and emotions, even in mathematics or science, or anything, there is an iota of emotional glue to be found somewhere, yes emotions can have amplitude and shades and qualities and aspect etc..., but I am addicted to eureka after question in never ending tumble weed across some ambient space, let's say chess is in the scenery).

I think it would make progress more perceptible, sometimes small steps allow long strides later.
======================================================================================
Did anyone remind the following: (I would say duh! for my own thinking).

W+D+L = 1 = 0.999999999... = 100% = 99.999999.... So knowing 1, W and L we know D. and other 3 in 4 among (1, W, D, L).

And that, hum hum, which took me a while to understand (like today looking at the 3 color data viz. figures for the first time with attention), by "more information" the blog might have been referring to having 2 dimensions among the W,D,L 3 numbers, rather than just 1 number with the centipawns. No need to invoke having human extra information embedded in the conversion curves, like I did. (specially, now that we know only lichess using that, I hope, not SF anymore).

Off the handle:
Who was looking for good enough move notions? not just getting stuck on the ceiling like dried noodles? and thinking it was the ultimate pedagogical referential (learning from mistakes, but all my moves are mistakes..... these might be just words, but to exorcise them one has to have done some work to look at it as just a mathematical convention, some successes or progress made, and given the usual or apparently common sense chess ambitions about rating ladders, I think it might take a lot of time, and lost of negative reinforcement proportion, or coach human compensatory lexicon). Weird words sorry. but somehow they are precise to me, by being a bit clinical.

Idea. Isn't having W D and L like that a more balanced way to measure quality of move even. Despite the fact we have lost the material rule of thumb we could have used for interpreting the centipawn looking score. Well, even decoupled, it seems it is still around... But my point, is that lichess may have been flirting with that when modulating SF tunnel vision score into qualitative errors categories. But why not use that directly.. no more compromise, or better:

It may take some material counting decomposition still though (shouldn't SF decompose its own score that way, perhaps with the PV leaf where the leaf evalu components can be decomposed.. into simple eval, NN1, NN2, and the ELO worth input feature values (or activations). brainstorming with my crumbs. For people to fully interpret in their chess study or need for feedback from SF the gladiator turned detective (without really having passed a rigorous examination for that job, given the thirsty mob, demanding its truth, like an FDA approval bypass).

dboing

edited

doubt about post in forum. i made.

WDL-centipawn conversion curves and SF exhaustive search. The point I should have emphasized is that if we think of the PV as a human well the depth of the position in relation to WDL odds might not matter so much (IDK, but it seems to me that as human the position itself is enough to attribute a winning odds assuming best play continuation, which is not human but as noodle stuck on the ceiling we can agree on it).

but exhaustive search: it is about weak (getting stronger) leaf evaluations over large breadth of leaves in a partial subtree from root.

a set of many leaf evaluations, some of which are of different magnitude of bulk depth.

those that have the simple eval only (if still possible) and those** that receive the NN treatments. Those represent different leaf depths of search (NN has been trained with moderate depth search to fit and generalize well the SF score itself, not the leaf evaluation, but a partial tree of moderate depth search further than the input position of the NN). So the problem is not within the one branch that uses such a leaf eval at its tip, but the set of all types of leaves, those with early score and those with deeper same score value, but representing deeper position. How do they compare really. I made more abstract version of this question in other threads before. but I just tried to boil down to the essence of my dissonance or trouble (it could just be me again).

** hmm, maybe they all get that NN treatment now, during training? also? and parameter tuning, which I view has hyperparameter in the NN outer loop optimization, while encastré inside whole SF-dev. parameter search (grid? don't care much, I assume done well by now, they seem to have had their global optimization, albeit restricted to some parameter subset)

my readings were mostly around SF12 period, when I had some energy for it, and punctual revisit to notice little change, until SF16 ish maybe before, disservin or language machine might have changed something, i.e. there is a repo wiki now, see other thread here about NNue stuff or I buried it..

dboing

#10

stuff gone to lichess.org/forum/team-dboing-musings-echoes/wdl-stuff

dboing

#11

ok. I need to figure out why a sharpness defition proposal would also be a sigmoid function of some W odds... (or vice versa, i tend to flip things around when not having my eye of them for real, mind's eye, likes to keep only the relationships, STLT).
www.chess-journal.com/assets/img/sharpFunction.png
en.wikipedia.org/wiki/Sigmoid_function

dboing

lczero.org/blog/2023/07/the-lc0-v0.30.0-wdl-rescale/contempt-implementation/
I will forget for now the choice of word about contempt (not a very clinical choice of words I would say, likely to lead astray, while like the word "chaos" in chaos theory, might have been a good attention grabber, it might not be about the crux one could see better from their own trajectory into the matter.

Why I take note of that here is for the following:

> Accompanying this feature, a new score type “WDL_mu” has been added, which follows the newly adopted convention of +1.00 meaning white has a 50% chance to win a game, thus finally making Lc0 and Stockfish evals directly comparable.

and even more the following. I had been wondering about how people not best players, who had at least the material conversion, rather objective convention of evaluation to use sparsely in their mind's eye when planning (they might not know they are planning, let's call it foreseeing, like B. Franklin)

> Notably, this is also approximately the value of (removing) a white center pawn or a black edge pawn from the standard starting position; white edge pawns are evaluated a bit less important, black center pawns a bit more. Furthermore the drawscore adjustment has been simplified into a single parameter which is applied from white’s perspective, where 0.0 will mean regular chess scoring, and -1.0 will give Armageddon scoring.

I would not get distracted by words like Armageddon, it might make one smile, but it does not explain much (or does it, leaving room to be wrong, as it is hygienic, mentally, to do so).

It kind of quiets down my problem with conversion curve and depth of the position being plotted there... (although I could argue that is very tight cloud and high rating limit, it might be in the position itself (the depth, or very grouped cluster of points for each dependent variable point). Taking the initial position as referential. We would all have some emerging notion of white initial turn advantage, so it might be a good psychological anchor were to give some sense of what can WDL stuff do for you.

Taking away the fog of depth at material conversion. There was never any certainty in the exhaustive tree searches that all positional potential advantage had to be converted into a material one, before a mate, which might be considered both "material" final conversion, or purely positional final restriction of the king. I prefer that last one, as even if the ruleset was about capturing the king, the game ends there.. and it is still one 3 outcome.. W,D,L. an almost trivial ordering. ( W,L would be the trivial binary ordering).

Join the Dboing's Musings team, to post in this forum