Just curious. Because i value your opinion.
From the two point form AI responses i did provide what your gauge on it and its accuracy now knowing the process?
Also what other metrics / methods should i try and apply to make it more accurate?
I'd be guessing here, so keep that in mind...
Based on your explanation of how you're getting data and what seems to be a good understanding of how AI and prompts work, I think your accuracy is probably fairly close to as good as possible. But I think "as good as possible" is still not necessarily accurate just due to lack of data. As you said, likely much more than 75% don't post about their experience with the game, especially if they like it. Even with me, I post reviews for some games and not for others. Sometimes I'll post for a game I like, sometimes for one I don't like, and sometimes for one that I find kind of average. But I don't post for every game I like or every game I don't like. And that is from someone who does write reviews. Most don't.
As you said, games are kind of an interesting situation because people can tend to post more because it is something they can get really hooked on. But even so, I think most still don't post. Of course, those include people who like and people who don't like the game. It is possible that the ratio of people who write reviews or post on forums is the same as the ratio of people who do not. As you said, the odds of that are very low, but it is possible. But if it's not the same, which way do they lean? Do more people who don't post like the game? Or do more not like it? I'm not sure there is any way to determine that other than to guess. That leaves the reliability of anything that talks about what the average person thinks about the game questionable, whether that's through the use of AI or some other method.
Of course, you can start to gain some statistical accuracy by increasing the number of data points. If we're talking about let's say 10% of players who post messages or reviews about the game, that's a low level of accuracy, but if you could increase that percentage through use of something like in-game polls that don't require writing anything, you could get a far more accurate result. I'm not saying they should put in-game polls, of course. It's just an example of getting more data to improve accuracy. There is also a general feeling of what percentage of people would be considered relatively accurate results for statistical data. I believe we're below that, at least for things like opinion of the game. But there may be ways to improve that accuracy.
One thing I'd bring up since you mentioned limiting your results to those within a certain range of hours... I don't think it is necessarily fair to ignore people new to the game. You'd be ignoring a large percentage of console players. And if someone likes the game because they haven't played older versions and so aren't stuck on "A16 is the best and it's been downhill ever since" nostalgia, their opinion is just as valid. I understand the idea that someone should have time to experience the game to know if they really like it. But I think people have a good feeling for a game in a much shorter time than 500 hours. After all, people normally don't play 500+ hours in most games even if they like the game. This game is of course different because of the strong replayability factor, but that isn't the norm. I think I'd suggest lowering that to at least 100 hours, if not even lower. For me, if I don't like a game, it would be strange to see me play it for more than 5 hours.
I'd actually be curious to see results that look into time played recently. It would be a hard set of data to really quantify, but if someone has played more than 10 hours in the past week, there is a good chance that they like the game, even if they maybe don't like certain things in the game. Basically, they like it more than they don't like it and so they are playing it. If someone isn't putting in that many hours, then maybe they don't like it. If looked at over the course of the past year, you can get a feel for whether any individual player has increased how much they play or decreased how much they play. You also get an idea of how much they like the game. Putting in 100+ hours, even over the course of a year, isn't something they are likely to do without liking the game. If you see a high number of people's play time dropping off significantly enough after 2.0, then that's a good indication that 2.0 wasn't liked by a large part of the players. If you don't see that happen to more than say 20% of the players, then it's a good indication that 2.0 was liked by a much larger percentage of players than those who don't like it.
Of course, there is the issue of mods. If someone is using overhaul mods, for example, and those mods aren't updated right away, they may stop playing until those get updated, but that isn't necessarily an indication of not liking the game. It can just mean they prefer the overhaul mod. Or someone might keep playing as much after 2.0 once their mod(s) is/are updated and that may not indicate they like the update, but only that their mod(s) make it enjoyable to them, perhaps by changing or removing the changes the update made. So it's not necessarily accurate either.
What I would say is that by using a variety of different ways of reading data, such as what you used and this example, along with others, you can combine the results from all of them to try to increase the accuracy. One "question" for lack of a better word might result in an idea that 80% like the changes and another might show 70% don't like the changes, and a third might show that 60% like the changes. When you start looking at those all as one, you can get results that are more likely to be accurate than using a single "question". Note I'm not really talking about prompts when I say question here. I'm just talking about a different way of collating the data.
One other thing to consider is that new data may not be as accurate as older data. Changes upset people. At least some people. Even a good change that a person will end up thinking was a good change later on may be viewed as bad initially. So it isn't unusual to get new opinions that say the game is bad now, yet if you asked the same person in a few months or so, they may say that the game is good. That leads to questionable results from new data. Add in the changes to 2.0 in both 2.1 and 2.2 and whatever is coming later to improve on the original 2.0 release and are the opinions based on the original 2.0 or on 2.1 or on 2.2? Someone could have hated 2.0, but liked how things were changed with 2.1 or 2.2. Maybe they updated their posts or reviews, maybe not. I think the number of people who change their reviews is extremely low, even if their opinion changed from like to dislike or dislike to like. But how do you handle that? Do you ignore new data? If you did, then you couldn't get any info about 2.0 since it's too new. Do you wait 3-6 months before trying to get results? You'd be more accurate, but that's a long time to wait. Do you weight the results differently? That also makes it so opinions about 2.x aren't going to be visible or not as visible. I don't have an answer to that.
Well, I hope this gave you the answers you were wanting. I want to point out that I'm not a statistician, though I do understand statistics. I'm also not an expert with AI, though I understand many of the pitfalls and restrictions with the use of AI due to a lot of programming and scripting knowledge and a strong grasp of logic that helps with understanding how AI and prompts work. So my views aren't the views of an expert in the field, but also aren't those of a layman either. I think you have a good understanding of the use of AI and your results are probably about as good as is possible with the available data. My initial dismissal of your AI responses was because of the lack of good use of AI over the past couple of months here. Also, even with a conservative look at it, I'd say 95% of people using AI have no real idea of how to properly use prompts to get good results, 4% are like me, where we do understand it but have very limited practical experience with it as it's more for personal use and not because of needing it for work use, and 1% are like you, where you have both a good understanding of it and a lot of practical use of it. And, like I said, that's being conservative. It's probably closer to 99%, 0.9%, and 0.1%. Heh. The point being that I assumed your AI post was what you'd expect from that 95% (or 99%). I was wrong about that, and I apologize for that assumption.