wordfreq/SUNSET
I don’t think anyone has reliable information about post-2021 language usage by humans. The open Web (via OSCAR) was one of wordfreq’s data sources. Now the Web at large is full of slop generated by large language models, written by no one to communicate nothing. Including this slop in the data skews the word frequencies.
Source: wordfreq/SUNSET.md at master - rspeer/wordfreq
What a mess
There are useful things generative AI can do, but it certainly makes me question everything I read online written after 2022