Giant Language Fashions (LLMs) have made substantial progress up to now a number of months, shattering state-of-the-art benchmarks in lots of domains. This paper investigates LLMs’ habits with respect to gender stereotypes, a recognized stumbling block for prior fashions. We suggest a easy paradigm to check the presence of gender bias, constructing on however differing from WinoBias, a generally used gender bias dataset which is prone to be included within the coaching information of present LLMs. We take a look at 4 lately revealed LLMs and show that they categorical biased assumptions about women and men, particularly these aligned with folks’s perceptions, relatively than these grounded in actual fact. We moreover examine the reasons offered by the fashions for his or her selections. Along with explanations which can be explicitly grounded in stereotypes, we discover {that a} important proportion of explanations are factually inaccurate and certain obscure the true purpose behind the fashions’ selections. This highlights a key property of those fashions: LLMs are educated on unbalanced datasets; as such, even with reinforcement studying with human suggestions, they have an inclination to mirror these imbalances again at us. As with different sorts of societal biases, we advise that LLMs have to be fastidiously examined to make sure that they deal with minoritized people and communities equitably.