Consider the Werewolf and the Owlbear. They are both CR3, so they should be equally challenging, right? WRONG.
The owlbear could be considered the baseline CR3 monster. It's got a big bag o'HP, an AC of 13 that will be hit by a third level character ~65% of the time, and two +7 attacks that can hit level 3 characters with a similar hit probability (50% vs. chainmail and shield, 75% vs. a squishy caster with decent dexterity) dealing 24 damage per round on average assuming both hit. Party composition barely matters, 4 level 3s with competently-distributed ability scores, spells, etc... will take it out (barring a series of shitty rolls).
The werewolf though... on paper, it's easier than an owlbear. They have similar HP, a smidgen lower AC in hybrid form (12), and much weaker attacks (+4, with 12 average DPR). But this is where the CR system gets swingy depending on the party. If your party is a melee fighter, a ranged rogue, a barbarian and a monk then the werewolf might as well be CR999999 because it's immune to nonmagical non-silvered damage. But if the party is a paladin, a soulknife, a cleric and sorcerer then its effective CR will be basically 1.5.
This ridiculousness is another reason I was happy to ditch DnD for PF2. Building encounters is sooooo much quicker and easier than trying to wrestle with nonsensical CR. Bane of my existence as a GM for 5e.
This is a tool for DMs to estimate the difficulties of an encounter for their player. On the right you have the statblock of a cat, on the bottom left is the current number of creatures in the encounter (158,750 cats). On the top left is an approximate gauge for how difficult the current encounter (of 158,750 cats) would be, which is "This Would Kill Tiamat". Tiamat being the 5-headed Queen of all Evil Dragons.