State objects are pretty heavy-weight. On linux-x64 sizeof(State) = 2344
Much of this is due to the two 32-bit int arrays _cost and _rule.
The number of rules vary by platform, but seem to hover around 1k. A 16-bit integer seem more than enough to map all rules, so making _rule an uint16_t array saves significant space.
Also it seems we can fold the _valid bit vector into _rule and reduce the size even further. This also considerably simplify the common validity checks, which more than makes up for needing to initialize more memory when setting up the State object.
Those two optimizations reduce sizeof(State) down to 1736. Profiling compilations show a tiny win overall and instrumenting some simple compilations I see a sharp decline in Arena::grow events caused by ResourceObj allocation in Matcher::match_tree and Matcher::Label_Root
Much of this is due to the two 32-bit int arrays _cost and _rule.
The number of rules vary by platform, but seem to hover around 1k. A 16-bit integer seem more than enough to map all rules, so making _rule an uint16_t array saves significant space.
Also it seems we can fold the _valid bit vector into _rule and reduce the size even further. This also considerably simplify the common validity checks, which more than makes up for needing to initialize more memory when setting up the State object.
Those two optimizations reduce sizeof(State) down to 1736. Profiling compilations show a tiny win overall and instrumenting some simple compilations I see a sharp decline in Arena::grow events caused by ResourceObj allocation in Matcher::match_tree and Matcher::Label_Root