Now You See Me (CME): Concept-based Model Extraction

Leveraging Semi-Supervised Idea-based Fashions with CME

CME depends on an identical commentary highlighted in [3], the place it was noticed that vanilla CNN fashions typically retain a excessive quantity of data pertaining to ideas of their hidden house, which can be used for idea info mining at no additional annotation value. Importantly, this work thought-about the state of affairs the place the underlying ideas are unknown, and needed to be extracted from a mannequin’s hidden house in an unsupervised style.

With CME, we make use of the above commentary, and take into account a state of affairs the place we’ve got data of the underlying ideas, however we solely have a small quantity of pattern annotations for every these ideas. Equally to [3], CME depends on a given pre-trained vanilla CNN and the small quantity of idea annotations with the intention to extract additional idea annotations in a semi-supervised style, as proven beneath:

CME mannequin processing. Picture by the writer.

As proven above, CME extracts the idea illustration utilizing a pre-trained mannequin’s hidden house in a post-hoc style. Additional particulars are given beneath.

Idea Encoder Coaching: as an alternative of coaching idea encoders from scratch on the uncooked information, as achieved in case of CBMs, we setup our idea encoder mannequin coaching in a semi-supervised style, utilizing the vanilla CNN’s hidden house:

We start by pre-specifying a set of layers L from the vanilla CNN to make use of for idea extraction. This will vary from all layers, to simply the previous couple of, relying on out there compute capability.Subsequent, for every idea, we prepare a separate mannequin on high of the hidden house of every layer in L to foretell that idea’s values from the layer’s hidden spaceWe proceed to choosing the mannequin and corresponding layer with the most effective mannequin accuracy because the “finest” mannequin and layer for predicting that idea.Consequently, when making idea predictions for an idea i, we first retrieve the hidden house illustration of the most effective layer for that idea, after which cross it via the corresponding predictive mannequin for inference.

Total, the idea encoder operate might be summarised as follows (assuming there are okay ideas in whole):

CME Idea Encoder equation. Picture by the writer.

Right here, p-hat on the LHS represents the idea encoder functionThe gᵢ phrases symbolize the hidden-space-to-concept fashions educated on high of the totally different layer hidden areas, with i representing the idea index, starting from 1 to okay. In follow, these fashions might be pretty easy, reminiscent of Linear Regressors, or Gradient Boosted ClassifiersThe f(x) phrases symbolize the sub-models of the unique vanilla CNN, extracting the enter’s hidden illustration at a selected layerIn each instances above, lʲ superscripts specify the “finest” layers these two fashions are working on

Idea Processor Coaching: idea processor mannequin coaching in CME is setup by coaching fashions utilizing activity labels as outputs, and idea encoder predictions as inputs. Importantly, these fashions are working on a way more compact enter illustration, and may consequently be represented immediately through interpretable fashions, reminiscent of Determination Timber (DTs), or Logistic Regression (LR) fashions.

Source link