Feature Engineering tool
Here is a very simple sample to get started with Model Matrix. It’s assumed that you already installed Model Matrix schema in your PostgreSQL database, if not yet check installing schema documentation.
First it’s required to package CLI distribution. In root directory run:
sbt universal:packageBin
The cli zip distribution will be placed in:
- modelmatrix/modelmatrix-cli/target/universal/model-matrix-cli-0.0.2.zip
Unzip this file to start using Model Matrix
If you store you model matrix data in non-default database check CLI documentation to read how to provide custom database config.
Run simple cli command to ensure that schema installed successfully
# List available model matrix definitions
bin/modelmatrix-cli definition list
Using example configuration stored in features.conf:
bin/modelmatrix-cli definition add --config ./model.conf
This command will return model definition id, going forward I’m assuming that it is 1
bin/modelmatrix-cli definition view features --definition-id 1
bin/modelmatrix-cli instance validate --definition-id 1 --source hive://mm.clicks_2015_05_05
It will calculate categorical and continuous features transformations based on shape of input data
bin/modelmatrix-cli instance create \
--definition-id 1 \
--source hive://mm.clicks_2015_05_05 \
--name clicks \
--comment "getting started"
This command will return model instance id, going forward I’m assuming that it is 123
You can view what categorical and continuous transformations were computed from input data.
bin/modelmatrix-cli instance view features --instance-id 123
bin/modelmatrix-cli instance view columns --instance-id 123
Check that model instance computed at previous step compatible with next day input data
bin/modelmatrix-cli featurize validate \
--instance-id 123 \
--source hive://mm.clicks_2015_05_06
Apply model instance transformation to input data and build “featurized” sparse table in Hive
bin/modelmatrix-cli featurize sparse \
--instance-id 123 \
--source hive://mm.clicks_2015_05_06 \
--target hive://mm.clicks_sparse_features_2015_05_06 \
--id-column AUCTION_ID
More documentation on Command Line Interface is here.