Configurations¶
You have two options when configuring the application.
- Environment variables
- Yaml (see “config_files/.env_config.yml” )
Config Field | Details |
---|---|
FACT_STORE_URL | Url to metadata api (fact store) |
FREQ_NOISE | Increasing frequency of message used to cause the som to lower the score of anomaly for false positive message |
STORAGE_DATASOURCE | storage backend used to as a source and sink of data that is processed by log anomaly detector |
STORAGE_DATASINK | storage backend used to as a source and sink of data that is processed by log anomaly detector |
MODEL_DIR | Directory where the physical model files will be stored |
MODEL_FILE | Name of file where models are stored |
W2V_MODEL_PATH | File that is used for the word 2 vec model filename |
TRAIN_TIME_SPAN | Number of seconds specifying how far to the past to go to load log entries for training. |
TRAIN_MAX_ENTRIES | Maximum number of entries for training loaded from backend storage |
TRAIN_ITERATIONS | Parameter used to train the SOM model. Defines the number of training iterations used to train the model |
TRAIN_UPDATE_MODEL | If set to True, a pre-existing model is loaded for re-training. Otherwise, a new model is initialized. |
TRAIN_WINDOW | [HYPER_PARAMETER] This is a hyper parameter used by the Word2Vec to dictate the number of words behind and in front of the target word during training. Users may want to tweak this parameter. |
TRAIN_VECTOR_LENGTH | [HYPER_PARAMETER] This is a hyper-parameter used by the Word2vec implementation to dictate the length of the feature vector generated from the log data for further processing by the SOM anomaly detection. Optimal feature length often can’t be known a prior, so users may want to tune this parameter based on their specific data set and use-case. |
PARALLELISM | Used to past through to SOMPY package. Number of jobs that can be parallelized |
INFER_ANOMALY_THRESHOLD | This value dictates how many standard deviations away from the mean a particular log anomaly value has to be before it will be classified as an anomaly. If a user would like their system to be more strict they should increase this value. A value of 0 will classify all entries as anomalies. |
INFER_TIME_SPAN | The time in seconds that each inference batch represents. A value of 60 will pull the last 60 seconds of logs into the system for inference. |
INFER_LOOPS | Number of inference steps before retraining. |
INFER_MAX_ENTRIES | Maximum number of log messages read in from backend storage during inference |
LS_INPUT_PATH | Read from input path |
LS_OUTPUT_PATH | Write to output path |
W2V_MIN_COUNT | The minimum number of entries of a word in your corpus to be included in encoding |
W2V_ITER | The number of training epochs performed by word2vec |
W2V_COMPUTE_LOSS | If True, computes and stores loss value which can be retrieved later |
W2V_SEED | Seed for the random number generator |
W2V_WORKERS | Number of how many worker threads to train the model |
SOMPY_TRAIN_ROUGH_LEN | Number of epochs for the initial SOM training |
SOMPY_TRAIN_FINETUNE_LEN | Number of epochs for the SOM fine tuning training (after the rough train) |
SOMPY_NODE_MAP | Size of the SOM map (here 24x24) |
SOMPY_INIT | The method used for initializing the map either random or pca |
SQL_CONNECT | Used to connect fact_store ui to database to store metadata. Note: if you are running in openshift you can deploy mysql as a durable storage |
ES_ENDPOINT | ElasticSearch endpoint URL |
ES_CERT_DIR | Path to a directory where cert and key (es.crt and es.key) are stored for authentication |
ES_USE_SSL | If True, connect using ssl |
ES_TARGET_INDEX | ElasticSearch index name where results will be pushed to |
ES_INPUT_INDEX | ElasticSearch index name where log entries will be pulled from |
ES_QUERY | JSON representing a query passed to ElasticSearch to match the data |
ES_ELAST_ALERT | If set to ‘0’ then will disable email alerts from elastalert |
ES_VERSION | Version of elasticsearch that is running. By default we expect that you use elasticsearch 5 if your using newer version you can set it here |
KF_BOOTSTRAP_SERVER | Kafka Bootstrap server |
KF_TOPIC | Kafka Topic |
KF_CACERT | Path to a directory where cert and key (kf.crt and kf.key) are stored for authentication |
KF_SECURITY_PROTOCOL | By default plain text but can be SSL |
KF_AUTO_TIMEOUT | By default 30000. Number of ms to throw a timeout exception. |
LOG_FORMATTER | Custom log formatter for cleaning data from message. |
CUSTOM LOG FORMATTER¶
By default we expect that your logs coming from elasticsearch are json document with the field ‘message’. We will support other formats in the future. However when preprocessing logs we may run into issue where the logs are not standard format in that case we have custom log formatters that will clean the log line and extract the message. Below you will find configurations you can set with LOG_FORMATTER to allow for this ability.
strip_prefix
- Removes strings and gets the last field.