Configurations

You have two options when configuring the application.

  • Environment variables
  • Yaml (see “config_files/.env_config.yml” )
Config Field Details
FACT_STORE_URL Url to metadata api (fact store)
FREQ_NOISE Increasing frequency of message used to cause the som to lower the score of anomaly for false positive message
STORAGE_DATASOURCE storage backend used to as a source and sink of data that is processed by log anomaly detector
STORAGE_DATASINK storage backend used to as a source and sink of data that is processed by log anomaly detector
MODEL_DIR Directory where the physical model files will be stored
MODEL_FILE Name of file where models are stored
W2V_MODEL_PATH File that is used for the word 2 vec model filename
TRAIN_TIME_SPAN Number of seconds specifying how far to the past to go to load log entries for training.
TRAIN_MAX_ENTRIES Maximum number of entries for training loaded from backend storage
TRAIN_ITERATIONS Parameter used to train the SOM model. Defines the number of training iterations used to train the model
TRAIN_UPDATE_MODEL If set to True, a pre-existing model is loaded for re-training. Otherwise, a new model is initialized.
TRAIN_WINDOW [HYPER_PARAMETER] This is a hyper parameter used by the Word2Vec to dictate the number of words behind and in front of the target word during training. Users may want to tweak this parameter.
TRAIN_VECTOR_LENGTH [HYPER_PARAMETER] This is a hyper-parameter used by the Word2vec implementation to dictate the length of the feature vector generated from the log data for further processing by the SOM anomaly detection. Optimal feature length often can’t be known a prior, so users may want to tune this parameter based on their specific data set and use-case.
PARALLELISM Used to past through to SOMPY package. Number of jobs that can be parallelized
INFER_ANOMALY_THRESHOLD This value dictates how many standard deviations away from the mean a particular log anomaly value has to be before it will be classified as an anomaly. If a user would like their system to be more strict they should increase this value. A value of 0 will classify all entries as anomalies.
INFER_TIME_SPAN The time in seconds that each inference batch represents. A value of 60 will pull the last 60 seconds of logs into the system for inference.
INFER_LOOPS Number of inference steps before retraining.
INFER_MAX_ENTRIES Maximum number of log messages read in from backend storage during inference
LS_INPUT_PATH Read from input path
LS_OUTPUT_PATH Write to output path
W2V_MIN_COUNT The minimum number of entries of a word in your corpus to be included in encoding
W2V_ITER The number of training epochs performed by word2vec
W2V_COMPUTE_LOSS If True, computes and stores loss value which can be retrieved later
W2V_SEED Seed for the random number generator
W2V_WORKERS Number of how many worker threads to train the model
SOMPY_TRAIN_ROUGH_LEN Number of epochs for the initial SOM training
SOMPY_TRAIN_FINETUNE_LEN Number of epochs for the SOM fine tuning training (after the rough train)
SOMPY_NODE_MAP Size of the SOM map (here 24x24)
SOMPY_INIT The method used for initializing the map either random or pca
SQL_CONNECT Used to connect fact_store ui to database to store metadata. Note: if you are running in openshift you can deploy mysql as a durable storage
ES_ENDPOINT ElasticSearch endpoint URL
ES_CERT_DIR Path to a directory where cert and key (es.crt and es.key) are stored for authentication
ES_USE_SSL If True, connect using ssl
ES_TARGET_INDEX ElasticSearch index name where results will be pushed to
ES_INPUT_INDEX ElasticSearch index name where log entries will be pulled from
ES_QUERY JSON representing a query passed to ElasticSearch to match the data
ES_ELAST_ALERT If set to ‘0’ then will disable email alerts from elastalert
ES_VERSION Version of elasticsearch that is running. By default we expect that you use elasticsearch 5 if your using newer version you can set it here
KF_BOOTSTRAP_SERVER Kafka Bootstrap server
KF_TOPIC Kafka Topic
KF_CACERT Path to a directory where cert and key (kf.crt and kf.key) are stored for authentication
KF_SECURITY_PROTOCOL By default plain text but can be SSL
KF_AUTO_TIMEOUT By default 30000. Number of ms to throw a timeout exception.
LOG_FORMATTER Custom log formatter for cleaning data from message.

CUSTOM LOG FORMATTER

By default we expect that your logs coming from elasticsearch are json document with the field ‘message’. We will support other formats in the future. However when preprocessing logs we may run into issue where the logs are not standard format in that case we have custom log formatters that will clean the log line and extract the message. Below you will find configurations you can set with LOG_FORMATTER to allow for this ability.

strip_prefix
  • Removes strings and gets the last field.