Configurations¶

You have two options when configuring the application.

Environment variables
Yaml (see “config_files/.env_config.yml” )

Config Field	Details
FACT_STORE_URL	Url to metadata api (fact store)
FREQ_NOISE	Increasing frequency of message used to cause the som to lower the score of anomaly for false positive message
STORAGE_DATASOURCE	storage backend used to as a source and sink of data that is processed by log anomaly detector
STORAGE_DATASINK	storage backend used to as a source and sink of data that is processed by log anomaly detector
MODEL_DIR	Directory where the physical model files will be stored
MODEL_FILE	Name of file where models are stored
W2V_MODEL_PATH	File that is used for the word 2 vec model filename
TRAIN_TIME_SPAN	Number of seconds specifying how far to the past to go to load log entries for training.
TRAIN_MAX_ENTRIES	Maximum number of entries for training loaded from backend storage
TRAIN_ITERATIONS	Parameter used to train the SOM model. Defines the number of training iterations used to train the model
TRAIN_UPDATE_MODEL	If set to True, a pre-existing model is loaded for re-training. Otherwise, a new model is initialized.
TRAIN_WINDOW	[HYPER_PARAMETER] This is a hyper parameter used by the Word2Vec to dictate the number of words behind and in front of the target word during training. Users may want to tweak this parameter.
TRAIN_VECTOR_LENGTH	[HYPER_PARAMETER] This is a hyper-parameter used by the Word2vec implementation to dictate the length of the feature vector generated from the log data for further processing by the SOM anomaly detection. Optimal feature length often can’t be known a prior, so users may want to tune this parameter based on their specific data set and use-case.
PARALLELISM	Used to past through to SOMPY package. Number of jobs that can be parallelized
INFER_ANOMALY_THRESHOLD	This value dictates how many standard deviations away from the mean a particular log anomaly value has to be before it will be classified as an anomaly. If a user would like their system to be more strict they should increase this value. A value of 0 will classify all entries as anomalies.
INFER_TIME_SPAN	The time in seconds that each inference batch represents. A value of 60 will pull the last 60 seconds of logs into the system for inference.
INFER_LOOPS	Number of inference steps before retraining.
INFER_MAX_ENTRIES	Maximum number of log messages read in from backend storage during inference
LS_INPUT_PATH	Read from input path
LS_OUTPUT_PATH	Write to output path
W2V_MIN_COUNT	The minimum number of entries of a word in your corpus to be included in encoding
W2V_ITER	The number of training epochs performed by word2vec
W2V_COMPUTE_LOSS	If True, computes and stores loss value which can be retrieved later
W2V_SEED	Seed for the random number generator
W2V_WORKERS	Number of how many worker threads to train the model
SOMPY_TRAIN_ROUGH_LEN	Number of epochs for the initial SOM training
SOMPY_TRAIN_FINETUNE_LEN	Number of epochs for the SOM fine tuning training (after the rough train)
SOMPY_NODE_MAP	Size of the SOM map (here 24x24)
SOMPY_INIT	The method used for initializing the map either random or pca
SQL_CONNECT	Used to connect fact_store ui to database to store metadata. Note: if you are running in openshift you can deploy mysql as a durable storage
ES_ENDPOINT	ElasticSearch endpoint URL
ES_CERT_DIR	Path to a directory where cert and key (es.crt and es.key) are stored for authentication
ES_USE_SSL	If True, connect using ssl
ES_TARGET_INDEX	ElasticSearch index name where results will be pushed to
ES_INPUT_INDEX	ElasticSearch index name where log entries will be pulled from
ES_QUERY	JSON representing a query passed to ElasticSearch to match the data
ES_ELAST_ALERT	If set to ‘0’ then will disable email alerts from elastalert
ES_VERSION	Version of elasticsearch that is running. By default we expect that you use elasticsearch 5 if your using newer version you can set it here
KF_BOOTSTRAP_SERVER	Kafka Bootstrap server
KF_TOPIC	Kafka Topic
KF_CACERT	Path to a directory where cert and key (kf.crt and kf.key) are stored for authentication
KF_SECURITY_PROTOCOL	By default plain text but can be SSL
KF_AUTO_TIMEOUT	By default 30000. Number of ms to throw a timeout exception.
LOG_FORMATTER	Custom log formatter for cleaning data from message.

CUSTOM LOG FORMATTER¶

By default we expect that your logs coming from elasticsearch are json document with the field ‘message’. We will support other formats in the future. However when preprocessing logs we may run into issue where the logs are not standard format in that case we have custom log formatters that will clean the log line and extract the message. Below you will find configurations you can set with LOG_FORMATTER to allow for this ability.

strip_prefix

Removes strings and gets the last field.