magichour.lib.StringMatch package

Submodules

magichour.lib.StringMatch.StringMatch module

magichour.lib.StringMatch.StringMatch.get_clusters(dataset_iterator, batch_size, skip_count, threshold, MIN_SAMPLES_FOR_SPLIT)
magichour.lib.StringMatch.StringMatch.main()

magichour.lib.StringMatch.cluster module

class magichour.lib.StringMatch.cluster.Cluster(leaf, index=None, index_val=None)

Bases: object

add_leaf(leaf)
add_to_leaf(line, threshold, skip_count)

Add line to this cluster

check_for_match(line_split, threshold, skip_count, verbose=False)

Check to see if input line belongs in the cluster

get_num_lines()
get_template_line()
print_groups(index, include_text=False)
print_template_lines(indent)
split_leaf(min_items_for_split, skip_count, min_word_pos_entropy, min_percent, verbose=False)

Split any leafs that meet the criteria for a split (> min_items_for_split)

magichour.lib.StringMatch.cluster.LogGroup

alias of LogEntry

class magichour.lib.StringMatch.cluster.LogLine(ts, text)

Bases: tuple

__getnewargs__()

Return self as a plain tuple. Used by copy and pickle.

__getstate__()

Exclude the OrderedDict from pickling

__repr__()

Return a nicely formatted representation string

text

Alias for field number 1

ts

Alias for field number 0

magichour.lib.StringMatch.leaf module

class magichour.lib.StringMatch.leaf.Leaf(log_line, index=None, index_val=None)

Bases: object

add_to_leaf(line, threshold, skip_count)
check_for_match(line_split, threshold, skip_count, verbose=False)
get_num_lines()
get_template_line()
print_groups(index, include_text=False)
print_template_lines(indent)
split_leaf(min_items_for_split, skip_count, min_word_pos_entropy, min_percent, verbose=False)

Split this leaf into multiple leaves

TODO: This function is gross clean it up!

magichour.lib.StringMatch.utils module

magichour.lib.StringMatch.utils.cosine_sim(l1, l2, skip_count=0)
magichour.lib.StringMatch.utils.get_entropy_of_word_positions(word_counts, num_log_lines)

Module contents