magichour.lib.StringMatch package¶

Submodules¶

magichour.lib.StringMatch.StringMatch module¶

magichour.lib.StringMatch.StringMatch.get_clusters(dataset_iterator, batch_size, skip_count, threshold, MIN_SAMPLES_FOR_SPLIT)¶

magichour.lib.StringMatch.StringMatch.main()¶

magichour.lib.StringMatch.cluster module¶

class magichour.lib.StringMatch.cluster.Cluster(leaf, index=None, index_val=None)¶

Bases: object

add_leaf(leaf)¶

add_to_leaf(line, threshold, skip_count)¶: Add line to this cluster

check_for_match(line_split, threshold, skip_count, verbose=False)¶: Check to see if input line belongs in the cluster

get_num_lines()¶

get_template_line()¶

print_groups(index, include_text=False)¶

print_template_lines(indent)¶

split_leaf(min_items_for_split, skip_count, min_word_pos_entropy, min_percent, verbose=False)¶: Split any leafs that meet the criteria for a split (> min_items_for_split)

magichour.lib.StringMatch.cluster.LogGroup¶: alias of LogEntry

class magichour.lib.StringMatch.cluster.LogLine(ts, text)¶

Bases: tuple

__getnewargs__()¶: Return self as a plain tuple. Used by copy and pickle.

__getstate__()¶: Exclude the OrderedDict from pickling

__repr__()¶: Return a nicely formatted representation string

text¶: Alias for field number 1

ts¶: Alias for field number 0

magichour.lib.StringMatch.leaf module¶

class magichour.lib.StringMatch.leaf.Leaf(log_line, index=None, index_val=None)¶

Bases: object

add_to_leaf(line, threshold, skip_count)¶

check_for_match(line_split, threshold, skip_count, verbose=False)¶

get_num_lines()¶

get_template_line()¶

print_groups(index, include_text=False)¶

print_template_lines(indent)¶

split_leaf(min_items_for_split, skip_count, min_word_pos_entropy, min_percent, verbose=False)¶

Split this leaf into multiple leaves

TODO: This function is gross clean it up!

magichour.lib.StringMatch.utils module¶

magichour.lib.StringMatch.utils.cosine_sim(l1, l2, skip_count=0)¶

magichour.lib.StringMatch.utils.get_entropy_of_word_positions(word_counts, num_log_lines)¶

Module contents¶

Read the Docs v: latest

Versions: latest

Downloads

On Read the Docs: Project Home; Builds

Free document hosting provided by Read the Docs.