magichour.lib.StringMatch package¶
Submodules¶
magichour.lib.StringMatch.StringMatch module¶
-
magichour.lib.StringMatch.StringMatch.
get_clusters
(dataset_iterator, batch_size, skip_count, threshold, MIN_SAMPLES_FOR_SPLIT)¶
-
magichour.lib.StringMatch.StringMatch.
main
()¶
magichour.lib.StringMatch.cluster module¶
-
class
magichour.lib.StringMatch.cluster.
Cluster
(leaf, index=None, index_val=None)¶ Bases:
object
-
add_leaf
(leaf)¶
-
add_to_leaf
(line, threshold, skip_count)¶ Add line to this cluster
-
check_for_match
(line_split, threshold, skip_count, verbose=False)¶ Check to see if input line belongs in the cluster
-
get_num_lines
()¶
-
get_template_line
()¶
-
print_groups
(index, include_text=False)¶
-
print_template_lines
(indent)¶
-
split_leaf
(min_items_for_split, skip_count, min_word_pos_entropy, min_percent, verbose=False)¶ Split any leafs that meet the criteria for a split (> min_items_for_split)
-
-
magichour.lib.StringMatch.cluster.
LogGroup
¶ alias of
LogEntry
-
class
magichour.lib.StringMatch.cluster.
LogLine
(ts, text)¶ Bases:
tuple
-
__getnewargs__
()¶ Return self as a plain tuple. Used by copy and pickle.
-
__getstate__
()¶ Exclude the OrderedDict from pickling
-
__repr__
()¶ Return a nicely formatted representation string
-
text
¶ Alias for field number 1
-
ts
¶ Alias for field number 0
-
magichour.lib.StringMatch.leaf module¶
-
class
magichour.lib.StringMatch.leaf.
Leaf
(log_line, index=None, index_val=None)¶ Bases:
object
-
add_to_leaf
(line, threshold, skip_count)¶
-
check_for_match
(line_split, threshold, skip_count, verbose=False)¶
-
get_num_lines
()¶
-
get_template_line
()¶
-
print_groups
(index, include_text=False)¶
-
print_template_lines
(indent)¶
-
split_leaf
(min_items_for_split, skip_count, min_word_pos_entropy, min_percent, verbose=False)¶ Split this leaf into multiple leaves
TODO: This function is gross clean it up!
-
magichour.lib.StringMatch.utils module¶
-
magichour.lib.StringMatch.utils.
cosine_sim
(l1, l2, skip_count=0)¶
-
magichour.lib.StringMatch.utils.
get_entropy_of_word_positions
(word_counts, num_log_lines)¶