magichour.lib.LogCluster package

Submodules

magichour.lib.LogCluster.LogCluster module

magichour.lib.LogCluster.LogCluster.collapse_patterns(input_pattern_tuple)

Collapse all the related patterns into a single pattern string

Parameters:input_pattern_tuple ((tuple[str], iterable[list[str]]) –
Returns:Returns a list of pattern strings (where the pattern is a list of strings)
Return type:list[list[str]]
magichour.lib.LogCluster.LogCluster.extract_patterns(line, frequent_words)

Take single LogLine and output a tuple of frequent words and frequent words with skips

Parameters:log_lines (LogLine) – Input log line
Returns:(tuple – tuple of (frequent words in order, freq words with skip counts between frequent words)
Return type:[str]
magichour.lib.LogCluster.LogCluster.log_cluster(sc, log_lines, support)

Run log cluster

Parameters:
  • log_lines (rdd of LogLine) – Input log messages as LogLine objects
  • support (int) – Threshold # of occurrences before a pattern can be included
Returns:

Returns a list of pattern strings (where the pattern is a list of strings) for the log lines

Return type:

list[list[str]]

magichour.lib.LogCluster.LogCluster.parse_words(log_lines)

Take single LogLine and split words for input to a word count

Parameters:log_lines (LogLine) – Input log line
Returns:List of (word, count) tuples where the count will always be 1

Module contents