Pattern Structure#

class paspailleur.pattern_structures.pattern_structure.PatternStructure(pattern_type: Type[Pattern] = None)#

A class to process complex datasets where every row (called object) is described by one pattern.

All patterns should be of the same class defined by PatternType attribute.

References#

Ganter, B., & Kuznetsov, S. O. (2001, July). Pattern structures and their projections. In International conference on conceptual structures (pp. 129-142). Berlin, Heidelberg: Springer Berlin Heidelberg.

Attributes#

PatternType: TypeVar

A type variable bound to the Pattern class.

Private Attributes#

_object_irreducibles: Optional[dict[PatternType, fbarray]]

Patterns introduced by objects.

_object_names: Optional[list[str]]

Names of the objects.

_atomic_patterns: Optional[OrderedDict[PatternType, fbarray]]

Smallest nontrivial patterns.

_atomic_patterns_order: Optional[list[fbarray]]

Partial order on atomic patterns meaning that some atomic patterns can be incomparable. _atomic_patterns_order[i][j] == True would mean that i-th atomic pattern is less precise than j-th atomic pattern. _atomic_patterns_order[i][j] == False can mean both that j-th atomic pattern is less precise than i-th atomic pattern and that i-th and j-th atomic patterns can not be compared.

Properties#

premaximal_patterns

Return the premaximal patterns in the structure.

atomic_patterns_order

Return the partial order of atomic patterns by extent inclusion.

n_atomic_patterns

Return the number of atomic patterns in the structure.

atomic_patterns

Return the atomic patterns in the structure.

max_atoms

Return the maximal atomic patterns in the structure.

max_pattern

Return the maximal pattern in the structure.

min_pattern

Return the minimal pattern in the structure.

Basic operations on a context#

PatternStructure.extent(pattern: PatternType, return_bitarray: bool = False) set[str] | frozenbitarray#

Compute the extent of a given pattern.

Parameters:
  • pattern (PatternType) – The pattern for which to compute the extent.

  • return_bitarray (bool, optional) – If True, returns the extent as a bitarray (default is False).

Returns:

extent – The extent of the pattern, either as a set of object names or a bitarray.

Return type:

Union[set[str], fbarray]

Raises:

ValueError – If the data is unknown (i.e., not fitted).

Examples

>>> ps = PatternStructure()
>>> ps.fit(data)  # Assume this fits the structure with object data
>>> p = Pattern("A")
>>> ps.extent(p)
{'obj1', 'obj2'}
PatternStructure.intent(objects: Collection[str] | frozenbitarray) PatternType#

Compute the intent of a given set of objects.

Parameters:

objects (Union[Collection[str], fbarray]) – The objects for which to compute the intent.

Returns:

intent – The intent of the given objects.

Return type:

PatternType

Raises:

ValueError – If the data is unknown (i.e., not fitted).

Examples

>>> ps = PatternStructure()
>>> ps.fit(data)  # Assume this fits the structure with object data
>>> ps.intent(['obj1', 'obj2'])
Pattern('A')

Initialisation of the Pattern Structure#

PatternStructure.fit(object_descriptions: dict[str, PatternType], compute_atomic_patterns: bool = None, min_atom_support: int | float = 0, use_tqdm: bool = False)#

Initialize the PatternStructure with object descriptions.

Parameters:
  • object_descriptions (dict[str, PatternType]) – A dictionary mapping object names to their corresponding patterns.

  • compute_atomic_patterns (bool, optional) – If True, computes atomic patterns. If None, tries to infer whether computation is possible (default is None).

  • min_atom_support (Union[int, float], optional) – Minimum support threshold for an atomic pattern to be retained (default is 0). If a float between 0 and 1, it is interpreted as a proportion of total objects.

  • use_tqdm (bool, optional) – If True, displays a progress bar when computing atomic patterns (default is True).

Examples

>>> ps = PatternStructure()
>>> data = {
    "obj1": Pattern("A"),
    "obj2": Pattern("A & B"),
    "obj3": Pattern("B")
}
>>> ps.fit(data)
PatternStructure.init_atomic_patterns(min_support: int | float = 0, use_tqdm: bool = False)#

Compute the set of all patterns that cannot be obtained by join of other patterns

Parameters:
  • min_support (Union[int, float], optional) – Minimum number or proportion of objects a pattern must describe to be considered atomic (default is 0).

  • use_tqdm (bool, optional) – If True, displays a progress bar during computation (default is False).

Raises:

AssertionError – If the internal atomic pattern ordering check fails.

Examples

>>> ps = PatternStructure()
>>> ps.fit({
    "obj1": Pattern("A"),
    "obj2": Pattern("B"),
    "obj3": Pattern("A & B")
})
>>> ps.init_atomic_patterns(min_support=1)

Properties that are easy to compute#

property PatternStructure.min_pattern: PatternType#

Return the minimal pattern in the structure.

This is the least precise pattern that describes all objects.

Returns:

min – The minimal pattern found in the structure.

Return type:

PatternType

Raises:

ValueError – If the data is unknown (i.e., the structure has not been fitted yet).

Examples

>>> ps = PatternStructure()
>>> ps.fit({"obj1": Pattern("A"), "obj2": Pattern("B")})
>>> ps.min_pattern
Pattern("...")  # minimal pattern across objects
property PatternStructure.max_pattern: PatternType#

Return the maximal pattern in the structure.

This is the most precise pattern that includes all patterns in the structure.

Returns:

max – The maximal pattern found in the structure.

Return type:

PatternType

Raises:

ValueError – If the data is unknown (i.e., the structure has not been fitted yet).

Examples

>>> ps = PatternStructure()
>>> ps.fit({"obj1": Pattern("A"), "obj2": Pattern("B")})
>>> ps.max_pattern
Pattern("A | B")
property PatternStructure.max_atoms: set[PatternType]#

Return the maximal atomic patterns in the structure.

Maximal atoms are those atomic patterns that cannot be subsumed by any other.

Returns:

max_atoms – A set of maximal atomic patterns.

Return type:

set[PatternType]

Examples

>>> ps = PatternStructure()
>>> ps.fit({"obj1": Pattern("A"), "obj2": Pattern("A & B")})
>>> ps.max_atoms
{Pattern("A")}
property PatternStructure.atomic_patterns: OrderedDict[PatternType, set[str]]#

Return the atomic patterns in the structure.

Returns:

atoms – An ordered dictionary mapping atomic patterns to their extents (object names).

Return type:

OrderedDict[PatternType, set[str]]

Examples

>>> ps.atomic_patterns
OrderedDict({Pattern("A"): {"obj1", "obj3"}, Pattern("B"): {"obj2"}})
property PatternStructure.n_atomic_patterns: int#

Return the number of atomic patterns in the structure.

Returns:

count – The number of atomic patterns.

Return type:

int

Examples

>>> ps.n_atomic_patterns
5
property PatternStructure.atomic_patterns_order: dict[PatternType, set[PatternType]]#

Return the partial order of atomic patterns, i.e. for every atomic pattern show all its atomic super-patterns.

Each pattern maps to the set of patterns that strictly subsume it.

Returns:

order – A dictionary representing the ordering of atomic patterns. Keys are atomic patterns, values are sets of greater atomic patterns.

Return type:

dict[PatternType, set[PatternType]]

Examples

>>> ps.atomic_patterns_order
{
    Pattern("A"): {Pattern("A | B")},
    Pattern("B"): set()
}
property PatternStructure.premaximal_patterns: dict[PatternType, set[str]]#

Return the premaximal patterns in the structure.

Premaximal patterns are those just below the maximal pattern in precision.

Returns:

premaximals – A dictionary mapping premaximal patterns to their extents.

Return type:

dict[PatternType, set[str]]

Examples

>>> ps.premaximal_patterns
{
    Pattern("A & B"): {"obj1", "obj3"},
    Pattern("B & C"): {"obj2"}
}

Pattern Iterators#

PatternStructure.iter_atomic_patterns(return_bitarrays: bool = False, kind: Literal['bruteforce', 'ascending', 'ascending controlled'] = 'bruteforce', support_characteristic: Literal['any', 'maximal', 'minimal'] = 'any') Generator[tuple[PatternType, set[str] | frozenbitarray], bool | None, None]#

Iterate over atomic patterns in the pattern structure.

If controlled is False, then the function works as a generator that can be directly put into for-in loop, If controlled is True, then patterns’ navigation can be controleld by pattern_iterator.send() function. After receiving pattern p, send True to the iterator in order to continue iterating over atomic patterns more precise than p, and send False to “early stop” the iteration.

Important: When controlled is set to True, initialise the generator using next(pattern_generator) before generating any atomic patterns.

Parameters:
  • return_bitarrays (bool, optional) – If True, returns extents as bitarrays (default is False).

  • kind (Literal, optional) – Iteration strategy: ‘bruteforce’, ‘ascending’, or ‘ascending controlled’ (default is ‘bruteforce’).

  • support_characteristic (Literal, optional) – Defines what type of atomic patterns to output. If ‘any’ (which is the default), then output all the atomic patterns. If ‘maximal’ then output the most precise atomic patterns w.r.t. their support. (i.e. if atomic pattern is support-maximal, then any more precise atomic pattern would describe fewer objects). If ‘minimal’ then output the least precise atomic patterns w.r.t. their support. (i.e. if atomic pattern is support-minimal, then any less precise atomic pattern would describe more objects).

Yields:

pattern_generator (Generator[tuple[PatternType, Union[set[str], fbarray]], Union[bool, None], None]) – Generator of pairs of atomic patterns and their extents (either as a set of objects’ names or as a bitarray). When controlled parameter is set to True, one can send boolean refine_atomic_pattern parameter to the generator in order to control the patterns’ navigation.

Examples

>>> for atom, extent in ps.iter_atomic_patterns():
    print(atom, extent)
PatternStructure.iter_patterns(kind: Literal['ascending', 'ascending controlled'] = 'ascending', min_support: int | float = 0, depth_first: bool = True, return_objects_as_bitarrays: bool = False) Iterator[tuple[PatternType, bitarray]] | Generator[tuple[PatternType, bitarray], bool, None]#

Iterate through all patterns based on selected criteria.

Parameters:
  • kind (Literal, optional) – Strategy for traversal: ‘ascending’ or ‘ascending controlled’ (default is ‘ascending’).

  • min_support (Union[int, float], optional) – Minimum support required for a pattern to be yielded (default is 0).

  • depth_first (bool, optional) – If True, performs a depth-first traversal (default is True).

  • return_objects_as_bitarrays (bool, optional) – If True, extents are returned as bitarrays; otherwise as sets (default is False).

Yields:

pattern_info (Iterator or Generator) – Yields patterns and their extents.

Examples

>>> for pattern, extent in ps.iter_patterns(min_support=2):
    print(pattern, extent)
PatternStructure.iter_keys(patterns: PatternType | Iterable[PatternType] = None, max_length: int = None, min_support: int | float = 1, use_tqdm: bool = False) Iterator[PatternType] | Iterator[tuple[PatternType, PatternType]]#

Iterate over the keys that describe given patterns.

Keys (also known as minimal generators) of a pattern are the least precise subpatterns that describe the very same objects as the pattern.

Reference#

Buzmakov, A., Dudyrev, E., Kuznetsov, S. O., Makhalova, T., & Napoli, A. (2024). Data complexity: An FCA-based approach. International Journal of Approximate Reasoning, 165, 109084.

param patterns:

A pattern or list of patterns to decompose into atomic keys.

type patterns:

Union[PatternType, Iterable[PatternType]]

param max_length:

Maximum length of key combinations (default is None).

type max_length:

Optional[int], optional

param min_support:

Minimal number of objects described by a key. Only used when patterns is set to None.

type min_support:

int, default = 0

param use_tqdm:

Flag whether to show tqdm progress bar or not.

type use_tqdm:

bool, default = False

Yields:

keys (Union[PatternType, tuple[PatternType, PatternType]]) – Keys of the input patterns or (key, original pattern) pairs.

Examples

>>> for key in ps.iter_keys(Pattern("A | B")):
...    print(key)
PatternStructure.iter_subgroups(goal_objects: set[str] | bitarray, quality_measure: Literal['Accuracy', 'Precision', 'Recall', 'Jaccard', 'F1', 'WRAcc'], quality_threshold: float, kind: Literal['bruteforce'] = 'bruteforce', max_length: int | None = None, min_support: int | float | None = 0, return_objects_as_bitarrays: bool = False, use_tqdm: bool = False, atomic_support_characteristic: Literal['maximal', 'minimal'] = 'maximal') Iterator[SubgroupData]#

Iterate over subgroups that satisfy a quality threshold.

A subgroup is a pattern whose extent is very similar to the set of goal_objects. The similarity is measured by quality_measure and is lower-bounded by qulity_threshold.

References

Atzmueller, M. (2015). Subgroup discovery. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 5(1), 35-49.

Parameters:
  • goal_objects (Union[set[str], bitarray]) – Set or bitarray of target objects.

  • quality_measure (Literal) – Metric to evaluate subgroups (e.g., ‘Accuracy’, ‘F1’, ‘WRAcc’).

  • quality_threshold (float) – Minimum value for the selected quality measure.

  • kind (Literal, optional) – Subgroup mining strategy (currently only ‘bruteforce’ supported).

  • max_length (Optional[int], optional) – Maximum length of subgroups (default is None).

  • min_support (Union[int, float], optional) – The minimal amount of objects that a subgroup should describe. Defaults to 0.

  • return_objects_as_bitarrays (bool, optional) – If True, extents are returned as bitarrays (default is False).

  • use_tqdm (bool, optional) – Flag whether to show tqdm progress bar to visualise the number of passed subgroup candidates. Defaults to False.

  • atomic_support_characteristic (Literal["maximal", "minimal"], optional) – What type of atomic patterns to use to find subgroups. If “maximal” (the default) then prefer more precise atomic patterns (called support-maximal), and if “minimal” then prefer less precise atomic patterns (called support-minimal). For example, say there are two interval patterns “Age >= 20” and “Age > 10” that describe the same objects. Then the former pattern is support-maximal (as it is more precise) and the latter is support-minimal (as it is less precise).

Yields:

subs (Iterator[SubgroupData]) – Named Tuples of (pattern, extent, quality_value, quality_name).

Examples

>>> for p, o, q in ps.iter_subgroups({"obj1", "obj2"}, "Precision", 0.7):
    print(p, o, q)

High-level FCA API#

PatternStructure.mine_concepts(min_support: int | float = 0, min_delta_stability: int | float = 0, algorithm: Literal['CloseByOne object-wise', 'gSofia', 'CbOI'] = None, return_objects_as_bitarrays: bool = False, use_tqdm: bool = False) list[PatternConcept]#

Mine pattern concepts (extent-intent pairs) from the pattern structure.

Parameters:
  • min_support (Union[int, float], optional) – Minimum support required for concepts (default is 0).

  • min_delta_stability (Union[int, float], optional) – Minimum delta stability for concept filtering (default is 0).

  • algorithm (Literal, optional) – Algorithm used for mining: ‘CloseByOne object-wise’, ‘gSofia’, and ‘CbOI’.

  • return_objects_as_bitarrays (bool, optional) – If True, returns extents as bitarrays (default is False).

  • use_tqdm (bool, optional) – If True, displays a progress bar (default is False).

Returns:

A list of named tuples, each containing an extent and its corresponding intent. Extent representation of each concept depends on return_objects_as_bitarrays parameter.

Return type:

concepts

Examples

>>> ps.mine_concepts(min_support=1)
[({"obj1", "obj2"}, Pattern("A"))]
PatternStructure.mine_implications(basis_name: Literal['Canonical', 'Canonical Direct'] = 'Canonical Direct', min_support: int | float = 0, min_delta_stability: int | float = 0, max_key_length: int | None = None, algorithm: Literal['CloseByOne object-wise', 'gSofia', 'Talky-GI'] = None, reduce_conclusions: bool = False, use_tqdm: bool = False) dict[PatternType, PatternType]#

Mine implications from the pattern structure using a selected basis.

Parameters:
  • basis_name (Literal, optional) – Type of basis used for implications (default is “Canonical Direct”).

  • min_support (Union[int, float], optional) – Minimum support for implications (default is 0).

  • min_delta_stability (Union[int, float], optional) – Minimum delta stability (default is 0).

  • max_key_length (Optional[int], optional) – Maximum length of keys (default is None).

  • algorithm (Literal, optional) – Concept mining algorithm to use (default is None).

  • reduce_conclusions (bool, optional) – If True, reduces the size of implication conclusions (default is False).

  • use_tqdm (bool, optional) – If True, displays a progress bar (default is False).

Returns:

implications – A dictionary mapping premises to conclusions in the implication basis.

Return type:

dict[PatternType, PatternType]

Examples

>>> ps.mine_implications(min_support=2)
{Pattern("A"): Pattern("B")}
PatternStructure.mine_implications_from_premises(premises: Iterable[PatternType], pseudo_close_premises: bool = False, reduce_conclusions: bool = False, use_tqdm: bool = False) dict[PatternType, PatternType] | OrderedDict[PatternType, PatternType]#

Construct implications from a set of given premises.

Parameters:
  • premises (Iterable[PatternType]) – The premises to base implications on.

  • pseudo_close_premises (bool, optional) – Whether to pseudo-close the premises (default is False).

  • reduce_conclusions (bool, optional) – If True, reduces conclusions to minimal additions (default is False).

  • use_tqdm (bool, optional) – If True, displays progress bars (default is False).

Returns:

premises – A mapping from premises to conclusions.

Return type:

Union[dict[PatternType, PatternType], OrderedDict[PatternType, PatternType]]

Examples

>>> premises = [Pattern("A")]
>>> ps.mine_implications_from_premises(premises)
{Pattern("A"): Pattern("B")}
PatternStructure.next_patterns(pattern: PatternType, return_extents: bool = False, return_objects_as_bitarrays: bool = False) set[PatternType] | dict[PatternType, set[str]] | dict[PatternType, frozenbitarray]#

Compute the immediate successor patterns of a given pattern.

The immediate successor patternsare the next more precise patterns.

Parameters:
  • pattern (PatternType) – The pattern to compute successors for.

  • return_extents (bool, optional) – If True, returns extents along with patterns (default is False).

  • return_objects_as_bitarrays (bool, optional) – If True, extents are returned as bitarrays instead of sets (default is False).

Returns:

next_patterns – Either a set of successor patterns or a mapping of successors to extents.

Return type:

Union[set[PatternType], dict[PatternType, set[str]], dict[PatternType, fbarray]]

Examples

>>> ps.next_patterns(Pattern("A"))
{Pattern("A & B")}

Measures of Patterns#

PatternStructure.measure_support(pattern: Pattern) int#

Measure the support (number of objects) of a given pattern.

Support is the size of the pattern’s extent, i.e., the number of objects that satisfy the pattern.

Parameters:

pattern (Pattern) – The pattern for which to compute support.

Returns:

support – The number of objects (support count) covered by the pattern.

Return type:

int

Examples

>>> ps.measure_support(Pattern("A"))
3
PatternStructure.measure_frequency(pattern: Pattern) float#

Measure the frequency of a given pattern.

Frequency is the proportion of objects satisfying the pattern.

Parameters:

pattern (Pattern) – The pattern for which to compute frequency.

Returns:

frequency – The frequency of the pattern as a fraction between 0 and 1.

Return type:

float

Examples

>>> ps.measure_frequency(Pattern("A"))
0.6
PatternStructure.measure_delta_stability(pattern: Pattern) int#

Measure the delta stability of a given pattern.

Delta stability is defined as the difference in support between the pattern and its most specific generalization (i.e., next pattern with largest shared extent).

Parameters:

pattern (Pattern) – The pattern for which to compute delta stability.

Returns:

delta_s – The delta stability value.

Return type:

int

Examples

>>> ps.measure_delta_stability(Pattern("A"))
2  # pattern support is 5, next largest overlapping pattern support is 3

Helping Functions#

PatternStructure.verbalise_extent(extent: bitarray | set[str]) set[str]#

Convert an extent to a human-readable format (set of object names).

This function takes an extent represented as a bitarray (internal format) or a set of object names, and returns the corresponding object names in a readable form.

Parameters:

extent (Union[bitarray, set[str]]) – The extent to convert. If a bitarray is provided, each set bit is mapped to its corresponding object name.

Returns:

readable – The human-readable extent as a set of object names.

Return type:

set[str]

Examples

>>> ba = bitarray('1010')
>>> ps._object_names = ["obj1", "obj2", "obj3", "obj4"]
>>> ps.verbalise_extent(ba)
{'obj1', 'obj3'}
>>> ps.verbalise_extent({'obj2', 'obj4'})
{'obj2', 'obj4'}