Skip to content

abstract

  • Name: cognitivefactory.interactive_clustering.clustering.abstract
  • Description: The abstract class used to define constrained clustering algorithms.
  • Author: Erwan SCHILD
  • Created: 17/03/2021
  • Licence: CeCILL-C License v1.0 (https://cecill.info/licences.fr.html)

AbstractConstrainedClustering

Bases: ABC

Abstract class that is used to define constrained clustering algorithms. The main inherited method is cluster.

References
  • Survey on Constrained Clustering : Lampert, T., T.-B.-H. Dao, B. Lafabregue, N. Serrette, G. Forestier, B. Cremilleux, C. Vrain, et P. Gancarski (2018). Constrained distance based clustering for time-series : a comparative and experimental study. Data Mining and Knowledge Discovery 32(6), 1663–1707.
Source code in src\cognitivefactory\interactive_clustering\clustering\abstract.py
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
class AbstractConstrainedClustering(ABC):
    """
    Abstract class that is used to define constrained clustering algorithms.
    The main inherited method is `cluster`.

    References:
        - Survey on Constrained Clustering : `Lampert, T., T.-B.-H. Dao, B. Lafabregue, N. Serrette, G. Forestier, B. Cremilleux, C. Vrain, et P. Gancarski (2018). Constrained distance based clustering for time-series : a comparative and experimental study. Data Mining and Knowledge Discovery 32(6), 1663–1707.`
    """

    # ==============================================================================
    # ABSTRACT METHOD - CLUSTER
    # ==============================================================================
    @abstractmethod
    def cluster(
        self,
        constraints_manager: AbstractConstraintsManager,
        vectors: Dict[str, csr_matrix],
        nb_clusters: Optional[int],
        verbose: bool = False,
        **kargs,
    ) -> Dict[str, int]:
        """
        (ABSTRACT METHOD)
        An abstract method that represents the main method used to cluster data.

        Args:
            constraints_manager (AbstractConstraintsManager): A constraints manager over data IDs that will force clustering to respect some conditions during computation.
            vectors (Dict[str, csr_matrix]): The representation of data vectors. The keys of the dictionary represents the data IDs. This keys have to refer to the list of data IDs managed by the `constraints_manager`. The value of the dictionary represent the vector of each data.
            nb_clusters (Optional[int]): The number of clusters to compute. Can be `None` if this parameters is estimated or if the algorithm doesn't need it.
            verbose (bool, optional): Enable verbose output. Defaults to `False`.
            **kargs (dict): Other parameters that can be used in the clustering.

        Raises:
            ValueError: if `vectors` and `constraints_manager` are incompatible, or if some parameters are incorrectly set.

        Returns:
            Dict[str,int]: A dictionary that contains the predicted cluster for each data ID.
        """

cluster(constraints_manager, vectors, nb_clusters, verbose=False, **kargs) abstractmethod

(ABSTRACT METHOD) An abstract method that represents the main method used to cluster data.

Parameters:

Name Type Description Default
constraints_manager AbstractConstraintsManager

A constraints manager over data IDs that will force clustering to respect some conditions during computation.

required
vectors Dict[str, csr_matrix]

The representation of data vectors. The keys of the dictionary represents the data IDs. This keys have to refer to the list of data IDs managed by the constraints_manager. The value of the dictionary represent the vector of each data.

required
nb_clusters Optional[int]

The number of clusters to compute. Can be None if this parameters is estimated or if the algorithm doesn't need it.

required
verbose bool

Enable verbose output. Defaults to False.

False
**kargs dict

Other parameters that can be used in the clustering.

{}

Raises:

Type Description
ValueError

if vectors and constraints_manager are incompatible, or if some parameters are incorrectly set.

Returns:

Type Description
Dict[str, int]

Dict[str,int]: A dictionary that contains the predicted cluster for each data ID.

Source code in src\cognitivefactory\interactive_clustering\clustering\abstract.py
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
@abstractmethod
def cluster(
    self,
    constraints_manager: AbstractConstraintsManager,
    vectors: Dict[str, csr_matrix],
    nb_clusters: Optional[int],
    verbose: bool = False,
    **kargs,
) -> Dict[str, int]:
    """
    (ABSTRACT METHOD)
    An abstract method that represents the main method used to cluster data.

    Args:
        constraints_manager (AbstractConstraintsManager): A constraints manager over data IDs that will force clustering to respect some conditions during computation.
        vectors (Dict[str, csr_matrix]): The representation of data vectors. The keys of the dictionary represents the data IDs. This keys have to refer to the list of data IDs managed by the `constraints_manager`. The value of the dictionary represent the vector of each data.
        nb_clusters (Optional[int]): The number of clusters to compute. Can be `None` if this parameters is estimated or if the algorithm doesn't need it.
        verbose (bool, optional): Enable verbose output. Defaults to `False`.
        **kargs (dict): Other parameters that can be used in the clustering.

    Raises:
        ValueError: if `vectors` and `constraints_manager` are incompatible, or if some parameters are incorrectly set.

    Returns:
        Dict[str,int]: A dictionary that contains the predicted cluster for each data ID.
    """

rename_clusters_by_order(clusters)

Rename cluster ID to be ordered by data IDs.

Parameters:

Name Type Description Default
clusters Dict[str, int]

The dictionary of clusters.

required

Returns:

Type Description
Dict[str, int]

Dict[str, int]: The sorted dictionary of clusters.

Source code in src\cognitivefactory\interactive_clustering\clustering\abstract.py
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
def rename_clusters_by_order(
    clusters: Dict[str, int],
) -> Dict[str, int]:
    """
    Rename cluster ID to be ordered by data IDs.

    Args:
        clusters (Dict[str, int]): The dictionary of clusters.

    Returns:
        Dict[str, int]: The sorted dictionary of clusters.
    """

    # Get `list_of_data_IDs`.
    list_of_data_IDs = sorted(clusters.keys())

    # Define a map to be able to rename cluster IDs.
    mapping_of_old_ID_to_new_ID: Dict[int, int] = {}
    new_ID: int = 0
    for data_ID in list_of_data_IDs:  # , cluster_ID in clusters.items():
        if clusters[data_ID] not in mapping_of_old_ID_to_new_ID.keys():
            mapping_of_old_ID_to_new_ID[clusters[data_ID]] = new_ID
            new_ID += 1

    # Rename cluster IDs.
    new_clusters = {
        data_ID_to_assign: mapping_of_old_ID_to_new_ID[clusters[data_ID_to_assign]]
        for data_ID_to_assign in list_of_data_IDs
    }

    # Return the new ordered clusters
    return new_clusters