The tutorial teaches us about “**Python Scipy Fcluster**” to cluster similar observations into one or more than one cluster, and also we will learn the steps behind clustering the data points with the following topics.

- What is clustering?
- How to create a cluster in Python Scipy
- Python Scipy Cluster T
- How to get the required cluster using the Maxclust
- Python Scipy Cluster Inconsistent
- Python Scipy Fcluster Data

## What is clustering?

Unsupervised machine learning tasks include clustering. Because of how this process operates, We could also hear it called cluster analysis.

When using a clustering method, we will be providing the algorithm with a large amount of unlabeled input data and allow it to identify whatever groups or collections of data it can.

These collections are known as clusters. A cluster is a collection of data points that are related to one another based on how they relate to other data points in the area. Pattern discovery and feature engineering are two applications of clustering.

The fundamental idea behind clustering is the division of a set of observations into subgroups or clusters so that observations belonging to the same cluster have some characteristics.

Read: Python Scipy Interpolate

## Python Scipy Fcluster

There is a method

of Python Scipy in a module *fcluster()*

creates flat clusters from the hierarchical clustering that the provided linkage matrix has defined.*scipy.cluster.hierarchy*

The syntax is given below.

`scipy.cluster.hierarchy.fcluster(Z, t, criterion='inconsistent', depth=2, R=None, monocrit=None)`

Where parameters are:

**Z(ndarray):**The linking function’s return matrix is encoded with hierarchical clustering.**t(scalar):***For criteria ‘inconsistent’, ‘distance’ or ‘monocrit’:*applying this threshold will result in flat clusters.*For ‘maxclust’ or ‘maxclust_monocrit’ criteria:*the maximum number of clusters requested would be this.**criterion(string):**The criterion to be applied while creating flat clusters. Any of the following values may be used as this :`inconsistent`

,`distance`

,`maxclust`

,`monocrit`

and`maxclust_monocrit`

**depth(int):**The maximum depth at which the inconsistency calculation can be made. Regarding the other criteria, it means nothing. 2 is the default.**R(ndarray):**The matrix of inconsistencies to be applied to the “inconsistent” criterion. If not given, this matrix is computed.**monocrit(ndarray):**A set of n-1 elements. The statistics used to threshold non-singleton I are called monocrit[i]. The monocrit vector must be monotonic, meaning that given a node c with index I monocrit[i] >= monocrit[j] for all node indices j corresponding to nodes below c.

The method

returns *fcluster()*.

`fclusters`

(T[i] is the flat cluster number to which the original observation i belongs. It is an array of length n)Let’s take an example by following the below steps:

Import the required libraries or methods using the below python code.

```
from scipy.cluster import hierarchy
from scipy.spatial import distance
```

The output of any cluster linkage method, such as ** scipy.cluster.hierarchy.ward**, is a linkage matrix Z. Create an x array of data which is the start and end points of the USA cities using the below code.

```
X_ = [[0, 0], [0, 2], [2, 0],
[0, 5], [0, 4], [2, 5],
[5, 0], [4, 0], [5, 2],
[5, 5], [4, 5], [5, 4]]
```

Condense the input data

into a matrix using the method *X_*`pdist()`

and pass this data to the clustering method

using the below code.*ward()*

```
Z_ = hierarchy.ward(distance.pdist(X_))
Z_
```

The first and second elements of the above matrix, which represents a dendrogram, are the two clusters that were combined at each step.

The third element in the matrix is the distance between the two clusters, and the fourth element is the size of the new cluster or the number of original data points that were included.

Now pass the above matrix to method fcluster using the below code.

`hierarchy.fcluster(Z_, t=0.8, criterion='distance')`

Twelve separate clusters are returned because the threshold t is too tiny to allow any two samples in the data to create a cluster. We can adjust the threshold (t) to form a cluster that we will learn in the next subsection.

Read Python Scipy Pairwise Distance

## Python Scipy Fcluster T

The dendrogram can be flattened using

which assigns the original data points to single clusters. This assignment is largely determined by a distance threshold (t), which is the maximum inter-cluster distance permitted.*scipy.cluster.hierarchy.fcluster*

Through this section, we are continuing the same example that we have used in the above subsection **“Python Scipy Fcluster”**.

Run the below code after the above subsection codes to know how the threshold (t) works.

`hierarchy.fcluster(Z_, t=0.6, criterion='distance')`

Run the same code with t=1.0 using the below code.

`hierarchy.fcluster(Z_, t=1.0, criterion='distance')`

Then t=3.1

`hierarchy.fcluster(Z_, t=3.1, criterion='distance')`

At last t=10

`hierarchy.fcluster(Z_, t=10, criterion='distance')`

- 12 separate clusters are returned in the first scenario because the threshold t is too low to allow any two samples in the data to create a cluster.
- In the second scenario, the threshold is high enough to permit the fusion of the points with those that are closest to them. Thus, only 9 clusters are returned in this case.
- Up to 8 data points may be connected in the third scenario, which has a significantly higher threshold; as a result, 4 clusters are returned in this situation.
- Finally, the fourth case’s threshold is high enough to permit the fusion of all data points, resulting in the return of a single cluster.

This is how to use the threshold (t) to form the cluster.

Read Working with Python Scipy Linalg Svd

## Python Scipy Cluster Maxclust

The method

accepts a parameter *fcluster()*that is applied while creating flat clusters. It can be any of the following values.

`criterion`

Any leaf descendants of a cluster node that have an inconsistent value less than or equal to t are considered to be members of the same flat cluster. Every node is given its own cluster if no non-singleton cluster fulfils this requirement.*inconsistent:*Creates flat clusters with a maximum cophenetic distance of t between the initial observations in each flat cluster.*distance:*Finds a minimum threshold r below which no more than t flat clusters can form and the cophenetic distance between any two original observations in a single flat cluster cannot exceed r.*maxclust:*Where monocrit[j] = t, creates a flat cluster from a cluster node c with index i.*monocrit:*When monocrit[i] = r for all cluster indices I below and including c, forms a flat cluster from a non-singleton cluster node c. R is reduced so that t or fewer flat clusters can form. There must be monotony in monocrit.*maxclust_monocrit:*

Remember from the second subsection of this tutorial that the parameter

for *t***‘maxclust’ or ‘maxclust_monocrit’ criteria** would be a maximum number of clusters requested.

Here we will directly use the same code that we have used in the above subsection **“Python Scipy Fcluster”**.

Suppose we need to form 5 clusters then the value of ** t** will be

**and**

*5***equal to**

*criterion***as shown in the below code.**

*maxclust*`hierarchy.fcluster(Z_, t=5, criterion='maxclust')`

From the above output, we got the five clusters such as ** first_cluster = [2, 2], second_cluster = [3], third_cluster = [5, 5, 5], fourth_cluster = [1, 1, 1]** and the

**.**

*fifth_cluster = [4, 4, 4]*This is how to use the value

for the criterion with a parameter *maxclust*

to get the number of required clusters.*t*

## Python Scipy Cluster Inconsistent

We already know from the above subsection that the method

accepts a parameter *fcluster()*that is applied while creating flat clusters. This criterion accepts a value

`criterion`

*inconsistent*

. Inconsistent means If a cluster node’s inconsistent value is less than or equal to t, then all of the node’s leaf descendants are members of the same flat cluster. When no non-singleton cluster satisfies this requirement, each node is given its own cluster.

Let’s see an example by following the below steps.

Import the required libraries or methods using the below python code.

```
from scipy.cluster import hierarchy
from scipy.spatial import distance
```

Create an x array of data which is the start and end distance points of the USA States such as ** Alabama (0,0 to 0,2)**,

**,**

*California (0,2 to 2,0)***,**

*Florida (2,0 to 0,3)***,**

*Georgia (0,3 to 0,2)***and so on for**

*Hawaii (0,2 to 2, 5)***,**

*Indiana***,**

*Kentucky***,**

*Montana***,**

*Nevada***and**

*New Jersy***using the below code.**

*New York*```
X_ = [[0, 0], [0, 2], [2, 0],
[0, 3], [0, 2], [2, 5],
[3, 0], [4, 0], [5, 2],
[5, 5], [4, 5], [5, 4]]
```

`Z_ = hierarchy.ward(distance.pdist(X_))`

Now pass the above data to method

with *fcluster()*

equla to *criterion*

using the below code.*inconsistent*

`hierarchy.fcluster(Z_, t= 0.9, criterion='inconsistent')`

Read Python Scipy Ndimage Imread Tutorial

## Python Scipy Fcluster Data

The method

in a module *fclusterdata()*

of Python Scipy used a certain metric, group observational data. *scipy.cluster.hierarchy*

X, which contains n observations in m dimensions, performs hierarchical clustering using the single linkage algorithm, flat clustering using the inconsistency method with t as the cut-off threshold, and clustering of the original observations using the single linkage algorithm.

The syntax is given below.

`scipy.cluster.hierarchy.fclusterdata(X, t, criterion='inconsistent', metric='euclidean', depth=2, method='single', R=None)`

Where parameters are:

**X(ndarray (N, M):**With N observations in M dimensions, the data matrix is N by M.**t(scalar):***For criteria ‘inconsistent’, ‘distance’ or ‘monocrit’:*applying this threshold will result in flat clusters.*For ‘maxclust’ or ‘maxclust_monocrit’ criteria:*the maximum number of clusters requested would be this.**criterion(string):**The criterion to be applied while creating flat clusters. Any of the following values may be used as this :`inconsistent`

,`distance`

,`maxclust`

,`monocrit`

and`maxclust_monocrit`

.**metric(string):**The metric of distance used to compute pairwise distances.**depth(int):**The maximum depth at which the inconsistency calculation can be made. Regarding the other criteria, it means nothing. 2 is the default.- method(string): The recommended linkage method (complete, single, average, weighted, ward, median centroid).
**R(ndarray):**The matrix of inconsistencies to be applied to the “inconsistent” criterion. If not given, this matrix is computed.

The method

returns *fclusterdata()*

(T[i] is the flat cluster number to which the original observation i belongs. It is a vector of length n).*fclusterdata*

Let’s see an example with the same data that we have created in the above subsection **“Python Scipy Cluster Inconsistent”** by following the below steps.

Import the required libraries or methods using the below python code.

`from scipy.cluster import hierarchy`

Create an x array of data which is the start and end distance points of the USA States such as ** Alabama (0,0 to 0,2)**,

**,**

*California (0,2 to 2,0)***,**

*Florida (2,0 to 0,3)***,**

*Georgia (0,3 to 0,2)***and so on for**

*Hawaii (0,2 to 2, 5)***,**

*Indiana***,**

*Kentucky***,**

*Montana***,**

*Nevada***and**

*New Jersy***using the below code.**

*New York*```
X_ = [[0, 0], [0, 2], [2, 0],
[0, 3], [0, 2], [2, 5],
[3, 0], [4, 0], [5, 2],
[5, 5], [4, 5], [5, 4]]
```

Use “scipy.cluster.hierarchy.fcluster,” to find flat clusters with a user-specified distance threshold t = 1.0.

`hierarchy.fclusterdata(X_, t=1.0)`

In the above output, four clusters are the result for dataset X_, distance threshold t = 1.0.

All the steps in a typical SciPy hierarchical clustering workflow are abstracted by the convenience method “fclusterdata()” that we have performed in the subsection **“Python Scipy Fcluster”** such as the following steps:

- Using scipy.spatial.distance.pdist, create a condensed matrix from the provided data.
- Use a clustering approach like

.*ward()* - Using
, find flat clusters with a user-defined distance threshold t.*scipy.cluster.hierarchy.fcluster*

All the above three steps can be done using the method

.*fclusterdata()*

We have learned about how to cluster similar data points using “Python Scipy Fcluster”, and get the required number of clusters using the criterion value

. Also, we have covered the following topics.*maxclust*

- What is clustering?
- How to create the cluster in Python Scipy
- Python Scipy Cluster T
- How to get the required cluster using the Maxclust
- Python Scipy Cluster Inconsistent
- Python Scipy Fcluster Data

You may like the following Python Scipy tutorials:

- Python Scipy Softmax
- How to use Python Scipy Differential Evolution
- How to use Python Scipy Linprog
- Python Lil_Matrix Scipy
- How to use Python Scipy Gaussian_Kde

I am Bijay Kumar, a Microsoft MVP in SharePoint. Apart from SharePoint, I started working on Python, Machine learning, and artificial intelligence for the last 5 years. During this time I got expertise in various Python libraries also like Tkinter, Pandas, NumPy, Turtle, Django, Matplotlib, Tensorflow, Scipy, Scikit-Learn, etc… for various clients in the United States, Canada, the United Kingdom, Australia, New Zealand, etc. Check out my profile.