Posts

Showing posts from November, 2021

HOPKINS STATISTICS IN CLUSTERING

Image
 CLUSTER ASSESSING USING HOPKINS STATISTICS Clustering in simple terms mean a collections or a group. Similarly, grouping of data can happen in data science to get effective solutions for different groups. Say for example, we take the field of marketing in a fashion store. Decisions on clothing designs need to taken for different groups like kids, teens, women, men etc., Here, decision cannot be same for all the groups. To have better marketing, the strategies used should be different for each group. In such scenarios, we take the help of clustering.  Clustering is a data analysis tool applied across data into similar group of items. There can be two aims of clustering - realistic or constructive. The aim of realistic clustering is to cluster the data to uncover the real groupings that can happen in the data whereas the aim of constructive clustering is to cluster the data no matter if the real grouping is inherent in the data or no. Example - The above marketing example is a ...

Enemies of Data Scientists ! Part - 1

Image
Hello peeps ! Welcome to my new blog ! Ever wondered if data scientists have some enemies?  Well yes !! They do have enemies (not meaning enemy by person). In fact, when a data is given to a data scientists or analyst, the first thing one should go through is look for their enemies. So what enemies am I exactly talking about. Keep reading this blog. When a data is given, there might be two major problems in it (which I am calling it as enemies) - 1. Missing Values 2. Outliers One cannot move to analyze the data without treating these enemies. But what are they exactly? How do we treat them ? I answer all these questions in this blog. Missing Values      The meaning of this lies within the name itself. When a data is missing in a particular variable, we term it as missing values. Generally missing values in a data might be a blank space, NA or null.  How to find missing values in python? Figure: Data with missing values In the above figure, it is a very small d...