Please refer to Jupyter NBExtension Readme page to display
the table of contents in floating window and expand the body of the contents.

assignment2

Executive Summary

BLE RSSI (Received Signal Strength Indicator) for Indoor localization Data Set is chosen for this assignment. The goal of our project is to identify the signal measurement patterns from all the observations in the dataset, and investigate whether clustering could help us to group all these patterns into organized structure. We have performed the following steps for data preparation.

Data Preparation

1. Import Packages

Import all the necessary packages, numpy, pandas, matplotlib, sklearn, seaborn, math etc.

In [1]:
import pandas as pd
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn import cluster
from IPython.display import display, HTML
from sklearn.cluster import KMeans
import math
import warnings
from sklearn import metrics
warnings.filterwarnings("ignore")
pd.set_option('display.max_columns', None) 
pd.set_option('display.max_rows', None)

2. Read in Data File

Read iniBeacon_RSSI_Labeled.csv which is located in the same directory as this jupyter notebook. Retain all the column names in the csv file.

In [2]:
df_rssi = pd.read_csv("iBeacon_RSSI_Labeled.csv", parse_dates=['date'], index_col=False)

3. Check Dimensions and Data Types

Check the size and data types of the loaded-in dataset

In [3]:
print("Shape of df_rssi:")
print(df_rssi.shape)
print("Datatypes of df_rssi:")
print(df_rssi.dtypes)
Shape of df_rssi:
(1420, 15)
Datatypes of df_rssi:
location            object
date        datetime64[ns]
b3001                int64
b3002                int64
b3003                int64
b3004                int64
b3005                int64
b3006                int64
b3007                int64
b3008                int64
b3009                int64
b3010                int64
b3011                int64
b3012                int64
b3013                int64
dtype: object

4. Check missing values

Check if there is any missing values in the dataset.

In [4]:
df_rssi.isna().sum()
Out[4]:
location    0
date        0
b3001       0
b3002       0
b3003       0
b3004       0
b3005       0
b3006       0
b3007       0
b3008       0
b3009       0
b3010       0
b3011       0
b3012       0
b3013       0
dtype: int64

==> No Missing value is found from the dataset.

5. Show Content

Read in the first 5 rows of the datasets. We noticed in each observation, the value in the column location records the respective location grid (refers to the iBeacon_layout.jpg) for the RSSI readings that come up from the particular iBeacon(s) in the associated column(s).

In [5]:
df_rssi.head()
Out[5]:
location date b3001 b3002 b3003 b3004 b3005 b3006 b3007 b3008 b3009 b3010 b3011 b3012 b3013
0 O02 2016-10-18 11:15:21 -200 -200 -200 -200 -200 -78 -200 -200 -200 -200 -200 -200 -200
1 P01 2016-10-18 11:15:19 -200 -200 -200 -200 -200 -78 -200 -200 -200 -200 -200 -200 -200
2 P01 2016-10-18 11:15:17 -200 -200 -200 -200 -200 -77 -200 -200 -200 -200 -200 -200 -200
3 P01 2016-10-18 11:15:15 -200 -200 -200 -200 -200 -77 -200 -200 -200 -200 -200 -200 -200
4 P01 2016-10-18 11:15:13 -200 -200 -200 -200 -200 -77 -200 -200 -200 -200 -200 -200 -200

6. Set variables

n_readings

Set the number of rows in the dataset as number of RSSI readings n_readings

In [6]:
n_readings=df_rssi.shape[0]
n_readings
Out[6]:
1420

beacons_col and n_beacons

Set all the columns with prefix 'b3' as beacons_col and number of becaons as n_beacons

In [7]:
beacons_col = df_rssi.columns[df_rssi.columns.str.find('b3', 0, 2)!=-1].tolist()
n_beacons=len(beacons_col)
n_beacons
Out[7]:
13

7. Apply test suite

Run a small test-suite to test all the value in the location column are in the defined location grid 'A01' to 'W18'

In [8]:
grid_col1 = list(map(str, range(1, 10)))
grid_col1 =['0' + sub for sub in grid_col1]
grid_col2 = list(map(str, range(10, 19)))
grid_col = grid_col1 + grid_col2
In [9]:
grid_all = list()
for c in 'ABCDEFGHIJKLMNOPQRSTUVW':
    grid = [c + sub for sub in grid_col]
    grid_all.extend(grid)
In [10]:
checkAllTrue=True
for i in np.arange(len(df_rssi['location'])):
        check = (df_rssi['location'][i] in grid_all)
        if (check == False):
            print("We found values not in grid labels", i, "with value", df_rssi['location'][i])
            checkAllTrue=False
if(checkAllTrue == True):
    print("All locations are within grid labels")
All locations are within grid labels

8. Handle Out-of-range readings

For out-of-range readings, RSSI is indicated by -200. (Refer to https://www.speedcheck.org/wiki/rssi/) RSSI is measured in decibels from 0 (zero) to -120 (minus 120). Define a function checkRssi() to test all the value in the beacon columns which are not equal to -200, nor not between 0 to -120. Run checkRssi().

In [11]:
def checkRssi():
    checkAllTrue=True
    for col in beacons_col:
        for i in np.arange(len(df_rssi[col])):
            check= ((df_rssi[col][i] == -200) | ((df_rssi[col][i] >= -120 ) & (df_rssi[col][i] <0 )))   
            if (check == False):
                print("We found values not in RSSI ranges at row", i, "for", col, " where values is:", df_rssi[col][i])
                checkAllTrue=False
    if (checkAllTrue == True):
           print("We found all values in RSSI are valid")
In [12]:
checkRssi()
We found values not in RSSI ranges at row 203 for b3002  where values is: -198
We found values not in RSSI ranges at row 356 for b3002  where values is: -198
We found values not in RSSI ranges at row 357 for b3002  where values is: -198
We found values not in RSSI ranges at row 411 for b3002  where values is: -198
We found values not in RSSI ranges at row 423 for b3002  where values is: -198
We found values not in RSSI ranges at row 424 for b3002  where values is: -198
We found values not in RSSI ranges at row 513 for b3002  where values is: -198
We found values not in RSSI ranges at row 514 for b3002  where values is: -198
We found values not in RSSI ranges at row 515 for b3002  where values is: -198
We found values not in RSSI ranges at row 518 for b3002  where values is: -198
We found values not in RSSI ranges at row 519 for b3002  where values is: -198
We found values not in RSSI ranges at row 401 for b3012  where values is: -199
We found values not in RSSI ranges at row 402 for b3012  where values is: -199
We found values not in RSSI ranges at row 404 for b3012  where values is: -199
We found values not in RSSI ranges at row 405 for b3012  where values is: -199

All these values are close to -200, impute these values to -200.

In [13]:
for col in beacons_col:
    for i in np.arange(len(df_rssi[col])):
        check= ((df_rssi[col][i] == -200) | ((df_rssi[col][i] >= -120 ) & (df_rssi[col][i] < 0 )))   
        if (check == False):
            df_rssi[col][i] = -200

Run checkRssi() again

In [14]:
checkRssi()
We found all values in RSSI are valid

Data Exploration

Explore each column in the dataset. As we don't need to consider the time-stamp of each observation to identify the locations/coverage. We only need to inspect the beacons and location columns.

1. Readings of each iBeacon

Check the summary statistics of the readings from each iBeacon

In [15]:
df_rssi.describe(include = np.number).round(2)
Out[15]:
b3001 b3002 b3003 b3004 b3005 b3006 b3007 b3008 b3009 b3010 b3011 b3012 b3013
count 1420.00 1420.00 1420.00 1420.00 1420.00 1420.00 1420.00 1420.00 1420.00 1420.00 1420.00 1420.00 1420.00
mean -197.83 -156.64 -175.53 -164.53 -178.38 -175.06 -195.64 -191.97 -197.15 -197.44 -197.75 -197.24 -196.07
std 16.26 60.23 49.45 56.52 47.18 49.60 22.88 30.73 19.16 17.74 16.85 18.54 22.05
min -200.00 -200.00 -200.00 -200.00 -200.00 -200.00 -200.00 -200.00 -200.00 -200.00 -200.00 -200.00 -200.00
25% -200.00 -200.00 -200.00 -200.00 -200.00 -200.00 -200.00 -200.00 -200.00 -200.00 -200.00 -200.00 -200.00
50% -200.00 -200.00 -200.00 -200.00 -200.00 -200.00 -200.00 -200.00 -200.00 -200.00 -200.00 -200.00 -200.00
75% -200.00 -78.00 -200.00 -80.00 -200.00 -200.00 -200.00 -200.00 -200.00 -200.00 -200.00 -200.00 -200.00
max -67.00 -59.00 -56.00 -56.00 -60.00 -62.00 -58.00 -56.00 -55.00 -61.00 -59.00 -60.00 -59.00

As we know that -200 indicates the iBeacons is out-of-range (i.e. no readings) for the location in each observation. We should take away all the -200 values in each iBeacon column to obtain valid readings. After eliminating -200 in each iBeacon, we can inspect the summary statistics again and plot the valid readings range of each iBeacon in the same graph

In [16]:
valid_reading_all_beacon = pd.DataFrame()
for col in beacons_col:
    valid_reading_each_beacon = pd.DataFrame(df_rssi.loc[df_rssi[col]!=-200,col].reset_index(drop=True))
    print(valid_reading_each_beacon.describe(include = np.number))
    print('=========================================================')
    valid_reading_each_beacon['beacon']=col
    valid_reading_each_beacon.columns=['Readings', 'Beacon']
    valid_reading_each_beacon = valid_reading_each_beacon.reindex(columns=['Beacon', 'Readings'])
    valid_reading_all_beacon = valid_reading_all_beacon.append(valid_reading_each_beacon, ignore_index=True, sort=False)
           b3001
count  25.000000
mean  -76.480000
std     4.134408
min   -81.000000
25%   -80.000000
50%   -78.000000
75%   -74.000000
max   -67.000000
=========================================================
            b3002
count  486.000000
mean   -73.308642
std      5.844321
min    -87.000000
25%    -78.000000
50%    -74.000000
75%    -69.000000
max    -59.000000
=========================================================
            b3003
count  280.000000
mean   -75.917857
std      5.794297
min    -88.000000
25%    -80.000000
50%    -78.000000
75%    -74.000000
max    -56.000000
=========================================================
            b3004
count  402.000000
mean   -74.723881
std      5.136625
min    -88.000000
25%    -78.000000
50%    -76.000000
75%    -71.000000
max    -56.000000
=========================================================
            b3005
count  247.000000
mean   -75.696356
std      4.695711
min    -83.000000
25%    -79.000000
50%    -77.000000
75%    -73.000000
max    -60.000000
=========================================================
            b3006
count  287.000000
mean   -76.620209
std      4.019012
min    -87.000000
25%    -79.000000
50%    -77.000000
75%    -75.000000
max    -62.000000
=========================================================
           b3007
count  50.000000
mean  -76.100000
std     6.952462
min   -85.000000
25%   -80.750000
50%   -79.000000
75%   -73.000000
max   -58.000000
=========================================================
           b3008
count  91.000000
mean  -74.703297
std     6.013863
min   -83.000000
25%   -79.000000
50%   -77.000000
75%   -71.500000
max   -56.000000
=========================================================
           b3009
count  31.000000
mean  -69.225806
std     8.849519
min   -82.000000
25%   -77.000000
50%   -72.000000
75%   -59.500000
max   -55.000000
=========================================================
           b3010
count  29.000000
mean  -74.758621
std     6.168209
min   -81.000000
25%   -79.000000
50%   -78.000000
75%   -72.000000
max   -61.000000
=========================================================
           b3011
count  25.000000
mean  -72.120000
std     7.562627
min   -85.000000
25%   -79.000000
50%   -72.000000
75%   -67.000000
max   -59.000000
=========================================================
           b3012
count  31.000000
mean  -73.419355
std     8.106681
min   -82.000000
25%   -81.000000
50%   -77.000000
75%   -66.500000
max   -60.000000
=========================================================
           b3013
count  44.000000
mean  -73.022727
std     7.963547
min   -87.000000
25%   -79.500000
50%   -75.000000
75%   -65.750000
max   -59.000000
=========================================================
In [17]:
plt.figure(figsize=(16,10))
g=sns.boxplot(x="Beacon", y="Readings", data=valid_reading_all_beacon)
g.set_title('BoxPlot of Readings for all Beacons', fontsize = 25)
plt.xticks(rotation=45)
plt.savefig('1. BeaconReadings.png', dpi=300, bbox_inches='tight')

2. Count of unique readings in each location

Let's look at the unique values in the location column from the dataset.

In [18]:
loc_df=np.unique(df_rssi['location'], return_counts = True)
loc_df=pd.DataFrame(loc_df).T
loc_df.columns=["location", "count"]
loc_df
Out[18]:
location count
0 D13 6
1 D14 4
2 D15 14
3 E15 4
4 F08 4
5 G15 4
6 I01 18
7 I02 21
8 I03 19
9 I04 18
10 I05 19
11 I06 27
12 I07 27
13 I08 26
14 I09 8
15 I10 14
16 I15 5
17 J01 16
18 J02 22
19 J03 24
20 J04 32
21 J05 19
22 J06 29
23 J07 27
24 J08 9
25 J10 6
26 J15 8
27 K01 6
28 K02 9
29 K03 23
30 K04 34
31 K05 25
32 K06 22
33 K07 11
34 K08 12
35 L01 6
36 L02 10
37 L03 13
38 L04 20
39 L05 14
40 L06 22
41 L08 3
42 L09 2
43 L15 10
44 M01 10
45 M02 14
46 M03 12
47 M04 19
48 M05 10
49 M06 20
50 N01 12
51 N02 12
52 N03 14
53 N04 10
54 N05 12
55 N06 14
56 N15 12
57 O01 2
58 O02 8
59 O03 13
60 O04 24
61 O05 24
62 O06 15
63 P01 13
64 P02 6
65 P03 12
66 P04 13
67 P05 12
68 P06 8
69 P15 7
70 Q01 6
71 Q02 4
72 Q03 18
73 Q04 18
74 Q05 24
75 Q06 4
76 R01 14
77 R02 18
78 R03 15
79 R04 10
80 R05 16
81 R06 6
82 R15 12
83 S01 23
84 S02 21
85 S03 17
86 S04 18
87 S05 20
88 S06 20
89 S07 10
90 S08 4
91 S15 3
92 T01 4
93 T03 6
94 T04 12
95 T05 10
96 T15 7
97 U01 8
98 U02 10
99 U03 14
100 U04 10
101 U05 8
102 U15 5
103 V15 8
104 W15 17

Plot the count of non -200 RSSI readings at each location

In [19]:
plt.figure(figsize=(25, 10))
plt.xticks(rotation='vertical')
sns.barplot(x='location', y='count', data=loc_df)
plt.title('Count of non "-200" RSSI readings at each location', fontsize = 25)
plt.savefig('2. non -200.png', dpi=300, bbox_inches='tight')

Set number of unique location as n_locations

In [20]:
n_locations = len(loc_df)
n_locations
Out[20]:
105

After having a basic understanding on the dataset, I have set the goal of the project - To identify the readings coverage of each iBeacon, Thus from the above findings, we have completed the checkings on the read-in dataset for the goal of our project. We can now move on to pairwise comparison.

Pairwise Comparison

1. Location vs iBeacon

Explore the relationship between pairs of attributes. Let's see which iBeacon would detect reading for the location in the dataset.

(a) Assign a location code to each unique location and make an additional column in the master dataframe `df_rssi`, for each observation (or reading) specify the location code in `df_rssi`.
In [21]:
loc_df["location_code"]=-1

count=0
for i in range(len(loc_df)):
    loc_df["location_code"][i]=count
    count+=1
loc_dict=loc_df.set_index('location')['location_code'].to_dict()
In [22]:
df_rssi['location_code']=df_rssi['location'].replace(loc_dict)
  (b) Construct 2 matrices, `bLoc` and `bLocCnt` which has the dimension of `n_location` x `n_beacons`.  The matrices represent the crosstab relationship between each location and each beacon.  We would like to see the pattern of readings in particular location for particular beacon.  For example if we read the first row of the dataset, there is only 1 non "-200" value in beacon `b3006`(beacon index=5) and the location is O02 (with location code 58), we would set both bLoc[58,5] and bLocCnt[58,5] to 1.   If we read the second row of the dataset, there are non "-200" value in both `b3006` and `b3005` beacons and the location is also specified as O02, we would set both bLoc[58,5] and bLoc[58,4] to 1 and bLocCnt[58,5] is now incremented to 2, bLoc[58,4] is set to 1, and so forth. Thus, we read-in each row in the dataset and:
        - set 1 for cell in bLoc of the corresponding location and beacon 
        - increment the count for the cell in bLocCnt of the corresponding location and beacon
  if the reading value of the beacon is not -200 
In [23]:
bLoc = np.zeros((n_locations, n_beacons))
bLocCnt = np.zeros((n_locations, n_beacons))
for i in range(0, n_readings):
    for j in range(0, n_beacons):
        if df_rssi.iloc[i][j+2] != -200:
            bLoc[df_rssi['location_code'][i],j]=1
            bLocCnt[df_rssi['location_code'][i],j]=bLocCnt[df_rssi['location_code'][i],j]+1
(c) We can sum up bLocCnt to find out the number of non -200 readings in the entire file
In [24]:
sum(sum(bLocCnt))
Out[24]:
2028.0
(d) make a list of beacon from b3001 to b3013 as specified from the given excel file
In [25]:
beacon_colunm1 = list(map(str, range(1, 10)))
beacon_colunm1 = ['b300' + sub for sub in beacon_colunm1]
beacon_colunm2 = list(map(str, range(10, 14)))
beacon_colunm2 = ['b30' + sub for sub in beacon_colunm2]
beacon_colunm=beacon_colunm1+beacon_colunm2
(e) print a heatmap to illustrate the relationship between beacon and location.
In [26]:
y_axis_labels = loc_df['location']
x_axis_labels = beacon_colunm
plt.figure(figsize=(16, 30))
sns.heatmap(bLocCnt, cmap="Blues", annot=True, fmt='.0f', yticklabels=y_axis_labels, xticklabels=x_axis_labels, linewidths=.5, cbar_kws={"shrink": 0.5})
plt.title("Locations with Presence of Readings from iBeacon", fontsize = 25)
plt.xlabel("Beacons", fontsize = 20)
plt.ylabel("Location", fontsize = 20)
plt.savefig('3.readingsLocBeacon.png', dpi=300, bbox_inches='tight')

The above heatmap shows the co-ordinations between beacons and locations. However there is no indication of the groupings of observations for the given file. For example, a valid reading detected by the beacon combination of b3002 and b3003, can be for location I03, I05 or I06 etc. But for a valid reading detected by beacon combination of b3002, 3004 and 3006, only come up for location I03 and K05, but not I05 and not I06. We need a way to group our observations and see the relationship between beacon and location, so that we can make it as our target and see if clustering can help us to identify the target automatically.

2. Location vs combinations of iBeacons

Let's group observations and find the relationship of each observations with Beacon and location

  (a) Construct a matrix, `bReadings` which has the dimension of `n_readings` x `n_beacons`.  The matrix represent the `non -200` value in the beacon in each reading. `1` stands for `non -200` value,  `0` stands for `-200` value.  
In [27]:
bReadings = np.zeros((n_readings, n_beacons))
for i in range(0, n_readings):
    for j in range(0, n_beacons):
        if df_rssi.iloc[i][j+2] != -200:
            bReadings[i,j]=1
In [28]:
bReadings[0:15,:]
Out[28]:
array([[0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 1., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0.]])
In [29]:
bReadings.shape
Out[29]:
(1420, 13)
  (b) Make a new column `readings_group` in dataframe `df_rssi`, initialize the value as `-1` for all observations
In [30]:
df_rssi['readings_group']=-1
df_rssi.head(5)
Out[30]:
location date b3001 b3002 b3003 b3004 b3005 b3006 b3007 b3008 b3009 b3010 b3011 b3012 b3013 location_code readings_group
0 O02 2016-10-18 11:15:21 -200 -200 -200 -200 -200 -78 -200 -200 -200 -200 -200 -200 -200 58 -1
1 P01 2016-10-18 11:15:19 -200 -200 -200 -200 -200 -78 -200 -200 -200 -200 -200 -200 -200 63 -1
2 P01 2016-10-18 11:15:17 -200 -200 -200 -200 -200 -77 -200 -200 -200 -200 -200 -200 -200 63 -1
3 P01 2016-10-18 11:15:15 -200 -200 -200 -200 -200 -77 -200 -200 -200 -200 -200 -200 -200 63 -1
4 P01 2016-10-18 11:15:13 -200 -200 -200 -200 -200 -77 -200 -200 -200 -200 -200 -200 -200 63 -1
  (c) Use a for loop to read all the observations, check if there is any same becaon combinations of non -200 readings  in previous row, if so, classify them into same readings group.  Shorten the nympy array `bReadings` with no duplicate beacon combinations, and name it as `bReadingsGroup`. 
In [31]:
group_cnt= df_rssi.loc[0, "readings_group"]=0
arr=[bReadings[0]]

for i in range(1, n_readings):
    last_row = i-1
    for j in range(0, i):
        if np.array_equal(bReadings[i],bReadings[j]):
            df_rssi.loc[i, "readings_group"] = df_rssi.loc[j, "readings_group"]
            break
        if j == last_row:
            group_cnt+=1
            df_rssi.loc[i, "readings_group"] =group_cnt
            arr.append(bReadings[i])

bReadingsGroup=np.array(arr)
  (d) Construct a matrix, `bReadingsGroupCnt`, which is the same deminsion as `bReadingsGroup`, use it to count the frequency of occurrences for each beacon combinations on all obvervations.  Also, make a list `locationReadingList` which will record the location came up in dataframe `df_rssi` for each `bReadingsGroup`.     
In [32]:
bReadingsGroupCnt=np.zeros((len(bReadingsGroup), n_beacons))

locationReadingList = [list() for i in range(len(bReadingsGroup))]

for i in range(0, n_readings):
    for j in range(0, len(bReadingsGroup)):
        if np.array_equal(bReadings[i],bReadingsGroup[j]):
            bReadingsGroupCnt[j]=bReadingsGroupCnt[j]+bReadings[i]
            if df_rssi['location'][i] not in locationReadingList[j]:
                locationReadingList[j].append(df_rssi['location'][i])
    (e) Construct a list `beaconReadingList` which will record the beacon combination for each `bReadingsGroup`.     
In [33]:
beaconReadingList = [ list() for i in range(len(bReadingsGroup))]

for i in range(0, len(bReadingsGroup)):
    bArray=np.argwhere(bReadingsGroup[i]!=0).reshape(-1)
    for j in range(0, len(bArray)):
        beacon = 'b300' + str(bArray[j] + 1)
        beaconReadingList[i].append(beacon)
    (f) Show the length of `bReadingsGroup`
In [34]:
len(bReadingsGroup)
Out[34]:
63
    (g) Sum up bReadingsGroupCnt, it also shows the number of non -200 readings in the entire file
In [35]:
sum(sum(bReadingsGroupCnt))
Out[35]:
2028.0
    (h) Define a function listToString to combine list as a string
In [36]:
# Python program to convert a list 
# to string using join() function 
    
# Function to convert   
def listToString(s):  
    
    # initialize an empty string 
    str1 = " " 
    
    # return string   
    return (str1.join(s)) 
    (i) Construct a dataframe `readingList_df` which would show the bReadingsGroup, combination of beacons and occurrences of locations side by side
In [37]:
count = 0
readingList_df = pd.DataFrame(columns=["readings_group", "beacon_list", "location_list"])
for i in range(0, len(bReadingsGroup)):
    beaconStr=listToString(beaconReadingList[i])
    locationStr=listToString(locationReadingList[i])
    readingList_df.loc[len(readingList_df)] = [count, beaconStr, locationStr]
    count+=1

pd.options.display.max_colwidth = None 
display(HTML('<b>Table 1: Readings group by Combination of beacon Readings occurences of locations </b>'))
readingList_df
Table 1: Readings group by Combination of beacon Readings occurences of locations
Out[37]:
readings_group beacon_list location_list
0 0 b3006 O02 P01 L05 M05 N05 P05 Q05 Q06 P06 O06 N06 M06 L06 K06 J01 K01 L02 M02 N01 N02 O01 I08 K08 K07
1 1 b3003 P01 P02 K03 P03 Q04 P04 O04 M04 L04 L01 M01 M02 N01 Q02
2 2 b3003 b3006 P01 K03 M03 N03 O03 L01 M01 M02 N01 N02 L04 P04
3 3 b3004 R01 R02 S01 S02 T01 U01 U02 O03 P03 Q03 R03 S03 T03 U03 U04 T04 S04 R04 O05 P05 Q05 R05 S05 T05 U05 S06 S07 S08 L08 Q02 Q01
4 4 b3002 U02 J03 K03 L03 N04 M04 L04 K04 J04 I04 I05 J05 K05 L05 K06 J06 J02 I06 I01 I02 I03 J01 K02 J08 L02 P02
5 5 b3003 b3004 U01 O03 P03 Q03 S04 R04 Q04 S03 Q01
6 6 b3002 b3003 J03 K03 L03 Q04 P04 O04 N04 M04 L04 K04 J04 J02 L01 K05 I02 I04 I03
7 7 b3002 b3003 b3005 L03
8 8 b3003 b3005 b3006 M03 N03 O02
9 9 b3003 b3004 b3006 N03 L04
10 10 b3002 b3004 Q04 L05 Q05
11 11 b3003 b3007 Q04
12 12 b3002 b3003 b3007 Q04
13 13 b3002 b3003 b3006 O04 N04 K04 N01 L05 P04
14 14 b3002 b3006 L04 K05 I04 I02 J01 L02
15 15 b3001 I04 F08
16 16 b3002 b3008 J05 J06 I09 I07
17 17 b3002 b3005 J05 M05 K06 J06 I06 J04 I07 I03 I04 I05 I08 J07 L06
18 18 b3002 b3004 b3006 K05 I03
19 19 b3005 M05 M06 L06 K06 J07 I07 J08 I02 O02 I03 I08 J06 K05 K07 K08 L08 L05 N06
20 20 b3004 b3006 M05 O05 P05 P06 O06
21 21 b3004 b3005 N05
22 22 b3004 b3007 R05 S05 S06 R06 S07 S08 S03 R03
23 23 b3007 R05 T05 S06 R06 S07 S08
24 24 b3005 b3006 M06 L06 K06 J07 K07 K08
25 25 b3002 b3005 b3006 M06 K06
26 26 b3002 b3005 b3008 I06 I07 I08 J08 J06 J07
27 27 b3005 b3008 J07 I07 I10 J10 J08 I09 L09 K08
28 28 b3008 b30010 I10
29 29 b3008 I10 J10 I08 J08
30 30 b3009 D15 E15 D14 D13
31 31 b30010 G15 I15 J15
32 32 b30010 b30011 J15 N15
33 33 b30011 L15 N15
34 34 b30011 b30012 L15 N15
35 35 b30010 b30012 b30013 L15
36 36 b3008 b30012 R15
37 37 b30012 R15 P15
38 38 b30012 b30013 R15 T15
39 39 b30013 T15 W15 S15 U15 V15
40 40 b3005 b3007 I07
41 41 b3001 b3008 I08 L09
42 42 b3001 b3005 b3008 I08 I07 I09
43 43 b3001 b3002 I07
44 44 b3001 b3005 b3006 b3008 I08
45 45 b3005 b3006 b3008 I09 K07
46 46 b3006 b3008 b30010 I10
47 47 b3003 b3005 J07
48 48 b3009 b30010 I15 J15
49 49 b30011 b30013 U15
50 50 b3006 b3007 S07
51 51 b3004 b3006 b3007 S07
52 52 b3006 b3008 L08
53 53 b3002 b3004 b3005 b3006 L06
54 54 b3003 b3004 b3005 b3006 L06
55 55 b3002 b3003 b3005 b3006 L05
56 56 b3002 b3003 b3004 L04
57 57 b3004 b3005 b3006 L03
58 58 b3002 b3003 b3004 b3006 K04
59 59 b3004 b3005 b3008 I08
60 60 b3002 b3003 b3005 b3008 I05
61 61 b3002 b3008 b30010 J10
62 62 b3002 b3003 b3004 b3007 Q04
In [38]:
readingList_df.to_csv('a. readingList_df.csv', index=False)
(j) print a heatmap to illustrate the relationship between bReadingsGroupCnt and combinations of beacon.
In [39]:
import matplotlib.pyplot as plt
x_axis_labels=beacon_colunm
y_axis_labels = readingList_df['beacon_list']
plt.figure(figsize=(16, 16))
sns.heatmap(bReadingsGroupCnt, cmap="Greens", annot=True, fmt='.0f', yticklabels=y_axis_labels, xticklabels =x_axis_labels, linewidths=.5, cbar_kws={"shrink": 0.5})
plt.title("Readings distributions with Becaons", fontsize = 25)
plt.xlabel("Beacons", fontsize = 20)
plt.ylabel("Beacons Group", fontsize = 20)
plt.savefig('4.readingGroupBeacon.png', dpi=300, bbox_inches='tight')
    (k) Construct a new dataframe `df_rssi_readingList` which shows the group the count of location and readings_group. 
In [40]:
df_rssi_readingList=df_rssi.groupby(['location', 'readings_group']).size().reset_index()
df_rssi_readingList.columns=['location', 'readings_group', 'count']
df_rssi_readingList
Out[40]:
location readings_group count
0 D13 30 6
1 D14 30 4
2 D15 30 14
3 E15 30 4
4 F08 15 4
5 G15 31 4
6 I01 4 18
7 I02 4 16
8 I02 6 1
9 I02 14 2
10 I02 19 2
11 I03 4 7
12 I03 6 5
13 I03 17 3
14 I03 18 2
15 I03 19 2
16 I04 4 11
17 I04 6 1
18 I04 14 2
19 I04 15 2
20 I04 17 2
21 I05 4 14
22 I05 17 4
23 I05 60 1
24 I06 4 3
25 I06 17 16
26 I06 26 8
27 I07 16 1
28 I07 17 4
29 I07 19 8
30 I07 26 6
31 I07 27 2
32 I07 40 2
33 I07 42 2
34 I07 43 2
35 I08 0 2
36 I08 17 4
37 I08 19 2
38 I08 26 3
39 I08 29 2
40 I08 41 4
41 I08 42 6
42 I08 44 2
43 I08 59 1
44 I09 16 2
45 I09 27 2
46 I09 42 2
47 I09 45 2
48 I10 27 2
49 I10 28 4
50 I10 29 6
51 I10 46 2
52 I15 31 3
53 I15 48 2
54 J01 0 2
55 J01 4 12
56 J01 14 2
57 J02 4 13
58 J02 6 9
59 J03 4 18
60 J03 6 6
61 J04 4 24
62 J04 6 6
63 J04 17 2
64 J05 4 10
65 J05 16 2
66 J05 17 7
67 J06 4 7
68 J06 16 5
69 J06 17 13
70 J06 19 2
71 J06 26 2
72 J07 17 4
73 J07 19 16
74 J07 24 2
75 J07 26 1
76 J07 27 2
77 J07 47 2
78 J08 4 2
79 J08 19 2
80 J08 26 2
81 J08 27 2
82 J08 29 1
83 J10 27 2
84 J10 29 3
85 J10 61 1
86 J15 31 3
87 J15 32 4
88 J15 48 1
89 K01 0 6
90 K02 4 9
91 K03 1 6
92 K03 2 2
93 K03 4 11
94 K03 6 4
95 K04 4 16
96 K04 6 14
97 K04 13 3
98 K04 58 1
99 K05 4 16
100 K05 6 2
101 K05 14 3
102 K05 18 2
103 K05 19 2
104 K06 0 2
105 K06 4 3
106 K06 17 6
107 K06 19 2
108 K06 24 6
109 K06 25 3
110 K07 0 1
111 K07 19 2
112 K07 24 6
113 K07 45 2
114 K08 0 2
115 K08 19 4
116 K08 24 5
117 K08 27 1
118 L01 1 2
119 L01 2 2
120 L01 6 2
121 L02 0 8
122 L02 4 1
123 L02 14 1
124 L03 4 2
125 L03 6 8
126 L03 7 2
127 L03 57 1
128 L04 1 3
129 L04 2 1
130 L04 4 8
131 L04 6 4
132 L04 9 1
133 L04 14 2
134 L04 56 1
135 L05 0 4
136 L05 4 4
137 L05 10 2
138 L05 13 1
139 L05 19 2
140 L05 55 1
141 L06 0 2
142 L06 17 1
143 L06 19 13
144 L06 24 4
145 L06 53 1
146 L06 54 1
147 L08 3 1
148 L08 19 1
149 L08 52 1
150 L09 27 1
151 L09 41 1
152 L15 33 2
153 L15 34 6
154 L15 35 2
155 M01 1 8
156 M01 2 2
157 M02 0 6
158 M02 1 4
159 M02 2 4
160 M03 2 10
161 M03 8 2
162 M04 1 6
163 M04 4 8
164 M04 6 5
165 M05 0 4
166 M05 17 2
167 M05 19 2
168 M05 20 2
169 M06 0 6
170 M06 19 4
171 M06 24 8
172 M06 25 2
173 N01 0 4
174 N01 1 4
175 N01 2 2
176 N01 13 2
177 N02 0 2
178 N02 2 10
179 N03 2 8
180 N03 8 4
181 N03 9 2
182 N04 4 2
183 N04 6 6
184 N04 13 2
185 N05 0 10
186 N05 21 2
187 N06 0 12
188 N06 19 2
189 N15 32 3
190 N15 33 7
191 N15 34 2
192 O01 0 2
193 O02 0 5
194 O02 8 1
195 O02 19 2
196 O03 2 8
197 O03 3 2
198 O03 5 3
199 O04 1 6
200 O04 6 8
201 O04 13 10
202 O05 3 14
203 O05 20 10
204 O06 0 14
205 O06 20 1
206 P01 0 8
207 P01 1 3
208 P01 2 2
209 P02 1 5
210 P02 4 1
211 P03 1 2
212 P03 3 4
213 P03 5 6
214 P04 1 4
215 P04 2 2
216 P04 6 6
217 P04 13 1
218 P05 0 2
219 P05 3 4
220 P05 20 6
221 P06 0 4
222 P06 20 4
223 P15 37 7
224 Q01 3 2
225 Q01 5 4
226 Q02 1 1
227 Q02 3 3
228 Q03 3 14
229 Q03 5 4
230 Q04 1 4
231 Q04 5 5
232 Q04 6 2
233 Q04 10 2
234 Q04 11 2
235 Q04 12 2
236 Q04 62 1
237 Q05 0 2
238 Q05 3 20
239 Q05 10 2
240 Q06 0 4
241 R01 3 14
242 R02 3 18
243 R03 3 13
244 R03 22 2
245 R04 3 4
246 R04 5 6
247 R05 3 10
248 R05 22 4
249 R05 23 2
250 R06 22 2
251 R06 23 4
252 R15 36 2
253 R15 37 8
254 R15 38 2
255 S01 3 23
256 S02 3 21
257 S03 3 14
258 S03 5 2
259 S03 22 1
260 S04 3 13
261 S04 5 5
262 S05 3 18
263 S05 22 2
264 S06 3 8
265 S06 22 6
266 S06 23 6
267 S07 3 1
268 S07 22 2
269 S07 23 5
270 S07 50 1
271 S07 51 1
272 S08 3 1
273 S08 22 2
274 S08 23 1
275 S15 39 3
276 T01 3 4
277 T03 3 6
278 T04 3 12
279 T05 3 8
280 T05 23 2
281 T15 38 2
282 T15 39 5
283 U01 3 6
284 U01 5 2
285 U02 3 8
286 U02 4 2
287 U03 3 14
288 U04 3 10
289 U05 3 8
290 U15 39 4
291 U15 49 1
292 V15 39 8
293 W15 39 17
    (l) Map the beacon combination list to df_rssi_readingList.
In [41]:
beaconList_dict=readingList_df.set_index('readings_group')['beacon_list'].to_dict()
df_rssi_readingList['beacon_list']=df_rssi_readingList['readings_group'].replace(beaconList_dict)
df_rssi_readingList = df_rssi_readingList.reindex(columns=['location', 'readings_group', 'beacon_list', 'count']).reset_index(drop=True)
df_rssi_readingList.head(15)
Out[41]:
location readings_group beacon_list count
0 D13 30 b3009 6
1 D14 30 b3009 4
2 D15 30 b3009 14
3 E15 30 b3009 4
4 F08 15 b3001 4
5 G15 31 b30010 4
6 I01 4 b3002 18
7 I02 4 b3002 16
8 I02 6 b3002 b3003 1
9 I02 14 b3002 b3006 2
10 I02 19 b3005 2
11 I03 4 b3002 7
12 I03 6 b3002 b3003 5
13 I03 17 b3002 b3005 3
14 I03 18 b3002 b3004 b3006 2
In [42]:
df_rssi_readingList.to_csv('b. df_rssi_readingList.csv', index=False)
    (m) Row 1 of Table 1 can be explained by `df_rssi_readingList` when readings_group == 0 and an associated bar chart.
In [43]:
df_rssi_readingList[df_rssi_readingList['readings_group']==0]
Out[43]:
location readings_group beacon_list count
35 I08 0 b3006 2
54 J01 0 b3006 2
89 K01 0 b3006 6
104 K06 0 b3006 2
110 K07 0 b3006 1
114 K08 0 b3006 2
121 L02 0 b3006 8
135 L05 0 b3006 4
141 L06 0 b3006 2
157 M02 0 b3006 6
165 M05 0 b3006 4
169 M06 0 b3006 6
173 N01 0 b3006 4
177 N02 0 b3006 2
185 N05 0 b3006 10
187 N06 0 b3006 12
192 O01 0 b3006 2
193 O02 0 b3006 5
204 O06 0 b3006 14
206 P01 0 b3006 8
218 P05 0 b3006 2
221 P06 0 b3006 4
237 Q05 0 b3006 2
240 Q06 0 b3006 4
In [44]:
plt.figure(figsize=(10, 7))
sns.barplot(x='location', y='count', hue="beacon_list", data=df_rssi_readingList[df_rssi_readingList['readings_group']==0])
plt.title("Location where b3006 can detect signal", fontsize = 20)
plt.savefig('5.b300Loc.png', dpi=300, bbox_inches='tight')
    (n) Location in the bar chart, for example K08, can be explaind by df_rssi_readingList when location == K08, and an associated bar chart.
In [45]:
df_rssi_readingList[df_rssi_readingList['location']=='K08']
Out[45]:
location readings_group beacon_list count
114 K08 0 b3006 2
115 K08 19 b3005 4
116 K08 24 b3005 b3006 5
117 K08 27 b3005 b3008 1
In [46]:
plt.title("Becaon combinations which detect \nsignal at location K08", fontsize = 18)
g=sns.barplot(x='location', y='count', hue="beacon_list", data=df_rssi_readingList[df_rssi_readingList['location']=='K08'])
g.legend(loc='center left', bbox_to_anchor=(1, 0.5), ncol=1)
plt.savefig('6. kn08.png', dpi=300, bbox_inches='tight')

Data Modeling

Read in all the RSSI readings into an array array_rssi.

In [47]:
array_rssi=df_rssi[beacons_col].values
array_rssi.shape
Out[47]:
(1420, 13)

1. DB Scan

(a)  Construct the k Distance Graph to determine Eps parameter for DB Scan. 
In [48]:
from sklearn.neighbors import NearestNeighbors
nbrs=NearestNeighbors().fit(array_rssi)
distances, indices=nbrs.kneighbors(array_rssi, 20)
kDis=distances[:,10]
kDis2=distances[:,5]
kDis3=distances[:,1]

kDis.sort()
kDis2.sort()
kDis3.sort()

kDis=kDis[range(len(kDis)-1, 0, -1)]
kDis2=kDis2[range(len(kDis2)-1, 0, -1)]
kDis3=kDis3[range(len(kDis3)-1, 0, -1)]
plt.xlabel('observations')
plt.ylabel('distance')
plt.plot(range(0, len(kDis)), kDis, color='blue', label="10th neighbours")
plt.plot(range(0, len(kDis2)), kDis2, color='orange', label="5th neighbors")
plt.plot(range(0, len(kDis3)), kDis3, color='green', label="1st neighbors")
plt.legend(loc="top right")
plt.title("Determine Eps - k Distance Graph \nfor DBScan Clustering", fontsize = 18)
plt.savefig('7. EPS.png', dpi=300, bbox_inches='tight')

The k-distance graph can help us to determine the optimal “eps” for fitting db scan clustering.
Beside eps, db_scan has another parameter, min_samples (we chose to tune the parameter eps only in this assignment), which is default to be 5 and is shown as the 5th neighbours in the k distance graph. From the yellow line, it suggests the elbow occurs at around 13 for 5th neighbours, thus we fit our original data (a matrix with n_readings x n_beacons contains RSSI measurements) into db scan with eps=13 and min_samples=5 (default value).

   (b) Perform DBScan by fitting `aray_rssi` with the optimal eps from the k-distance graph. 
In [49]:
dbs_13 = cluster.DBSCAN(eps=13)
dbs_fit_13=dbs_13.fit(array_rssi)
  (c) Find the label from the fitted model.  Check the number of clusters in the label.
In [50]:
labels_dbs_13=dbs_fit_13.labels_
In [51]:
np.unique(labels_dbs_13, return_counts = True)
Out[51]:
(array([-1,  0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15,
        16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30],
       dtype=int64),
 array([ 65, 114,  58,  53, 298, 238,  37,  89,   7,   6,  19,  12,   6,
         68,  70,  23,  21,  20,  31,   5,  10,  22,  14,  12,  28,  10,
          9,   8,  15,  37,   5,  10], dtype=int64))
(d) Construct a dataframe `X` which will compare the cluster labels with the target (which is the readings_group in `df_rssi` dataframe)
In [52]:
df_rssi['beacon_list']=df_rssi['readings_group'].replace(beaconList_dict)
In [53]:
X=pd.DataFrame(array_rssi)
X['cluster']=labels_dbs_13
X['target']= df_rssi['readings_group'].astype(str).str.zfill(2) + '_' + df_rssi['beacon_list']
X.head(10)
Out[53]:
0 1 2 3 4 5 6 7 8 9 10 11 12 cluster target
0 -200 -200 -200 -200 -200 -78 -200 -200 -200 -200 -200 -200 -200 0 00_b3006
1 -200 -200 -200 -200 -200 -78 -200 -200 -200 -200 -200 -200 -200 0 00_b3006
2 -200 -200 -200 -200 -200 -77 -200 -200 -200 -200 -200 -200 -200 0 00_b3006
3 -200 -200 -200 -200 -200 -77 -200 -200 -200 -200 -200 -200 -200 0 00_b3006
4 -200 -200 -200 -200 -200 -77 -200 -200 -200 -200 -200 -200 -200 0 00_b3006
5 -200 -200 -82 -200 -200 -200 -200 -200 -200 -200 -200 -200 -200 1 01_b3003
6 -200 -200 -80 -200 -200 -77 -200 -200 -200 -200 -200 -200 -200 2 02_b3003 b3006
7 -200 -200 -86 -200 -200 -200 -200 -200 -200 -200 -200 -200 -200 1 01_b3003
8 -200 -200 -200 -75 -200 -200 -200 -200 -200 -200 -200 -200 -200 3 03_b3004
9 -200 -200 -200 -75 -200 -200 -200 -200 -200 -200 -200 -200 -200 3 03_b3004
  (e) Use the crosstab function to construct a confusion matrix.
In [54]:
cm_dbs13=pd.crosstab(index=X["target"], columns=X["cluster"])
cm_dbs13
Out[54]:
cluster -1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
target
00_b3006 0 114 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
01_b3003 0 0 58 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
02_b3003 b3006 0 0 0 53 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
03_b3004 0 0 0 0 298 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
04_b3002 0 0 0 0 0 238 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
05_b3003 b3004 0 0 0 0 0 0 37 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
06_b3002 b3003 0 0 0 0 0 0 0 89 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
07_b3002 b3003 b3005 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
08_b3003 b3005 b3006 0 0 0 0 0 0 0 0 7 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
09_b3003 b3004 b3006 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
10_b3002 b3004 0 0 0 0 0 0 0 0 0 6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
11_b3003 b3007 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
12_b3002 b3003 b3007 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
13_b3002 b3003 b3006 0 0 0 0 0 0 0 0 0 0 19 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
14_b3002 b3006 0 0 0 0 0 0 0 0 0 0 0 12 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
15_b3001 0 0 0 0 0 0 0 0 0 0 0 0 6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
16_b3002 b3008 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 10 0 0 0 0 0 0 0 0 0 0 0
17_b3002 b3005 0 0 0 0 0 0 0 0 0 0 0 0 0 68 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
18_b3002 b3004 b3006 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
19_b3005 0 0 0 0 0 0 0 0 0 0 0 0 0 0 70 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
20_b3004 b3006 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 23 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
21_b3004 b3005 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
22_b3004 b3007 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 21 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
23_b3007 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 20 0 0 0 0 0 0 0 0 0 0 0 0 0 0
24_b3005 b3006 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 31 0 0 0 0 0 0 0 0 0 0 0 0 0
25_b3002 b3005 b3006 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 5 0 0 0 0 0 0 0 0 0 0 0 0
26_b3002 b3005 b3008 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 22 0 0 0 0 0 0 0 0 0 0
27_b3005 b3008 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 14 0 0 0 0 0 0 0 0 0
28_b3008 b30010 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
29_b3008 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 12 0 0 0 0 0 0 0 0
30_b3009 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 28 0 0 0 0 0 0 0
31_b30010 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 10 0 0 0 0 0 0
32_b30010 b30011 7 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
33_b30011 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 9 0 0 0 0 0
34_b30011 b30012 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 8 0 0 0 0
35_b30010 b30012 b30013 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
36_b3008 b30012 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
37_b30012 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 15 0 0 0
38_b30012 b30013 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
39_b30013 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 37 0 0
40_b3005 b3007 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
41_b3001 b3008 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 5 0
42_b3001 b3005 b3008 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 10
43_b3001 b3002 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
44_b3001 b3005 b3006 b3008 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
45_b3005 b3006 b3008 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
46_b3006 b3008 b30010 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
47_b3003 b3005 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
48_b3009 b30010 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
49_b30011 b30013 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
50_b3006 b3007 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
51_b3004 b3006 b3007 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
52_b3006 b3008 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
53_b3002 b3004 b3005 b3006 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
54_b3003 b3004 b3005 b3006 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
55_b3002 b3003 b3005 b3006 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
56_b3002 b3003 b3004 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
57_b3004 b3005 b3006 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
58_b3002 b3003 b3004 b3006 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
59_b3004 b3005 b3008 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
60_b3002 b3003 b3005 b3008 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
61_b3002 b3008 b30010 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
62_b3002 b3003 b3004 b3007 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
In [55]:
cm_dbs13.to_csv('c1. cm_dbs13.csv', index=True)

The above confusion matrix shows that target group 00_b3006 has been clustered to group 0 by DB scan, 01_b3003 has been clustered to group 1 by DB scan etc. However, group 07_b3002 b3003 b3005 and 09_b3003 b3004 b3006 have been both clustered to group -1, which is the outliner group in DB Scan.

  (f) Run some of the clustering metrics from sklearn to compare the target and the fitted model
In [56]:
label_target=df_rssi['readings_group'].values

print("adjusted_mutal_info :", metrics.adjusted_mutual_info_score(label_target, labels_dbs_13)) 
print("adjusted_rand_score :", metrics.adjusted_rand_score(label_target, labels_dbs_13)) 
print("normalized_mutal_info :", metrics.normalized_mutual_info_score(label_target, labels_dbs_13)) 
print("fowles_mallows_score :", metrics.fowlkes_mallows_score(label_target, labels_dbs_13)) 
print("silhouette_score :", metrics.silhouette_score(array_rssi, labels_dbs_13)) 
print("calinski_harabasz_score :", metrics.calinski_harabasz_score(array_rssi, labels_dbs_13)) 
print("davies_bouldin_score :", metrics.davies_bouldin_score(array_rssi, labels_dbs_13)) 
print("homogeneity_completeness_v_measure :", metrics.homogeneity_completeness_v_measure(label_target, labels_dbs_13))
adjusted_mutal_info : 0.9702379130248487
adjusted_rand_score : 0.9883213121708294
normalized_mutal_info : 0.9738881841539032
fowles_mallows_score : 0.9894827726622666
silhouette_score : 0.8799918151767319
calinski_harabasz_score : 513.4973243462551
davies_bouldin_score : 1.1831817624067935
homogeneity_completeness_v_measure : (0.9491053207986584, 0.9999999999999999, 0.9738881841539034)

There are many performance evaluation metrics for clustering, each one has its advantages and drawbacks. For a more all-rounded evaluation, we would be looking into the following 4 metrics:

- Adjusted mutual information score: measures the agreement of the two assignments, ignoring permutations, it is normalized against chance, the score is symmetric: swapping the argument does not change the score. 
- homogeneity: each cluster contains only members of a single class.
- completeness: all members of a given class are assigned to the same cluster.
- v_measure: harmonic means between homogeneity and completeness
 (g) We use the hill climbing method to apply a for loop to fit the model repeatedly with EPS value ranges from 2 to 50 and obtain the homogeneity_completeness_v_measure metric of each model and save it into the dataframe `dbs_metric`.
In [57]:
dbs_metric=pd.DataFrame(columns=["eps", "metric","score"])
dbs_cluster_num=pd.DataFrame(columns=["eps", "num_cluster"])
for eps_num in range(2, 50):
    dbs = cluster.DBSCAN(eps=eps_num)
    dbs_fit=dbs.fit(array_rssi)
    labels_dbs=dbs_fit.labels_
    a = metrics.adjusted_mutual_info_score(label_target, labels_dbs)
    h,c,v= metrics.homogeneity_completeness_v_measure(label_target, labels_dbs)
    dbs_metric.loc[len(dbs_metric)] = [eps_num, "homogeneity", h]
    dbs_metric.loc[len(dbs_metric)] = [eps_num, "completeness", c]
    dbs_metric.loc[len(dbs_metric)] = [eps_num, "v_measure", v]
    dbs_metric.loc[len(dbs_metric)] = [eps_num, "ami_score", a]    
    dbs_cluster_num.loc[len(dbs_cluster_num)] = [eps_num, len(np.unique(labels_dbs, return_counts = True)[0])]
  (h) Visualize the `dbs_metric`.
In [58]:
plt.figure(figsize=(10, 7))
sns.lineplot(x='eps', y='score', hue="metric", data=dbs_metric)
plt.title("Performance analysis for DBScan \nClustering with different eps", fontsize = 18)
plt.savefig('8. DBScan perform.png', dpi=300, bbox_inches='tight')
  (i) Visualize the relationship between eps and number of cluster in this dataset.
In [59]:
plt.plot(dbs_cluster_num['eps'], dbs_cluster_num['num_cluster'])
plt.title("Relationship between eps and num_cluster in DBScan")
plt.xlabel("EPS", fontsize = 15)
plt.ylabel("Number of Cluster", fontsize = 15)
plt.savefig('8a. DBScan cluster.png', dpi=300, bbox_inches='tight')
  (j) Compare the confusion matrix with an eps value with worse performance (eps=3) and better performance (eps=21)
In [60]:
dbs_metric_sorted=dbs_metric.pivot(index='eps', columns='metric', values='score').reset_index().rename_axis(None, axis=1)
dbs_metric_sorted=dbs_metric_sorted.sort_values(by=["completeness", "homogeneity", "v_measure", "ami_score"], ascending=[False, False, False, False]) 
dbs_metric_sorted.head(1)
Out[60]:
eps ami_score completeness homogeneity v_measure
19 21 0.973348 1.0 0.954395 0.976665
In [61]:
dbs_3 = cluster.DBSCAN(eps=3)
dbs_fit_3=dbs_3.fit(array_rssi)
labels_dbs_3=dbs_fit_3.labels_
X=pd.DataFrame(array_rssi)
X['cluster']=labels_dbs_3
X['target']= df_rssi['readings_group'].astype(str).str.zfill(2) + '_' + df_rssi['beacon_list']
cm_dbs3=pd.crosstab(index=X["target"], columns=X["cluster"])
cm_dbs3
Out[61]:
cluster -1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27
target
00_b3006 0 114 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
01_b3003 0 0 58 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
02_b3003 b3006 9 0 0 21 0 0 0 23 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
03_b3004 0 0 0 0 298 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
04_b3002 0 0 0 0 0 238 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
05_b3003 b3004 5 0 0 0 0 0 0 0 32 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
06_b3002 b3003 9 0 0 0 0 0 73 0 0 7 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
07_b3002 b3003 b3005 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
08_b3003 b3005 b3006 7 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
09_b3003 b3004 b3006 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
10_b3002 b3004 6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
11_b3003 b3007 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
12_b3002 b3003 b3007 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
13_b3002 b3003 b3006 19 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
14_b3002 b3006 4 0 0 0 0 0 0 0 0 0 8 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
15_b3001 6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
16_b3002 b3008 10 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
17_b3002 b3005 13 0 0 0 0 0 0 0 0 0 0 0 55 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
18_b3002 b3004 b3006 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
19_b3005 0 0 0 0 0 0 0 0 0 0 0 70 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
20_b3004 b3006 2 0 0 0 0 0 0 0 0 0 0 0 0 21 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
21_b3004 b3005 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
22_b3004 b3007 12 0 0 0 0 0 0 0 0 0 0 0 0 0 0 9 0 0 0 0 0 0 0 0 0 0 0 0 0
23_b3007 2 0 0 0 0 0 0 0 0 0 0 0 0 0 13 0 0 0 0 0 0 0 0 0 0 0 0 0 5
24_b3005 b3006 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 26 0 0 0 0 0 0 0 0 0 0 0 0
25_b3002 b3005 b3006 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
26_b3002 b3005 b3008 13 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 9 0 0 0 0 0 0 0 0 0 0 0
27_b3005 b3008 14 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
28_b3008 b30010 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
29_b3008 6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 6 0 0
30_b3009 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 10 15 0 0 0 0 0 0 0 0 0
31_b30010 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 5 0 0 0 0 0 0 0 0
32_b30010 b30011 7 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
33_b30011 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 5 0 0 0 0 0 0 0
34_b30011 b30012 8 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
35_b30010 b30012 b30013 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
36_b3008 b30012 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
37_b30012 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 6 0 0 0 0 7 0
38_b30012 b30013 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
39_b30013 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 17 15 0 0 0 0
40_b3005 b3007 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
41_b3001 b3008 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
42_b3001 b3005 b3008 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 7 0 0 0
43_b3001 b3002 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
44_b3001 b3005 b3006 b3008 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
45_b3005 b3006 b3008 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
46_b3006 b3008 b30010 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
47_b3003 b3005 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
48_b3009 b30010 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
49_b30011 b30013 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
50_b3006 b3007 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
51_b3004 b3006 b3007 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
52_b3006 b3008 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
53_b3002 b3004 b3005 b3006 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
54_b3003 b3004 b3005 b3006 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
55_b3002 b3003 b3005 b3006 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
56_b3002 b3003 b3004 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
57_b3004 b3005 b3006 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
58_b3002 b3003 b3004 b3006 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
59_b3004 b3005 b3008 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
60_b3002 b3003 b3005 b3008 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
61_b3002 b3008 b30010 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
62_b3002 b3003 b3004 b3007 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
In [62]:
dbs_21 = cluster.DBSCAN(eps=21)
dbs_fit_21=dbs_21.fit(array_rssi)
labels_dbs_21=dbs_fit_21.labels_
X=pd.DataFrame(array_rssi)
X['cluster']=labels_dbs_21
X['target']= df_rssi['readings_group'].astype(str).str.zfill(2) + '_' + df_rssi['beacon_list']
cm_dbs21=pd.crosstab(index=X["target"], columns=X["cluster"])
cm_dbs21
Out[62]:
cluster -1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
target
00_b3006 0 114 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
01_b3003 0 0 58 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
02_b3003 b3006 0 0 0 53 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
03_b3004 0 0 0 0 298 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
04_b3002 0 0 0 0 0 238 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
05_b3003 b3004 0 0 0 0 0 0 37 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
06_b3002 b3003 0 0 0 0 0 0 0 89 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
07_b3002 b3003 b3005 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
08_b3003 b3005 b3006 0 0 0 0 0 0 0 0 7 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
09_b3003 b3004 b3006 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
10_b3002 b3004 0 0 0 0 0 0 0 0 0 6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
11_b3003 b3007 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
12_b3002 b3003 b3007 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
13_b3002 b3003 b3006 0 0 0 0 0 0 0 0 0 0 19 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
14_b3002 b3006 0 0 0 0 0 0 0 0 0 0 0 12 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
15_b3001 0 0 0 0 0 0 0 0 0 0 0 0 6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
16_b3002 b3008 0 0 0 0 0 0 0 0 0 0 0 0 0 10 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
17_b3002 b3005 0 0 0 0 0 0 0 0 0 0 0 0 0 0 68 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
18_b3002 b3004 b3006 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
19_b3005 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 70 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
20_b3004 b3006 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 23 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
21_b3004 b3005 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
22_b3004 b3007 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 21 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
23_b3007 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 20 0 0 0 0 0 0 0 0 0 0 0 0 0 0
24_b3005 b3006 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 31 0 0 0 0 0 0 0 0 0 0 0 0 0
25_b3002 b3005 b3006 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 5 0 0 0 0 0 0 0 0 0 0 0 0
26_b3002 b3005 b3008 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 22 0 0 0 0 0 0 0 0 0 0 0
27_b3005 b3008 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 14 0 0 0 0 0 0 0 0 0 0
28_b3008 b30010 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
29_b3008 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 12 0 0 0 0 0 0 0 0 0
30_b3009 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 28 0 0 0 0 0 0 0 0
31_b30010 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 10 0 0 0 0 0 0 0
32_b30010 b30011 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 7 0 0 0 0 0 0
33_b30011 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 9 0 0 0 0 0
34_b30011 b30012 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 8 0 0 0 0
35_b30010 b30012 b30013 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
36_b3008 b30012 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
37_b30012 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 15 0 0 0
38_b30012 b30013 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
39_b30013 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 37 0 0
40_b3005 b3007 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
41_b3001 b3008 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 5 0
42_b3001 b3005 b3008 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 10
43_b3001 b3002 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
44_b3001 b3005 b3006 b3008 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
45_b3005 b3006 b3008 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
46_b3006 b3008 b30010 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
47_b3003 b3005 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
48_b3009 b30010 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
49_b30011 b30013 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
50_b3006 b3007 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
51_b3004 b3006 b3007 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
52_b3006 b3008 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
53_b3002 b3004 b3005 b3006 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
54_b3003 b3004 b3005 b3006 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
55_b3002 b3003 b3005 b3006 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
56_b3002 b3003 b3004 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
57_b3004 b3005 b3006 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
58_b3002 b3003 b3004 b3006 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
59_b3004 b3005 b3008 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
60_b3002 b3003 b3005 b3008 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
61_b3002 b3008 b30010 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
62_b3002 b3003 b3004 b3007 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
In [63]:
cm_dbs3.to_csv('c2. cm_dbs3.csv', index=True)
cm_dbs21.to_csv('c3. cm_dbs21.csv', index=True)

We found that for DBscan clustering on this dataset, performance boosts at eps = 5 and becomes flatten after eps = 21. (21 is the smallest eps with the best score in 4 metrics) When we compare the confusion matrix for 3 different models with different eps values (3, 13 and 21), we have the following findings:

Eps = 3: Beside the outliner group, DBScan groups all the data into 28 clusters, all these clusters belong to only one single target group. (Thus, Homogeneity should be 1, however, due to the outliner group, it still scores low in homogeneity). 58 out of 63 target groups would have some members grouped as outliners. 17 out of these 58 groups have been split by DBScan into 2 or more groups (for example, group 06_b3002 b3003, 73 members have been clustered to group 5 and 7 members to group 8, 9 members to outliners) , while 41 of them are entirely classified as outliners (for example, group 13_b3002 b3003 b3006 with 19 members have been classified entirely as outliner)

Eps = 13: DBScan groups all the data into 31 clusters, all these clusters belong only to one single target group. No target groups have been split into separate clusters and 32 groups are entirely classified as outliners.

Eps = 21: DBScan groups all the data into 32 clusters, all these clusters belong only to one single target group. No target groups have been split into separate clusters and 31 groups are entirely classified as outliners. Only difference compare to eps=13 is group 32_b30010 b30011, with 7 members have been entirely identified as an individual cluster instead of outliner. All the outliners are now with less than 5 members in the target groups. (4 target groups with 4 members, 2 target groups with 3 members and the rests are just 2 or 1 member(s)).

As it seems the cluster number hasn’t changed much with all these eps parameter, we have plotted the following graph to show the relationship between eps and number of clusters in DBSCan. We can see that the range of clusters are within 29 to 33.

2. K-Mean Clustering

  (a) Construct the `averageDistance` to centroids graph to determine optimal `n_cluster` value for K-Mean clustering.  The function is customized for this dataset as 13 beacons indicate there would be 13 dimensions for calculating the distance to the centroids.
In [64]:
def avgDistToCentroids(dataArray, k_dist):
    k_disValues = np.zeros(len(k_dist))
    
    for cur_k_ind in range(0,len(k_dist)):
        #try each k value, starting from the first one with index 0
        K_rssi = k_dist[cur_k_ind]
        km_rssi = KMeans(n_clusters=K_rssi)
        labels_rssi=km_rssi.fit(dataArray).labels_
    #    print(cur_k_ind, K_rssi, km_rssi)

        #calculate the corresponding average distance to the centriods for this k value
        sumDis = 0
        for ind in range(0,n_readings):

            q1 = dataArray[ind, 0]
            q2 = dataArray[ind, 1]
            q3 = dataArray[ind, 2]
            q4 = dataArray[ind, 3]
            q5 = dataArray[ind, 4]
            q6 = dataArray[ind, 5]
            q7 = dataArray[ind, 6]
            q8 = dataArray[ind, 7]
            q9 = dataArray[ind, 8]
            q10 = dataArray[ind, 9]
            q11 = dataArray[ind, 10]
            q12 = dataArray[ind, 11]
            q13 = dataArray[ind, 12]

            p1 = km_rssi.cluster_centers_[labels_rssi[ind], 0]
            p2 = km_rssi.cluster_centers_[labels_rssi[ind], 1]
            p3 = km_rssi.cluster_centers_[labels_rssi[ind], 2]
            p4 = km_rssi.cluster_centers_[labels_rssi[ind], 3]
            p5 = km_rssi.cluster_centers_[labels_rssi[ind], 4]
            p6 = km_rssi.cluster_centers_[labels_rssi[ind], 5]
            p7 = km_rssi.cluster_centers_[labels_rssi[ind], 6]
            p8 = km_rssi.cluster_centers_[labels_rssi[ind], 7]
            p9 = km_rssi.cluster_centers_[labels_rssi[ind], 8]
            p10 = km_rssi.cluster_centers_[labels_rssi[ind], 9]
            p11 = km_rssi.cluster_centers_[labels_rssi[ind], 10]
            p12 = km_rssi.cluster_centers_[labels_rssi[ind], 11]
            p13 = km_rssi.cluster_centers_[labels_rssi[ind], 12]

            dis = math.sqrt(math.pow(q1 - p1, 2) + math.pow(q2 - p2, 2) + math.pow(q3 - p3, 2) + math.pow(q4 - p4, 2)+ math.pow(q5 - p5, 2)+ math.pow(q6 - p6, 2)+ math.pow(q7 - p7, 2)+ math.pow(q8 - p8, 2)+ math.pow(q9 - p9, 2)+ math.pow(q10 - p10, 2)+ math.pow(q11 - p11, 2)+ math.pow(q12 - p12, 2)+ math.pow(q13 - p13, 2)) 

            sumDis = sumDis + dis

        k_disValues[cur_k_ind] = sumDis/n_readings
    return k_disValues
In [65]:
k_dist = range(20,105)
print(k_dist)

k_disValues = avgDistToCentroids(array_rssi, k_dist)
range(20, 105)
In [66]:
plt.plot(k_dist, k_disValues)
plt.xlabel('k values')
plt.ylabel('average distance to the centroids')
plt.title("Determine number of clusters - k values and \ndistance to centroids Graph for Kmean Clustering", fontsize = 18)
plt.savefig('9. avgDistCentroids.png', dpi=300, bbox_inches='tight')

From the graph, the curve is slowly decreasing and we cannot find any distinctive elbow that would intervene and flatten the trend. Thus I have chosen n_cluster=60 as an arbitrary value to fit our original data (a matrix with n_readings x n_beacons contains RSSI measurements) into kmean clustering.

(b) Perform KMeans clustering by fitting `aray_rssi` with n_clusters=60
In [67]:
km_rssi = KMeans(n_clusters=60, random_state=999)
km_fit = km_rssi.fit(array_rssi)
(c) Find the label from the fitted model.  Check the number of clusters in the label.
In [68]:
labels_kmean=km_fit.labels_
np.unique(labels_kmean, return_counts = True)
Out[68]:
(array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
        17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33,
        34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50,
        51, 52, 53, 54, 55, 56, 57, 58, 59]),
 array([238,  53, 298,  70,  15,  37, 114,  58,  89,  28,   8,  20,  31,
         68,  23,  23,   7,  12,  37,  12,  19,  15,   6,  21,  10,  10,
         10,   5,   9,   2,   7,   6,   4,   4,   5,   2,   4,   2,   3,
          4,   2,   2,   2,   2,   3,   3,   2,   2,   1,   1,   1,   1,
          1,   1,   2,   1,   1,   1,   1,   1], dtype=int64))
(d)  Construct a dataframe `X` which will compare the cluster labels with the target (which is the readings_group in `df_rssi` dataframe)
In [69]:
X=pd.DataFrame(array_rssi)
X['cluster']=labels_kmean
X['target']= df_rssi['readings_group'].astype(str).str.zfill(2) + '_' + df_rssi['beacon_list']
X.head(10)
Out[69]:
0 1 2 3 4 5 6 7 8 9 10 11 12 cluster target
0 -200 -200 -200 -200 -200 -78 -200 -200 -200 -200 -200 -200 -200 6 00_b3006
1 -200 -200 -200 -200 -200 -78 -200 -200 -200 -200 -200 -200 -200 6 00_b3006
2 -200 -200 -200 -200 -200 -77 -200 -200 -200 -200 -200 -200 -200 6 00_b3006
3 -200 -200 -200 -200 -200 -77 -200 -200 -200 -200 -200 -200 -200 6 00_b3006
4 -200 -200 -200 -200 -200 -77 -200 -200 -200 -200 -200 -200 -200 6 00_b3006
5 -200 -200 -82 -200 -200 -200 -200 -200 -200 -200 -200 -200 -200 7 01_b3003
6 -200 -200 -80 -200 -200 -77 -200 -200 -200 -200 -200 -200 -200 1 02_b3003 b3006
7 -200 -200 -86 -200 -200 -200 -200 -200 -200 -200 -200 -200 -200 7 01_b3003
8 -200 -200 -200 -75 -200 -200 -200 -200 -200 -200 -200 -200 -200 2 03_b3004
9 -200 -200 -200 -75 -200 -200 -200 -200 -200 -200 -200 -200 -200 2 03_b3004
(e)  Use the crosstab function to construct a confusion matrix.
In [70]:
cm_kmean=pd.crosstab(index=X["target"], columns=X["cluster"])
cm_kmean
Out[70]:
cluster 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59
target
00_b3006 0 0 0 0 0 0 114 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
01_b3003 0 0 0 0 0 0 0 58 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
02_b3003 b3006 0 53 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
03_b3004 0 0 298 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
04_b3002 238 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
05_b3003 b3004 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 37 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
06_b3002 b3003 0 0 0 0 0 0 0 0 89 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
07_b3002 b3003 b3005 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
08_b3003 b3005 b3006 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 7 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
09_b3003 b3004 b3006 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0
10_b3002 b3004 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
11_b3003 b3007 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
12_b3002 b3003 b3007 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
13_b3002 b3003 b3006 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 19 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
14_b3002 b3006 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 12 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
15_b3001 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
16_b3002 b3008 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 10 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
17_b3002 b3005 0 0 0 0 0 0 0 0 0 0 0 0 0 68 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
18_b3002 b3004 b3006 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
19_b3005 0 0 0 70 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
20_b3004 b3006 0 0 0 0 0 0 0 0 0 0 0 0 0 0 23 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
21_b3004 b3005 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
22_b3004 b3007 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 21 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
23_b3007 0 0 0 0 0 0 0 0 0 0 0 20 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
24_b3005 b3006 0 0 0 0 0 0 0 0 0 0 0 0 31 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
25_b3002 b3005 b3006 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
26_b3002 b3005 b3008 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 22 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
27_b3005 b3008 0 0 0 0 14 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
28_b3008 b30010 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
29_b3008 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 12 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
30_b3009 0 0 0 0 0 0 0 0 0 28 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
31_b30010 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 10 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
32_b30010 b30011 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 7 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
33_b30011 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 9 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
34_b30011 b30012 0 0 0 0 0 0 0 0 0 0 8 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
35_b30010 b30012 b30013 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
36_b3008 b30012 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
37_b30012 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 15 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
38_b30012 b30013 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
39_b30013 0 0 0 0 0 37 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
40_b3005 b3007 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0
41_b3001 b3008 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
42_b3001 b3005 b3008 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 10 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
43_b3001 b3002 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
44_b3001 b3005 b3006 b3008 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0
45_b3005 b3006 b3008 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
46_b3006 b3008 b30010 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
47_b3003 b3005 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0
48_b3009 b30010 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
49_b30011 b30013 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0
50_b3006 b3007 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0
51_b3004 b3006 b3007 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0
52_b3006 b3008 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0
53_b3002 b3004 b3005 b3006 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0
54_b3003 b3004 b3005 b3006 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
55_b3002 b3003 b3005 b3006 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0
56_b3002 b3003 b3004 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0
57_b3004 b3005 b3006 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0
58_b3002 b3003 b3004 b3006 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0
59_b3004 b3005 b3008 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
60_b3002 b3003 b3005 b3008 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
61_b3002 b3008 b30010 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0
62_b3002 b3003 b3004 b3007 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

We observed there is no outliner group in k-mean clustering. From the above table, it shows that 00_b3006 has been clustered to group 6, 00_b3003 has been clustered to group 7 by Kmeans etc.

  (f) Run some of the clustering metrics from sklearn to compare the target and the fitted model
In [71]:
print("adjusted_mutal_info :", metrics.adjusted_mutual_info_score(label_target, labels_kmean)) 
print("adjusted_rand_score :", metrics.adjusted_rand_score(label_target, labels_kmean)) 
print("normalized_mutal_info :", metrics.normalized_mutual_info_score(label_target, labels_kmean)) 
print("fowles_mallows_score :", metrics.fowlkes_mallows_score(label_target, labels_kmean)) 
print("silhouette_score :", metrics.silhouette_score(array_rssi, labels_kmean)) 
print("calinski_harabasz_score :", metrics.calinski_harabasz_score(array_rssi, labels_kmean)) 
print("davies_bouldin_score :", metrics.davies_bouldin_score(array_rssi, labels_kmean)) 
print("homogeneity_completeness_v_measure :", metrics.homogeneity_completeness_v_measure(label_target, labels_kmean))
adjusted_mutal_info : 0.9986451761078916
adjusted_rand_score : 0.9997780173318629
normalized_mutal_info : 0.9988439153787477
fowles_mallows_score : 0.9997988470741014
silhouette_score : 0.9326333754127222
calinski_harabasz_score : 6843.723843035605
davies_bouldin_score : 0.25020485861620845
homogeneity_completeness_v_measure : (0.9976905007340793, 1.0000000000000002, 0.9988439153787477)
(g)  Similar to DB scan, we are also evaluating with the same 4 metrics, we use the hill climbing method to apply a for loop to fit the model repeatedly with n_clusters ranges from 15 to 80 and obtain the homogeneity_completeness_v_measure metric of each model and save it into the dataframe `kmeans_metric`.
In [72]:
kmeans_metric=pd.DataFrame(columns=["n_cluster", "metric","score"])

for cluster_num in range(15, 80):
    km_rssi = KMeans(n_clusters=cluster_num, random_state=999)
    labels_kmean=km_rssi.fit(array_rssi).labels_
    a = metrics.adjusted_mutual_info_score(label_target, labels_kmean)
    h,c,v= metrics.homogeneity_completeness_v_measure(label_target, labels_kmean)
    kmeans_metric.loc[len(kmeans_metric)] = [cluster_num, "homogeneity", h]
    kmeans_metric.loc[len(kmeans_metric)] = [cluster_num, "completeness", c]
    kmeans_metric.loc[len(kmeans_metric)] = [cluster_num, "v_measure", v]
    kmeans_metric.loc[len(kmeans_metric)] = [cluster_num, "ami_score", a]    
  (h) Visualize the `kmeans_metric`.
In [73]:
g=sns.lineplot(x='n_cluster', y='score', hue="metric", data=kmeans_metric)
g.legend(loc='center left', bbox_to_anchor=(1, 0.5), ncol=1)
plt.title("Performance analysis for Kmean \nClustering with different n_cluster", fontsize = 18)
plt.savefig('10. KMean perform.png', dpi=300, bbox_inches='tight')
  (i) Compare the confusion matrix with n_cluster value of worse performance (n_cluster=30 or 70) and better performance (n_cluster=60)
In [74]:
kmeans_metric_sorted=kmeans_metric.pivot(index='n_cluster', columns='metric', values='score').reset_index().rename_axis(None, axis=1)
kmeans_metric_sorted=kmeans_metric_sorted.sort_values(by=["completeness", "homogeneity", "v_measure", "ami_score"], ascending=[False, False, False, False]) 
kmeans_metric_sorted.head(5)
Out[74]:
n_cluster ami_score completeness homogeneity v_measure
47 62 0.999733 1.0 0.999545 0.999773
45 60 0.998645 1.0 0.997691 0.998844
44 59 0.998452 1.0 0.997360 0.998678
43 58 0.998185 1.0 0.996906 0.998450
42 57 0.997763 1.0 0.996185 0.998089
In [75]:
km_rssi_62 = KMeans(n_clusters=62, random_state=999)
km_fit_62 = km_rssi_62.fit(array_rssi)
labels_kmean_62=km_fit_62.labels_
X=pd.DataFrame(array_rssi)
X['cluster']=labels_kmean_62
X['target']= df_rssi['readings_group'].astype(str).str.zfill(2) + '_' + df_rssi['beacon_list']
cm_kmean_62= pd.crosstab(index=X["target"], columns=X["cluster"])
cm_kmean_62
Out[75]:
cluster 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61
target
00_b3006 0 0 114 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
01_b3003 0 0 0 0 0 0 0 0 0 0 0 58 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
02_b3003 b3006 0 0 0 0 0 0 53 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
03_b3004 0 298 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
04_b3002 0 0 0 238 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
05_b3003 b3004 0 0 0 0 0 0 0 0 0 0 0 0 0 0 37 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
06_b3002 b3003 0 0 0 0 89 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
07_b3002 b3003 b3005 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
08_b3003 b3005 b3006 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 7 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
09_b3003 b3004 b3006 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
10_b3002 b3004 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
11_b3003 b3007 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
12_b3002 b3003 b3007 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
13_b3002 b3003 b3006 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 19 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
14_b3002 b3006 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 12 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
15_b3001 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
16_b3002 b3008 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 10 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
17_b3002 b3005 68 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
18_b3002 b3004 b3006 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
19_b3005 0 0 0 0 0 70 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
20_b3004 b3006 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 23 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
21_b3004 b3005 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0
22_b3004 b3007 0 0 0 0 0 0 0 0 0 0 21 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
23_b3007 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 20 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
24_b3005 b3006 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 31 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
25_b3002 b3005 b3006 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
26_b3002 b3005 b3008 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 22 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
27_b3005 b3008 0 0 0 0 0 0 0 0 0 14 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
28_b3008 b30010 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
29_b3008 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 12 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
30_b3009 0 0 0 0 0 0 0 0 0 0 0 0 0 28 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
31_b30010 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 10 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
32_b30010 b30011 0 0 0 0 0 0 0 0 7 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
33_b30011 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 9 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
34_b30011 b30012 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 8 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
35_b30010 b30012 b30013 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
36_b3008 b30012 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
37_b30012 0 0 0 0 0 0 0 0 0 0 0 0 15 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
38_b30012 b30013 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
39_b30013 0 0 0 0 0 0 0 37 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
40_b3005 b3007 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0
41_b3001 b3008 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
42_b3001 b3005 b3008 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 10 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
43_b3001 b3002 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
44_b3001 b3005 b3006 b3008 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0
45_b3005 b3006 b3008 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
46_b3006 b3008 b30010 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
47_b3003 b3005 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
48_b3009 b30010 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
49_b30011 b30013 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0
50_b3006 b3007 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0
51_b3004 b3006 b3007 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0
52_b3006 b3008 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0
53_b3002 b3004 b3005 b3006 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0
54_b3003 b3004 b3005 b3006 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
55_b3002 b3003 b3005 b3006 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0
56_b3002 b3003 b3004 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
57_b3004 b3005 b3006 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0
58_b3002 b3003 b3004 b3006 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0
59_b3004 b3005 b3008 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0
60_b3002 b3003 b3005 b3008 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0
61_b3002 b3008 b30010 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0
62_b3002 b3003 b3004 b3007 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0
In [76]:
km_rssi_30 = KMeans(n_clusters=30, random_state=999)
km_fit_30 = km_rssi_30.fit(array_rssi)
labels_kmean_30=km_fit_30.labels_
X=pd.DataFrame(array_rssi)
X['cluster']=labels_kmean_30
X['target']= df_rssi['readings_group'].astype(str).str.zfill(2) + '_' + df_rssi['beacon_list']
cm_kmean_30= pd.crosstab(index=X["target"], columns=X["cluster"])
cm_kmean_30
Out[76]:
cluster 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29
target
00_b3006 0 0 0 114 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
01_b3003 0 0 0 0 0 0 0 0 0 0 58 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
02_b3003 b3006 0 0 0 0 0 0 0 0 53 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
03_b3004 0 298 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
04_b3002 0 0 238 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
05_b3003 b3004 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 37 0 0 0 0 0 0 0 0 0 0 0 0 0
06_b3002 b3003 0 0 0 0 89 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
07_b3002 b3003 b3005 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
08_b3003 b3005 b3006 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 7 0 0
09_b3003 b3004 b3006 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0
10_b3002 b3004 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 6
11_b3003 b3007 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
12_b3002 b3003 b3007 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
13_b3002 b3003 b3006 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 19 0 0 0 0 0 0 0 0 0 0 0 0 0 0
14_b3002 b3006 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 12 0 0 0 0 0 0 0 0 0 0 0 0
15_b3001 0 0 0 0 0 0 0 0 0 0 0 6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
16_b3002 b3008 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 10 0 0 0 0 0 0
17_b3002 b3005 68 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
18_b3002 b3004 b3006 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 4
19_b3005 0 0 0 0 0 70 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
20_b3004 b3006 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 23 0 0 0 0 0 0 0
21_b3004 b3005 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
22_b3004 b3007 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 21 0 0 0 0 0 0 0 0 0 0
23_b3007 0 0 0 0 0 0 0 0 0 0 0 0 20 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
24_b3005 b3006 0 0 0 0 0 0 0 0 0 31 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
25_b3002 b3005 b3006 0 0 0 0 0 0 0 0 0 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
26_b3002 b3005 b3008 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 22 0 0 0 0 0 0 0 0 0
27_b3005 b3008 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 14 0 0 0 0 0
28_b3008 b30010 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 4 0 0 0 0 0 0 0 0 0 0 0
29_b3008 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 12 0 0 0 0 0 0 0 0 0 0 0
30_b3009 0 0 0 0 0 0 0 28 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
31_b30010 0 0 0 0 0 0 0 0 0 0 0 0 0 0 10 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
32_b30010 b30011 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 7 0 0 0 0
33_b30011 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 9 0 0 0 0 0 0 0 0
34_b30011 b30012 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 8 0 0 0 0 0 0 0 0
35_b30010 b30012 b30013 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
36_b3008 b30012 0 0 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
37_b30012 0 0 0 0 0 0 0 0 0 0 0 0 0 15 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
38_b30012 b30013 0 0 0 0 0 0 0 0 0 0 0 0 0 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
39_b30013 0 0 0 0 0 0 37 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
40_b3005 b3007 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
41_b3001 b3008 0 0 0 0 0 0 0 0 0 0 0 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
42_b3001 b3005 b3008 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 10 0 0 0
43_b3001 b3002 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
44_b3001 b3005 b3006 b3008 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0
45_b3005 b3006 b3008 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 4 0
46_b3006 b3008 b30010 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0
47_b3003 b3005 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
48_b3009 b30010 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
49_b30011 b30013 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
50_b3006 b3007 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
51_b3004 b3006 b3007 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0
52_b3006 b3008 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0
53_b3002 b3004 b3005 b3006 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
54_b3003 b3004 b3005 b3006 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0
55_b3002 b3003 b3005 b3006 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0
56_b3002 b3003 b3004 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
57_b3004 b3005 b3006 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
58_b3002 b3003 b3004 b3006 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
59_b3004 b3005 b3008 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0
60_b3002 b3003 b3005 b3008 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0
61_b3002 b3008 b30010 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0
62_b3002 b3003 b3004 b3007 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
In [77]:
km_rssi_70 = KMeans(n_clusters=70, random_state=999)
km_fit_70 = km_rssi_70.fit(array_rssi)
labels_kmean_70=km_fit_70.labels_
X=pd.DataFrame(array_rssi)
X['cluster']=labels_kmean_70
X['target']= df_rssi['readings_group'].astype(str).str.zfill(2) + '_' + df_rssi['beacon_list']
cm_kmean_70= pd.crosstab(index=X["target"], columns=X["cluster"])
cm_kmean_70
Out[77]:
cluster 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69
target
00_b3006 0 0 0 0 31 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 83 0 0 0 0 0
01_b3003 0 0 0 0 0 0 0 0 0 58 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
02_b3003 b3006 0 0 53 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
03_b3004 186 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 112 0 0 0
04_b3002 0 138 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 100 0 0 0 0
05_b3003 b3004 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 37 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
06_b3002 b3003 0 0 0 0 0 27 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 53 0 0 0 0 0 0 0 0 9 0 0
07_b3002 b3003 b3005 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
08_b3003 b3005 b3006 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 7 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
09_b3003 b3004 b3006 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
10_b3002 b3004 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
11_b3003 b3007 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
12_b3002 b3003 b3007 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
13_b3002 b3003 b3006 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 19 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
14_b3002 b3006 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 12 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
15_b3001 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
16_b3002 b3008 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 10 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
17_b3002 b3005 0 0 0 0 0 0 0 55 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 13 0
18_b3002 b3004 b3006 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
19_b3005 0 0 0 70 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
20_b3004 b3006 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 23 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
21_b3004 b3005 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
22_b3004 b3007 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 21 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
23_b3007 0 0 0 0 0 0 0 0 0 0 0 20 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
24_b3005 b3006 0 0 0 0 0 0 0 0 0 0 0 0 0 0 31 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
25_b3002 b3005 b3006 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
26_b3002 b3005 b3008 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 22 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
27_b3005 b3008 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 14 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
28_b3008 b30010 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
29_b3008 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 12 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
30_b3009 0 0 0 0 0 0 0 0 28 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
31_b30010 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 10 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
32_b30010 b30011 0 0 0 0 0 0 0 0 0 0 7 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
33_b30011 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 9 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
34_b30011 b30012 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 8 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
35_b30010 b30012 b30013 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
36_b3008 b30012 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
37_b30012 0 0 0 0 0 0 0 0 0 0 0 0 0 15 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
38_b30012 b30013 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
39_b30013 0 0 0 0 0 0 18 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 19
40_b3005 b3007 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
41_b3001 b3008 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
42_b3001 b3005 b3008 0 0 0 0 0 0 0 0 0 0 0 0 10 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
43_b3001 b3002 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
44_b3001 b3005 b3006 b3008 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
45_b3005 b3006 b3008 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
46_b3006 b3008 b30010 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
47_b3003 b3005 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
48_b3009 b30010 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
49_b30011 b30013 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
50_b3006 b3007 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
51_b3004 b3006 b3007 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
52_b3006 b3008 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0
53_b3002 b3004 b3005 b3006 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
54_b3003 b3004 b3005 b3006 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
55_b3002 b3003 b3005 b3006 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0
56_b3002 b3003 b3004 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
57_b3004 b3005 b3006 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0
58_b3002 b3003 b3004 b3006 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
59_b3004 b3005 b3008 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0
60_b3002 b3003 b3005 b3008 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0
61_b3002 b3008 b30010 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0
62_b3002 b3003 b3004 b3007 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0
In [78]:
cm_kmean_62.to_csv('d1. cm_kmean_62.csv', index=True)
cm_kmean_70.to_csv('d2. cm_kmean_70.csv', index=True)
cm_kmean_30.to_csv('d3. cm_kmean_30.csv', index=True)

We found that for KMenas clustering on this dataset, performance for v_measure, ami_score and homogeneity gradually increase until n_cluster reaches 62. (62 is the n_cluster with the best score for all 4 metrics) Completeness is almost at its perfect score until n_cluster reaches 62. When n_cluster reaches 62, v_measure, ami_score and completeness dramatically drop, while homogeneity stay flatten. To further investigate this, we compare the confusion matrix for 3 different models with different n_cluster values (30, 62 and 70), we have the following findings:

N_cluster = 30: KMeans groups all the data into 30 clusters, only 7 clusters belong to one single target group. All target groups have been formed into same clusters, except group 09_b3003 b3004 b3006. When we observe closer, we would find that if the cluster groups have multiple target groups, there would be similarities in the beacon combinations of the target group, for example, cluster group 4, comprises 5 target groups, namely, 06_b3002 b3003, 07_b3002 b3003 b3005, 12_b3002 b3003 b3007, 56_b3002 b3003 b3004 and 62_b3002 b3003 b3004 b3007, all contains the combination of b3002 and b3003. Beside the group 06_b3002 b3003 which has 89 members in the group, all these target groups have 1 or 2 members only.

N_cluster = 62: KMeans groups all the data into 62 clusters, only cluster 48, is comprised with target group 21_b3004 b3005 (2 members) and 57_b3004 b3005 b3006 (1 member), all other clusters and target groups are having one-to-one mapping.

N_cluster = 70: KMeans groups all the data into 70 clusters, all cluster belong to one single target group. However, 6 target groups have been split into separate clusters. For example, 03_b3004 has been split into cluster group 0 and 66. Obviously, this is the result of too many clusters for mapping 63 target groups.

Conclusion

Compare the homogeneity and completeness of all our confusion matrices for DB Scan and KMean

 (a) Define a function to check each column for homogeneity and each row for completeness
In [79]:
def checkConfusionMatrix(cm):
    perfect_homogeneity = True
    perfect_completeness = True
    print("Homogeneity Check:")
    homo_group = 0
    complete_group = 0

    for col in cm.columns:
        homogeneity_cnt = (cm[col]!=0).sum()
        if(homogeneity_cnt != 1):
            print("================================================================")
            print("cluster:", col, "contains", homogeneity_cnt, "target group(s)")
            print("target group: ", cm[cm[col]!=0].index.values, "are both grouped into cluster", col)
            perfect_homogeneity = False
        else:
            homo_group = homo_group + 1
    if(perfect_homogeneity == True):
        print("All cluster only contains one single target group")
    else:
        print("\nHowever,", homo_group, "out of", len(cm.columns), "clusters contain(s) one single target group")

    print("\nCompleteness Check:")        
    for row in cm.index:
        completeness_cnt = (cm.loc[row]!=0).sum()
        target_member = 0
        if(completeness_cnt != 1):
            print("================================================================")
            perfect_completeness = False
            print("target:", row, "has been split into", completeness_cnt, "cluster group")
            cluster=list(np.where(cm.loc[row]!=0))[0]
            for i in cluster:
                col=cm.columns[i]
                target_member = target_member + cm[col][row]
                print("while cluster", col , "contains", cm[col][row], "members")
            print("there should be ", target_member, "members in target group", row)
        else:
            complete_group = complete_group + 1
            
    if(perfect_completeness == True):
        print("All target groups are grouped into same cluster")
    else:
        print("\nHowever,", complete_group, "out of", len(cm.index), "target groups have been grouped into same cluster")
        
In [80]:
checkConfusionMatrix(cm_dbs3)
Homogeneity Check:
================================================================
cluster: -1 contains 58 target group(s)
target group:  ['02_b3003 b3006' '05_b3003 b3004' '06_b3002 b3003' '07_b3002 b3003 b3005'
 '08_b3003 b3005 b3006' '09_b3003 b3004 b3006' '10_b3002 b3004'
 '11_b3003 b3007' '12_b3002 b3003 b3007' '13_b3002 b3003 b3006'
 '14_b3002 b3006' '15_b3001' '16_b3002 b3008' '17_b3002 b3005'
 '18_b3002 b3004 b3006' '20_b3004 b3006' '21_b3004 b3005' '22_b3004 b3007'
 '23_b3007' '24_b3005 b3006' '25_b3002 b3005 b3006' '26_b3002 b3005 b3008'
 '27_b3005 b3008' '28_b3008 b30010' '29_b3008' '30_b3009' '31_b30010'
 '32_b30010 b30011' '33_b30011' '34_b30011 b30012'
 '35_b30010 b30012 b30013' '36_b3008 b30012' '37_b30012'
 '38_b30012 b30013' '39_b30013' '40_b3005 b3007' '41_b3001 b3008'
 '42_b3001 b3005 b3008' '43_b3001 b3002' '44_b3001 b3005 b3006 b3008'
 '45_b3005 b3006 b3008' '46_b3006 b3008 b30010' '47_b3003 b3005'
 '48_b3009 b30010' '49_b30011 b30013' '50_b3006 b3007'
 '51_b3004 b3006 b3007' '52_b3006 b3008' '53_b3002 b3004 b3005 b3006'
 '54_b3003 b3004 b3005 b3006' '55_b3002 b3003 b3005 b3006'
 '56_b3002 b3003 b3004' '57_b3004 b3005 b3006'
 '58_b3002 b3003 b3004 b3006' '59_b3004 b3005 b3008'
 '60_b3002 b3003 b3005 b3008' '61_b3002 b3008 b30010'
 '62_b3002 b3003 b3004 b3007'] are both grouped into cluster -1

However, 28 out of 29 clusters contain(s) one single target group

Completeness Check:
================================================================
target: 02_b3003 b3006 has been split into 3 cluster group
while cluster -1 contains 9 members
while cluster 2 contains 21 members
while cluster 6 contains 23 members
there should be  53 members in target group 02_b3003 b3006
================================================================
target: 05_b3003 b3004 has been split into 2 cluster group
while cluster -1 contains 5 members
while cluster 7 contains 32 members
there should be  37 members in target group 05_b3003 b3004
================================================================
target: 06_b3002 b3003 has been split into 3 cluster group
while cluster -1 contains 9 members
while cluster 5 contains 73 members
while cluster 8 contains 7 members
there should be  89 members in target group 06_b3002 b3003
================================================================
target: 14_b3002 b3006 has been split into 2 cluster group
while cluster -1 contains 4 members
while cluster 9 contains 8 members
there should be  12 members in target group 14_b3002 b3006
================================================================
target: 17_b3002 b3005 has been split into 2 cluster group
while cluster -1 contains 13 members
while cluster 11 contains 55 members
there should be  68 members in target group 17_b3002 b3005
================================================================
target: 20_b3004 b3006 has been split into 2 cluster group
while cluster -1 contains 2 members
while cluster 12 contains 21 members
there should be  23 members in target group 20_b3004 b3006
================================================================
target: 22_b3004 b3007 has been split into 2 cluster group
while cluster -1 contains 12 members
while cluster 14 contains 9 members
there should be  21 members in target group 22_b3004 b3007
================================================================
target: 23_b3007 has been split into 3 cluster group
while cluster -1 contains 2 members
while cluster 13 contains 13 members
while cluster 27 contains 5 members
there should be  20 members in target group 23_b3007
================================================================
target: 24_b3005 b3006 has been split into 2 cluster group
while cluster -1 contains 5 members
while cluster 15 contains 26 members
there should be  31 members in target group 24_b3005 b3006
================================================================
target: 26_b3002 b3005 b3008 has been split into 2 cluster group
while cluster -1 contains 13 members
while cluster 16 contains 9 members
there should be  22 members in target group 26_b3002 b3005 b3008
================================================================
target: 29_b3008 has been split into 2 cluster group
while cluster -1 contains 6 members
while cluster 25 contains 6 members
there should be  12 members in target group 29_b3008
================================================================
target: 30_b3009 has been split into 3 cluster group
while cluster -1 contains 3 members
while cluster 17 contains 10 members
while cluster 18 contains 15 members
there should be  28 members in target group 30_b3009
================================================================
target: 31_b30010 has been split into 2 cluster group
while cluster -1 contains 5 members
while cluster 19 contains 5 members
there should be  10 members in target group 31_b30010
================================================================
target: 33_b30011 has been split into 2 cluster group
while cluster -1 contains 4 members
while cluster 20 contains 5 members
there should be  9 members in target group 33_b30011
================================================================
target: 37_b30012 has been split into 3 cluster group
while cluster -1 contains 2 members
while cluster 21 contains 6 members
while cluster 26 contains 7 members
there should be  15 members in target group 37_b30012
================================================================
target: 39_b30013 has been split into 3 cluster group
while cluster -1 contains 5 members
while cluster 22 contains 17 members
while cluster 23 contains 15 members
there should be  37 members in target group 39_b30013
================================================================
target: 42_b3001 b3005 b3008 has been split into 2 cluster group
while cluster -1 contains 3 members
while cluster 24 contains 7 members
there should be  10 members in target group 42_b3001 b3005 b3008

However, 46 out of 63 target groups have been grouped into same cluster
In [81]:
checkConfusionMatrix(cm_dbs13)
Homogeneity Check:
================================================================
cluster: -1 contains 32 target group(s)
target group:  ['07_b3002 b3003 b3005' '09_b3003 b3004 b3006' '11_b3003 b3007'
 '12_b3002 b3003 b3007' '18_b3002 b3004 b3006' '21_b3004 b3005'
 '28_b3008 b30010' '32_b30010 b30011' '35_b30010 b30012 b30013'
 '36_b3008 b30012' '38_b30012 b30013' '40_b3005 b3007' '43_b3001 b3002'
 '44_b3001 b3005 b3006 b3008' '45_b3005 b3006 b3008'
 '46_b3006 b3008 b30010' '47_b3003 b3005' '48_b3009 b30010'
 '49_b30011 b30013' '50_b3006 b3007' '51_b3004 b3006 b3007'
 '52_b3006 b3008' '53_b3002 b3004 b3005 b3006'
 '54_b3003 b3004 b3005 b3006' '55_b3002 b3003 b3005 b3006'
 '56_b3002 b3003 b3004' '57_b3004 b3005 b3006'
 '58_b3002 b3003 b3004 b3006' '59_b3004 b3005 b3008'
 '60_b3002 b3003 b3005 b3008' '61_b3002 b3008 b30010'
 '62_b3002 b3003 b3004 b3007'] are both grouped into cluster -1

However, 31 out of 32 clusters contain(s) one single target group

Completeness Check:
All target groups are grouped into same cluster
In [82]:
checkConfusionMatrix(cm_dbs21)
Homogeneity Check:
================================================================
cluster: -1 contains 31 target group(s)
target group:  ['07_b3002 b3003 b3005' '09_b3003 b3004 b3006' '11_b3003 b3007'
 '12_b3002 b3003 b3007' '18_b3002 b3004 b3006' '21_b3004 b3005'
 '28_b3008 b30010' '35_b30010 b30012 b30013' '36_b3008 b30012'
 '38_b30012 b30013' '40_b3005 b3007' '43_b3001 b3002'
 '44_b3001 b3005 b3006 b3008' '45_b3005 b3006 b3008'
 '46_b3006 b3008 b30010' '47_b3003 b3005' '48_b3009 b30010'
 '49_b30011 b30013' '50_b3006 b3007' '51_b3004 b3006 b3007'
 '52_b3006 b3008' '53_b3002 b3004 b3005 b3006'
 '54_b3003 b3004 b3005 b3006' '55_b3002 b3003 b3005 b3006'
 '56_b3002 b3003 b3004' '57_b3004 b3005 b3006'
 '58_b3002 b3003 b3004 b3006' '59_b3004 b3005 b3008'
 '60_b3002 b3003 b3005 b3008' '61_b3002 b3008 b30010'
 '62_b3002 b3003 b3004 b3007'] are both grouped into cluster -1

However, 32 out of 33 clusters contain(s) one single target group

Completeness Check:
All target groups are grouped into same cluster
In [83]:
checkConfusionMatrix(cm_kmean_30)
Homogeneity Check:
================================================================
cluster: 1 contains 2 target group(s)
target group:  ['03_b3004' '21_b3004 b3005'] are both grouped into cluster 1
================================================================
cluster: 2 contains 2 target group(s)
target group:  ['04_b3002' '43_b3001 b3002'] are both grouped into cluster 2
================================================================
cluster: 4 contains 5 target group(s)
target group:  ['06_b3002 b3003' '07_b3002 b3003 b3005' '12_b3002 b3003 b3007'
 '56_b3002 b3003 b3004' '62_b3002 b3003 b3004 b3007'] are both grouped into cluster 4
================================================================
cluster: 5 contains 3 target group(s)
target group:  ['19_b3005' '40_b3005 b3007' '47_b3003 b3005'] are both grouped into cluster 5
================================================================
cluster: 6 contains 2 target group(s)
target group:  ['39_b30013' '49_b30011 b30013'] are both grouped into cluster 6
================================================================
cluster: 8 contains 2 target group(s)
target group:  ['02_b3003 b3006' '09_b3003 b3004 b3006'] are both grouped into cluster 8
================================================================
cluster: 9 contains 3 target group(s)
target group:  ['24_b3005 b3006' '25_b3002 b3005 b3006' '57_b3004 b3005 b3006'] are both grouped into cluster 9
================================================================
cluster: 10 contains 2 target group(s)
target group:  ['01_b3003' '11_b3003 b3007'] are both grouped into cluster 10
================================================================
cluster: 11 contains 2 target group(s)
target group:  ['15_b3001' '41_b3001 b3008'] are both grouped into cluster 11
================================================================
cluster: 12 contains 2 target group(s)
target group:  ['23_b3007' '50_b3006 b3007'] are both grouped into cluster 12
================================================================
cluster: 13 contains 3 target group(s)
target group:  ['36_b3008 b30012' '37_b30012' '38_b30012 b30013'] are both grouped into cluster 13
================================================================
cluster: 14 contains 3 target group(s)
target group:  ['31_b30010' '35_b30010 b30012 b30013' '48_b3009 b30010'] are both grouped into cluster 14
================================================================
cluster: 15 contains 2 target group(s)
target group:  ['13_b3002 b3003 b3006' '58_b3002 b3003 b3004 b3006'] are both grouped into cluster 15
================================================================
cluster: 16 contains 2 target group(s)
target group:  ['05_b3003 b3004' '09_b3003 b3004 b3006'] are both grouped into cluster 16
================================================================
cluster: 18 contains 4 target group(s)
target group:  ['28_b3008 b30010' '29_b3008' '46_b3006 b3008 b30010' '52_b3006 b3008'] are both grouped into cluster 18
================================================================
cluster: 19 contains 2 target group(s)
target group:  ['22_b3004 b3007' '51_b3004 b3006 b3007'] are both grouped into cluster 19
================================================================
cluster: 20 contains 2 target group(s)
target group:  ['26_b3002 b3005 b3008' '60_b3002 b3003 b3005 b3008'] are both grouped into cluster 20
================================================================
cluster: 21 contains 2 target group(s)
target group:  ['33_b30011' '34_b30011 b30012'] are both grouped into cluster 21
================================================================
cluster: 23 contains 2 target group(s)
target group:  ['16_b3002 b3008' '61_b3002 b3008 b30010'] are both grouped into cluster 23
================================================================
cluster: 24 contains 2 target group(s)
target group:  ['27_b3005 b3008' '59_b3004 b3005 b3008'] are both grouped into cluster 24
================================================================
cluster: 26 contains 2 target group(s)
target group:  ['42_b3001 b3005 b3008' '44_b3001 b3005 b3006 b3008'] are both grouped into cluster 26
================================================================
cluster: 27 contains 3 target group(s)
target group:  ['08_b3003 b3005 b3006' '54_b3003 b3004 b3005 b3006'
 '55_b3002 b3003 b3005 b3006'] are both grouped into cluster 27
================================================================
cluster: 29 contains 3 target group(s)
target group:  ['10_b3002 b3004' '18_b3002 b3004 b3006' '53_b3002 b3004 b3005 b3006'] are both grouped into cluster 29

However, 7 out of 30 clusters contain(s) one single target group

Completeness Check:
================================================================
target: 09_b3003 b3004 b3006 has been split into 2 cluster group
while cluster 8 contains 2 members
while cluster 16 contains 1 members
there should be  3 members in target group 09_b3003 b3004 b3006

However, 62 out of 63 target groups have been grouped into same cluster
In [84]:
checkConfusionMatrix(cm_kmean_62)
Homogeneity Check:
================================================================
cluster: 48 contains 2 target group(s)
target group:  ['21_b3004 b3005' '57_b3004 b3005 b3006'] are both grouped into cluster 48

However, 61 out of 62 clusters contain(s) one single target group

Completeness Check:
All target groups are grouped into same cluster
In [85]:
checkConfusionMatrix(cm_kmean_70)
Homogeneity Check:
All cluster only contains one single target group

Completeness Check:
================================================================
target: 00_b3006 has been split into 2 cluster group
while cluster 4 contains 31 members
while cluster 64 contains 83 members
there should be  114 members in target group 00_b3006
================================================================
target: 03_b3004 has been split into 2 cluster group
while cluster 0 contains 186 members
while cluster 66 contains 112 members
there should be  298 members in target group 03_b3004
================================================================
target: 04_b3002 has been split into 2 cluster group
while cluster 1 contains 138 members
while cluster 65 contains 100 members
there should be  238 members in target group 04_b3002
================================================================
target: 06_b3002 b3003 has been split into 3 cluster group
while cluster 5 contains 27 members
while cluster 58 contains 53 members
while cluster 67 contains 9 members
there should be  89 members in target group 06_b3002 b3003
================================================================
target: 17_b3002 b3005 has been split into 2 cluster group
while cluster 7 contains 55 members
while cluster 68 contains 13 members
there should be  68 members in target group 17_b3002 b3005
================================================================
target: 39_b30013 has been split into 2 cluster group
while cluster 6 contains 18 members
while cluster 69 contains 19 members
there should be  37 members in target group 39_b30013

However, 57 out of 63 target groups have been grouped into same cluster

To summarize our analysis, we have successfully achieved our goal to identify the signal measurement patterns and use clustering to group all the data into organized structure.

For this data set, from all the eps value we tested (range from 3 to 50), DB scan can differentiate and make all the cluster only contain one single target group. DB scan uses its outliner group “-1” to store all the unidentified cluster. If the eps value is too small, DB scan will split target group into different cluster. At the best eps value, DB scan will not split any target group into different cluster, and just leave all those target group with low density as outliners. With eps value higher than the optimal result, db scan’s performance stay the same and will not be over-tuned.

For K means clustering, from all the n_cluster value we tested (range from 30 to 70), if the n_cluster is too low, Kmean will make the cluster to multiple target groups with similar beacon combinations, but will not split any target group into different cluster. When the n_cluster value is well chosen, K means will differentiate the data as almost 1:1 mapping with the target group. However, beyond the optimal value, K means performance will drop, and will split the target group into different clusters although all the clusters still contain one single target group.

Even though, we have found the almost perfect n_cluster value to identify all the cluster and target in 1:1 mapping. I would still recommend to use db scan to process this dataset, as it is comparatively insensitive to the input parameter, and the undefined clusters are actually sparse reading of beacon combinations that come up infrequently in the dataset, tuning eps is just for improving clustering on less-dense target group. The perfect n_cluster value in kmeans might only work for this dataset for this particular number of groups of beacon combinations, small difference in the n_cluster might turn out to have very bad results in clustering the data.

In [ ]: