Skip to content

K-Means Clustering chapters: zipcode library retired #3

@enzedonline

Description

@enzedonline

The zipcode library used in the exercises was retired a while back making it tricky to follow the example.

I managed to get most of it loaded using zipcodeR and amending the code as follows:

install.packages("zipcodeR")
library(zipcodeR)
zipcode <- search_state('NY')
zipcode$city2 <- toupper(zipcode$major_city)
ds <- merge(ds, zipcode, by.x = "Zip.Code", by.y = "zipcode", all.x = T)

Additionally, if you want to use metric instead of miles, you can adjust the code as follows:

kilometres <- merge(data.cl, centers, by.x = "clust", by.y = "clust")
# create null vector
kms <- c()
# for each row in the kilometres table, calculate the distance in km from the point to the node centre
for(i in 1:nrow(kilometres)){
  kms.temp <- round(as.numeric(distVincentyEllipsoid(c(kilometres$x.x[i], kilometres$y.x[i]), c(kilometres$x.y[i], kilometres$y.y[i]))/1000),0)
  kms <- c(kms, kms.temp)
}
# push the distance data into the kilometres data frame
kilometres$kilometres <- kms

# calculate max distance and total distance for 2 node model
mx.dist2 <- max(kilometres$kilometres) 
tot.kms2 <- sum(kilometres$kilometres, na.rm = T)

Something curious/spurious with the k-means - both distance and max peaked at 6 nodes, something up with the algorithm there ...

totals<-c(tot.kms1, tot.kms2,tot.kms3,tot.kms4,tot.kms5,tot.kms6,tot.kms7,tot.kms8,tot.kms9,tot.kms10)
max.kms<-c(mx.dist1,mx.dist2,mx.dist3,mx.dist4,mx.dist5,mx.dist6,mx.dist7,mx.dist8,mx.dist9,mx.dist10)
df.analysis=data.frame(clusters=1:10, totals, max.kms)

ggplot(df.analysis) + 
  geom_bar(mapping = aes(x = clusters, y = totals/1000), stat = "identity", fill = "black") +
  geom_line(mapping = aes(x = clusters, y = max.kms*10), size = 2, color = "blue") + 
  scale_x_continuous(breaks=scales::breaks_width(1)) +
  scale_y_continuous(name = "Total distance ('000km's)", 
                     sec.axis = sec_axis(~ . / 10, name = "Max Distance")) + 
  theme(
    axis.title.y = element_text(color = "black"),
    axis.title.y.right = element_text(color = "blue"))

image

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions