Clustering non-numeric -- or categorial -- data is surprisingly difficult, but it's explained here by resident data scientist Dr. James McCaffrey of Microsoft Research, who provides all the code you ...
K-means is comparatively simple and works well with large datasets, but it assumes clusters are circular/spherical in shape, so it can only find simple cluster geometries. Data clustering is the ...