In this post, I perform a similar type of clustering using slightly different methods (details forthcoming). I'm not providing m̶u̶c̶h̶ any of the methodological details in this post, but I just wanted to show a pretty plot with nice clusters.
Drugs were clustered using t-SNE (here and here), which is a method that is remarkably good at taking high-dimensional data (in this case, thousands of different side effects) and clustering it into 2-dimensions while maintaining much of the "nearest neighbor" information that was present at high dimensions.
You can see the 2-D "cluster plot" (after t-SNE transformation) below.
- Hover over the data points to see drug names.
- I colored the data points by running k-means on the 2-D data (very quick and dirty and probably not the best approach).
- Still, you can see pretty good functional clustering. For example:
- the dark green points on the left edge of the plot are almost all related to cholesterol and lipid pharmacotherapy. If you zoom in, you will also find daptomycin (an anti-bacterial), but it clusters there b/c it is associated with reports of myopathy and rhabdomyolysis (adverse events that are somewhat characteristic of the "statins").
- red and blue points on the left edge are primarily involved w/ Type 2 diabetes.