r - plot/ggplot2 - Fill area with too many points -
final implementation - not finished heading right way
idea/problem: have plot many overlapping points , want replace them plain area, therefore increasing performance viewing plot.
possible implementation: calculate distance matrix between points , connect points below specified distance.
todo/not finished: works manually set distances depending on size of printed plot. stopped here because outcome didnt meet aesthetic sense.
minimal example intermediate plots
set.seed(074079089) n.points <- 3000 mat <- matrix(rnorm(n.points*2, 0,0.2), nrow=n.points, ncol=2) colnames(mat) <- c("x", "y") d.mat <- dist(mat) fit.mat <-hclust(d.mat, method = "single") lims <- c(-1,1) real.lims <- lims*1.1 ## ggplot invokes them approximately # attempt estimate point-sizes, works default pdfs pdf("test.pdf") cutsize <- sum(abs(real.lims))/100 groups <- cutree(fit.mat, h=cutsize) # cut tree @ height cutsize # plot(fit.mat) # display dendogram # draw dendogram red borders around 5 clusters # rect.hclust(fit.mat, h=cutsize, border="red") library(ggplot2) df <- data.frame(mat) df$groups <- groups plot00 <- ggplot(data=df, aes(x,y, col=factor(groups))) + geom_point() + guides(col=false) + xlim(lims) + ylim(lims)+ ggtitle("each color group") pdf("plot00.pdf") print(plot00) dev.off()
# if less 4 points connected, show them seperately t.groups <- table(groups) # how group drop.group <- as.numeric(names(t.groups[t.groups<4])) # groups less 4 points taken groups[groups %in% drop.group] <- 0 # in group 0 df$groups <- groups plot01 <- ggplot(data=df, aes(x,y, col=factor(groups))) + geom_point() + xlim(lims)+ ylim(lims) + scale_color_hue(l=10) pdf("plot01.pdf") print(plot01) dev.off()
find_hull <- function(df_0) { return(df_0[chull(df_0$x, df_0$y), ]) } library(plyr) single.points.df <- df[df$groups == 0 , ] connected.points.df <- df[df$groups != 0 , ] hulls <- ddply(connected.points.df, "groups", find_hull) # groups find hull plot02 <- ggplot() + geom_point(data=single.points.df, aes(x,y, col=factor(groups))) + xlim(lims)+ ylim(lims) + scale_color_hue(l=10) pdf("plot02.pdf") print(plot02) dev.off()
plot03 <- plot02 for(grp in names(table(hulls$groups))) { plot03 <- plot03 + geom_polygon(data=hulls[hulls$groups==grp, ], aes(x,y), alpha=0.4) } # print(plot03) plot01 <- plot01 + theme(legend.position="none") plot03 <- plot03 + theme(legend.position="none") # multiplot(plot01, plot03, cols=2) pdf("plot03.pdf") print(plot03) dev.off()
initial question
i have (maybe odd) question.
in plots, have thousands of points in analysis. display them, pc takes quite bit of time because there many points. after now, many of these points can overlap, have filled area (which fine!). save time/effort displaying, usefull fill area plotting each point on own.
i know there possibilities in heatmaps , on, not idea have in mind. idea like:
#plot00: ggplot many many points , filled area of points plot00 <- plot00 + fill.crowded.areas() # plot(), sadly have idea how manage
any ideas? or nothing anytime?
# example code # install.packages("ggplot2") library(ggplot2) n.points <- 10000 mat <- matrix(rexp(n.points*2), nrow=n.points, ncol=2) colnames(mat) <- c("x", "y") df <- data.frame(mat) plot00 <- ggplot(df, aes(x=x, y=y)) + theme_bw() + # white background, grey strips geom_point(shape=19)# aussehen der punkte print(plot00)
# no ggplot2 plot(df, pch=19)
edit:
have density-plots mentioned fdetsch (how can mark name?) there questions concerning topic. not thing want exactly. know concern bit strange, densities make plot more busy necessary.
links topics densities:
you use robust estimator estimate location of majority of points , plot convex hull of points follows:
set.seed(1337) n.points <- 500 mat <- matrix(rexp(n.points*2), nrow=n.points, ncol=2) colnames(mat) <- c("x", "y") df <- data.frame(mat) require(robustbase) my_poly <- function(data, a, ...){ cov_rob = covmcd(data, alpha = a) df_rob = data[cov_rob$best,] ch = chull(df_rob$x, df_rob$y) geom_polygon(data = df_rob[ch,], aes(x,y), ...) } require(ggplot2) ggplot() + geom_point(data=df, aes(x,y)) + my_poly(df, = 0.5, fill=2, alpha=0.5) + my_poly(df, = 0.7, fill=3, alpha=0.5)
this leads to:
by controlling alpha-value of covmcd
can increase/decrease size of area. see ?robustbase::covmcd
details. btw.: mcd stands minimum covariance determinant. instead of can use mass::cov.mve
calculate minimum valume ellipsoid mass::cov.mve(..., quantile.used=
-percent of points within ellipsoid.
for 2+ classes:
my_poly2 <- function(data, a){ cov_rob = covmcd(data, alpha = a) df_rob = data[cov_rob$best,] ch = chull(df_rob[,1], df_rob[,2]) df_rob[ch,] } ggplot(faithful, aes(waiting, eruptions, color = eruptions > 3)) + geom_point() + geom_polygon(data = my_poly2(faithful[faithful$eruptions > 3,], a=0.5), aes(waiting, eruptions), fill = 2, alpha = 0.5) + geom_polygon(data = my_poly2(faithful[faithful$eruptions < 3,], a=0.5), aes(waiting, eruptions), fill = 3, alpha = 0.5)
or if ok un-robust ellipsoids have @ stat_ellipse
Comments
Post a Comment