r - plot/ggplot2 - Fill area with too many points -


final implementation - not finished heading right way

idea/problem: have plot many overlapping points , want replace them plain area, therefore increasing performance viewing plot.

possible implementation: calculate distance matrix between points , connect points below specified distance.

todo/not finished: works manually set distances depending on size of printed plot. stopped here because outcome didnt meet aesthetic sense.

minimal example intermediate plots

set.seed(074079089) n.points <- 3000  mat <- matrix(rnorm(n.points*2, 0,0.2), nrow=n.points, ncol=2) colnames(mat) <- c("x", "y")  d.mat <- dist(mat) fit.mat <-hclust(d.mat, method = "single") lims <- c(-1,1) real.lims <- lims*1.1               ## ggplot invokes them approximately  # attempt estimate point-sizes, works default pdfs pdf("test.pdf") cutsize <- sum(abs(real.lims))/100   groups <- cutree(fit.mat, h=cutsize) # cut tree @ height cutsize # plot(fit.mat) # display dendogram  # draw dendogram red borders around 5 clusters # rect.hclust(fit.mat, h=cutsize, border="red")  library(ggplot2) df <- data.frame(mat) df$groups <- groups plot00 <- ggplot(data=df, aes(x,y, col=factor(groups))) +      geom_point() + guides(col=false) +  xlim(lims) + ylim(lims)+      ggtitle("each color group") pdf("plot00.pdf") print(plot00) dev.off() 

plot00 - points group color

# if less 4 points connected, show them seperately t.groups <- table(groups)   # how group drop.group <- as.numeric(names(t.groups[t.groups<4]))   # groups less 4 points taken groups[groups %in% drop.group] <- 0                     # in group 0 df$groups <- groups plot01 <- ggplot(data=df, aes(x,y, col=factor(groups))) +      geom_point() + xlim(lims)+ ylim(lims) +      scale_color_hue(l=10) pdf("plot01.pdf") print(plot01) dev.off() 

plot01 - single points in 1 group

find_hull <- function(df_0)  {     return(df_0[chull(df_0$x, df_0$y), ]) }   library(plyr) single.points.df <- df[df$groups == 0 , ] connected.points.df <- df[df$groups != 0 , ] hulls <- ddply(connected.points.df, "groups", find_hull) #  groups find hull plot02 <- ggplot() +      geom_point(data=single.points.df, aes(x,y, col=factor(groups))) +      xlim(lims)+ ylim(lims) +      scale_color_hue(l=10) pdf("plot02.pdf") print(plot02) dev.off() 

plot02 - "single"-points (less 4 connected points)

plot03 <- plot02 for(grp in names(table(hulls$groups))) {     plot03 <- plot03 + geom_polygon(data=hulls[hulls$groups==grp, ],                                     aes(x,y), alpha=0.4) } # print(plot03) plot01 <- plot01 + theme(legend.position="none") plot03 <- plot03 + theme(legend.position="none") # multiplot(plot01, plot03, cols=2) pdf("plot03.pdf") print(plot03) dev.off() 

plot03 - final

initial question

i have (maybe odd) question.

in plots, have thousands of points in analysis. display them, pc takes quite bit of time because there many points. after now, many of these points can overlap, have filled area (which fine!). save time/effort displaying, usefull fill area plotting each point on own.

i know there possibilities in heatmaps , on, not idea have in mind. idea like:

#plot00: ggplot many many points , filled area of points plot00 <- plot00 + fill.crowded.areas()  # plot(), sadly have idea how manage 

any ideas? or nothing anytime?

# example code # install.packages("ggplot2") library(ggplot2)  n.points <- 10000 mat <- matrix(rexp(n.points*2), nrow=n.points, ncol=2) colnames(mat) <- c("x", "y") df <- data.frame(mat) plot00 <- ggplot(df, aes(x=x, y=y)) +      theme_bw()  +                       # white background, grey strips     geom_point(shape=19)# aussehen der punkte  print(plot00) 

ggplot2

# no ggplot2 plot(df, pch=19) 

plot

edit:
have density-plots mentioned fdetsch (how can mark name?) there questions concerning topic. not thing want exactly. know concern bit strange, densities make plot more busy necessary.

links topics densities:

scatterplot many points
high density scatter plots

you use robust estimator estimate location of majority of points , plot convex hull of points follows:

set.seed(1337) n.points <- 500 mat <- matrix(rexp(n.points*2), nrow=n.points, ncol=2) colnames(mat) <- c("x", "y") df <- data.frame(mat)  require(robustbase) my_poly <- function(data, a, ...){   cov_rob = covmcd(data, alpha = a)   df_rob = data[cov_rob$best,]   ch = chull(df_rob$x, df_rob$y)   geom_polygon(data = df_rob[ch,], aes(x,y), ...) }  require(ggplot2) ggplot() +    geom_point(data=df, aes(x,y)) +   my_poly(df, = 0.5, fill=2, alpha=0.5) +   my_poly(df, = 0.7, fill=3, alpha=0.5)  

this leads to:

enter image description here

by controlling alpha-value of covmcd can increase/decrease size of area. see ?robustbase::covmcd details. btw.: mcd stands minimum covariance determinant. instead of can use mass::cov.mve calculate minimum valume ellipsoid mass::cov.mve(..., quantile.used=-percent of points within ellipsoid.

for 2+ classes:

my_poly2 <- function(data, a){   cov_rob = covmcd(data, alpha = a)   df_rob = data[cov_rob$best,]   ch = chull(df_rob[,1], df_rob[,2])   df_rob[ch,] }  ggplot(faithful, aes(waiting, eruptions, color = eruptions > 3)) +   geom_point() +    geom_polygon(data = my_poly2(faithful[faithful$eruptions > 3,], a=0.5), aes(waiting, eruptions), fill = 2, alpha = 0.5) +   geom_polygon(data = my_poly2(faithful[faithful$eruptions < 3,], a=0.5), aes(waiting, eruptions), fill = 3, alpha = 0.5) 

enter image description here

or if ok un-robust ellipsoids have @ stat_ellipse


Comments

Popular posts from this blog

php - Wordpress website dashboard page or post editor content is not showing but front end data is showing properly -

How to get the ip address of VM and use it to configure SSH connection dynamically in Ansible -

javascript - Get parameter of GET request -