Monday, 16 May 2011

Dealing with large R plots

Had a small problem today. I was preparing several plots for a manuscript, but there was a lot of number crunching involved. One plot in particular comprised over 1.5 million data points! My preferred format of choice from saving out of R is as .pdf, because this vector-based format is scalable at any resolution, searchable, and easily compiled into another document by pdfLaTeX. However, due to the amount of data the pdf ended up over 25mb—a little too large! I could have gone down the low-quality raster route (e.g. as a .png) to reduce file size, but didn't really want to due to the crappy, pixelated appearance of the text.

What I needed was a compromise. Normally, I like to get the bulk of a plot done in R, and then polish it up a bit in Inkscape (it's so much easier this way). I adopted a similar approach here. Using the ggplot2 package, I removed all extraneous axis labelling using options including:

qplot() + opts(axis.title.x=theme_blank(), axis.title.x=theme_blank())

I also did the same when not using ggplot2, with the standard R graphics commands:

plot(dat, xlab="", ylab="", xaxt="n", yaxt="n")
axis(side=1, labels=FALSE)
axis(side=2, labels=FALSE)

Then, I exported it as a high quality (600 dpi) png:

png(file="file.png", width=11.7, height=11.7, units = "in", res=600)

This resulted in a series of plain plots something like this:

Next, I simply imported this png into Inkscape and added the axis labels myself as text. This was then saved as .svg (master copy), and then copies exported as .pdf.

The resulting file was much smaller at under 1mb (could have probably made it smaller still), and retained the all-important vector graphic text. The actual plotting was still embedded as raster, but it was high enough quality and I could live with that.