Skip to main content

R Tip: Use drop = FALSE with data.frames

(This article was first published on R – Win-Vector Blog, and kindly contributed to R-bloggers)

Another R tip. Get in the habit of using drop = FALSE when indexing (using [ , ] on) data.frames.

NewImage

Prince Rupert’s drops (img: Wikimedia Commons)

In R, single column data.frames are often converted to vectors when manipulated. For example:

d <- data.frame(x = seq_len(3))
print(d)
#>   x
#> 1 1
#> 2 2
#> 3 3
# not a data frame!
d[order(-d$x), ]
#> [1] 3 2 1

We were merely trying to re-order the rows and the result was converted to a vector. This happened because the rules for [ , ] change if there is only one result column. This happens even if the there had been only one input column. Another example is: d[,] is also vector in this case.

The issue is: if we are writing re-usable code we are often programming before we know complete contents of a variable or argument. For a data.frame named “g” supplied as an argument: g[vec, ] can be a data.frame or a vector (or even possibly a list). However we do know if g is a data.frame then g[vec, , drop = FALSE] is also a data.frame (assuming vec is a vector of valid row indices or a logical vector, note: NA induces some special cases).

We care as vectors and data.frames have different semantics, so are not fully substitutable in later code.

The fix is to include drop = FALSE as a third argument to [ , ].

# is a data frame.
d[order(-d$x), , drop = FALSE]
#>   x
#> 3 3
#> 2 2
#> 1 1

To pull out a column I suggest using one of the many good extraction notations (all using the fact a data.frame is officially a list of columns):

d[["x"]]
#> [1] 1 2 3

d$x
#> [1] 1 2 3

d[[1]]
#> [1] 1 2 3

My overall advice is: get in the habit of including drop = FALSE when working with [ , ] and data.frames. I say do this even when it is obvious that the result does in fact have more than one column.

For example write “mtcars[, c("mpg", "cyl"), drop = FALSE]” instead of “mtcars[, c("mpg", "cyl")]“. It is clear that for data.frames both forms should work the same (either selecting a data frame with two columns, or throwing an error if we have mentioned a non existent column). But longer drop = FALSE form is safer (go further towards ensuring type stable code) and more importantly documents intent (that you wanted a data.frame result).

One can also try base::subset(), as it has non-dropping defaults.

To leave a comment for the author, please follow the link and comment on their blog: R – Win-Vector Blog.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...


from R-bloggers http://ift.tt/2EYloz6
via IFTTT

Comments

Popular posts from this blog

Controlling legend appearance in ggplot2 with override.aes

[This article was first published on Very statisticious on Very statisticious , and kindly contributed to R-bloggers ]. (You can report issue about the content on this page here ) Want to share your content on R-bloggers? click here if you have a blog, or here if you don't. In ggplot2 , aesthetics and their scale_*() functions change both the plot appearance and the plot legend appearance simultaneously. The override.aes argument in guide_legend() allows the user to change only the legend appearance without affecting the rest of the plot. This is useful for making the legend more readable or for creating certain types of combined legends. In this post I’ll first introduce override.aes with a basic example and then go through three additional plotting scenarios to how other instances where override.aes comes in handy. Table of Contents R packages Introducing override.aes Adding a guides() layer Using the guide argument in scale_*() Changing multiple aesthetic par...

Using RStudio and LaTeX

(This article was first published on r – Experimental Behaviour , and kindly contributed to R-bloggers) This post will explain how to integrate RStudio and LaTeX, especially the inclusion of well-formatted tables and nice-looking graphs and figures produced in RStudio and imported to LaTeX. To follow along you will need RStudio, MS Excel and LaTeX. Using tikzdevice to insert R Graphs into LaTeX I am a very visual thinker. If I want to understand a concept I usually and subconsciously try to visualise it. Therefore, more my PhD I tried to transport a lot of empirical insights by means of  visualization . These range from histograms, or violin plots to show distributions, over bargraphs including error bars to compare means, to interaction- or conditional effects of regression models. For quite a while it was very tedious to include such graphs in LaTeX documents. I tried several ways, like saving them as pdf and then including them in LaTeX as pdf, or any other file ...