Skip to main content

R Tip: Use drop = FALSE with data.frames

(This article was first published on R – Win-Vector Blog, and kindly contributed to R-bloggers)

Another R tip. Get in the habit of using drop = FALSE when indexing (using [ , ] on) data.frames.

NewImage

Prince Rupert’s drops (img: Wikimedia Commons)

In R, single column data.frames are often converted to vectors when manipulated. For example:

d <- data.frame(x = seq_len(3))
print(d)
#>   x
#> 1 1
#> 2 2
#> 3 3
# not a data frame!
d[order(-d$x), ]
#> [1] 3 2 1

We were merely trying to re-order the rows and the result was converted to a vector. This happened because the rules for [ , ] change if there is only one result column. This happens even if the there had been only one input column. Another example is: d[,] is also vector in this case.

The issue is: if we are writing re-usable code we are often programming before we know complete contents of a variable or argument. For a data.frame named “g” supplied as an argument: g[vec, ] can be a data.frame or a vector (or even possibly a list). However we do know if g is a data.frame then g[vec, , drop = FALSE] is also a data.frame (assuming vec is a vector of valid row indices or a logical vector, note: NA induces some special cases).

We care as vectors and data.frames have different semantics, so are not fully substitutable in later code.

The fix is to include drop = FALSE as a third argument to [ , ].

# is a data frame.
d[order(-d$x), , drop = FALSE]
#>   x
#> 3 3
#> 2 2
#> 1 1

To pull out a column I suggest using one of the many good extraction notations (all using the fact a data.frame is officially a list of columns):

d[["x"]]
#> [1] 1 2 3

d$x
#> [1] 1 2 3

d[[1]]
#> [1] 1 2 3

My overall advice is: get in the habit of including drop = FALSE when working with [ , ] and data.frames. I say do this even when it is obvious that the result does in fact have more than one column.

For example write “mtcars[, c("mpg", "cyl"), drop = FALSE]” instead of “mtcars[, c("mpg", "cyl")]“. It is clear that for data.frames both forms should work the same (either selecting a data frame with two columns, or throwing an error if we have mentioned a non existent column). But longer drop = FALSE form is safer (go further towards ensuring type stable code) and more importantly documents intent (that you wanted a data.frame result).

One can also try base::subset(), as it has non-dropping defaults.

To leave a comment for the author, please follow the link and comment on their blog: R – Win-Vector Blog.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...


from R-bloggers http://ift.tt/2EYloz6
via IFTTT

Comments

Popular posts from this blog

Solving Van der Pol equation with ivp_solve

Van der Pol’s differential equation is The equation describes a system with nonlinear damping, the degree of damping given by μ. If μ = 0 the system is linear and undamped, but for positive μ the system is nonlinear and damped. We will plot the phase portrait for the solution to Van der Pol’s equation in Python using SciPy’s new ODE solver ivp_solve . The function ivp_solve does not solve second-order systems of equations directly. It solves systems of first-order equations, but a second-order differential equation can be recast as a pair of first-order equations by introducing the first derivative as a new variable. Since y is the derivative of x , the phase portrait is just the plot of ( x , y ). If μ = 0, we have a simple harmonic oscillator and the phase portrait is simply a circle. For larger values of μ the solutions enter limiting cycles, but the cycles are more complicated than just circles. Here’s the Python code that made the plot. from scipy import linspace from ...

Lawyer: 'Socialite Grifter' Anna Sorokin 'Had To Do It Her Way' (And Steal $275,000)

Opening statements were made in the "Socialite Grifter" trial on Wednesday, and both sides provided extremely different reasons why Anna Sorokin allegedly scammed a number of people and institutions out of $275,000. [ more › ] Gothamist https://ift.tt/2HXgI0E March 29, 2019 at 12:33AM

NYC's Deadliest Trash Hauling Company Is Going Out Of Business

Sanitation Salvage, the embattled private trash hauling company responsible for two deaths and countless safety violations , has surrendered its license and is going out of business. The company announced the decision in a letter sent to the Business Integrity Commission this week, city officials said. [ more › ] Gothamist https://ift.tt/2TYFVLx November 28, 2018 at 07:14PM