Skip to main content

Following the Movement of Birds in the United States

(This article was first published on R – NYC Data Science Academy Blog, and kindly contributed to R-bloggers)

The American Birder

For the millions of bird watchers in America, relevant and useful resources are always a welcome sight. Range maps and ecological histories enhance the bird watching experience by adding a layer of conservation awareness and help hobbyists become more acquainted with the birds they observe. As a birder myself, I am always looking for new applications that help me achieve a greater understanding of the birds I observe on a daily basis. Learning more about these fascinating and beautiful animals helps to bring the bigger picture into focus; they have a much larger role in our world than just the momentary glimpse you get when observing them in a park or when walking down the street.

The Data

I chose to use the eBird dataset from the Cornell Lab of Ornithology to construct a Shiny application in R that allows a birder to “zoom out” from an isolated bird observation. eBird uses crowd-sourced data from around the world to track the locations and times of bird observations. Using their online interface, a user can view observations of many different species of birds, explore bird watching hot-spots, and even see a real time observation submission map. Observations are available from as far back as the year 1900, and the increasing accessibility of technology translates to an ever increasing avalanche of data pouring in. The total data set today consists of over 500 million observations!

The App

Using the Shiny package in R, I was able to build an application that explored observations from 2016 of 10 species of birds in the United States. I aggregated the observations by county and produced a graph using ggplot2 to show the frequency of observations throughout the US. The observations can be filtered by month using a slider bar to inspect the distribution of sightings during a particular season. I also added a feature that allows the user to filter the observations by breeding season, which I implemented by using estimated breeding season ranges from the Cornell Lab of Ornithology Birds of North America website. A feature that I found really interesting and was excited to add to my application is the “play” button which shows an animated map that cycles through the months of the year and displays the bird sightings accordingly. This provides the user with an important perspective on the movement of different species throughout the US which can sometimes be lost when looking at separate monthly range maps one at a time.

 

Map that shows number of species observations in a specific month range

 

In addition to visualizing the movement of birds throughout the US, I wanted to add additional functionality for bird watchers. I thought an interesting question to ask would be: at what time are birds most often being seen? To answer this question, I added a histogram which shows the frequency of times that a certain species was observed. What I found was that 8:00 AM was by far the most frequent time that an eBird user submitted an observation. What I have concluded is that the data is biased; many more people are actively bird watching around 8:00 AM, thus, the amount of observations spike around that time. This does not necessarily mean that a species is more likely to be seen at 8:00 AM; it means that there are more people actively looking. However, I did find a different trend for the only owl species (the Short-eared Owl) that I included in my list of species. This species showed a maximum sighting frequency at around 5:00 PM, which is in agreement with the fact that owls become more active around dusk. This leads me to believe that the functionality of this feature is mainly relevant for determining very basic activity levels for certain species.

 

Histogram that shows the frequency of species observations by time of day

 

When viewing the range map I found that regional movement of bird sightings was apparent, however it was easy to overlook state-level observation trends. I elected to add a bar graph that brakes down observations by month for every state where there were sightings. This feature allows the user to see a clear pattern in sighting frequency over the course of a year. The visualization enabled by the graph makes simple work of detecting whether the bird is a year-round resident or only present for certain seasons. This is important for understanding seasonal distribution of species. Although I have not implemented this functionality yet, a daily breakdown of sightings could yield important information on bird migration stopover sites (i.e. where birds temporarily stop to refuel during migration).

 

Graph that shows the frequency of species observations by month for a specific state

 

The final section of my application allows the user to inspect the data behind the graphics. This can be useful if the user wishes to extract a specific value of observations at a certain time or in a certain location. The data table has a search function that allows the user to filter the data by county, state, or time of day.

 

Data tables that allow users to inspect specific values in the data set

 

Going Forward

Although the application is functional, there are several potential areas for improvement that I would like to address in the future.

  1. First (and I believe most important) is that the size of the data for some species is quite large which leads to issues with loading where graphics can take several seconds to render. This makes the play function of the map difficult to use effectively in some cases. I believe that these cases would benefit tremendously from either further optimization of the code or incorporating graphics packages with quicker rendering capabilities (ideally a combination of both)
  2. Second, I would like to allow the user to inspect a much larger list of species and range of years. Due to the size of the data, storage is a significant issue which may not be avoidable without establishing a dedicated server to host the data.
  3. Third, the observations by county are currently displayed using a log scale. I decided to use a log scale over the raw number of observations because many areas have observations of 1-100 and were vastly overshadowed by areas that had observations in the thousands. These areas with lower observations can still show significant trends, and I wanted to make sure they were not ignored. Still, this system does not address the issue of there being a bias in sighting frequency strictly due to larger numbers of available birders. Areas with larger populations will produce higher numbers of sightings simply due to the fact that there are more people actively looking for birds (similar to the issue I have with the time histogram). I would like to implement a system that standardizes county sightings by county population which would give a more normalized representation of sighting frequency.
  4. Finally, my current system uses counties as groups for aggregating observations by location. This can introduce problems in location detail. For example, many counties in Western United States are very large and can diminish the granularity of the map. I have seen other bird range mapping tools (such as the eBird map) that instead use rectangular areas denoted by longitude and latitude and do not rely on human designated borders. Implementing this method could increase the level of accuracy of the map’s representation of sighting hot-spots.

Hey, Thanks!

Thank you for taking the time to read about my project! As someone who is passionate about ecology and animal behavior I found building this application to be very rewarding and insightful. I am always trying to think about new ways to marry technology and environmental biology, and data science is an incredibly powerful tool that I can use to ask and answer questions in a field that I think is fascinating. Feel free to check out my application and I welcome any feedback!

To leave a comment for the author, please follow the link and comment on their blog: R – NYC Data Science Academy Blog.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...


from R-bloggers https://ift.tt/2NURuzB
via IFTTT

Comments

Popular posts from this blog

Solving Van der Pol equation with ivp_solve

Van der Pol’s differential equation is The equation describes a system with nonlinear damping, the degree of damping given by μ. If μ = 0 the system is linear and undamped, but for positive μ the system is nonlinear and damped. We will plot the phase portrait for the solution to Van der Pol’s equation in Python using SciPy’s new ODE solver ivp_solve . The function ivp_solve does not solve second-order systems of equations directly. It solves systems of first-order equations, but a second-order differential equation can be recast as a pair of first-order equations by introducing the first derivative as a new variable. Since y is the derivative of x , the phase portrait is just the plot of ( x , y ). If μ = 0, we have a simple harmonic oscillator and the phase portrait is simply a circle. For larger values of μ the solutions enter limiting cycles, but the cycles are more complicated than just circles. Here’s the Python code that made the plot. from scipy import linspace from ...

Lawyer: 'Socialite Grifter' Anna Sorokin 'Had To Do It Her Way' (And Steal $275,000)

Opening statements were made in the "Socialite Grifter" trial on Wednesday, and both sides provided extremely different reasons why Anna Sorokin allegedly scammed a number of people and institutions out of $275,000. [ more › ] Gothamist https://ift.tt/2HXgI0E March 29, 2019 at 12:33AM

5 Massively Important AI Features In Time Tracking Applications

Artificial intelligence has transformed the future of many industries. One area that has been under- investigated is the use of AI in time tracking technology. AI is Fundamentally Changing the Future of Time Tracking Technology A time tracking software is a worthy investment irrespective of the size of your organization. It generates accurate reports based on the amount of time your team spends working on a task. These reports facilitate planning of budgets for upcoming projects. Many AI tools are changing the nature of time management. MindSync AI discussed the pivotal role of AI in time management in a Medium article . Why is time tracking software important? It helps with keeping track of the hours being invested on a given task. This sheds light on the timeline for the overall project. It also helps in determining the productivity levels of the employees. This is one of the many reasons that AI is driving workplace productivity . But how can employers utilize it effectively? ...