Introduction

The Black Saturday bush fires took place in Victoria in 2009 and devastated the country, claiming many lives, including an estimated one million animals (NMA, 2022). Months of hot and dry conditions combined with strong winds from Central Australia created the perfect conditions for these catastrophic fires to thrive (NMA, 2022).

We have decided to look specifically at the Lyrebird family (Menuridae), as they are a prominent native Australian group, and used data extracted from the Atlas of Living Australia to study how the distribution and abundance of these animals changed as a result of the fires. We are aiming to do this by comparing data from before and after the event and presenting the information on spatial graphs and as a generalised linear model.

Past studies have found that Lyrebirds forage for food that is sparsely found, therefore, requiring them to occupy large habitats (Lill, 1995). As a result, they are quite vulnerable to the harm of any habitat fragmentation or destruction such as is caused by bushfires (Lill, 1995). Arthur et al. (2012) and Nungent et al. (2014) have each seen a decrease in Lyrebirds in fire-affected areas immediately after a fire, with Robinson et al. (2016) finding they moved to areas that were less affected by fires such as gullies and rainforests. Conversely, Doty et al. (2014) have found that Lyrebirds favour post-fire environments as they find them ideal for foraging.

Our report aims to see how Lyrebirds reacted after the Black Saturday fires specifically and will hopefully act as an exemplar for users of the Atlas of Living Australia who desire to transform other data sets into digestible information that can be applied to real-world problems.

Method

To answer this question, we used a multifaceted analysis of the data, performing both statistical and spatial analysis, utilising Rstudio and GIS programs.

Data Extraction

Extracting Data From Atlas of Living Australia

Lyrebirds where chosen as the specific species as they were a species that had sufficient data available for the research time period, three months before and after the Black Saturday fires occurred, as well as being present in the areas affected by the fires.

The positional data of the lyrebirds used in the analysis was extracted from the Atlas of Living Australia database. This was done by filtering the database to display records of lyrebirds located in Victoria from the beginning of the year 2000 to the end of 2009. This narrowed down the number of records from over 94 000 to around 3 000 and saved time in cleaning up the data to just the records in the time and place that was desired.

Map of filtered lyrebird records

The data was then extracted as a CSV file that contained each individual record, the date it was recorded on and the latitude and longitude of the record.

The data was then cleaned, firstly by converting the dates to a Year-Month-Day format.

touch temp
sed -E 's?(T13:00:00Z|T14:00:00Z)??g' ALA_LyreBird.csv > temp
cat temp > correct_ALA_LyreBirdless.csv
rm temp

This puts the the record dates into a format that can be read by R functions. From here, the records that occurred within our time frame, 2008/11/01 to 2009/06/30, was extracted, removing any other records that occurred outside of this time frame.

start <- which(cleaned$eventDate == "2008-11-01")[1]
finish <- which(cleaned$eventDate == "2009-06-30")[1]
cleaned <- cleaned[start:finish, ]
write.csv(cleaned, "ALA_LyreBird.csv", row.names = FALSE)

We use python to create a new file from the cleaned file above and the new file will contain the date and the counts of lyrebirds accumulated within that week.

import pandas

def getdata(data1):
    all_date = {}
    ret = {}
    date_range = pd.date_range(start="2008-11-01", end='2009-06-30')
    for i in date_range:
        capture = str(i).split(' ')[0]
        all_date.update({capture: 0})
    counter = 0
    
    for i in data1.index:
        var =  data1['eventDate'][i]
        counter += data1['individualCount'][i]
        if var in all_date:
            num = all_date[var]
            all_date.update({var:  num + counter})
            counter = 0
    start = 0
    count = 0
    for i in all_date:
        start += 1
        count += all_date[i]
        if start == 7:
            ret.update({i: count})
            count = 0
            start = 0
    return ret
  
final_data = pd.read_csv("correct_ALA_LyreBirdless.csv")
everything = getdata(final_data)
write_file = open('countsperweek.csv', 'a')
read_file = open('countsperweek.csv', 'r')
if len(read_file.readlines()) == 0:
    write_file.write('Date,num_birds\n')
else:
    write_file.close()
    read_file.close()
    quit()
for i in everything:
    string = i + ',' + str(everything[i]) + '\n'
    write_file.write(string)
write_file.close()
read_file.close()

Additional environmental variables that were considered to have an impact on Lyrebird abundance and distribution was also added to each record’s profile. These variables included rainfall, minimum temperature, tree density and land cover type.

Combining Additional Environmental Variable Data Using ArcGIS Pro

As many of these variables were available as shapefiles or rasters, we found the easiest way to connect the variable data to each record was by using a spatial program, ArcGIS Pro. The occurrence data was uploaded as a csv file and then converted to a XY point feature class using the XY Table to Point tool. This plots the occurrence data onto the map, visualising the occurrence data. However, this can only be done if the occurrence data has the longitude and latitude of each occurrence recorded.

The shapefiles and rasters of the environmental variables, land cover type and tree density, were gathered from the Victorian Government Data Directory and then uploaded to ArcGIS. These files required some processing as, while some were already downloadable as raster files (.tif), many of them could only be downloaded as shapefiles (.shp), which meant they needed to be converted into raster files. This was done using the Feature to Raster tool.

The data of each raster was then exported at each XY point and joined by each occurrence record using the Extract Multi Values to Points tool. This resulted in the extraction of each raster’s values at the location of each occurrence record, meaning each lyrebird record also contained information on the land cover type and tree density. The XY point feature file was then exported as an excel file using the Table to Excel tool.

Rainfall and Temperature Data Extraction

The monthly rainfall and temperature variables needed for the generalised linear model were obtained from the Bureau of Meteorology website. They have collected weather station records that have historic data on the monthly rainfall as well as the monthly maximum and minimum temperatures that occurred at each of these stations.

To find these variables for each lyrebird sighting, the coordinates of the sighting were input into the website and the closest weather station with data available for the month that the sighting occurred was selected. This data was presented in a table containing the values for each month over a period of years. The value for the month that the record was made in was input into the spreadsheet for the generalised linear model to analyse. This process was repeated for each sighting.

BOM page to find closest weather station and how the data is displayed

Introducing variables is important as it takes into consideration the interaction between Lyrebird distribution and abundance and factors other than fire. Additionally, it accounts for the relationship between the fire and variables such as rainfall and temperature.

Analysis

The analysis we performed involved produced two generalised linear models (GLM), for both lyrebird abundance and distribution, and a spatial model, for lyrebird distribution. The difference between the distribution glm and the spatial model is that the glm is used to predict potential explanations for distribution where as the spatial model only looks at where they were before the fire and after the fire and does not try and explain the reasoning for any movement.

Creating a Generalised Linear Model for Lyrebird Abundance

Organising the Data

A GLM was used as it was a much more flexible model that allowed for the use of multiple variables to explain the response variables, i.e. distribution or abundance of lyrebirds.

The packages required are tidyverse, emmeans and sjPlot.

We then performed some precursory analysis, visualising the data and getting a basic understanding of what the data may look like.

Lyrebirds <- read.csv("Data/ALA_LyreBird.csv")
summary(Lyrebirds)
##   eventDate              year          month             day       
##  Length:208         Min.   :2008   Min.   : 1.000   Min.   : 1.00  
##  Class :character   1st Qu.:2008   1st Qu.: 4.000   1st Qu.: 8.00  
##  Mode  :character   Median :2009   Median : 5.000   Median :20.50  
##                     Mean   :2009   Mean   : 6.433   Mean   :17.31  
##                     3rd Qu.:2009   3rd Qu.:11.000   3rd Qu.:26.00  
##                     Max.   :2009   Max.   :12.000   Max.   :31.00  
##  decimalLatitude  decimalLongitude    Rainfall     MeanMaxTemperature
##  Min.   :-38.43   Min.   :145.2    Min.   :  2.8   Min.   : 5.00     
##  1st Qu.:-37.74   1st Qu.:146.4    1st Qu.: 22.4   1st Qu.:15.75     
##  Median :-37.64   Median :147.7    Median : 57.8   Median :20.60     
##  Mean   :-37.55   Mean   :147.5    Mean   : 73.8   Mean   :19.74     
##  3rd Qu.:-37.49   3rd Qu.:148.3    3rd Qu.: 98.6   3rd Qu.:22.52     
##  Max.   :-36.12   Max.   :149.9    Max.   :267.8   Max.   :30.20     
##  MeanMinTemperature IndividualCount  TreeDensity        Landcover_type    
##  Min.   :-0.100     Min.   : 1.000   Length:208         Length:208        
##  1st Qu.: 5.600     1st Qu.: 1.000   Class :character   Class :character  
##  Median : 9.300     Median : 1.000   Mode  :character   Mode  :character  
##  Mean   : 9.082     Mean   : 1.279                                        
##  3rd Qu.:12.100     3rd Qu.: 1.000                                        
##  Max.   :15.500     Max.   :10.000                                        
##    FireSev         
##  Length:208        
##  Class :character  
##  Mode  :character  
##                    
##                    
## 
LyrebirdPoints <- read.csv("Data/countsperweek.csv")
num_days <- seq(-16, nrow(LyrebirdPoints) - 1 - 16)
num_birds_col <- LyrebirdPoints$num_birds
variable <- data.frame(num_birds_col, num_days)
colors <- c(rep("#f8766d", 16), rep("#ffe15b", 5), rep("#00bfc4", 13))
barplot(num_birds_col~ num_days , data=variable, xlab='Time since fire (weeks)', ylab='Number of sightings per week', col = colors)
legend("top", legend = c("Before Event", "During Event", "After Event"), fill = c("#f8766d", "#ffe15b", "#00bfc4"))

ggplot(Lyrebirds, aes(IndividualCount)) + 
  geom_histogram(bins = 10L, fill = "orange") + labs(x = "Individual Count", y = "Frequency")

Lyrebirds$LOG_IndividualCount <- log10(Lyrebirds$IndividualCount)

ggplot(Lyrebirds, aes(LOG_IndividualCount)) + 
  geom_histogram(bins = 10L, fill = "orange") + labs(x = "Individual Count", y = "Frequency")

We then summerised the mean daily temperature and rainfall using a median function, as you cannot summarise the mean of a mean. The records were then grouped by months.

Lyrebirds_per_month <- Lyrebirds %>%  # Specify data frame
  group_by(month) %>%                 # Specify group indicator
  summarise_at(vars(IndividualCount, Rainfall, MeanMaxTemperature, MeanMinTemperature),   # Specify columns
               list(Count_per_Month = median))                                            # Specify function

We then summerised the individual counts to mean number of lyrebird counts per month.

Lyrebirds_per_month$IndividualCount_Count_per_Month <- Lyrebirds %>% # Specify data frame
  group_by(month) %>%                                                # Specify group indicator
  summarise_at(vars(IndividualCount),                                # Specify column
               list(mean = mean))                                    # Specify function



# Here we remove the extra median column from the first dplyr tibble we made
Lyrebirds_per_month$IndividualCount_Count_per_Month <- Lyrebirds_per_month$IndividualCount_Count_per_Month$mean 

A copy of the months column was then created to sort the records into before, during or after the fire.

Lyrebirds_per_month$Fire_pres_abs <- Lyrebirds_per_month$month

Lyrebirds_per_month$Fire_pres_abs[Lyrebirds_per_month$Fire_pres_abs == 11] <- c("Before")
Lyrebirds_per_month$Fire_pres_abs[Lyrebirds_per_month$Fire_pres_abs == 12] <- c("Before")
Lyrebirds_per_month$Fire_pres_abs[Lyrebirds_per_month$Fire_pres_abs == 1] <- c("Before")

Lyrebirds_per_month$Fire_pres_abs[Lyrebirds_per_month$Fire_pres_abs == 2] <- c("Before")
Lyrebirds_per_month$Fire_pres_abs[Lyrebirds_per_month$Fire_pres_abs == 3] <- c("After")

Lyrebirds_per_month$Fire_pres_abs[Lyrebirds_per_month$Fire_pres_abs == 4] <- c("After")
Lyrebirds_per_month$Fire_pres_abs[Lyrebirds_per_month$Fire_pres_abs == 5] <- c("After")
Lyrebirds_per_month$Fire_pres_abs[Lyrebirds_per_month$Fire_pres_abs == 6] <- c("After")
Lyrebirds_per_month$Fire_pres_abs[Lyrebirds_per_month$Fire_pres_abs == 7] <- c("After")

As seen when we visualised our individual counts of lyrebirds, the distribution was not a normal distribution and so we decided to perform a logarithmic transformation to fix this problem.

Lyrebirds_per_month$LOG_IndividualCount <- log10(Lyrebirds_per_month$IndividualCount_Count_per_Month)

Abundance GLM

Once we had cleaned up and organised our data to how we needed it, we began looking for interactions between our variables. We started with all numerical variables: * Mean monthly minimum temperature * Mean monthly maximum temperature * Mean monthly rainfall

cor(Lyrebirds_per_month$MeanMinTemperature_Count_per_Month, Lyrebirds_per_month$Rainfall_Count_per_Month)
## [1] 0.330224
cor(Lyrebirds_per_month$MeanMaxTemperature_Count_per_Month, Lyrebirds_per_month$Rainfall_Count_per_Month)
## [1] 0.1877524
cor(Lyrebirds_per_month$MeanMinTemperature_Count_per_Month, Lyrebirds_per_month$MeanMaxTemperature_Count_per_Month)
## [1] 0.9418245

We ran GLMs for every biologically relevant interaction and found the interactions which were statistically significant, p<0.05.

Lyrebird.glm <- glm(LOG_IndividualCount ~ Rainfall_Count_per_Month * Fire_pres_abs, data = Lyrebirds_per_month)
Lyrebird.glm.emm.s <- emmeans(Lyrebird.glm, specs = pairwise ~ Rainfall_Count_per_Month * Fire_pres_abs)
##                                                                                           contrast
## 1 Rainfall_Count_per_Month65.8222222222222 After - Rainfall_Count_per_Month65.8222222222222 Before
##    estimate         SE df   lower.CL  upper.CL  t.ratio    p.value
## 1 0.1660709 0.04383495  5 0.05338959 0.2787522 3.788551 0.01277738

From this we could see that the most significant interaction occurred when rainfall was high in areas that had been affected by the fires. These areas saw an increased abundance of lyrebirds where as areas that recieved less rain was found to have less abundance of lyrebirds. We then plotted the prediction that our model shows.

plot_model(Lyrebird.glm, type = "pred", terms = c("Rainfall_Count_per_Month", "Fire_pres_abs"), title = "Predicted Lyrebird abundance before and after Black Saturday fires", axis.title = c("Rainfall count per month", "Log Individual count"), legend.title = "Fire presence")

Creating a Generalised Linear Model for Lyrebird Distribution

The packages you will need for this are geodata, raster, maptools, rgdal, rgeos, sf, tidyverse, ozmaps and dismo.

Because this GLM was for distribution and so required the use of rasters. When creating this guide, processing the Land Cover raster file proved to be too much for the computer we wrote this up on, so we have commented out the lines that involve the Land Cover raster so you can still see where it should be. When Land Cover was run in the analysis, it should be noted that it did not produce any significant interactions.

Soil <- raster("Rasters/Soil_TypeVIC.tif")
TreeDensity <- raster("Rasters/TreeDensity.tif")
#LandCover <- raster("Rasters/vic_landCOVER_TS_2015_19.tif")
precipitationBefore <- stack("Rasters/wc2.1_2.5m_prec_2008-11.tif",
                     "Rasters/wc2.1_2.5m_prec_2008-12.tif",
                      "Rasters/wc2.1_2.5m_prec_2009-01.tif",
                      "Rasters/wc2.1_2.5m_prec_2009-02.tif"
                      )
MaxTempBefore <- stack("Rasters/wc2.1_2.5m_tmax_2008-11.tif",
                 "Rasters/wc2.1_2.5m_tmax_2008-12.tif",
                 "Rasters/wc2.1_2.5m_tmax_2009-01.tif",
                 "Rasters/wc2.1_2.5m_tmax_2009-02.tif")

We needed to ensure all the rasters where projected into the same datum, WGS84.

projection(Soil) <- "+proj=longlat +datum=WGS84 +no_defs"
projection(precipitationBefore) <- CRS("+proj=longlat +datum=WGS84 +no_defs")
projection(TreeDensity)  <- CRS("+proj=longlat +datum=WGS84 +no_defs")
#projection(LandCover) <- CRS("+proj=longlat +datum=WGS84 +no_defs")
projection(MaxTempBefore) <- CRS("+proj=longlat +datum=WGS84 +no_defs")

We then made sure that the rasters were clipped to the extent of Victoria as this helps reduce processing time and ensures the model doesn’t try and include information from outside Victoria.

lga_sf <- ozmap_data("abs_ste")
vic_shape <- lga_sf[2, ]
vic_shape <- st_transform(vic_shape, crs = "+proj=longlat +datum=WGS84 +no_defs")
vic <- extent(140.8894, 149.9763, -39.16078, -33.92641)
Soil <- setExtent(Soil, vic_shape, keepres = F)
TreeDensity <- setExtent(TreeDensity, vic_shape, keepres = F)
#LandCover <- setExtent(LandCover, vic_shape, keepres = F)
precipitationBefore <- raster::projectRaster(precipitationBefore, TreeDensity)
MaxTempBefore <- raster::projectRaster(MaxTempBefore, TreeDensity)
Soil <- raster::projectRaster(Soil, TreeDensity)
#LandCover <- raster::projectRaster(LandCover, TreeDensity)

The biostack requires individual rasters and currently our rainfall and max temperature rasters had been collated into two stacks. We then needed to pull them apart.

Prec_11 <- raster(precipitationBefore, layer = 1)
Prec_12 <- raster(precipitationBefore, layer = 2)
Prec_1 <- raster(precipitationBefore, layer = 3)
Prec_2 <- raster(precipitationBefore, layer = 4)

MaxTemp_11 <- raster(MaxTempBefore, layer = 1)
MaxTemp_12 <- raster(MaxTempBefore, layer = 2)
MaxTemp_1 <- raster(MaxTempBefore, layer = 3)
MaxTemp_2 <- raster(MaxTempBefore, layer = 4)

We could now create our biostack which would create a map showing the data of each raster. The rainfall and max temperature are taken from before the fire occurrec.

#This is what the line would look like including Land Cover, but for this purpose we will run it without.
#BioStack <- stack(Soil, TreeDensity, LandCover, Prec_11, Prec_12, Prec_1, Prec_2, MaxTemp_11, MaxTemp_12, MaxTemp_1, MaxTemp_2)

BioStack <- stack(Soil, TreeDensity, Prec_11, Prec_12, Prec_1, Prec_2, MaxTemp_11, MaxTemp_12, MaxTemp_1, MaxTemp_2)
plot(BioStack)

We then needed to read in the occurrence records locations as coordinates in the same projection as the rasters.

From here, we then created our background data for the GLM. This creates pseudo absence data which the GLM will use to compare the presence data to when calculating its outputs.

BgLyre <- randomPoints(BioStack, 500) #random points from all vic
PresenceLyre <- raster::extract(BioStack, LyrebirdCoor) #All recorded lyrebird presence and its environmental variables
AbsenceLyre <- raster::extract(BioStack, BgLyre) # Use random points from BgLyre and associated environmental variables as absence

PresenceBgLyre <- c(rep(1, nrow(PresenceLyre)), rep(0, nrow(AbsenceLyre))) #combine absence and absence data
LyresdmData <- data.frame(cbind(PresenceBgLyre, rbind(PresenceLyre, AbsenceLyre))) #create species distribution model

We then ran GLMs for every variable to find the best interaction. These were using the variables from before the fire occurred. We found that tmperature and rainfall had significant interactions.

#This is what the line would look like including Land Cover, but for this purpose we will run it without.
#LyresdmData = na.omit(LyresdmData)
#LyrebirdBg.glm <- glm(PresenceBgLyre~ Soil_TypeVIC + TreeDensity + wc2.1_2.5m_prec_11+ wc2.1_2.5m_prec_12+ wc2.1_2.5m_prec_01 + wc2.1_2.5m_tmax_11 + wc2.1_2.5m_tmax_12 + wc2.1_2.5m_tmax_01 + VIC_LANDCOVER_TS_2015_19, data = LyresdmData)
#step(LyrebirdBg.glm, test="F")
#LyrebirdBg.glm <- glm(PresenceBgLyre ~ Soil_TypeVIC + wc2.1_2.5m_prec_11 + wc2.1_2.5m_prec_12 + wc2.1_2.5m_tmax_11 + wc2.1_2.5m_tmax_01, data = LyresdmData)

LyresdmData = na.omit(LyresdmData)
LyrebirdBg.glm <- glm(PresenceBgLyre~ Soil_TypeVIC + TreeDensity + wc2.1_2.5m_prec_11+ wc2.1_2.5m_prec_12+ wc2.1_2.5m_prec_01 + wc2.1_2.5m_tmax_11 + wc2.1_2.5m_tmax_12 + wc2.1_2.5m_tmax_01, data = LyresdmData)
#step(LyrebirdBg.glm, test="F") This displays the GLM through steps
LyrebirdBg.glm <- glm(PresenceBgLyre ~ Soil_TypeVIC + wc2.1_2.5m_prec_11 + wc2.1_2.5m_prec_12 + wc2.1_2.5m_tmax_11 + wc2.1_2.5m_tmax_01, data = LyresdmData)
summary(LyrebirdBg.glm)
## 
## Call:
## glm(formula = PresenceBgLyre ~ Soil_TypeVIC + wc2.1_2.5m_prec_11 + 
##     wc2.1_2.5m_prec_12 + wc2.1_2.5m_tmax_11 + wc2.1_2.5m_tmax_01, 
##     data = LyresdmData)
## 
## Deviance Residuals: 
##      Min        1Q    Median        3Q       Max  
## -0.77412  -0.23450  -0.03946   0.20408   0.80165  
## 
## Coefficients:
##                      Estimate Std. Error t value Pr(>|t|)    
## (Intercept)        -1.167e+00  4.974e-01  -2.347 0.019625 *  
## Soil_TypeVIC        9.198e-05  3.394e-05   2.710 0.007133 ** 
## wc2.1_2.5m_prec_11  3.658e-03  7.212e-04   5.071 7.07e-07 ***
## wc2.1_2.5m_prec_12  4.630e-03  1.374e-03   3.369 0.000858 ***
## wc2.1_2.5m_tmax_11  1.557e-01  2.946e-02   5.283 2.50e-07 ***
## wc2.1_2.5m_tmax_01 -9.613e-02  2.678e-02  -3.590 0.000388 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for gaussian family taken to be 0.1366113)
## 
##     Null deviance: 55.932  on 294  degrees of freedom
## Residual deviance: 39.481  on 289  degrees of freedom
## AIC: 257.88
## 
## Number of Fisher Scoring iterations: 2

We were also able to create a map predicting the probability of the locations of lyrebirds before the fire.

#Make rasters into geom objects
#--- convert to data.frame ---#
LyrePredict <- raster::predict(BioStack, LyrebirdBg.glm, ext=vic)
projection(LyrePredict) <- CRS("+proj=longlat +datum=WGS84 +no_defs")
LyrePredict_df <-
  as.data.frame(LyrePredict, xy = TRUE) %>%
  #--- remove cells with NA for any of the layers ---#
  na.omit() 


# make dataframe of presence points
LyrePoints_df <-
  as.data.frame(LyrebirdCoor@coords, xy = TRUE)

projection(LyrebirdCoor) <- CRS("+proj=longlat +datum=WGS84 +no_defs")
#LyrePoints_df <- ~decimalLongitude + decimalLatitude


# make victoria dataframe projection
lga_sf <- ozmap_data("abs_ste")
vic_shape <- lga_sf[2, ]
vic_shape <- st_transform(vic_shape, crs = "+proj=longlat +datum=WGS84 +no_defs")

# plot both predicted sdm background data and actual records:
pl <- ggplot(data = st_transform(vic_shape, 4326)) + 
  geom_sf(data = st_transform(vic_shape, 4326), fill = NA)+
  #coord_sf(crs = 4326) +
  geom_tile(aes(x = x, y = y, fill = layer), data = LyrePredict_df)+
  scale_fill_viridis_c() +
  theme_bw() +
  geom_point(data = LyrePoints_df, aes(x = decimalLongitude, y = decimalLatitude, color = "Lyrebirds")) + 
  labs(x = "Longitude", y = "Latitude", color = "Before fire recorded presence", fill = "Predicted presence likelihood")
pl

So we then repeated this whole process for the variables after the fire and found the same outcome as before the fire.

#We obtained this GLM by using step().
LyrebirdBgAfter.glm = glm(PresenceBgLyreAfter ~ Soil_TypeVIC + wc2.1_2.5m_prec_03 + wc2.1_2.5m_prec_04 + wc2.1_2.5m_prec_06 + wc2.1_2.5m_tmax_04 + wc2.1_2.5m_tmax_05, data = LyresdmDataAfter)
summary(LyrebirdBgAfter.glm)
## 
## Call:
## glm(formula = PresenceBgLyreAfter ~ Soil_TypeVIC + wc2.1_2.5m_prec_03 + 
##     wc2.1_2.5m_prec_04 + wc2.1_2.5m_prec_06 + wc2.1_2.5m_tmax_04 + 
##     wc2.1_2.5m_tmax_05, data = LyresdmDataAfter)
## 
## Deviance Residuals: 
##      Min        1Q    Median        3Q       Max  
## -0.90741  -0.20718  -0.04562   0.19972   0.84035  
## 
## Coefficients:
##                      Estimate Std. Error t value Pr(>|t|)    
## (Intercept)        -0.3382930  0.3805021  -0.889 0.374726    
## Soil_TypeVIC        0.0001431  0.0000310   4.616 5.95e-06 ***
## wc2.1_2.5m_prec_03 -0.0024963  0.0017761  -1.405 0.160990    
## wc2.1_2.5m_prec_04  0.0127038  0.0028442   4.467 1.15e-05 ***
## wc2.1_2.5m_prec_06 -0.0090905  0.0023679  -3.839 0.000153 ***
## wc2.1_2.5m_tmax_04 -0.1481875  0.0437961  -3.384 0.000817 ***
## wc2.1_2.5m_tmax_05  0.2001613  0.0418700   4.781 2.82e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for gaussian family taken to be 0.1315326)
## 
##     Null deviance: 60.319  on 287  degrees of freedom
## Residual deviance: 36.961  on 281  degrees of freedom
## AIC: 242.01
## 
## Number of Fisher Scoring iterations: 2

We also created a map predicting the distribution of lyrebirds after the fire.

Spatial Analysis of Species Occurrence Before and After a Major Bushfire event using ArcGIS Pro 3.0

Preparing the Data for ArcGIS Pro

The most important columns required in ArcGIS are the Longitude, Latitude, EventDate, IndividualCount and a UID (unique identifier).

IndividualCount for Lyrebirds had some values. To combat this, we will convert all Null values to 1, assuming that each record would possess at least 1 individual Lyrebird.

EventDate had to be formatted to (dd/mm/yyyy) to enable Time in ArcGIS and also make it easier to select for attributes according to the date.

We created the UID column in excel and each row was numbered in ascending order (1, 2, 3, 4, 5…). All tables used in ArcGIS require a UID for each row (cannot have Null values) if not you will get an error.

Lastly, delete all columns (other than those specified above) which are inapplicable to the study to make the data easier to navigate in ArcGIS. Save the excel file as a csv.(Macintosh).

ArcGIS Pro

When working in ArcGIS Pro a good first step is to set the Geoprocessing environment settings (e.g. processing extent, map projection and workspace) to match the study area of interest and data being used.

In this case we used WGS1984 (Central Meridian: Greenwich), however GDA1994 is also a good option. We used WGS1984 due to the scale of the extent of our data (Over all of the state of Victoria), and also because of its use in previous research for the same study area.

Creating before and after fire point pattern maps

Use the XY Table to Point Tool to create a point feature class layer from the Lyrebird data table.

The Black Saturday Fires occurred between 7th February 2009 and 14th March 2009. We retrieved data in 3 month periods exactly 3 months before and 3 months after these dates (e.g. 07/11/2008 - 07/02/2009 and 14/03/2009 - 14/06/2009).

To create two separate layers containing these dates, we used the Select Layer By Attribute geoprocessing tool. Using the EventDate column from the table we selected for each time period specified above.

Next use the Make Layer from Selected Feature button to create each before and after layer file and export them to your Geodatabase to ensure they are saved. Go to Add data and display your new before and after layers. You can adjust the colours and sizes of the data points in the Symbology tab. Add a layout to your project. This is where you can create a map to export with a title, legend, north arrow and scale.

In Figures 1 & 2 are maps we created using ALA Lyrebird incidence records

Elevating your XY Data Maps

In this section we are going to take our XY Before and After Lyrebird maps to the next level by incorporating a Black Saturday Fire Severity file and data on Bioregions through Victoria.

Fire Shapefile
Although we have our Lyrebird data points from before and after the fire. The maps are missing a vital piece of information - where the fire occurred!

We downloaded a Black Saturday Fire shapefile from the Victorian Government Datashare website. This particular file also had information on the intensity of the fire.

Add the fire shapefile to your map, you may get a warning that the projection of the file does not match the map. You can use the Data Management tool - Project to change the file’s projection to WGS1984 (or the coordinate system of your choosing). Next we can use graduated colours to better reflect the intensity of the fire by specifying Symbology by quantity.

Bioregion Data
Next we are going to add even more information to our maps but this time we are going to display it through each XY point through joining layers. Another piece of information we thought would be good to look at was which Victorian Bioregions each Lyrebird was recorded in. Bioregions refer to natural environments with certain defining characteristics.

We downloaded this bioregion data in the form of a shapefile from the Victorian Government Datashare website.

Add the Bioregion shapefile to your map. Symbolise the file by unique values to distinguish the different bioregion types.

Now, we want to connect the bioregion data spatially to our XY Lyrebird points to figure out which bioregion they were recorded in.

To do this we used the Add Spatial Join tool. We want our Target features to be the Lyrebird points, and the Join features to be the bioregion shapefile. We used the Intersect match option with a search radius of 5 kilometres (we chose this because many GPS devices have been found to have recording inaccuracies within 5km).

If you check the Attribute Table for your lyrebird data points there should be matched columns with bioregion data. To display this intersected bioregion data change the symbology of your lyrebird data to unique values, and select for Bioregion in the Field parameter as we have done before.

The following figures 3 & 4 show our mapped Lyrebird points, incorporating our fire and bioregion data.

Spatial Analysis

The XY points for our Lyrebird records can also be referred to as a “point pattern”. Keeping in mind our data type, the following analyses will be specifically for point patterns.

Testing for CSR with Average Nearest Neighbours

It is common practice in spatial analyses to first test for complete spatial randomness (CSR). Essentially it involves testing for whether there is a pattern in the lyrebird distribution (dispersed or clustered) or whether it has a random distribution.

This is a very important starting point, as it will help inform what scales we should be looking at our data and whether there is anything interesting in our point pattern.

To do so we will consider the following hypotheses:

  • Null Hypothesis: The Lyrebird point pattern will follow a distribution of complete spatial randomness (CSR).

  • Alternative Hypothesis: The Lyrebird point pattern will not follow a distribution of complete spatial randomness (CSR).

To test our hypotheses we will use an Average Nearest Neighbours Test. The Average Nearest Neighbour (ANN) test measures the centre of each point to the centre of its neighbouring points, and then averages these distances. It then compares the mean neighbourhood distance to the distances you could expect in a random distribution to tell us whether the point pattern is random, clustered or dispersed.

Input your Lyrebird data into the Average Nearest Neighbour (ANN) tool. We used the Euclidean Distance Method, left Area as the default and checked to generate a report. We have summarised our results from our ANN reports for both before and after Lyrebird points below:

Results: The results of the Average Nearest Neighbours test show that Lyrebird distribution is significantly spatially clustered for both before and after the fire.

We also reject the null, and accept the alternative hypothesis that the Lyrebird point pattern does not follow CSR.

Besides testing for CSR the ANN tool has provided us with another useful piece of information: the mean distance at which clustering is occurring. This will be useful in the following analyses where we will analyse our point pattern at smaller scales.

Testing for Clusters and Outliers with Anselin Local Moran’s I

Background

Now that we know that Lyrebird records are clustered, our next step will help us identify exactly where these clusters are. Moran’s I statistic is commonly used in the field of ecology when analysing species distributions spatially. Anselin Local Moran’s I is a more detailed version of this statistic which can analyse point patterns at smaller scales, and point out particular clusters.

The Clusters and Outliers Analysis tool in ArcGIS uses this statistic to identify clusters and outliers in the point pattern. Essentially this tool will consider the counts of each record as well as the neighbourhood of other records (or how close other records are) and compare these values to if the points followed a random distribution.

  • Clusters can be either hotspots or cold spots:
  • Hotspot: high values surrounded by high values
  • Coldspot: low values surrounded by low values

Outliers come in the form of: * Low values surrounded by high values * High values surrounded by low values

In which “values” refers to the Moran’s I index (I).

Running the Analysis

To start, input the Lyrebird feature data into the Cluster and Outlier (Anselin’s Moran’s I) Analysis tool and assign individual count to the Input field.

We used the fixed distance band (to have more control in defining the neighbourhood) for conceptualisation of spatial relationships, Euclidean as the Distance Method, Row Standardisation (to try and standardise the data for any sampling biases) and a distance band of 3000 (we chose this distance based on the mean neighbourhood distances from the ANN analysis).

We also applied the False Discovery Rate (FDR) Correction (which gives us a confidence interval of 95%) with 999 Permutations (this gives us the pseudo p-value used to determine whether the clustering is significant).

The Output

The following Figures 10 & 11 display the significant clusters and outliers which were identified using the Cluster and Outlier Analysis tool before and after the fire.

Cluster Visualisation with Kernel Density Estimation

Background

The last analysis we conducted used the Kernel Density tool. This spatial analyst tool is particularly good for visualising clustering or intensity of the Lyrebird records in space. The tool fits a smooth curve (or kernel) to each data point. The curve is highest at the centroid of the point, but diminishes as far as the search radius is defined. The output raster calculates the density for each cell from the overlap of all of these curves. Areas of high intensity will reflect areas with the most Lyrebirds!

Running the Analysis

Input the Lyrebird point data into the Kernel Density tool.

For the parameters we assigned the individual count to the Population field (as we are interested in the density of lyrebird counts), output cell size of 1300, search radius of 20,000 square metres, Geodesic Method (this considers the curve of the earth).

Lastly, you can vary the symbology of the kernel density raster. We recommend making No Data or values = 0 transparent so that you can still get a sense for location on the map.

The Output

Figures 7 and 8 display the Kernel density rasters on their own.

Figures 9 and 10 show the Kernel Density rasters with a different symbology and basemap, relative to the black Saturday fires.

Final Thoughts

With this guide under your belt, you should be able to perform some in depth analysis of ALA data when undertaking your next research task. All of this was written to be reproducible but do not feel the need to stick to what has been written here. This is only a guide and much more can be taken from the data that ALA has.