The Black Saturday bush fires took place in Victoria in 2009 and devastated the country, claiming many lives, including an estimated one million animals (NMA, 2022). Months of hot and dry conditions combined with strong winds from Central Australia created the perfect conditions for these catastrophic fires to thrive (NMA, 2022).
We have decided to look specifically at the Lyrebird family (Menuridae), as they are a prominent native Australian group, and used data extracted from the Atlas of Living Australia to study how the distribution and abundance of these animals changed as a result of the fires. We are aiming to do this by comparing data from before and after the event and presenting the information on spatial graphs and as a generalised linear model.
Past studies have found that Lyrebirds forage for food that is sparsely found, therefore, requiring them to occupy large habitats (Lill, 1995). As a result, they are quite vulnerable to the harm of any habitat fragmentation or destruction such as is caused by bushfires (Lill, 1995). Arthur et al. (2012) and Nungent et al. (2014) have each seen a decrease in Lyrebirds in fire-affected areas immediately after a fire, with Robinson et al. (2016) finding they moved to areas that were less affected by fires such as gullies and rainforests. Conversely, Doty et al. (2014) have found that Lyrebirds favour post-fire environments as they find them ideal for foraging.
Our report aims to see how Lyrebirds reacted after the Black Saturday fires specifically and will hopefully act as an exemplar for users of the Atlas of Living Australia who desire to transform other data sets into digestible information that can be applied to real-world problems.
To answer this question, we used a multifaceted analysis of the data, performing both statistical and spatial analysis, utilising Rstudio and GIS programs.
Lyrebirds where chosen as the specific species as they were a species that had sufficient data available for the research time period, three months before and after the Black Saturday fires occurred, as well as being present in the areas affected by the fires.
The positional data of the lyrebirds used in the analysis was extracted from the Atlas of Living Australia database. This was done by filtering the database to display records of lyrebirds located in Victoria from the beginning of the year 2000 to the end of 2009. This narrowed down the number of records from over 94 000 to around 3 000 and saved time in cleaning up the data to just the records in the time and place that was desired.
The data was then extracted as a CSV file that contained each individual record, the date it was recorded on and the latitude and longitude of the record.
The data was then cleaned, firstly by converting the dates to a Year-Month-Day format.
touch temp
sed -E 's?(T13:00:00Z|T14:00:00Z)??g' ALA_LyreBird.csv > temp
cat temp > correct_ALA_LyreBirdless.csv
rm temp
This puts the the record dates into a format that can be read by R functions. From here, the records that occurred within our time frame, 2008/11/01 to 2009/06/30, was extracted, removing any other records that occurred outside of this time frame.
<- which(cleaned$eventDate == "2008-11-01")[1]
start <- which(cleaned$eventDate == "2009-06-30")[1]
finish <- cleaned[start:finish, ]
cleaned write.csv(cleaned, "ALA_LyreBird.csv", row.names = FALSE)
We use python to create a new file from the cleaned file above and the new file will contain the date and the counts of lyrebirds accumulated within that week.
import pandas
getdata(data1):
def = {}
all_date = {}
ret = pd.date_range(start="2008-11-01", end='2009-06-30')
date_range for i in date_range:
= str(i).split(' ')[0]
capture all_date.update({capture: 0})
= 0
counter
for i in data1.index:
= data1['eventDate'][i]
var += data1['individualCount'][i]
counter if var in all_date:
= all_date[var]
num all_date.update({var: num + counter})
= 0
counter = 0
start = 0
count for i in all_date:
+= 1
start += all_date[i]
count if start == 7:
ret.update({i: count})
= 0
count = 0
start
return ret
= pd.read_csv("correct_ALA_LyreBirdless.csv")
final_data = getdata(final_data)
everything = open('countsperweek.csv', 'a')
write_file = open('countsperweek.csv', 'r')
read_file if len(read_file.readlines()) == 0:
write_file.write('Date,num_birds\n')
else:
write_file.close()
read_file.close()
quit()
for i in everything:
= i + ',' + str(everything[i]) + '\n'
string write_file.write(string)
write_file.close()
read_file.close()
Additional environmental variables that were considered to have an impact on Lyrebird abundance and distribution was also added to each record’s profile. These variables included rainfall, minimum temperature, tree density and land cover type.
As many of these variables were available as shapefiles or rasters, we found the easiest way to connect the variable data to each record was by using a spatial program, ArcGIS Pro. The occurrence data was uploaded as a csv file and then converted to a XY point feature class using the XY Table to Point tool. This plots the occurrence data onto the map, visualising the occurrence data. However, this can only be done if the occurrence data has the longitude and latitude of each occurrence recorded.
The shapefiles and rasters of the environmental variables, land cover type and tree density, were gathered from the Victorian Government Data Directory and then uploaded to ArcGIS. These files required some processing as, while some were already downloadable as raster files (.tif), many of them could only be downloaded as shapefiles (.shp), which meant they needed to be converted into raster files. This was done using the Feature to Raster tool.
The data of each raster was then exported at each XY point and joined by each occurrence record using the Extract Multi Values to Points tool. This resulted in the extraction of each raster’s values at the location of each occurrence record, meaning each lyrebird record also contained information on the land cover type and tree density. The XY point feature file was then exported as an excel file using the Table to Excel tool.
The monthly rainfall and temperature variables needed for the generalised linear model were obtained from the Bureau of Meteorology website. They have collected weather station records that have historic data on the monthly rainfall as well as the monthly maximum and minimum temperatures that occurred at each of these stations.
To find these variables for each lyrebird sighting, the coordinates of the sighting were input into the website and the closest weather station with data available for the month that the sighting occurred was selected. This data was presented in a table containing the values for each month over a period of years. The value for the month that the record was made in was input into the spreadsheet for the generalised linear model to analyse. This process was repeated for each sighting.
Introducing variables is important as it takes into consideration the interaction between Lyrebird distribution and abundance and factors other than fire. Additionally, it accounts for the relationship between the fire and variables such as rainfall and temperature.
The analysis we performed involved produced two generalised linear models (GLM), for both lyrebird abundance and distribution, and a spatial model, for lyrebird distribution. The difference between the distribution glm and the spatial model is that the glm is used to predict potential explanations for distribution where as the spatial model only looks at where they were before the fire and after the fire and does not try and explain the reasoning for any movement.
A GLM was used as it was a much more flexible model that allowed for the use of multiple variables to explain the response variables, i.e. distribution or abundance of lyrebirds.
The packages required are tidyverse
,
emmeans
and sjPlot.
We then performed some precursory analysis, visualising the data and getting a basic understanding of what the data may look like.
<- read.csv("Data/ALA_LyreBird.csv")
Lyrebirds summary(Lyrebirds)
## eventDate year month day
## Length:208 Min. :2008 Min. : 1.000 Min. : 1.00
## Class :character 1st Qu.:2008 1st Qu.: 4.000 1st Qu.: 8.00
## Mode :character Median :2009 Median : 5.000 Median :20.50
## Mean :2009 Mean : 6.433 Mean :17.31
## 3rd Qu.:2009 3rd Qu.:11.000 3rd Qu.:26.00
## Max. :2009 Max. :12.000 Max. :31.00
## decimalLatitude decimalLongitude Rainfall MeanMaxTemperature
## Min. :-38.43 Min. :145.2 Min. : 2.8 Min. : 5.00
## 1st Qu.:-37.74 1st Qu.:146.4 1st Qu.: 22.4 1st Qu.:15.75
## Median :-37.64 Median :147.7 Median : 57.8 Median :20.60
## Mean :-37.55 Mean :147.5 Mean : 73.8 Mean :19.74
## 3rd Qu.:-37.49 3rd Qu.:148.3 3rd Qu.: 98.6 3rd Qu.:22.52
## Max. :-36.12 Max. :149.9 Max. :267.8 Max. :30.20
## MeanMinTemperature IndividualCount TreeDensity Landcover_type
## Min. :-0.100 Min. : 1.000 Length:208 Length:208
## 1st Qu.: 5.600 1st Qu.: 1.000 Class :character Class :character
## Median : 9.300 Median : 1.000 Mode :character Mode :character
## Mean : 9.082 Mean : 1.279
## 3rd Qu.:12.100 3rd Qu.: 1.000
## Max. :15.500 Max. :10.000
## FireSev
## Length:208
## Class :character
## Mode :character
##
##
##
<- read.csv("Data/countsperweek.csv")
LyrebirdPoints <- seq(-16, nrow(LyrebirdPoints) - 1 - 16)
num_days <- LyrebirdPoints$num_birds
num_birds_col <- data.frame(num_birds_col, num_days)
variable <- c(rep("#f8766d", 16), rep("#ffe15b", 5), rep("#00bfc4", 13))
colors barplot(num_birds_col~ num_days , data=variable, xlab='Time since fire (weeks)', ylab='Number of sightings per week', col = colors)
legend("top", legend = c("Before Event", "During Event", "After Event"), fill = c("#f8766d", "#ffe15b", "#00bfc4"))
ggplot(Lyrebirds, aes(IndividualCount)) +
geom_histogram(bins = 10L, fill = "orange") + labs(x = "Individual Count", y = "Frequency")
$LOG_IndividualCount <- log10(Lyrebirds$IndividualCount)
Lyrebirds
ggplot(Lyrebirds, aes(LOG_IndividualCount)) +
geom_histogram(bins = 10L, fill = "orange") + labs(x = "Individual Count", y = "Frequency")
We then summerised the mean daily temperature and rainfall using a median function, as you cannot summarise the mean of a mean. The records were then grouped by months.
<- Lyrebirds %>% # Specify data frame
Lyrebirds_per_month group_by(month) %>% # Specify group indicator
summarise_at(vars(IndividualCount, Rainfall, MeanMaxTemperature, MeanMinTemperature), # Specify columns
list(Count_per_Month = median)) # Specify function
We then summerised the individual counts to mean number of lyrebird counts per month.
$IndividualCount_Count_per_Month <- Lyrebirds %>% # Specify data frame
Lyrebirds_per_monthgroup_by(month) %>% # Specify group indicator
summarise_at(vars(IndividualCount), # Specify column
list(mean = mean)) # Specify function
# Here we remove the extra median column from the first dplyr tibble we made
$IndividualCount_Count_per_Month <- Lyrebirds_per_month$IndividualCount_Count_per_Month$mean Lyrebirds_per_month
A copy of the months column was then created to sort the records into before, during or after the fire.
$Fire_pres_abs <- Lyrebirds_per_month$month
Lyrebirds_per_month
$Fire_pres_abs[Lyrebirds_per_month$Fire_pres_abs == 11] <- c("Before")
Lyrebirds_per_month$Fire_pres_abs[Lyrebirds_per_month$Fire_pres_abs == 12] <- c("Before")
Lyrebirds_per_month$Fire_pres_abs[Lyrebirds_per_month$Fire_pres_abs == 1] <- c("Before")
Lyrebirds_per_month
$Fire_pres_abs[Lyrebirds_per_month$Fire_pres_abs == 2] <- c("Before")
Lyrebirds_per_month$Fire_pres_abs[Lyrebirds_per_month$Fire_pres_abs == 3] <- c("After")
Lyrebirds_per_month
$Fire_pres_abs[Lyrebirds_per_month$Fire_pres_abs == 4] <- c("After")
Lyrebirds_per_month$Fire_pres_abs[Lyrebirds_per_month$Fire_pres_abs == 5] <- c("After")
Lyrebirds_per_month$Fire_pres_abs[Lyrebirds_per_month$Fire_pres_abs == 6] <- c("After")
Lyrebirds_per_month$Fire_pres_abs[Lyrebirds_per_month$Fire_pres_abs == 7] <- c("After") Lyrebirds_per_month
As seen when we visualised our individual counts of lyrebirds, the distribution was not a normal distribution and so we decided to perform a logarithmic transformation to fix this problem.
$LOG_IndividualCount <- log10(Lyrebirds_per_month$IndividualCount_Count_per_Month) Lyrebirds_per_month
Once we had cleaned up and organised our data to how we needed it, we began looking for interactions between our variables. We started with all numerical variables: * Mean monthly minimum temperature * Mean monthly maximum temperature * Mean monthly rainfall
cor(Lyrebirds_per_month$MeanMinTemperature_Count_per_Month, Lyrebirds_per_month$Rainfall_Count_per_Month)
## [1] 0.330224
cor(Lyrebirds_per_month$MeanMaxTemperature_Count_per_Month, Lyrebirds_per_month$Rainfall_Count_per_Month)
## [1] 0.1877524
cor(Lyrebirds_per_month$MeanMinTemperature_Count_per_Month, Lyrebirds_per_month$MeanMaxTemperature_Count_per_Month)
## [1] 0.9418245
We ran GLMs for every biologically relevant interaction and found the interactions which were statistically significant, p<0.05.
<- glm(LOG_IndividualCount ~ Rainfall_Count_per_Month * Fire_pres_abs, data = Lyrebirds_per_month)
Lyrebird.glm <- emmeans(Lyrebird.glm, specs = pairwise ~ Rainfall_Count_per_Month * Fire_pres_abs) Lyrebird.glm.emm.s
## contrast
## 1 Rainfall_Count_per_Month65.8222222222222 After - Rainfall_Count_per_Month65.8222222222222 Before
## estimate SE df lower.CL upper.CL t.ratio p.value
## 1 0.1660709 0.04383495 5 0.05338959 0.2787522 3.788551 0.01277738
From this we could see that the most significant interaction occurred when rainfall was high in areas that had been affected by the fires. These areas saw an increased abundance of lyrebirds where as areas that recieved less rain was found to have less abundance of lyrebirds. We then plotted the prediction that our model shows.
plot_model(Lyrebird.glm, type = "pred", terms = c("Rainfall_Count_per_Month", "Fire_pres_abs"), title = "Predicted Lyrebird abundance before and after Black Saturday fires", axis.title = c("Rainfall count per month", "Log Individual count"), legend.title = "Fire presence")
The packages you will need for this are geodata
,
raster
, maptools
, rgdal
,
rgeos
, sf
, tidyverse
,
ozmaps
and dismo
.
Because this GLM was for distribution and so required the use of rasters. When creating this guide, processing the Land Cover raster file proved to be too much for the computer we wrote this up on, so we have commented out the lines that involve the Land Cover raster so you can still see where it should be. When Land Cover was run in the analysis, it should be noted that it did not produce any significant interactions.
<- raster("Rasters/Soil_TypeVIC.tif")
Soil <- raster("Rasters/TreeDensity.tif")
TreeDensity #LandCover <- raster("Rasters/vic_landCOVER_TS_2015_19.tif")
<- stack("Rasters/wc2.1_2.5m_prec_2008-11.tif",
precipitationBefore "Rasters/wc2.1_2.5m_prec_2008-12.tif",
"Rasters/wc2.1_2.5m_prec_2009-01.tif",
"Rasters/wc2.1_2.5m_prec_2009-02.tif"
)<- stack("Rasters/wc2.1_2.5m_tmax_2008-11.tif",
MaxTempBefore "Rasters/wc2.1_2.5m_tmax_2008-12.tif",
"Rasters/wc2.1_2.5m_tmax_2009-01.tif",
"Rasters/wc2.1_2.5m_tmax_2009-02.tif")
We needed to ensure all the rasters where projected into the same datum, WGS84.
projection(Soil) <- "+proj=longlat +datum=WGS84 +no_defs"
projection(precipitationBefore) <- CRS("+proj=longlat +datum=WGS84 +no_defs")
projection(TreeDensity) <- CRS("+proj=longlat +datum=WGS84 +no_defs")
#projection(LandCover) <- CRS("+proj=longlat +datum=WGS84 +no_defs")
projection(MaxTempBefore) <- CRS("+proj=longlat +datum=WGS84 +no_defs")
We then made sure that the rasters were clipped to the extent of Victoria as this helps reduce processing time and ensures the model doesn’t try and include information from outside Victoria.
<- ozmap_data("abs_ste")
lga_sf <- lga_sf[2, ]
vic_shape <- st_transform(vic_shape, crs = "+proj=longlat +datum=WGS84 +no_defs")
vic_shape <- extent(140.8894, 149.9763, -39.16078, -33.92641)
vic <- setExtent(Soil, vic_shape, keepres = F)
Soil <- setExtent(TreeDensity, vic_shape, keepres = F)
TreeDensity #LandCover <- setExtent(LandCover, vic_shape, keepres = F)
<- raster::projectRaster(precipitationBefore, TreeDensity)
precipitationBefore <- raster::projectRaster(MaxTempBefore, TreeDensity)
MaxTempBefore <- raster::projectRaster(Soil, TreeDensity)
Soil #LandCover <- raster::projectRaster(LandCover, TreeDensity)
The biostack requires individual rasters and currently our rainfall and max temperature rasters had been collated into two stacks. We then needed to pull them apart.
<- raster(precipitationBefore, layer = 1)
Prec_11 <- raster(precipitationBefore, layer = 2)
Prec_12 <- raster(precipitationBefore, layer = 3)
Prec_1 <- raster(precipitationBefore, layer = 4)
Prec_2
<- raster(MaxTempBefore, layer = 1)
MaxTemp_11 <- raster(MaxTempBefore, layer = 2)
MaxTemp_12 <- raster(MaxTempBefore, layer = 3)
MaxTemp_1 <- raster(MaxTempBefore, layer = 4) MaxTemp_2
We could now create our biostack which would create a map showing the data of each raster. The rainfall and max temperature are taken from before the fire occurrec.
#This is what the line would look like including Land Cover, but for this purpose we will run it without.
#BioStack <- stack(Soil, TreeDensity, LandCover, Prec_11, Prec_12, Prec_1, Prec_2, MaxTemp_11, MaxTemp_12, MaxTemp_1, MaxTemp_2)
<- stack(Soil, TreeDensity, Prec_11, Prec_12, Prec_1, Prec_2, MaxTemp_11, MaxTemp_12, MaxTemp_1, MaxTemp_2)
BioStack plot(BioStack)
We then needed to read in the occurrence records locations as coordinates in the same projection as the rasters.
From here, we then created our background data for the GLM. This creates pseudo absence data which the GLM will use to compare the presence data to when calculating its outputs.
<- randomPoints(BioStack, 500) #random points from all vic
BgLyre <- raster::extract(BioStack, LyrebirdCoor) #All recorded lyrebird presence and its environmental variables
PresenceLyre <- raster::extract(BioStack, BgLyre) # Use random points from BgLyre and associated environmental variables as absence
AbsenceLyre
<- c(rep(1, nrow(PresenceLyre)), rep(0, nrow(AbsenceLyre))) #combine absence and absence data
PresenceBgLyre <- data.frame(cbind(PresenceBgLyre, rbind(PresenceLyre, AbsenceLyre))) #create species distribution model LyresdmData
We then ran GLMs for every variable to find the best interaction. These were using the variables from before the fire occurred. We found that tmperature and rainfall had significant interactions.
#This is what the line would look like including Land Cover, but for this purpose we will run it without.
#LyresdmData = na.omit(LyresdmData)
#LyrebirdBg.glm <- glm(PresenceBgLyre~ Soil_TypeVIC + TreeDensity + wc2.1_2.5m_prec_11+ wc2.1_2.5m_prec_12+ wc2.1_2.5m_prec_01 + wc2.1_2.5m_tmax_11 + wc2.1_2.5m_tmax_12 + wc2.1_2.5m_tmax_01 + VIC_LANDCOVER_TS_2015_19, data = LyresdmData)
#step(LyrebirdBg.glm, test="F")
#LyrebirdBg.glm <- glm(PresenceBgLyre ~ Soil_TypeVIC + wc2.1_2.5m_prec_11 + wc2.1_2.5m_prec_12 + wc2.1_2.5m_tmax_11 + wc2.1_2.5m_tmax_01, data = LyresdmData)
= na.omit(LyresdmData)
LyresdmData <- glm(PresenceBgLyre~ Soil_TypeVIC + TreeDensity + wc2.1_2.5m_prec_11+ wc2.1_2.5m_prec_12+ wc2.1_2.5m_prec_01 + wc2.1_2.5m_tmax_11 + wc2.1_2.5m_tmax_12 + wc2.1_2.5m_tmax_01, data = LyresdmData)
LyrebirdBg.glm #step(LyrebirdBg.glm, test="F") This displays the GLM through steps
<- glm(PresenceBgLyre ~ Soil_TypeVIC + wc2.1_2.5m_prec_11 + wc2.1_2.5m_prec_12 + wc2.1_2.5m_tmax_11 + wc2.1_2.5m_tmax_01, data = LyresdmData)
LyrebirdBg.glm summary(LyrebirdBg.glm)
##
## Call:
## glm(formula = PresenceBgLyre ~ Soil_TypeVIC + wc2.1_2.5m_prec_11 +
## wc2.1_2.5m_prec_12 + wc2.1_2.5m_tmax_11 + wc2.1_2.5m_tmax_01,
## data = LyresdmData)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -0.77412 -0.23450 -0.03946 0.20408 0.80165
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.167e+00 4.974e-01 -2.347 0.019625 *
## Soil_TypeVIC 9.198e-05 3.394e-05 2.710 0.007133 **
## wc2.1_2.5m_prec_11 3.658e-03 7.212e-04 5.071 7.07e-07 ***
## wc2.1_2.5m_prec_12 4.630e-03 1.374e-03 3.369 0.000858 ***
## wc2.1_2.5m_tmax_11 1.557e-01 2.946e-02 5.283 2.50e-07 ***
## wc2.1_2.5m_tmax_01 -9.613e-02 2.678e-02 -3.590 0.000388 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for gaussian family taken to be 0.1366113)
##
## Null deviance: 55.932 on 294 degrees of freedom
## Residual deviance: 39.481 on 289 degrees of freedom
## AIC: 257.88
##
## Number of Fisher Scoring iterations: 2
We were also able to create a map predicting the probability of the locations of lyrebirds before the fire.
#Make rasters into geom objects
#--- convert to data.frame ---#
<- raster::predict(BioStack, LyrebirdBg.glm, ext=vic)
LyrePredict projection(LyrePredict) <- CRS("+proj=longlat +datum=WGS84 +no_defs")
<-
LyrePredict_df as.data.frame(LyrePredict, xy = TRUE) %>%
#--- remove cells with NA for any of the layers ---#
na.omit()
# make dataframe of presence points
<-
LyrePoints_df as.data.frame(LyrebirdCoor@coords, xy = TRUE)
projection(LyrebirdCoor) <- CRS("+proj=longlat +datum=WGS84 +no_defs")
#LyrePoints_df <- ~decimalLongitude + decimalLatitude
# make victoria dataframe projection
<- ozmap_data("abs_ste")
lga_sf <- lga_sf[2, ]
vic_shape <- st_transform(vic_shape, crs = "+proj=longlat +datum=WGS84 +no_defs")
vic_shape
# plot both predicted sdm background data and actual records:
<- ggplot(data = st_transform(vic_shape, 4326)) +
pl geom_sf(data = st_transform(vic_shape, 4326), fill = NA)+
#coord_sf(crs = 4326) +
geom_tile(aes(x = x, y = y, fill = layer), data = LyrePredict_df)+
scale_fill_viridis_c() +
theme_bw() +
geom_point(data = LyrePoints_df, aes(x = decimalLongitude, y = decimalLatitude, color = "Lyrebirds")) +
labs(x = "Longitude", y = "Latitude", color = "Before fire recorded presence", fill = "Predicted presence likelihood")
pl
So we then repeated this whole process for the variables after the fire and found the same outcome as before the fire.
#We obtained this GLM by using step().
= glm(PresenceBgLyreAfter ~ Soil_TypeVIC + wc2.1_2.5m_prec_03 + wc2.1_2.5m_prec_04 + wc2.1_2.5m_prec_06 + wc2.1_2.5m_tmax_04 + wc2.1_2.5m_tmax_05, data = LyresdmDataAfter)
LyrebirdBgAfter.glm summary(LyrebirdBgAfter.glm)
##
## Call:
## glm(formula = PresenceBgLyreAfter ~ Soil_TypeVIC + wc2.1_2.5m_prec_03 +
## wc2.1_2.5m_prec_04 + wc2.1_2.5m_prec_06 + wc2.1_2.5m_tmax_04 +
## wc2.1_2.5m_tmax_05, data = LyresdmDataAfter)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -0.90741 -0.20718 -0.04562 0.19972 0.84035
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.3382930 0.3805021 -0.889 0.374726
## Soil_TypeVIC 0.0001431 0.0000310 4.616 5.95e-06 ***
## wc2.1_2.5m_prec_03 -0.0024963 0.0017761 -1.405 0.160990
## wc2.1_2.5m_prec_04 0.0127038 0.0028442 4.467 1.15e-05 ***
## wc2.1_2.5m_prec_06 -0.0090905 0.0023679 -3.839 0.000153 ***
## wc2.1_2.5m_tmax_04 -0.1481875 0.0437961 -3.384 0.000817 ***
## wc2.1_2.5m_tmax_05 0.2001613 0.0418700 4.781 2.82e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for gaussian family taken to be 0.1315326)
##
## Null deviance: 60.319 on 287 degrees of freedom
## Residual deviance: 36.961 on 281 degrees of freedom
## AIC: 242.01
##
## Number of Fisher Scoring iterations: 2
We also created a map predicting the distribution of lyrebirds after the fire.
The most important columns required in ArcGIS are the Longitude, Latitude, EventDate, IndividualCount and a UID (unique identifier).
IndividualCount for Lyrebirds had some
EventDate had to be formatted to (dd/mm/yyyy) to enable Time in ArcGIS and also make it easier to select for attributes according to the date.
We created the UID column in excel and each row was numbered in ascending order (1, 2, 3, 4, 5…). All tables used in ArcGIS require a UID for each row (cannot have Null values) if not you will get an error.
Lastly, delete all columns (other than those specified above) which are inapplicable to the study to make the data easier to navigate in ArcGIS. Save the excel file as a csv.(Macintosh).
When working in ArcGIS Pro a good first step is to set the Geoprocessing environment settings (e.g. processing extent, map projection and workspace) to match the study area of interest and data being used.
In this case we used WGS1984 (Central Meridian: Greenwich), however GDA1994 is also a good option. We used WGS1984 due to the scale of the extent of our data (Over all of the state of Victoria), and also because of its use in previous research for the same study area.
Use the XY Table to Point Tool to create a point feature class layer from the Lyrebird data table.
The Black Saturday Fires occurred between 7th February 2009 and 14th March 2009. We retrieved data in 3 month periods exactly 3 months before and 3 months after these dates (e.g. 07/11/2008 - 07/02/2009 and 14/03/2009 - 14/06/2009).
To create two separate layers containing these dates, we used the Select Layer By Attribute geoprocessing tool. Using the EventDate column from the table we selected for each time period specified above.
Next use the Make Layer from Selected Feature button to create each before and after layer file and export them to your Geodatabase to ensure they are saved. Go to Add data and display your new before and after layers. You can adjust the colours and sizes of the data points in the Symbology tab. Add a layout to your project. This is where you can create a map to export with a title, legend, north arrow and scale.
In this section we are going to take our XY Before and After Lyrebird maps to the next level by incorporating a Black Saturday Fire Severity file and data on Bioregions through Victoria.
Fire Shapefile
Although we have our Lyrebird data points from before and after the
fire. The maps are missing a vital piece of information - where the fire
occurred!
We downloaded a Black Saturday Fire shapefile from the Victorian Government Datashare website. This particular file also had information on the intensity of the fire.
Add the fire shapefile to your map, you may get a warning that the projection of the file does not match the map. You can use the Data Management tool - Project to change the file’s projection to WGS1984 (or the coordinate system of your choosing). Next we can use graduated colours to better reflect the intensity of the fire by specifying Symbology by quantity.
Bioregion Data
Next we are going to add even more information to our maps but this time
we are going to display it through each XY point through joining layers.
Another piece of information we thought would be good to look at was
which Victorian Bioregions each Lyrebird was recorded in. Bioregions
refer to natural environments with certain defining characteristics.
We downloaded this bioregion data in the form of a shapefile from the Victorian Government Datashare website.
Add the Bioregion shapefile to your map. Symbolise the file by unique values to distinguish the different bioregion types.
Now, we want to connect the bioregion data spatially to our XY Lyrebird points to figure out which bioregion they were recorded in.
To do this we used the Add Spatial Join tool. We want our Target features to be the Lyrebird points, and the Join features to be the bioregion shapefile. We used the Intersect match option with a search radius of 5 kilometres (we chose this because many GPS devices have been found to have recording inaccuracies within 5km).
If you check the Attribute Table for your lyrebird data points there should be matched columns with bioregion data. To display this intersected bioregion data change the symbology of your lyrebird data to unique values, and select for Bioregion in the Field parameter as we have done before.
The XY points for our Lyrebird records can also be referred to as a “point pattern”. Keeping in mind our data type, the following analyses will be specifically for point patterns.
Testing for CSR with Average Nearest Neighbours
It is common practice in spatial analyses to first test for complete spatial randomness (CSR). Essentially it involves testing for whether there is a pattern in the lyrebird distribution (dispersed or clustered) or whether it has a random distribution.
This is a very important starting point, as it will help inform what scales we should be looking at our data and whether there is anything interesting in our point pattern.
To do so we will consider the following hypotheses:
Null Hypothesis: The Lyrebird point pattern will follow a distribution of complete spatial randomness (CSR).
Alternative Hypothesis: The Lyrebird point pattern will not follow a distribution of complete spatial randomness (CSR).
To test our hypotheses we will use an Average Nearest Neighbours Test. The Average Nearest Neighbour (ANN) test measures the centre of each point to the centre of its neighbouring points, and then averages these distances. It then compares the mean neighbourhood distance to the distances you could expect in a random distribution to tell us whether the point pattern is random, clustered or dispersed.
Input your Lyrebird data into the Average Nearest Neighbour (ANN) tool. We used the Euclidean Distance Method, left Area as the default and checked to generate a report. We have summarised our results from our ANN reports for both before and after Lyrebird points below:
Results: The results of the Average Nearest Neighbours test show that Lyrebird distribution is significantly spatially clustered for both before and after the fire.
We also reject the null, and accept the alternative hypothesis that the Lyrebird point pattern does not follow CSR.
Besides testing for CSR the ANN tool has provided us with another useful piece of information: the mean distance at which clustering is occurring. This will be useful in the following analyses where we will analyse our point pattern at smaller scales.
Testing for Clusters and Outliers with Anselin Local Moran’s I
Now that we know that Lyrebird records are clustered, our next step will help us identify exactly where these clusters are. Moran’s I statistic is commonly used in the field of ecology when analysing species distributions spatially. Anselin Local Moran’s I is a more detailed version of this statistic which can analyse point patterns at smaller scales, and point out particular clusters.
The Clusters and Outliers Analysis tool in ArcGIS uses this statistic to identify clusters and outliers in the point pattern. Essentially this tool will consider the counts of each record as well as the neighbourhood of other records (or how close other records are) and compare these values to if the points followed a random distribution.
Outliers come in the form of: * Low values surrounded by high values * High values surrounded by low values
In which “values” refers to the Moran’s I index (I).
To start, input the Lyrebird feature data into the Cluster and Outlier (Anselin’s Moran’s I) Analysis tool and assign individual count to the Input field.
We used the fixed distance band (to have more control in defining the neighbourhood) for conceptualisation of spatial relationships, Euclidean as the Distance Method, Row Standardisation (to try and standardise the data for any sampling biases) and a distance band of 3000 (we chose this distance based on the mean neighbourhood distances from the ANN analysis).
We also applied the False Discovery Rate (FDR) Correction (which gives us a confidence interval of 95%) with 999 Permutations (this gives us the pseudo p-value used to determine whether the clustering is significant).
Cluster Visualisation with Kernel Density Estimation
The last analysis we conducted used the Kernel Density tool. This spatial analyst tool is particularly good for visualising clustering or intensity of the Lyrebird records in space. The tool fits a smooth curve (or kernel) to each data point. The curve is highest at the centroid of the point, but diminishes as far as the search radius is defined. The output raster calculates the density for each cell from the overlap of all of these curves. Areas of high intensity will reflect areas with the most Lyrebirds!
Input the Lyrebird point data into the Kernel Density tool.
For the parameters we assigned the individual count to the Population field (as we are interested in the density of lyrebird counts), output cell size of 1300, search radius of 20,000 square metres, Geodesic Method (this considers the curve of the earth).
Lastly, you can vary the symbology of the kernel density raster. We recommend making No Data or values = 0 transparent so that you can still get a sense for location on the map.
With this guide under your belt, you should be able to perform some in depth analysis of ALA data when undertaking your next research task. All of this was written to be reproducible but do not feel the need to stick to what has been written here. This is only a guide and much more can be taken from the data that ALA has.