As a fast-paced society, there are times where we have just accepted the facts because we have been informed by the people before us and we have attributed these facts as “common sense”. However, we fail to realize that part of our development was derived from revisiting these previously developed ideas and being seen from a different viewpoint, we are able to deconstruct and discover further meaning or completely revise what was previously known as “common sense”. My investigation aims to shed light on the lack of awareness of password security in a technologically advancing era by producing a website that provides insights on a given password through numbers and visualizations. My website is the result of this report which shows a step by step analysis of factors we commonly understand to determine what makes a password commonly used and how significant the factors are relative to each other. This analysis will be done on five sets of data sets each containing n factors of 10 than each other most commonly used passwords starting from a hundred to a million. I will be examining three variables: length, popularity of string and complexity in this analysis.
I source all this data from https://github.com/danielmiessler/SecLists/tree/master/Passwords/Common-Credentials.
Now that I have obtained the commonly used password data, I began to derive other information i.e. the variables that are being looked into. I decided to use Python to create our data because of its simplicity and the abundance of modules readily available to use. Length is an easy variable to compute by just applying the in-built function len(). However, data for the latter variables were more complex to collect.
For testing the complexity of a password, I tested a set of regular expression patterns that were commonly used in the password file with the top million commonly used passwords. This task was done via Python; employing a dictionary counting passwords that match a specific regular expression and printing it out with easier readability using the pprint. module I collected the top ten most commonly used patterns after iterations of different patterns. 90% of the passwords were in the top 10 most common patterns and that 10% of the top million passwords consisted of many different combination of patterns stored by the “other” key in the dictionary.
import re
import sys
import pprint
if len(sys.argv) != 2:
"Too little arguments")
sys.exit(
# Common regular expressions patterns for passwords
= {
regex_dict '^[a-z]+$': 0, # Starts and ends with lowercase letters
'^[a-z]+[0-9]+$': 0, # Starts with lowercase letters then followed by numbers
'^[0-9]+$': 0, # Starts and ends with a number
'^[a-z]+[0-9]+[a-z]+$': 0, # Starts with lowercase letters then numbers then lowercase letters
'^[0-9]+[a-z]+$': 0, # Starts with numbers and ends with lowercase letters
'^[A-Z]+[a-z]+[0-9]+$': 0, # Starts uppercase then lowercase letters
'^[A-Z]+[a-z]+$': 0, # Starts uppercase then lowercase letters
'^[A-Z]+$': 0, # Starts and ends with uppercase letters
'^[a-z]+[0-9]+[a-z]+[0-9]+$': 0, # Starts with lowercase letters then numbers then lowercase letters
'^[a-z]+[0-9]+[a-z]+[0-9]+[a-z]+$': 0, # Starts with lowercase letters then numbers then lowercase letters
'others': 0
}with open(sys.argv[1], "r") as passwords:
= [i[:-1] for i in passwords.readlines()]
all_passwords for password in all_passwords:
= False
found for regex in regex_dict:
if re.fullmatch(regex, password):
+= 1
regex_dict[regex] = True
found break
if found == False:
'others'] += 1
regex_dict[
sorted(regex_dict.items(), key= lambda x: x[1], reverse=True)) pprint.pprint(
## ["[('^[a-z]+$', 337118),\n",
## " ('^[a-z]+[0-9]+$', 252584),\n",
## " ('^[0-9]+$', 165206),\n",
## " ('others', 97797),\n",
## " ('^[a-z]+[0-9]+[a-z]+$', 38421),\n",
## " ('^[0-9]+[a-z]+$', 33045),\n",
## " ('^[A-Z]+[a-z]+[0-9]+$', 21378),\n",
## " ('^[A-Z]+[a-z]+$', 17147),\n",
## " ('^[A-Z]+$', 16053),\n",
## " ('^[a-z]+[0-9]+[a-z]+[0-9]+$', 11161),\n",
## " ('^[a-z]+[0-9]+[a-z]+[0-9]+[a-z]+$', 10088)]\n",
## '10\n']
In this experiment, I took the assumption that the popularity of a phrase is proportional to the number of Google search results. Using this assumption, I fetched search result data for passwords using the requests module and obtained a cleaner result to extract using the BeautifulSoup module. However, there emerged a problem that is Google doesn’t support heavy web scrapping meaning that I may face repercussions namely getting blocked by Google. I handled this issue by conducting a stratified random sampling with four groups on the data sets. Each group represents a different range of passwords based on how commonly used they are and each group contained 8 randomly chosen password. Every time I chose a password, it would be recorded in a list and afterwards in a file to ensure that the sample isn’t chosen twice. The reason for my choice of 8 passwords per group essentially 32 passwords random chosen in total was because of a data set with approximately 10 times the number of degrees of freedom of regression that is 4 - 1 = 3 (n = intercept + 3 variables) is recommended for analysis.
Furthermore, for extra precaution, I scrapped with a random time interval between each scrapped item with a rotating random HTTP header. Within this file, I created a new csv formatted file containing all the data on each password’s variables.
I repeated sampling three times for each data set to ensure accuracy.
import requests
import re
import sys
import random
import time
import math
import subprocess
from bs4 import BeautifulSoup
# Research Question:
# Does complexity, length and popularity of string affect how commonly the password is used
# Picks a random number that hasn't been picked from a group
def random_sampling(group, seen, used_index):
= random.randint(group[0], group[1])
in_group while (in_group in seen or in_group in used_index):
= random.randint(group[0], group[1])
in_group return in_group
# Check if the given password is considered complex
def is_complex(regex_dict, password):
for regex in regex_dict:
if re.fullmatch(regex, password):
return False
return True
# Get ranges of groups
def get_range(size):
= 10 ** int(size)
num_passwords = [(0, num_passwords/4), (num_passwords / 4, num_passwords / 2), (num_passwords /2, 0.75*num_passwords), (0.75*num_passwords, num_passwords)]
ret return ret
# Checks if number of arguments are valid
# Four arguments
# Argument 1 is the password file
# Argument 2 is the destination file
# Argument 3 is the number of passwords in the password file
if len(sys.argv) != 4:
"Too little arguments")
sys.exit(
# List of common regex patterns
= [
regex_dict '^[a-z]+$', # Starts and ends with lowercase letters
'^[a-z]+[0-9]+$', # Starts with lowercase letters then followed by numbers
'^[0-9]+$', # Starts and ends with a number
'^[a-z]+[0-9]+[a-z]+$', # Starts with lowercase letters then numbers then lowercase letters
'^[0-9]+[a-z]+$', # Starts with numbers and ends with lowercase letters
'^[A-Z]+[a-z]+[0-9]+$', # Starts uppercase then lowercase letters
'^[A-Z]+[a-z]+$', # Starts uppercase then lowercase letters
'^[A-Z]+$', # Starts and ends with uppercase letters
'^[a-z]+[0-9]+[a-z]+[0-9]+$', # Starts with lowercase letters then numbers then lowercase letters
'^[a-z]+[0-9]+[a-z]+[0-9]+[a-z]+$', # Starts with lowercase letters then numbers then lowercase letters
]
# Conducting a stratified random sampling with four groups, each group representing a different extent of popularity of password
= get_range(sys.argv[3])
all_ranges = all_ranges[0]
range_1 = all_ranges[1]
range_2 = all_ranges[2]
range_3 = all_ranges[3]
range_4
# Already used numbers
= []
seen
# The hypothesised linear model is y = b_0 + b_1*(popularity) + b_2*(length) + b_3*(complexity)
= 4 - 1
SS_reg_degrees_of_freedom = 4
no_groups = 10 * (SS_reg_degrees_of_freedom)
data_set_length
# File with already used samples
= f"sampled_{sys.argv[3]}"
sampled
with open(sys.argv[1], "r") as lines, open(sys.argv[2], "a") as new, open(sampled, 'r') as already:
"Password,popularity,length,complexity, group\n")
new.write(= lines.readlines()
passwords = already.readlines()
used = [passwords.index(i) - 1 + 2 for i in used]
used_index for i in range(0, math.ceil(data_set_length / 4)):
= []
curr_samples for j in range(0, no_groups):
= random_sampling(all_ranges[j], seen, used_index)
sample_no
seen.append(sample_no)
curr_samples.append(sample_no)for random_sample in curr_samples:
= passwords[random_sample]
password = password.replace("\n", "")
password = f"https://www.google.com/search?q={password}"
link = [{'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:77.0) Gecko/20100101 Firefox/77.0'},
headers 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/42.0.2311.135 Safari/537.36 Edge/12.246'},
{'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_2) AppleWebKit/601.3.9 (KHTML, like Gecko) Version/9.0.2 Safari/601.3.9'},
{'User-Agent': 'Mozilla/5.0 (iPhone12,1; U; CPU iPhone OS 13_0 like Mac OS X) AppleWebKit/602.1.50 (KHTML, like Gecko) Version/10.0 Mobile/15E148 Safari/602.1'},
{'User-Agent': 'Mozilla/5.0 (Linux; Android 10; SM-G996U Build/QP1A.190711.020; wv) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Mobile Safari/537.36'}
{
]= [{'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:77.0) Gecko/20100101 Firefox/77.0'}]
headers = headers[random.randint(0, len(headers) - 1)]
headers = BeautifulSoup(requests.get(link, headers=headers).content, 'html.parser')
soup = str(soup.findAll("div", {"id": "result-stats"})[0])
string = re.findall("About.*results", string)[0]
results = password + "," + results.split(" ")[1].replace(",", "") + "," + str(len(password)) + "," + str(is_complex(regex_dict, password)) + "," + str(passwords.index(password + '\n') + 1) +'\n'
count print(count)
new.write(count)5, 10))
time.sleep(random.uniform(
# Two ways to write used passwords into the sampled file
= f"sed -n '2,$p' {sys.argv[2]} | sed -E 's?,.*??g' >> {sampled}"
option_1 = f"tail -n +2 {sys.argv[2]} | sed -E 's?,.*??g' >> {sampled}"
option_2 = [option_1, option_2][random.randint(0, 1)]
choice =True) subprocess.run(choice, shell
Firstly, I opened each csv file in Rstudio.
<- read.csv("popularity_2_1.txt", header=T)
twos1 <- read.csv("popularity_2_2.txt", header=T)
twos2 <- read.csv("popularity_2_3.txt", header=T)
twos3 <- read.csv("popularity_3_1.txt", header=T)
threes1 <- read.csv("popularity_3_2.txt", header=T)
threes2 <- read.csv("popularity_3_3.txt", header=T)
threes3 <- read.csv("popularity_4_1.txt", header=T)
fours1 <- read.csv("popularity_4_2.txt", header=T)
fours2 <- read.csv("popularity_4_3.txt", header=T)
fours3 <- read.csv("popularity_5_1.txt", header=T)
fives1 <- read.csv("popularity_5_2.txt", header=T)
fives2 <- read.csv("popularity_5_3.txt", header=T)
fives3 <- read.csv("popularity_6_1.txt", header=T)
sixs1 <- read.csv("popularity_6_2.txt", header=T)
sixs2 <- read.csv("popularity_6_3.txt", header=T) sixs3
I observed that the values for popularity for all data sets are very large and there are values that are vastly different in magnitude from each other. This can also be seen in the variable “group” but this problem is not as influential as the precursor.
I mitigated this problem by applying a log transform on the variable “popularity” and “group”.
After careful examination of the variable complexity’s data, I found that most if not all entries were found to be “non-complex” password patterns and that “complex” passwords were outliers and omitted from analysis. Thus, I decided to omit the variable complexity from the analysis and my conclusion from this is that most if not all commonly used passwords have a “non-complex” pattern.
twos1
I first checked if the data was normally distributed
qqnorm(log(sixs1$group))
qqline(log(sixs1$group))
shapiro.test(log(sixs1$group))
##
## Shapiro-Wilk normality test
##
## data: log(sixs1$group)
## W = 0.8687, p-value = 0.001084
I created models for each repeated sampling of the complete data set.
Firstly, I checked if we are allowed to make linear model assumptions.
I assumed that the errors and hence the responses were uncorrelated since a simple random sample was taken.
Then using the residuals vs fitted plot, I found that there was a slight negative gradient of the red line line. This could mean that the mean of the responses are not a linear combination of the predictors.
Furthermore, I observed that the residuals vs fitted plots indicate heteroscedasticity with the diamond-like shaped distribution of residuals.From this, I deduced that the error variances and the variances of responses are non-constant. This rejection of assumption was further addressed in the scale-location plot with a similar type of analysis.
Finally, I checked if the assumption that the errors and responses are normally distributed holds. The normal q-q plot shows a slight derivation from a straight line mainly at the tails of the line and thus a violation of the normality assumption.
<- lm(log(group) ~ log(popularity) + length , data=data.frame(sixs1))
model6_1 plot(model6_1, 1)
plot(model6_1, 2)
plot(model6_1, 3)
To rectify this issue of invalidity of model assumptions, I decided to check the model for any outliers or any influential points. I used the function influence.measures() for identifying those points.
The results showed that there were several points were labelled as outliers or influential points namely 6, 10, 31 being the most influential.
I now revisited the assumptions again.
Firstly, using the residuals vs fitted plot, I found that there was a more horizontal red line line relative to the previous model. It is justifiable that the mean of the responses are a linear combination of the predictors.
Furthermore, I observed that the residuals vs fitted plots are more homoskedastic and a lack of shape in the spread of residuals. I can now apply the assumption that the model has constant variance. This is also similarly proven when looking at the scale-location plot.
Finally, the normal q-q plot shows that most of the residuals follow a straight line and the assumption that the errors and responses are normally distributed holds.
I also tested for any interactions between variables.
influence.measures(model6_1)
## Influence measures of
## lm(formula = log(group) ~ log(popularity) + length, data = data.frame(sixs1)) :
##
## dfb.1_ dfb.lg.. dfb.lngt dffit cov.r cook.d hat inf
## 1 -0.18956 0.357720 0.06567 -0.46099 0.991 6.84e-02 0.0937
## 2 0.00874 -0.016643 -0.00287 0.02179 1.220 1.64e-04 0.0896
## 3 0.00787 0.018905 -0.00401 0.06804 1.137 1.59e-03 0.0357
## 4 -0.18660 -0.021515 0.25189 0.32177 1.370 3.53e-02 0.2199 *
## 5 0.10053 -0.095424 -0.12862 -0.25405 1.001 2.12e-02 0.0434
## 6 0.01724 0.037309 -0.02326 0.07868 1.188 2.13e-03 0.0730
## 7 -0.00220 -0.118286 0.06168 0.21490 1.181 1.57e-02 0.1001
## 8 -0.04368 0.300412 -0.00905 0.39451 1.156 5.19e-02 0.1347
## 9 -0.51223 0.091673 0.48448 -0.75341 0.498 1.47e-01 0.0575 *
## 10 0.00348 0.004585 -0.00662 -0.01232 1.309 5.24e-05 0.1513
## 11 0.14051 -0.144236 -0.13455 -0.17635 1.390 1.07e-02 0.2108 *
## 12 0.09319 0.144716 -0.11592 0.34418 1.009 3.87e-02 0.0688
## 13 -0.15960 -0.145946 0.18100 -0.44800 0.877 6.26e-02 0.0633
## 14 0.00138 -0.000294 -0.00129 0.00198 1.179 1.35e-06 0.0579
## 15 0.01490 -0.053058 0.01964 0.13662 1.113 6.35e-03 0.0439
## 16 0.01273 0.074948 -0.00416 0.21339 1.021 1.51e-02 0.0375
## 17 0.02467 0.006238 -0.06106 -0.18042 1.056 1.09e-02 0.0373
## 18 0.22102 -0.189847 -0.18317 0.24735 1.237 2.08e-02 0.1378
## 19 0.29909 -0.241565 -0.25052 0.33446 1.154 3.75e-02 0.1183
## 20 0.25919 -0.286940 -0.17993 0.36443 1.051 4.37e-02 0.0866
## 21 -0.34465 0.129430 0.31431 -0.44827 0.868 6.26e-02 0.0618
## 22 0.01184 0.049389 -0.02007 0.08858 1.199 2.70e-03 0.0821
## 23 -0.01238 0.013685 0.01391 0.02505 1.172 2.16e-04 0.0532
## 24 0.14551 -0.274601 -0.05041 0.35387 1.080 4.14e-02 0.0937
## 25 0.08368 -0.044667 -0.14166 -0.34261 0.853 3.66e-02 0.0379
## 26 0.00856 -0.017223 -0.00635 -0.01943 1.313 1.30e-04 0.1538 *
## 27 0.06929 0.028486 -0.07258 0.14857 1.137 7.53e-03 0.0588
## 28 -0.15266 0.068593 0.17257 0.18494 1.473 1.18e-02 0.2539 *
## 29 -0.09230 0.180685 0.02550 -0.24704 1.122 2.06e-02 0.0802
## 30 0.05130 -0.013198 -0.04783 0.07173 1.170 1.77e-03 0.0586
## 31 0.16979 -0.009770 -0.17773 0.23299 1.189 1.84e-02 0.1086
## 32 -0.12822 0.163025 0.12296 0.20824 1.177 1.48e-02 0.0963
<- lm(log(group) ~ log(popularity) + length +log(popularity):length , data=data.frame(sixs1))
model6_1 influence.measures(model6_1)
## Influence measures of
## lm(formula = log(group) ~ log(popularity) + length + log(popularity):length, data = data.frame(sixs1)) :
##
## dfb.1_ dfb.lg.. dfb.lngt dfb.l... dffit cov.r cook.d hat inf
## 1 -0.13500 8.94e-02 0.05592 -2.15e-02 -0.45372 0.961 4.97e-02 0.0940
## 2 0.00768 -5.28e-03 -0.00323 1.52e-03 0.02560 1.270 1.70e-04 0.0899
## 3 0.02461 -2.01e-02 -0.02240 2.46e-02 0.07979 1.177 1.64e-03 0.0394
## 4 -0.19939 1.62e-01 0.22322 -1.68e-01 0.25302 1.877 1.65e-02 0.3929 *
## 5 -0.05480 1.28e-01 0.04660 -1.48e-01 -0.27774 1.035 1.92e-02 0.0605
## 6 -0.01504 3.19e-02 0.01401 -2.81e-02 0.05534 1.277 7.93e-04 0.0983
## 7 -0.06429 6.06e-02 0.09681 -8.17e-02 0.19999 1.259 1.03e-02 0.1202
## 8 -0.24134 3.26e-01 0.22096 -2.82e-01 0.43832 1.367 4.86e-02 0.2298
## 9 -0.31415 5.61e-03 0.27921 1.23e-02 -0.74940 0.379 1.09e-01 0.0575 *
## 10 0.12857 -1.20e-01 -0.15300 1.33e-01 -0.19657 1.579 9.98e-03 0.2784 *
## 11 -0.00106 3.65e-03 0.00141 -4.50e-03 -0.00656 1.921 1.12e-05 0.3980 *
## 12 -0.07797 1.95e-01 0.07318 -1.72e-01 0.36227 1.045 3.24e-02 0.0888
## 13 0.05807 -2.41e-01 -0.05642 2.15e-01 -0.52149 0.790 6.28e-02 0.0762
## 14 0.00108 -9.38e-05 -0.00096 2.52e-05 0.00240 1.228 1.49e-06 0.0579
## 15 0.04920 -6.06e-02 -0.02772 5.05e-02 0.15550 1.135 6.16e-03 0.0491
## 16 0.05799 -4.80e-02 -0.05401 6.43e-02 0.22956 1.007 1.31e-02 0.0407
## 17 -0.05166 8.51e-02 0.03472 -8.56e-02 -0.18707 1.095 8.84e-03 0.0472
## 18 0.37338 -2.77e-01 -0.33728 2.30e-01 0.41430 1.306 4.34e-02 0.1994
## 19 0.43053 -3.07e-01 -0.38866 2.54e-01 0.48423 1.162 5.82e-02 0.1634
## 20 0.34299 -2.64e-01 -0.28542 2.06e-01 0.45593 1.016 5.07e-02 0.1089
## 21 -0.26519 9.02e-02 0.23741 -6.69e-02 -0.44216 0.821 4.58e-02 0.0632
## 22 -0.02108 3.74e-02 0.01948 -3.27e-02 0.05913 1.307 9.06e-04 0.1184
## 23 0.00922 -2.65e-02 -0.00915 3.25e-02 0.06002 1.242 9.33e-04 0.0751
## 24 0.10551 -6.99e-02 -0.04371 1.68e-02 0.35460 1.071 3.12e-02 0.0940
## 25 -0.09328 1.75e-01 0.06816 -1.86e-01 -0.37579 0.836 3.33e-02 0.0503
## 26 0.00973 -7.09e-03 -0.00738 2.19e-03 -0.02876 1.367 2.14e-04 0.1547
## 27 0.00618 4.76e-02 -0.00453 -4.34e-02 0.14093 1.186 5.10e-03 0.0649
## 28 -0.08501 -3.25e-02 0.09196 5.05e-02 0.24183 1.537 1.51e-02 0.2655 *
## 29 -0.07359 5.53e-02 0.03221 -2.17e-02 -0.24028 1.143 1.46e-02 0.0809
## 30 0.03579 -5.93e-03 -0.03191 3.38e-03 0.07308 1.213 1.38e-03 0.0587
## 31 -0.00367 1.13e-01 0.00793 -1.17e-01 0.22064 1.308 1.25e-02 0.1510
## 32 0.02225 -1.19e-01 -0.03648 1.62e-01 0.30868 1.218 2.41e-02 0.1329
<- lm(log(group) ~ log(popularity) + length + log(popularity):length, data=data.frame(sixs1)[-c(4,9,10,11,28),])
model6_1 summary(model6_1)
##
## Call:
## lm(formula = log(group) ~ log(popularity) + length + log(popularity):length,
## data = data.frame(sixs1)[-c(4, 9, 10, 11, 28), ])
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.5025 -0.4002 0.1047 0.5520 0.8509
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 12.095498 2.280821 5.303 2.21e-05 ***
## log(popularity) 0.077736 0.184867 0.420 0.678
## length 0.086844 0.299463 0.290 0.774
## log(popularity):length -0.008444 0.025840 -0.327 0.747
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.7549 on 23 degrees of freedom
## Multiple R-squared: 0.02911, Adjusted R-squared: -0.09753
## F-statistic: 0.2299 on 3 and 23 DF, p-value: 0.8746
<- lm(log(group) ~ log(popularity) + length , data=data.frame(sixs1)[-c(4,9,11,28),])
model6_1 plot(model6_1, 1)
plot(model6_1, 2)
plot(model6_1, 3)
summary(model6_1)
##
## Call:
## lm(formula = log(group) ~ log(popularity) + length, data = data.frame(sixs1)[-c(4,
## 9, 11, 28), ])
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.5047 -0.2949 0.1793 0.5544 0.8505
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 12.54099 1.19847 10.464 1.27e-10 ***
## log(popularity) 0.01767 0.02587 0.683 0.501
## length 0.03017 0.14449 0.209 0.836
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.7281 on 25 degrees of freedom
## Multiple R-squared: 0.02025, Adjusted R-squared: -0.05813
## F-statistic: 0.2584 on 2 and 25 DF, p-value: 0.7743
I repeated this same analysis for the rest of the repeated samples.
<- lm(log(group) ~ log(popularity) + length +log(popularity):length , data=data.frame(sixs2))
model6_2 influence.measures(model6_2)
## Influence measures of
## lm(formula = log(group) ~ log(popularity) + length + log(popularity):length, data = data.frame(sixs2)) :
##
## dfb.1_ dfb.lg.. dfb.lngt dfb.l... dffit cov.r cook.d hat inf
## 1 -0.06464 0.05325 0.04841 -0.05947 -0.1868 1.046 8.74e-03 0.0357
## 2 -0.00335 -0.00881 -0.00288 0.02356 0.0833 1.215 1.79e-03 0.0625
## 3 0.01992 -0.07299 -0.01692 0.08426 0.1304 1.202 4.37e-03 0.0695
## 4 0.04224 -0.14014 -0.03275 0.15891 0.2500 1.086 1.57e-02 0.0654
## 5 -0.48448 -0.02837 0.35277 0.10330 -0.8779 0.221 1.30e-01 0.0508 *
## 6 -0.04327 0.09575 0.04049 -0.08699 0.1107 1.785 3.17e-03 0.3541 *
## 7 -0.06276 0.14486 0.04990 -0.11187 0.2445 1.185 1.52e-02 0.0989
## 8 -0.15956 0.01328 0.06442 0.12534 0.5923 1.053 8.51e-02 0.1587
## 9 -0.02884 0.26444 0.05841 -0.33633 -0.4971 0.981 5.97e-02 0.1111
## 10 -0.01662 -0.01191 0.01179 0.01277 -0.0501 1.207 6.51e-04 0.0484
## 11 0.05496 -0.00300 -0.04020 -0.00648 0.0900 1.197 2.09e-03 0.0538
## 12 0.10059 -0.03510 -0.01420 -0.02475 0.2814 1.170 2.00e-02 0.1047
## 13 -0.08140 0.16557 0.04079 -0.16514 -0.3102 0.943 2.34e-02 0.0510
## 14 5.61324 -3.24869 -6.53001 3.14991 -6.9880 6.907 1.08e+01 0.9120 *
## 15 0.09786 -0.04849 -0.05831 0.01604 0.1487 1.283 5.70e-03 0.1202
## 16 0.08079 -0.01967 -0.00958 -0.03139 0.2309 1.248 1.36e-02 0.1236
## 17 -0.15532 0.02469 0.11405 0.00463 -0.2321 1.084 1.35e-02 0.0589
## 18 -0.01179 0.04012 0.00936 -0.04571 -0.0716 1.226 1.32e-03 0.0664
## 19 0.03074 -0.07829 -0.01877 0.08365 0.1413 1.167 5.11e-03 0.0562
## 20 -0.03904 -0.41314 -0.08565 0.58577 0.8914 1.134 1.90e-01 0.2604
## 21 0.04399 0.02515 -0.00433 -0.10090 -0.3826 0.980 3.57e-02 0.0783
## 22 -0.01096 0.04108 0.00951 -0.04761 -0.0734 1.231 1.39e-03 0.0704
## 23 0.18962 -0.08513 -0.14079 0.04072 0.2329 1.224 1.38e-02 0.1128
## 24 -0.24332 0.69180 0.23848 -0.64281 0.8335 1.167 1.68e-01 0.2566
## 25 -0.02013 -0.06157 0.01295 0.05522 -0.1586 1.146 6.41e-03 0.0539
## 26 -0.00258 -0.05466 -0.00183 0.05529 -0.0834 1.332 1.80e-03 0.1374
## 27 0.03136 -0.02964 -0.00842 0.01753 0.0863 1.204 1.92e-03 0.0567
## 28 0.10637 -0.02589 -0.01261 -0.04133 0.3040 1.199 2.34e-02 0.1236
## 29 -0.17598 0.01074 0.12873 0.01977 -0.2866 0.994 2.02e-02 0.0541
## 30 0.01348 -0.01377 -0.01134 0.01854 0.0580 1.188 8.68e-04 0.0386
## 31 0.03874 -0.02343 -0.02506 0.01611 0.0651 1.193 1.10e-03 0.0439
## 32 0.11297 -0.03485 -0.01497 -0.03397 0.3184 1.157 2.55e-02 0.1114
<- lm(log(group) ~ log(popularity) + length +log(popularity):length , data=data.frame(sixs2)[-c(5, 6, 14),])
model6_2 summary(model6_2)
##
## Call:
## lm(formula = log(group) ~ log(popularity) + length + log(popularity):length,
## data = data.frame(sixs2)[-c(5, 6, 14), ])
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.2100 -0.3268 0.1861 0.3124 1.1490
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 9.09429 2.26640 4.013 0.00048 ***
## log(popularity) 0.25316 0.15596 1.623 0.11708
## length 0.55470 0.30136 1.841 0.07757 .
## log(popularity):length -0.03698 0.02115 -1.749 0.09262 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.6397 on 25 degrees of freedom
## Multiple R-squared: 0.1609, Adjusted R-squared: 0.06023
## F-statistic: 1.598 on 3 and 25 DF, p-value: 0.2149
<- lm(log(group) ~ log(popularity) + length , data=data.frame(sixs2)[-c(5, 14),])
model6_2 plot(model6_2, 1)
plot(model6_2, 2)
plot(model6_2, 3)
summary(model6_2)
##
## Call:
## lm(formula = log(group) ~ log(popularity) + length, data = data.frame(sixs2)[-c(5,
## 14), ])
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.1971 -0.3647 0.1647 0.4467 1.0144
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 12.71001 1.00178 12.687 6.87e-13 ***
## log(popularity) -0.01635 0.01928 -0.848 0.404
## length 0.06784 0.12617 0.538 0.595
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.6531 on 27 degrees of freedom
## Multiple R-squared: 0.05738, Adjusted R-squared: -0.01244
## F-statistic: 0.8218 on 2 and 27 DF, p-value: 0.4503
<- lm(log(group) ~ log(popularity) + length +log(popularity):length, data=data.frame(sixs3))
model6_3 influence.measures(model6_3)
## Influence measures of
## lm(formula = log(group) ~ log(popularity) + length + log(popularity):length, data = data.frame(sixs3)) :
##
## dfb.1_ dfb.lg.. dfb.lngt dfb.l... dffit cov.r cook.d hat inf
## 1 -0.01564 0.05515 1.61e-02 -0.05695 0.08464 1.693 1.86e-03 0.3184 *
## 2 -0.00461 0.00588 3.97e-03 -0.00801 -0.02339 1.202 1.42e-04 0.0398
## 3 0.07245 -0.01920 -6.38e-02 0.02361 0.20836 1.034 1.08e-02 0.0396
## 4 0.12572 0.03616 -1.12e-01 -0.04739 0.37148 0.942 3.34e-02 0.0671
## 5 0.04755 0.50311 -2.75e-03 -0.70071 -1.29256 0.884 3.73e-01 0.2783 *
## 6 0.01140 -0.01364 -9.53e-03 0.01745 0.04917 1.192 6.26e-04 0.0387
## 7 0.00682 -0.01533 -7.95e-03 0.02926 0.10121 1.176 2.64e-03 0.0467
## 8 -0.11127 0.04273 6.47e-02 0.06680 0.50501 0.939 6.11e-02 0.1027
## 9 -0.22551 0.20200 1.67e-01 -0.16744 -0.41197 0.737 3.89e-02 0.0453
## 10 0.00771 -0.01003 -6.70e-03 0.01391 0.04102 1.198 4.36e-04 0.0401
## 11 0.04071 -0.04412 -3.26e-02 0.05030 0.13312 1.123 4.52e-03 0.0373
## 12 -0.03937 0.03283 5.27e-02 -0.04102 0.07626 1.449 1.51e-03 0.2045 *
## 13 -0.07395 0.07976 5.91e-02 -0.09039 -0.23849 0.972 1.40e-02 0.0372
## 14 -0.03910 0.07674 3.48e-02 -0.06709 0.11626 1.309 3.49e-03 0.1282
## 15 0.17044 -0.09435 -1.51e-01 0.06910 0.21718 1.206 1.20e-02 0.0995
## 16 -0.04436 0.02747 8.63e-02 -0.05523 0.20686 1.223 1.09e-02 0.1043
## 17 -0.44640 0.30602 3.77e-01 -0.23890 -0.59079 0.630 7.64e-02 0.0657
## 18 -0.00940 0.01129 7.87e-03 -0.01449 -0.04093 1.196 4.34e-04 0.0388
## 19 0.15826 -0.11633 -1.33e-01 0.08860 0.19545 1.205 9.77e-03 0.0920
## 20 0.23958 -0.18922 -1.70e-01 0.11126 0.37366 1.242 3.53e-02 0.1622
## 21 -0.59196 0.46264 4.95e-01 -0.34509 -0.70457 0.926 1.17e-01 0.1528
## 22 -0.00579 -0.02115 -1.19e-03 0.03941 0.09037 1.351 2.11e-03 0.1497
## 23 0.02605 -0.03758 -2.38e-02 0.05671 0.17509 1.089 7.75e-03 0.0418
## 24 -0.01214 -0.01787 6.39e-06 0.06641 0.28237 1.018 1.97e-02 0.0579
## 25 0.54189 -1.11478 -5.13e-01 1.06443 -1.47921 0.668 4.59e-01 0.2562 *
## 26 -0.00837 0.00460 5.30e-03 0.00137 0.02511 1.438 1.63e-04 0.1963 *
## 27 0.04607 -0.00637 -5.81e-02 0.00370 -0.11531 1.299 3.43e-03 0.1221
## 28 0.25669 -0.20164 -1.82e-01 0.11636 0.40136 1.267 4.07e-02 0.1801
## 29 -0.11692 0.03240 1.04e-01 -0.01734 -0.20311 1.153 1.05e-02 0.0723
## 30 0.00163 -0.00440 -1.45e-03 0.00391 -0.00768 1.287 1.53e-05 0.1016
## 31 0.01695 -0.01580 -1.97e-03 0.00348 0.06990 1.249 1.26e-03 0.0813
## 32 0.16393 -0.14369 -1.90e-01 0.15586 -0.21441 2.890 1.19e-02 0.6016 *
<- lm(log(group) ~ log(popularity) + length +log(popularity):length, data=data.frame(sixs3)[-c(1, 5, 12, 25, 26, 32),])
model6_3 summary(model6_3)
##
## Call:
## lm(formula = log(group) ~ log(popularity) + length + log(popularity):length,
## data = data.frame(sixs3)[-c(1, 5, 12, 25, 26, 32), ])
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.39124 -0.35541 0.08272 0.52945 0.78274
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 10.49879 2.18473 4.806 8.45e-05 ***
## log(popularity) 0.10832 0.19842 0.546 0.591
## length 0.31845 0.26511 1.201 0.242
## log(popularity):length -0.01356 0.02481 -0.546 0.590
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.6683 on 22 degrees of freedom
## Multiple R-squared: 0.1268, Adjusted R-squared: 0.007683
## F-statistic: 1.065 on 3 and 22 DF, p-value: 0.3843
<- lm(log(group) ~ log(popularity) + length , data=data.frame(sixs3)[-c(1, 25, 26, 32),])
model6_3 plot(model6_3, 1)
plot(model6_3, 2)
plot(model6_3, 3)
summary(model6_3)
##
## Call:
## lm(formula = log(group) ~ log(popularity) + length, data = data.frame(sixs3)[-c(1,
## 25, 26, 32), ])
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.4987 -0.2217 0.1582 0.5012 0.8611
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 12.40465 0.86899 14.275 1.6e-13 ***
## log(popularity) -0.01470 0.02657 -0.553 0.585
## length 0.09467 0.09357 1.012 0.321
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.6779 on 25 degrees of freedom
## Multiple R-squared: 0.06658, Adjusted R-squared: -0.008093
## F-statistic: 0.8916 on 2 and 25 DF, p-value: 0.4226
From the p-values of the above models, we observe that none of the variables appear to be statistically significant. I hypothesized that this was due to randomness of passwords below a certain commonness where the popularity fluctuates between large ranges. In testing this hypothesis, I decide to repeat the experiment on a smaller range of most common passwords.
I took samples from the top 1000 passwords increasing by 10 times until 10000 and repeated each sampling three times. These were the results that I gathered.
<- lm(log(group) ~ log(popularity) + length + log(popularity):length, data=data.frame(threes1))
model3_1 influence.measures(model3_1)
## Influence measures of
## lm(formula = log(group) ~ log(popularity) + length + log(popularity):length, data = data.frame(threes1)) :
##
## dfb.1_ dfb.lg.. dfb.lngt dfb.l... dffit cov.r cook.d hat inf
## 1 0.01299 -0.03985 -0.010848 0.036023 -0.21170 1.049 1.12e-02 0.0437
## 2 -0.04008 0.02294 0.036034 -0.017476 -0.12268 1.155 3.86e-03 0.0446
## 3 -0.05071 0.07323 0.052594 -0.078497 0.11932 1.616 3.69e-03 0.2879 *
## 4 0.01292 -0.08614 -0.034099 0.147982 0.49337 0.830 5.69e-02 0.0767
## 5 0.85330 -0.90238 -0.756567 0.759714 -1.17265 0.519 2.80e-01 0.1562 *
## 6 0.01359 -0.02557 -0.013869 0.032424 0.08859 1.221 2.03e-03 0.0678
## 7 -2.72203 2.65145 3.390396 -3.192435 4.47827 9.871 4.85e+00 0.9114 *
## 8 0.07644 -0.05139 -0.068530 0.040480 0.18040 1.105 8.24e-03 0.0478
## 9 0.07992 -0.01741 -0.036583 -0.061194 -0.40283 1.124 4.04e-02 0.1260
## 10 0.00720 -0.00217 -0.001168 -0.010691 -0.10142 1.166 2.64e-03 0.0419
## 11 -0.11572 0.11022 0.081726 -0.072451 -0.19177 1.587 9.50e-03 0.2812 *
## 12 -0.03592 0.10849 0.048645 -0.136705 0.33188 1.386 2.81e-02 0.2138
## 13 -0.76628 0.74082 0.544877 -0.500552 -1.27750 0.725 3.52e-01 0.2318 *
## 14 -0.01373 0.00167 0.012509 -0.000190 -0.08807 1.177 2.00e-03 0.0423
## 15 -0.08993 0.10103 0.079583 -0.085614 0.15588 1.218 6.24e-03 0.0858
## 16 0.09346 -0.06411 -0.083764 0.050686 0.21198 1.069 1.13e-02 0.0486
## 17 -0.03036 0.04928 0.028329 -0.057578 -0.14803 1.184 5.62e-03 0.0663
## 18 0.00171 -0.00112 -0.001530 0.000881 0.00419 1.214 4.55e-06 0.0472
## 19 0.06118 -0.08169 -0.051129 0.082676 0.19225 1.149 9.40e-03 0.0669
## 20 -0.19410 0.22930 0.171473 -0.195322 0.40757 0.897 3.97e-02 0.0678
## 21 0.16142 -0.13791 -0.121218 0.071475 -0.35080 1.042 3.04e-02 0.0844
## 22 -0.01971 0.00105 0.015496 0.007183 -0.08995 1.267 2.09e-03 0.0974
## 23 0.05160 -0.03365 -0.046289 0.026351 0.12894 1.155 4.26e-03 0.0469
## 24 0.13265 -0.11327 -0.118311 0.092805 0.17105 1.237 7.51e-03 0.1006
## 25 -0.01803 0.03558 0.035016 -0.078334 -0.41392 0.670 3.84e-02 0.0387
## 26 -0.00975 -0.00821 0.000968 0.027042 0.11630 1.250 3.49e-03 0.0927
## 27 -0.00515 0.01524 0.004312 -0.013759 0.07969 1.185 1.64e-03 0.0438
## 28 0.08014 -0.08233 -0.074938 0.083831 0.18742 1.065 8.83e-03 0.0398
## 29 0.01060 -0.00244 -0.004896 -0.007876 -0.05263 1.322 7.18e-04 0.1273
## 30 -0.00278 -0.00800 -0.001642 0.018437 0.07065 1.254 1.29e-03 0.0849
## 31 -0.01611 0.01437 0.014351 -0.011849 -0.01860 1.429 8.96e-05 0.1906 *
## 32 -0.05572 -0.02701 0.012262 0.116852 0.53387 0.880 6.73e-02 0.0970
<- lm(log(group) ~ log(popularity) + length + log(popularity):length, data=data.frame(threes1)[-c(3, 5, 7, 11, 13, 31),])
model3_1 influence.measures(model3_1)
## Influence measures of
## lm(formula = log(group) ~ log(popularity) + length + log(popularity):length, data = data.frame(threes1)[-c(3, 5, 7, 11, 13, 31), ]) :
##
## dfb.1_ dfb.lg.. dfb.lngt dfb.l... dffit cov.r cook.d hat inf
## 1 0.05126 -0.064016 -0.04172 0.052229 -0.3330 0.941 0.026874 0.0614
## 2 -0.10539 0.096923 0.09583 -0.088468 -0.2296 1.144 0.013375 0.0725
## 4 -0.15212 0.132475 0.17120 -0.147087 0.5554 0.815 0.071372 0.1001
## 6 0.04257 -0.040515 -0.04906 0.046521 -0.0868 1.457 0.001969 0.1787
## 8 0.13517 -0.127190 -0.12208 0.115036 0.2393 1.190 0.014596 0.0909
## 9 -0.20549 0.219902 0.24499 -0.262436 -0.4953 1.348 0.061859 0.2322
## 10 -0.00683 0.007764 0.01183 -0.014166 -0.1389 1.171 0.004960 0.0460
## 12 -0.12459 0.136877 0.12176 -0.133725 0.2464 1.987 0.015828 0.4039 *
## 14 -0.03311 0.026736 0.03118 -0.025768 -0.1568 1.184 0.006317 0.0574
## 15 -0.20518 0.210295 0.18031 -0.184008 0.3058 1.565 0.024191 0.2670 *
## 16 0.17189 -0.162219 -0.15510 0.146544 0.2958 1.140 0.022047 0.0952
## 17 0.42586 -0.409925 -0.49237 0.472574 -0.8040 1.061 0.153590 0.2313
## 18 -0.02492 0.023376 0.02253 -0.021169 -0.0454 1.314 0.000540 0.0870
## 19 0.04510 -0.044052 -0.05236 0.051042 -0.0783 1.998 0.001606 0.3983 *
## 20 -0.48681 0.503593 0.42645 -0.439110 0.7950 0.868 0.145227 0.1772
## 21 0.03560 -0.031594 0.02597 -0.033446 -0.5285 1.143 0.068827 0.1743
## 22 -0.06428 0.053548 0.06038 -0.050065 -0.2001 1.351 0.010373 0.1469
## 23 0.08264 -0.077407 -0.07473 0.070136 0.1526 1.257 0.006024 0.0856
## 24 0.26915 -0.262956 -0.24026 0.234343 0.3486 1.857 0.031530 0.3756 *
## 25 -0.04898 0.051986 0.04839 -0.056992 -0.5521 0.394 0.059743 0.0414 *
## 26 0.01397 -0.019021 -0.01802 0.024180 0.1298 1.303 0.004380 0.1011
## 27 -0.01124 0.013934 0.00917 -0.011395 0.0706 1.265 0.001302 0.0616
## 28 0.03211 -0.032107 -0.01224 0.013505 0.2020 1.179 0.010434 0.0730
## 29 0.05010 -0.053536 -0.05970 0.063863 0.1193 1.569 0.003721 0.2388 *
## 30 -0.00230 0.000755 0.00217 -0.000282 0.0392 1.321 0.000401 0.0909
## 32 0.12426 -0.150435 -0.15399 0.185878 0.6923 0.696 0.106225 0.1117
summary(model3_1)
##
## Call:
## lm(formula = log(group) ~ log(popularity) + length + log(popularity):length,
## data = data.frame(threes1)[-c(3, 5, 7, 11, 13, 31), ])
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.33891 -0.36390 0.09031 0.27560 1.00699
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.81624 11.70029 0.070 0.945
## log(popularity) 0.33852 0.58025 0.583 0.566
## length 1.21549 1.65559 0.734 0.471
## log(popularity):length -0.07218 0.08225 -0.878 0.390
##
## Residual standard error: 0.5811 on 22 degrees of freedom
## Multiple R-squared: 0.2885, Adjusted R-squared: 0.1914
## F-statistic: 2.973 on 3 and 22 DF, p-value: 0.05384
<- lm(log(group) ~ log(popularity) + length , data=data.frame(threes1)[-c(7, 11, 13),])
model3_1 <- lm(log(group) ~ log(popularity) , data=data.frame(threes1)[-c(7, 11, 13),])
model3_1
plot(model3_1, 1)
plot(model3_1, 2)
plot(model3_1, 3)
summary(model3_1)
##
## Call:
## lm(formula = log(group) ~ log(popularity), data = data.frame(threes1)[-c(7,
## 11, 13), ])
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.39171 -0.32352 -0.08946 0.54202 1.17640
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 9.35254 1.35683 6.893 2.09e-07 ***
## log(popularity) -0.16552 0.06754 -2.451 0.021 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.6481 on 27 degrees of freedom
## Multiple R-squared: 0.182, Adjusted R-squared: 0.1517
## F-statistic: 6.006 on 1 and 27 DF, p-value: 0.02101
<- lm(log(group) ~ log(popularity) + length + log(popularity):length, data=data.frame(threes2))
model3_2 influence.measures(model3_2)
## Influence measures of
## lm(formula = log(group) ~ log(popularity) + length + log(popularity):length, data = data.frame(threes2)) :
##
## dfb.1_ dfb.lg.. dfb.lngt dfb.l... dffit cov.r cook.d hat inf
## 1 -0.070836 0.064284 0.045962 -4.09e-02 -0.20422 1.323 1.07e-02 0.1547
## 2 -0.034964 0.026359 0.025896 -2.02e-02 -0.12906 1.140 4.26e-03 0.0413
## 3 0.001212 0.005535 -0.004751 6.55e-04 0.09997 1.172 2.57e-03 0.0440
## 4 -0.003705 0.012501 -0.003057 -2.20e-03 0.13808 1.154 4.88e-03 0.0500
## 5 2.702502 -2.379805 -2.786002 2.45e+00 -3.52847 2.119 2.73e+00 0.7188 *
## 6 0.000005 -0.002045 0.001216 1.68e-05 -0.03073 1.208 2.45e-04 0.0456
## 7 0.084254 -0.074214 -0.056012 4.87e-02 0.24515 1.187 1.53e-02 0.0998
## 8 0.009740 -0.021464 -0.010961 2.77e-02 0.22386 1.030 1.25e-02 0.0435
## 9 -0.252242 0.224431 0.166345 -1.46e-01 -0.73013 0.736 1.20e-01 0.1138
## 10 0.420753 -0.502592 -0.405481 4.78e-01 -0.72168 1.601 1.31e-01 0.3666 *
## 11 -0.002231 0.006014 -0.000937 -1.32e-03 0.06044 1.208 9.45e-04 0.0520
## 12 0.048490 -0.035903 -0.036304 2.80e-02 0.18529 1.068 8.64e-03 0.0400
## 13 -0.030819 0.011087 0.030084 -1.77e-02 -0.26906 0.910 1.75e-02 0.0361
## 14 -0.800962 0.737852 0.855754 -7.86e-01 1.05845 1.023 2.61e-01 0.2665
## 15 0.002393 -0.003152 -0.002185 2.82e-03 -0.00649 1.317 1.09e-05 0.1221
## 16 0.230383 -0.211501 -0.148035 1.33e-01 0.66530 1.097 1.07e-01 0.1923
## 17 -0.739141 0.880279 0.804659 -9.62e-01 -1.59434 0.365 4.69e-01 0.1883 *
## 18 0.021176 -0.017987 -0.014475 1.22e-02 0.06375 1.233 1.05e-03 0.0694
## 19 0.002960 0.002580 -0.005067 1.67e-03 0.07974 1.180 1.64e-03 0.0412
## 20 -0.060287 0.077884 0.055089 -6.99e-02 0.15356 1.295 6.08e-03 0.1277
## 21 -0.033982 0.024497 0.025839 -1.96e-02 -0.13662 1.123 4.76e-03 0.0385
## 22 0.043220 -0.059491 -0.039395 5.31e-02 -0.13469 1.276 4.68e-03 0.1127
## 23 0.112486 -0.133067 -0.122390 1.45e-01 0.23596 1.389 1.43e-02 0.1952
## 24 0.071215 -0.080607 -0.082609 9.60e-02 0.21425 1.151 1.17e-02 0.0749
## 25 -0.144525 0.157686 0.167837 -1.87e-01 -0.34627 1.129 3.00e-02 0.1103
## 26 0.001559 -0.002799 -0.000185 9.10e-04 -0.02133 1.231 1.18e-04 0.0613
## 27 0.065594 -0.108514 -0.073688 1.22e-01 0.39732 1.094 3.92e-02 0.1144
## 28 0.129243 -0.113502 -0.086122 7.47e-02 0.37678 1.056 3.51e-02 0.0963
## 29 -0.236545 0.216049 0.152657 -1.37e-01 -0.68233 1.022 1.12e-01 0.1739
## 30 -0.000203 -0.000358 0.000455 -1.14e-04 -0.00818 1.208 1.73e-05 0.0425
## 31 0.055537 -0.063632 -0.064397 7.60e-02 0.17957 1.166 8.23e-03 0.0694
## 32 0.092233 -0.101580 -0.107080 1.20e-01 0.23444 1.188 1.40e-02 0.0969
<- lm(log(group) ~ log(popularity) + length + log(popularity):length, data=data.frame(threes2)[-c(5, 10, 17),])
model3_2 summary(model3_2)
##
## Call:
## lm(formula = log(group) ~ log(popularity) + length + log(popularity):length,
## data = data.frame(threes2)[-c(5, 10, 17), ])
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.4643 -0.2156 0.1235 0.3567 1.1888
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.49130 7.39029 0.202 0.842
## log(popularity) 0.23606 0.37969 0.622 0.540
## length 0.59891 1.20509 0.497 0.624
## log(popularity):length -0.02927 0.06183 -0.473 0.640
##
## Residual standard error: 0.6653 on 25 degrees of freedom
## Multiple R-squared: 0.12, Adjusted R-squared: 0.01438
## F-statistic: 1.136 on 3 and 25 DF, p-value: 0.3536
<- lm(log(group) ~ log(popularity) + length , data=data.frame(threes2)[-c(5, 17),])
model3_2 plot(model3_2, 1)
plot(model3_2, 2)
plot(model3_2, 3)
summary(model3_2)
##
## Call:
## lm(formula = log(group) ~ log(popularity) + length, data = data.frame(threes2)[-c(5,
## 17), ])
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.4875 -0.2673 0.1466 0.3390 1.1476
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.74276 1.01800 4.659 7.62e-05 ***
## log(popularity) 0.05297 0.03103 1.707 0.0993 .
## length 0.07605 0.13553 0.561 0.5793
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.6471 on 27 degrees of freedom
## Multiple R-squared: 0.1063, Adjusted R-squared: 0.04013
## F-statistic: 1.606 on 2 and 27 DF, p-value: 0.2192
<- lm(log(group) ~ log(popularity) + length , data=data.frame(threes3))
model3_3 influence.measures(model3_3)
## Influence measures of
## lm(formula = log(group) ~ log(popularity) + length, data = data.frame(threes3)) :
##
## dfb.1_ dfb.lg.. dfb.lngt dffit cov.r cook.d hat inf
## 1 -0.015268 -0.042230 0.05038 -0.13368 1.127 6.10e-03 0.0496
## 2 -0.000238 0.003975 -0.00255 0.00863 1.178 2.57e-05 0.0567
## 3 -0.047057 0.050119 0.04550 0.13023 1.106 5.77e-03 0.0391
## 4 -0.042727 -0.054662 0.14702 0.23970 1.121 1.94e-02 0.0780
## 5 -0.414801 0.404393 0.25319 -0.45926 1.268 7.07e-02 0.1989
## 6 -0.003365 0.002050 0.00294 -0.00479 1.175 7.91e-06 0.0547
## 7 0.076380 -0.072704 -0.04789 0.08485 1.330 2.48e-03 0.1681 *
## 8 0.014565 -0.026343 0.02186 0.13604 1.089 6.27e-03 0.0346
## 9 0.008966 -0.044566 0.02028 -0.08284 1.173 2.36e-03 0.0629
## 10 0.002878 -0.000674 -0.00329 0.00599 1.164 1.24e-05 0.0453
## 11 -0.241044 0.209795 0.20676 0.28132 1.278 2.69e-02 0.1665
## 12 0.138410 -0.104948 -0.10605 0.17101 1.150 9.96e-03 0.0717
## 13 0.443637 0.170083 -1.04890 -1.56631 0.133 4.06e-01 0.0748 *
## 14 -0.003578 -0.001867 0.00604 -0.01315 1.164 5.97e-05 0.0456
## 15 -0.053448 -0.231056 0.33796 0.52484 1.727 9.36e-02 0.3859 *
## 16 0.031979 0.042482 -0.07248 0.17525 1.093 1.04e-02 0.0475
## 17 0.123556 -0.418648 0.13898 -0.69561 0.643 1.36e-01 0.0701 *
## 18 -0.007904 -0.007558 0.01580 -0.03669 1.162 4.64e-04 0.0467
## 19 0.037571 0.005777 -0.05342 0.10809 1.133 4.00e-03 0.0448
## 20 -0.011378 0.003925 0.03819 0.15822 1.064 8.42e-03 0.0333
## 21 0.329816 -0.230765 -0.35055 -0.44383 1.031 6.40e-02 0.1013
## 22 -0.036847 0.034659 0.02340 -0.04103 1.315 5.81e-04 0.1559 *
## 23 -0.143003 0.092615 0.16093 0.20296 1.174 1.40e-02 0.0934
## 24 -0.135364 0.155521 0.09323 0.22814 1.090 1.75e-02 0.0622
## 25 -0.006627 -0.048298 0.04342 -0.12695 1.136 5.51e-03 0.0522
## 26 -0.011012 0.019864 -0.00291 -0.02421 1.436 2.02e-04 0.2262 *
## 27 -0.116620 0.063772 0.14537 0.18428 1.166 1.16e-02 0.0840
## 28 -0.083430 0.092993 0.06695 0.17475 1.090 1.03e-02 0.0465
## 29 0.049858 -0.057343 -0.03414 -0.08338 1.173 2.39e-03 0.0632
## 30 0.007792 -0.021568 0.00529 -0.03300 1.203 3.76e-04 0.0780
## 31 -0.037454 0.039074 0.03893 0.11536 1.113 4.54e-03 0.0376
## 32 0.249965 -0.031176 -0.35597 0.39836 1.352 5.37e-02 0.2248 *
<- lm(log(group) ~ log(popularity) + length , data=data.frame(threes3)[-c(7, 13, 15, 17, 22, 26, 32),])
model3_3 plot(model3_3, 1)
plot(model3_3, 2)
plot(model3_3, 3)
summary(model3_3)
##
## Call:
## lm(formula = log(group) ~ log(popularity) + length, data = data.frame(threes3)[-c(7,
## 13, 15, 17, 22, 26, 32), ])
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.1776 -0.2460 0.1170 0.5519 0.9354
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 6.11773 1.87508 3.263 0.00356 **
## log(popularity) -0.05995 0.07116 -0.842 0.40861
## length 0.17165 0.18810 0.913 0.37139
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.7388 on 22 degrees of freedom
## Multiple R-squared: 0.06365, Adjusted R-squared: -0.02148
## F-statistic: 0.7477 on 2 and 22 DF, p-value: 0.4851
<- lm(log(group) ~ log(popularity) + length, data=data.frame(fours1))
model4_1 #model4_1 <- lm(log(group) ~ log(popularity) + length + log(popularity):length, data=data.frame(fours1))
influence.measures(model4_1)
## Influence measures of
## lm(formula = log(group) ~ log(popularity) + length, data = data.frame(fours1)) :
##
## dfb.1_ dfb.lg.. dfb.lngt dffit cov.r cook.d hat inf
## 1 0.169899 -0.199526 -0.131129 -0.25035 1.1459 2.12e-02 0.0920
## 2 0.016684 -0.018683 -0.014946 -0.03186 1.1685 3.50e-04 0.0511
## 3 0.121748 -0.091779 -0.102736 0.14877 1.1660 7.57e-03 0.0746
## 4 -0.042957 0.038966 0.059255 0.18076 1.0477 1.09e-02 0.0352
## 5 0.056276 -0.064902 -0.046134 -0.09142 1.1770 2.87e-03 0.0676
## 6 -0.025083 0.150737 -0.088890 -0.28165 1.1674 2.68e-02 0.1105
## 7 0.000990 -0.002086 0.000143 0.00269 1.3860 2.49e-06 0.1984 *
## 8 0.083619 0.020634 -0.111177 0.24402 1.0254 1.97e-02 0.0466
## 9 -1.368365 0.855499 1.240133 -1.86595 0.0437 4.01e-01 0.0585 *
## 10 0.027726 -0.000479 -0.037237 0.04939 1.2557 8.42e-04 0.1170
## 11 -0.021105 0.043555 -0.002249 -0.05543 1.3978 1.06e-03 0.2062 *
## 12 0.068373 0.017991 -0.091450 0.20180 1.0672 1.37e-02 0.0466
## 13 0.056791 -0.059997 -0.059054 -0.14426 1.0990 7.06e-03 0.0406
## 14 -0.010560 0.010422 0.012648 0.03482 1.1496 4.18e-04 0.0368
## 15 -0.227698 0.199982 0.210883 0.26108 1.3314 2.33e-02 0.1909 *
## 16 0.083950 0.020716 -0.111616 0.24498 1.0244 1.98e-02 0.0466
## 17 -0.059481 -0.015977 0.079714 -0.17621 1.0895 1.05e-02 0.0466
## 18 0.235207 -0.086599 -0.273596 0.29631 1.4809 3.01e-02 0.2686 *
## 19 -0.110803 0.075562 0.121571 0.14607 1.2218 7.32e-03 0.1077
## 20 -0.018597 -0.060953 0.084550 0.17841 1.1785 1.09e-02 0.0893
## 21 0.198629 -0.236460 -0.146045 -0.27394 1.2138 2.55e-02 0.1316
## 22 0.015096 -0.000249 -0.018142 0.03617 1.1615 4.51e-04 0.0462
## 23 -0.023287 -0.010554 0.048579 0.07964 1.1986 2.18e-03 0.0803
## 24 0.012512 -0.082495 0.050704 0.15670 1.2207 8.42e-03 0.1091
## 25 0.003312 -0.071313 0.030611 -0.14885 1.1391 7.56e-03 0.0600
## 26 0.000688 0.000462 -0.001057 0.00261 1.1667 2.35e-06 0.0477
## 27 0.049602 -0.023244 -0.048723 0.07873 1.1558 2.13e-03 0.0507
## 28 -0.109846 0.278177 -0.002146 0.39070 1.0510 5.01e-02 0.0938
## 29 0.092069 -0.103893 -0.080679 -0.16869 1.1140 9.65e-03 0.0540
## 30 0.000672 -0.000730 -0.000653 -0.00150 1.1621 7.77e-07 0.0440
## 31 0.159665 -0.110761 -0.149579 0.16937 1.3798 9.86e-03 0.2043 *
## 32 0.049066 -0.133340 0.033175 0.19598 1.2716 1.32e-02 0.1469
#model4_1 <- lm(log(group) ~ log(popularity) + length + log(popularity):length, data=data.frame(fours1)[-c(7, 9, 10, 11, 18, 31),])
<- lm(log(group) ~ log(popularity) + length , data=data.frame(fours1)[-c(7, 9, 11, 15, 18, 31),])
model4_1 #model4_1 <- lm(log(group) ~ log(popularity) , data=data.frame(fours1)[-c(7, 9, 11, 32),])
plot(model4_1, 1)
plot(model4_1, 2)
plot(model4_1, 3)
summary(model4_1)
##
## Call:
## lm(formula = log(group) ~ log(popularity) + length, data = data.frame(fours1)[-c(7,
## 9, 11, 15, 18, 31), ])
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.01454 -0.32255 -0.04268 0.35813 1.13821
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 12.21071 1.81831 6.715 7.52e-07 ***
## log(popularity) -0.13120 0.04612 -2.845 0.00918 **
## length -0.22804 0.17521 -1.301 0.20598
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.5936 on 23 degrees of freedom
## Multiple R-squared: 0.2683, Adjusted R-squared: 0.2047
## F-statistic: 4.218 on 2 and 23 DF, p-value: 0.02752
<- lm(log(group) ~ log(popularity) + length + log(popularity):length, data=data.frame(fours2))
model4_2 #model4_2 <- lm(log(group) ~ log(popularity) , data=data.frame(fours2))
influence.measures(model4_2)
## Influence measures of
## lm(formula = log(group) ~ log(popularity) + length + log(popularity):length, data = data.frame(fours2)) :
##
## dfb.1_ dfb.lg.. dfb.lngt dfb.l... dffit cov.r cook.d hat inf
## 1 0.040969 -0.005175 -0.01619 -0.038964 -0.27556 1.326 1.94e-02 0.1733
## 2 -0.079727 0.090364 0.07605 -0.084053 0.12707 1.353 4.17e-03 0.1562
## 3 -0.104472 0.037489 0.07189 0.026710 0.45112 1.408 5.16e-02 0.2493
## 4 0.002733 -0.001577 0.00568 -0.007496 0.06045 1.345 9.47e-04 0.1427
## 5 0.134177 -0.133124 -0.11712 0.104855 -0.21478 1.290 1.18e-02 0.1401
## 6 -0.001902 -0.006810 0.00163 0.007385 -0.06313 1.239 1.03e-03 0.0735
## 7 0.092712 -0.095875 -0.08770 0.094071 0.16141 1.139 6.64e-03 0.0524
## 8 0.114133 -0.116043 -0.10620 0.110419 0.18279 1.133 8.49e-03 0.0576
## 9 0.040644 -0.020782 -0.01961 -0.016469 -0.23966 1.151 1.46e-02 0.0836
## 10 -0.003491 0.004793 -0.02251 0.020624 -0.20960 1.211 1.12e-02 0.0994
## 11 -0.000108 -0.000679 0.00385 -0.002653 0.03368 1.260 2.94e-04 0.0834
## 12 -0.040293 0.001778 0.03410 0.025153 0.33029 1.032 2.70e-02 0.0758
## 13 -1.760039 1.639642 1.66755 -1.506208 -2.05049 0.066 5.16e-01 0.1226 *
## 14 -0.002617 0.003698 0.00335 -0.005350 -0.01639 1.217 6.96e-05 0.0504
## 15 0.016039 -0.025606 -0.02317 0.040772 0.13851 1.160 4.91e-03 0.0524
## 16 -0.002238 -0.002312 0.01513 -0.008188 0.12835 1.215 4.24e-03 0.0758
## 17 -0.195063 0.180721 0.18604 -0.168156 -0.23829 1.180 1.44e-02 0.0947
## 18 -0.005631 0.005239 0.00534 -0.004827 -0.00663 1.306 1.14e-05 0.1143
## 19 -0.111226 0.140797 0.10641 -0.132689 0.26842 1.172 1.82e-02 0.1012
## 20 0.023720 -0.012974 -0.03370 0.031492 0.25363 0.982 1.58e-02 0.0429
## 21 0.044892 -0.047412 -0.03565 0.031559 -0.13798 1.183 4.88e-03 0.0624
## 22 -0.273157 0.325643 0.27477 -0.327146 0.44295 1.574 5.01e-02 0.3109 *
## 23 0.597159 -0.614806 -0.70660 0.724293 -0.98789 2.197 2.45e-01 0.5349 *
## 24 0.237023 -0.215881 -0.23063 0.208888 0.34637 0.943 2.91e-02 0.0606
## 25 0.197550 -0.284745 -0.20784 0.302470 -0.57779 1.180 8.24e-02 0.1959
## 26 0.018092 -0.014321 -0.01713 0.012597 0.02969 1.287 2.29e-04 0.1020
## 27 -0.006735 -0.000838 0.01193 -0.000308 0.09793 1.209 2.47e-03 0.0635
## 28 0.003888 -0.002122 0.00741 -0.010182 0.08040 1.354 1.67e-03 0.1502
## 29 0.163116 -0.112773 -0.10474 0.013672 -0.59152 0.888 8.24e-02 0.1138
## 30 -0.003926 0.005548 0.00503 -0.008026 -0.02459 1.216 1.57e-04 0.0504
## 31 0.151505 -0.139616 -0.14542 0.131527 0.19508 1.176 9.71e-03 0.0787
## 32 0.440753 -0.344943 -0.40991 0.286378 0.67746 1.219 1.13e-01 0.2350
<- lm(log(group) ~ log(popularity) + length + log(popularity):length, data=data.frame(fours2)[-c(13, 22, 23),])
model4_2 summary(model4_2)
##
## Call:
## lm(formula = log(group) ~ log(popularity) + length + log(popularity):length,
## data = data.frame(fours2)[-c(13, 22, 23), ])
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.02370 -0.31781 -0.06917 0.33670 0.78984
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 14.39057 4.10701 3.504 0.00175 **
## log(popularity) -0.29770 0.20155 -1.477 0.15215
## length -0.55581 0.53765 -1.034 0.31114
## log(popularity):length 0.02637 0.02682 0.983 0.33485
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.511 on 25 degrees of freedom
## Multiple R-squared: 0.3843, Adjusted R-squared: 0.3104
## F-statistic: 5.202 on 3 and 25 DF, p-value: 0.006262
#model4_2 <- lm(log(group) ~ log(popularity) + length , data=data.frame(fours2)[-c(3, 13, 23),])
<- lm(log(group) ~ log(popularity) , data=data.frame(fours2)[-c(4, 13, 23, 28),])
model4_2 <- lm(log(group) ~ log(popularity), data=fours2)
model4_2
plot(model4_2, 1)
plot(model4_2, 2)
plot(model4_2, 3)
summary(model4_2)
##
## Call:
## lm(formula = log(group) ~ log(popularity), data = fours2)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.68600 -0.35288 0.09768 0.46365 0.90897
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 9.56633 0.53851 17.764 <2e-16 ***
## log(popularity) -0.06903 0.02924 -2.361 0.0249 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.7043 on 30 degrees of freedom
## Multiple R-squared: 0.1567, Adjusted R-squared: 0.1286
## F-statistic: 5.573 on 1 and 30 DF, p-value: 0.02493
<- lm(log(group) ~ log(popularity) + length , data=data.frame(fours3))
model4_3 influence.measures(model4_3)
## Influence measures of
## lm(formula = log(group) ~ log(popularity) + length, data = data.frame(fours3)) :
##
## dfb.1_ dfb.lg.. dfb.lngt dffit cov.r cook.d hat inf
## 1 -4.93e-02 -0.10138 0.139112 -0.31372 1.058 3.26e-02 0.0751
## 2 1.74e-04 -0.00212 0.004934 0.01911 1.152 1.26e-04 0.0364
## 3 -1.29e-01 0.22352 0.047287 0.31546 1.041 3.28e-02 0.0701
## 4 8.69e-03 0.06832 -0.035229 0.25362 0.973 2.10e-02 0.0376
## 5 1.85e-01 -0.31994 -0.068278 -0.45006 0.910 6.38e-02 0.0706
## 6 -1.53e-01 0.14932 0.100848 -0.17802 1.228 1.08e-02 0.1178
## 7 1.50e-02 -0.01450 -0.009969 0.01763 1.245 1.07e-04 0.1076
## 8 -7.29e-02 0.06108 0.108110 0.25524 0.973 2.12e-02 0.0381
## 9 -1.97e-01 0.16829 0.138797 -0.26015 1.057 2.25e-02 0.0600
## 10 -9.34e-02 0.15952 0.035415 0.22101 1.122 1.65e-02 0.0729
## 11 1.05e-01 -0.03094 -0.133300 0.15816 1.268 8.59e-03 0.1382
## 12 -2.57e-02 -0.06547 0.135344 0.26643 1.095 2.38e-02 0.0749
## 13 1.52e-01 -0.23726 -0.066567 -0.29675 1.128 2.95e-02 0.0968
## 14 -8.98e-02 0.00258 0.132606 -0.17256 1.255 1.02e-02 0.1329
## 15 -3.51e-01 0.21673 0.423410 0.46700 1.211 7.26e-02 0.1760
## 16 6.71e-02 -0.12445 0.017901 0.17428 1.272 1.04e-02 0.1435
## 17 7.82e-01 -0.69676 -0.739930 -0.92712 0.851 2.56e-01 0.1616
## 18 -2.87e-01 0.24183 0.289264 0.35676 1.168 4.26e-02 0.1302
## 19 4.40e-02 -0.03824 -0.030814 0.05718 1.180 1.13e-03 0.0632
## 20 7.02e-02 -0.02942 -0.061298 0.16860 1.063 9.55e-03 0.0360
## 21 3.23e-03 -0.13393 0.048094 -0.42604 0.739 5.40e-02 0.0387
## 22 2.07e-03 0.01536 -0.023585 -0.05040 1.201 8.76e-04 0.0776
## 23 -2.92e-02 0.05858 -0.013369 -0.08754 1.266 2.64e-03 0.1274
## 24 7.73e-02 -0.13220 0.006536 0.17223 1.320 1.02e-02 0.1713 *
## 25 1.70e-02 -0.09305 0.040689 -0.16467 1.201 9.28e-03 0.0988
## 26 3.54e-03 -0.01180 0.000873 -0.02648 1.160 2.42e-04 0.0436
## 27 -1.77e-01 0.13237 0.199523 0.24574 1.163 2.04e-02 0.0991
## 28 1.93e-01 -0.11059 -0.203958 0.22541 1.313 1.74e-02 0.1750 *
## 29 4.98e-02 0.44286 -0.585264 -1.13890 0.699 3.62e-01 0.1627 *
## 30 6.21e-05 0.00081 -0.000375 0.00286 1.155 2.83e-06 0.0380
## 31 7.01e-02 -0.03156 -0.076157 0.10030 1.185 3.46e-03 0.0746
## 32 -9.69e-02 0.20307 0.022215 0.34250 0.949 3.77e-02 0.0538
<- lm(log(group) ~ log(popularity) + length , data=data.frame(fours3)[-c(24, 28, 29),])
model4_3 plot(model4_3, 1)
plot(model4_3, 2)
plot(model4_3, 3)
summary(model4_3)
##
## Call:
## lm(formula = log(group) ~ log(popularity) + length, data = data.frame(fours3)[-c(24,
## 28, 29), ])
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.68883 -0.45131 0.02802 0.53000 1.18339
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 10.44456 1.38983 7.515 5.6e-08 ***
## log(popularity) -0.11249 0.04491 -2.505 0.0188 *
## length -0.03061 0.13004 -0.235 0.8158
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.7881 on 26 degrees of freedom
## Multiple R-squared: 0.2154, Adjusted R-squared: 0.1551
## F-statistic: 3.57 on 2 and 26 DF, p-value: 0.04267
<- lm(log(group) ~ log(popularity) + length , data=data.frame(fives1))
model5_1 influence.measures(model5_1)
## Influence measures of
## lm(formula = log(group) ~ log(popularity) + length, data = data.frame(fives1)) :
##
## dfb.1_ dfb.lg.. dfb.lngt dffit cov.r cook.d hat inf
## 1 0.018410 -0.04137 0.00141 -0.06294 1.185 1.37e-03 0.0680
## 2 -0.001673 0.00087 0.00144 -0.00284 1.160 2.79e-06 0.0426
## 3 0.075686 -0.06424 -0.05168 0.08933 1.196 2.75e-03 0.0800
## 4 -0.048887 0.02124 0.09585 0.17401 1.089 1.02e-02 0.0458
## 5 0.086206 -0.07158 -0.10271 -0.16100 1.124 8.81e-03 0.0563
## 6 -0.107831 0.09205 0.07334 -0.12675 1.187 5.51e-03 0.0818
## 7 -0.260002 0.26294 0.21831 0.33892 1.140 3.84e-02 0.1134
## 8 -0.534262 0.19621 0.77343 0.81522 1.859 2.23e-01 0.4513 *
## 9 2.587712 -3.19476 -1.44613 -3.42733 0.116 1.74e+00 0.2403 *
## 10 -0.074724 0.05525 0.05546 -0.09815 1.158 3.31e-03 0.0566
## 11 0.044701 -0.02947 -0.03512 0.06437 1.158 1.43e-03 0.0489
## 12 0.096065 -0.07962 -0.06663 0.11533 1.178 4.56e-03 0.0738
## 13 -0.319377 0.48399 -0.04473 -0.69148 0.750 1.40e-01 0.0885
## 14 -0.000589 0.00717 -0.01228 -0.03124 1.164 3.37e-04 0.0477
## 15 0.077595 -0.05588 -0.05840 0.10413 1.151 3.72e-03 0.0543
## 16 -0.256076 0.32844 0.15823 0.40974 1.017 5.46e-02 0.0881
## 17 0.023272 0.00269 -0.07050 -0.14237 1.112 6.89e-03 0.0452
## 18 -0.051708 0.04463 0.03490 -0.06033 1.210 1.25e-03 0.0854
## 19 0.050355 -0.03880 -0.03652 0.06398 1.175 1.41e-03 0.0611
## 20 -0.098579 0.31736 -0.10365 0.47923 1.055 7.47e-02 0.1186
## 21 -0.005729 0.04549 -0.07233 -0.18740 1.084 1.18e-02 0.0480
## 22 -0.026857 0.02744 0.00797 -0.05183 1.155 9.25e-04 0.0438
## 23 0.009273 -0.00745 -0.00656 0.01139 1.191 4.48e-05 0.0676
## 24 0.071862 -0.02049 -0.07106 0.15863 1.083 8.50e-03 0.0393
## 25 -0.008157 -0.00368 0.01541 -0.02221 1.260 1.70e-04 0.1186
## 26 -0.090552 0.11722 0.05728 0.15241 1.169 7.94e-03 0.0771
## 27 0.011764 0.04901 -0.04007 0.16062 1.091 8.73e-03 0.0426
## 28 -0.022380 -0.00904 0.08034 0.16746 1.093 9.48e-03 0.0453
## 29 -0.220761 -0.05949 0.38305 -0.53032 1.010 9.02e-02 0.1168
## 30 -0.021412 0.07229 -0.01477 0.13491 1.136 6.21e-03 0.0542
## 31 -0.108115 0.21234 0.00835 0.29461 1.088 2.89e-02 0.0804
## 32 -0.014308 0.05929 -0.04441 -0.10190 1.627 3.58e-03 0.3187 *
<- lm(log(group) ~ log(popularity) + length , data=data.frame(fives1)[-c(8, 9, 32),])
model5_1 plot(model5_1, 1)
plot(model5_1, 2)
plot(model5_1, 3)
summary(model5_1)
##
## Call:
## lm(formula = log(group) ~ log(popularity) + length, data = data.frame(fives1)[-c(8,
## 9, 32), ])
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.9984 -0.2824 0.1741 0.5370 0.8765
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 9.99095 1.36201 7.335 8.64e-08 ***
## log(popularity) 0.02759 0.05669 0.487 0.631
## length 0.03621 0.13609 0.266 0.792
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.7915 on 26 degrees of freedom
## Multiple R-squared: 0.00961, Adjusted R-squared: -0.06657
## F-statistic: 0.1261 on 2 and 26 DF, p-value: 0.882
I decided to omit the second sample from the top 100000 passwords because the data set contained too many values that were vastly different from each other i.e. many influential points.
<- lm(log(group) ~ log(popularity) + length , data=data.frame(fives3))
model5_3 influence.measures(model5_3)
## Influence measures of
## lm(formula = log(group) ~ log(popularity) + length, data = data.frame(fives3)) :
##
## dfb.1_ dfb.lg.. dfb.lngt dffit cov.r cook.d hat inf
## 1 -0.106312 0.07344 0.080008 -0.17730 1.073 1.06e-02 0.0413
## 2 -0.128422 0.12349 0.088745 -0.15040 1.216 7.76e-03 0.1050
## 3 -0.116055 0.02532 0.199156 0.26500 1.168 2.37e-02 0.1066
## 4 -0.042589 -0.02692 0.132654 0.26055 1.053 2.26e-02 0.0587
## 5 -0.020388 0.01600 0.022023 0.02476 1.357 2.12e-04 0.1814 *
## 6 -0.134699 0.01845 0.178525 -0.24048 1.181 1.96e-02 0.1063
## 7 -0.084044 0.13940 0.044766 0.18897 1.146 1.21e-02 0.0751
## 8 0.137197 -0.05386 -0.158761 0.19177 1.218 1.26e-02 0.1147
## 9 0.155487 -0.13531 -0.153050 -0.17597 1.484 1.07e-02 0.2585 *
## 10 -0.039020 0.03611 0.027286 -0.04728 1.207 7.71e-04 0.0821
## 11 0.039210 -0.01150 -0.047933 0.06038 1.244 1.26e-03 0.1096
## 12 -0.035515 -0.03607 0.129260 0.26446 1.051 2.32e-02 0.0591
## 13 0.788312 -0.90424 -0.653303 -1.10937 0.464 3.06e-01 0.1020 *
## 14 -0.035590 0.07914 -0.019022 -0.11888 1.362 4.87e-03 0.1898 *
## 15 -0.012824 -0.05137 0.099131 0.22954 1.086 1.77e-02 0.0611
## 16 0.106970 -0.03433 -0.128830 0.16038 1.222 8.81e-03 0.1107
## 17 -0.083995 -0.02530 0.082144 -0.38966 0.752 4.55e-02 0.0344
## 18 0.012449 -0.02801 0.010310 0.05306 1.204 9.71e-04 0.0804
## 19 0.028455 -0.03678 0.004267 0.10541 1.123 3.80e-03 0.0390
## 20 0.125425 -0.10528 -0.090155 0.16877 1.118 9.67e-03 0.0560
## 21 -0.109206 -0.01644 0.103059 -0.45423 0.653 5.90e-02 0.0343 *
## 22 0.000395 0.00129 -0.000653 0.00566 1.153 1.10e-05 0.0361
## 23 0.035228 -0.10398 0.063004 0.23505 1.111 1.86e-02 0.0722
## 24 -0.333615 0.38295 0.274626 0.46506 1.029 7.01e-02 0.1063
## 25 -0.124040 0.25142 -0.064922 -0.43583 0.990 6.12e-02 0.0870
## 26 -0.029773 0.06333 -0.019663 -0.11458 1.194 4.51e-03 0.0836
## 27 0.013606 0.06379 -0.061231 0.15098 1.254 7.83e-03 0.1286
## 28 -0.183034 0.38069 0.025560 0.49811 1.101 8.12e-02 0.1395
## 29 0.132813 -0.19510 -0.076467 -0.23041 1.213 1.81e-02 0.1209
## 30 0.004558 -0.00348 -0.003355 0.00684 1.165 1.61e-05 0.0461
## 31 0.080434 -0.06905 -0.057468 0.10552 1.160 3.82e-03 0.0599
## 32 0.076559 0.08019 -0.161127 0.30263 1.162 3.08e-02 0.1136
<- lm(log(group) ~ log(popularity) + length , data=data.frame(fives3)[-c(5, 9, 13, 14, 21),])
model5_3 plot(model5_3, 1)
plot(model5_3, 2)
plot(model5_3, 3)
summary(model5_3)
##
## Call:
## lm(formula = log(group) ~ log(popularity) + length, data = data.frame(fives3)[-c(5,
## 9, 13, 14, 21), ])
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.9538 -0.2141 0.1816 0.5117 0.6349
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 10.59930 1.29983 8.154 2.25e-08 ***
## log(popularity) 0.01937 0.04389 0.441 0.663
## length -0.02079 0.12083 -0.172 0.865
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.694 on 24 degrees of freedom
## Multiple R-squared: 0.02537, Adjusted R-squared: -0.05585
## F-statistic: 0.3123 on 2 and 24 DF, p-value: 0.7347
From my results above, I concluded that sampling from the top 10000 passwords would produce the most meaningful results as the p-values reflect the true significance of the respective variables.
In picking the best predictive model from the models I sampled from 10000 passwords, I checked each model to see which one had the best predictive performance.
<- sum((residuals(model4_1)/(1-hatvalues(model4_1)))^2)
pr1 <- sum((residuals(model4_2)/(1-hatvalues(model4_2)))^2)
pr2 <- sum((residuals(model4_3)/(1-hatvalues(model4_3)))^2)
pr3 <- c(pr1, pr2, pr3)
pr pr
## [1] 10.20644 16.31431 19.71651
Thus, the smallest PRESS statistic indicates that the second model is the best predictive model.
library(leaps)
## Warning: package 'leaps' was built under R version 4.2.2
<- summary(regsubsets(log(group) ~ ., data=fours2)) sum
## Warning in leaps.setup(x, y, wt = wt, nbest = nbest, nvmax = nvmax, force.in =
## force.in, : 3 linear dependencies found
sum
## Subset selection object
## Call: regsubsets.formula(log(group) ~ ., data = fours2)
## 34 Variables (and intercept)
## Forced in Forced out
## Password05061986 FALSE FALSE
## Password10203040 FALSE FALSE
## Password1205 FALSE FALSE
## Password19101987 FALSE FALSE
## Password45M2DO5BS FALSE FALSE
## Passwordalice FALSE FALSE
## Passwordalissa FALSE FALSE
## Passwordambers FALSE FALSE
## Passwordbarcelon FALSE FALSE
## Passwordbattle FALSE FALSE
## Passwordbecky FALSE FALSE
## Passwordblond FALSE FALSE
## Passwordbooger FALSE FALSE
## Passworddanzig FALSE FALSE
## Passworddmitriy FALSE FALSE
## Passworddoggies FALSE FALSE
## Passworddunlop FALSE FALSE
## Passwordeverest FALSE FALSE
## Passwordflowers FALSE FALSE
## Passwordfrancesco FALSE FALSE
## Passwordilikepie FALSE FALSE
## Passwordjohanna FALSE FALSE
## Passwordpacific FALSE FALSE
## Passwordporsche1 FALSE FALSE
## Passwordputter FALSE FALSE
## Passwordrachelle FALSE FALSE
## Passwordrose FALSE FALSE
## Passwordsearch FALSE FALSE
## Passwordtoaster FALSE FALSE
## Passwordwill FALSE FALSE
## Passwordwoman FALSE FALSE
## popularity FALSE FALSE
## length FALSE FALSE
## complexityTrue FALSE FALSE
## 1 subsets of each size up to 8
## Selection Algorithm: exhaustive
## Password05061986 Password10203040 Password1205 Password19101987
## 1 ( 1 ) " " " " " " " "
## 2 ( 1 ) " " " " " " " "
## 3 ( 1 ) " " " " " " " "
## 4 ( 1 ) " " " " " " " "
## 5 ( 1 ) " " " " " " " "
## 6 ( 1 ) " " " " " " " "
## 7 ( 1 ) " " " " " " " "
## 8 ( 1 ) " " " " " " " "
## Password45M2DO5BS Passwordalice Passwordalissa Passwordambers
## 1 ( 1 ) " " " " " " " "
## 2 ( 1 ) " " " " " " " "
## 3 ( 1 ) " " " " " " " "
## 4 ( 1 ) " " " " " " " "
## 5 ( 1 ) " " " " " " " "
## 6 ( 1 ) " " " " " " " "
## 7 ( 1 ) " " " " " " " "
## 8 ( 1 ) " " " " " " " "
## Passwordbarcelon Passwordbattle Passwordbecky Passwordblond
## 1 ( 1 ) " " " " " " " "
## 2 ( 1 ) " " " " " " " "
## 3 ( 1 ) " " " " " " " "
## 4 ( 1 ) " " " " " " " "
## 5 ( 1 ) " " " " " " " "
## 6 ( 1 ) " " " " " " "*"
## 7 ( 1 ) " " " " " " "*"
## 8 ( 1 ) " " " " " " "*"
## Passwordbooger Passworddanzig Passworddmitriy Passworddoggies
## 1 ( 1 ) "*" " " " " " "
## 2 ( 1 ) "*" " " " " " "
## 3 ( 1 ) "*" " " " " " "
## 4 ( 1 ) "*" " " " " " "
## 5 ( 1 ) "*" " " " " " "
## 6 ( 1 ) "*" " " " " " "
## 7 ( 1 ) "*" " " " " " "
## 8 ( 1 ) "*" " " " " " "
## Passworddunlop Passwordeverest Passwordflowers Passwordfrancesco
## 1 ( 1 ) " " " " " " " "
## 2 ( 1 ) " " " " "*" " "
## 3 ( 1 ) " " " " "*" " "
## 4 ( 1 ) " " " " "*" " "
## 5 ( 1 ) " " " " "*" " "
## 6 ( 1 ) " " " " " " " "
## 7 ( 1 ) " " " " " " " "
## 8 ( 1 ) " " " " " " " "
## Passwordilikepie Passwordjohanna Passwordpacific Passwordporsche1
## 1 ( 1 ) " " " " " " " "
## 2 ( 1 ) " " " " " " " "
## 3 ( 1 ) " " " " " " " "
## 4 ( 1 ) " " " " " " " "
## 5 ( 1 ) " " " " "*" " "
## 6 ( 1 ) " " " " " " " "
## 7 ( 1 ) " " " " " " " "
## 8 ( 1 ) " " " " "*" " "
## Passwordputter Passwordrachelle Passwordrose Passwordsearch
## 1 ( 1 ) " " " " " " " "
## 2 ( 1 ) " " " " " " " "
## 3 ( 1 ) " " " " "*" " "
## 4 ( 1 ) " " " " "*" " "
## 5 ( 1 ) " " " " "*" " "
## 6 ( 1 ) " " " " " " "*"
## 7 ( 1 ) "*" " " " " "*"
## 8 ( 1 ) "*" " " " " "*"
## Passwordtoaster Passwordwill Passwordwoman popularity length
## 1 ( 1 ) " " " " " " " " " "
## 2 ( 1 ) " " " " " " " " " "
## 3 ( 1 ) " " " " " " " " " "
## 4 ( 1 ) " " " " " " "*" " "
## 5 ( 1 ) " " " " " " "*" " "
## 6 ( 1 ) " " "*" "*" "*" " "
## 7 ( 1 ) " " "*" "*" "*" " "
## 8 ( 1 ) " " "*" "*" "*" " "
## complexityTrue
## 1 ( 1 ) " "
## 2 ( 1 ) " "
## 3 ( 1 ) " "
## 4 ( 1 ) " "
## 5 ( 1 ) " "
## 6 ( 1 ) " "
## 7 ( 1 ) " "
## 8 ( 1 ) " "
#cbind(sum$which,AdjR2 = sum$adjr2, Cp=sum$cp, Rss = sum$rss)
Our final model is
model4_2
##
## Call:
## lm(formula = log(group) ~ log(popularity), data = fours2)
##
## Coefficients:
## (Intercept) log(popularity)
## 9.56633 -0.06903
<- -log(1/9)/(2*10^5)
k <- function(x, is_long, is_complex){
logist = (10/(1 + exp(-k*(x-750000))))
how_common if (is_long == TRUE && how_common >= 9) {
== 10
how_common
}else if (is_long == TRUE) {
= how_common + 1
how_common
}else if (is_long == FALSE && how_common <= 1) {
= 0
how_common
}else if (is_long == FALSE && how_common <= 1) {
= how_common - 1
how_common
}if (is_complex == TRUE && how_common >= 9) {
== 10
how_common
}else if (is_complex == TRUE) {
= how_common + 1
how_common
}else if (is_complex == FALSE && how_common <= 1) {
= 0
how_common
}else if (is_complex == FALSE && how_common <= 1) {
= how_common - 1
how_common
}print(how_common)
} logist(1000000, TRUE, TRUE)
## [1] 9.397171
I found that having top ten regular expression patterns to identify if a password is complex made most if not all passwords be considered non-complex. I redefined this definition by changing it from top ten to top five and to top three as it didn’t justify my hypothesis nor did it reflect the true nature of passwords. I noticed that from passwords that were 1000th place or lower became increasingly complex and the p-value for such changed from a non-significant result to a significant result. Furthermore, I noticed that as the passwords grew in complexity, they also grew in length. The interaction between length and complexity was testified to be true by a p-value of less than 0.05. For these trials, I didn’t require a stratified random sampling since I was able to source the data myself using python and that way I was able to use the entire data set.
Result for the top 10000 most commonly used passwords only looking at the variables complexity and length
<- read.csv("var", header=T)
lol <- lm(log(Group) ~ Complexity + Length + Complexity:Length, data=lol)
model summary(model)
##
## Call:
## lm(formula = log(Group) ~ Complexity + Length + Complexity:Length,
## data = lol)
##
## Residuals:
## Min 1Q Median 3Q Max
## -8.3427 -0.3751 0.3174 0.6801 1.2627
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 7.552384 0.050318 150.093 < 2e-16 ***
## ComplexityTrue 1.453229 0.255164 5.695 1.27e-08 ***
## Length 0.098790 0.007442 13.275 < 2e-16 ***
## ComplexityTrue:Length -0.188661 0.033491 -5.633 1.82e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.9884 on 9995 degrees of freedom
## Multiple R-squared: 0.01853, Adjusted R-squared: 0.01824
## F-statistic: 62.9 on 3 and 9995 DF, p-value: < 2.2e-16
In fact, I realized that the stratified random sampling I did initially taking four groups didn’t reflect the different ranges of how common a password is and that the strata I chose were too broad. I decided that I needed to increase the number of strata as well as increasing the number of observations in my data set.
In my attempts to choose strata that reflected true results, I took 100 strata instead of 4. My results from this is projected below taking the data set with 10000 most common passwords and having repeated three times for accuracy.
<- read.csv("var1", header=T)
lol hist(log(lol$popularity))
hist(lol$length)
shapiro.test(log(lol$popularity))
##
## Shapiro-Wilk normality test
##
## data: log(lol$popularity)
## W = 0.93797, p-value = 0.0001453
shapiro.test(lol$length)
##
## Shapiro-Wilk normality test
##
## data: lol$length
## W = 0.92336, p-value = 2.13e-05
<- lm(log(group) ~ log(popularity), data=lol)
model summary(model)
##
## Call:
## lm(formula = log(group) ~ log(popularity), data = lol)
##
## Residuals:
## Min 1Q Median 3Q Max
## -8.1101 -0.3261 0.3069 0.6350 1.1811
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 9.43405 0.57753 16.335 <2e-16 ***
## log(popularity) -0.07176 0.03222 -2.227 0.0282 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.189 on 98 degrees of freedom
## Multiple R-squared: 0.04818, Adjusted R-squared: 0.03847
## F-statistic: 4.961 on 1 and 98 DF, p-value: 0.02821
plot(model, 1)
plot(model, 2)
plot(model, 3)
<- read.csv("var2", header=T)
lol1 hist(log(lol1$popularity))
hist(lol1$length)
shapiro.test(log(lol1$popularity))
##
## Shapiro-Wilk normality test
##
## data: log(lol1$popularity)
## W = 0.95287, p-value = 0.001288
shapiro.test(lol1$length)
##
## Shapiro-Wilk normality test
##
## data: lol1$length
## W = 0.92187, p-value = 1.771e-05
<- lm(log(group) ~ log(popularity), data=lol1)
model1 summary(model1)
##
## Call:
## lm(formula = log(group) ~ log(popularity), data = lol1)
##
## Residuals:
## Min 1Q Median 3Q Max
## -5.9857 -0.3001 0.1806 0.5797 1.4101
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 9.70000 0.50695 19.134 < 2e-16 ***
## log(popularity) -0.08568 0.02833 -3.025 0.00318 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.025 on 98 degrees of freedom
## Multiple R-squared: 0.0854, Adjusted R-squared: 0.07606
## F-statistic: 9.15 on 1 and 98 DF, p-value: 0.003176
plot(model1, 1)
plot(model1, 2)
plot(model1, 3)
<- read.csv("var3", header=T)
lol2 hist(log(lol2$popularity))
hist(lol2$length)
shapiro.test(log(lol2$popularity))
##
## Shapiro-Wilk normality test
##
## data: log(lol2$popularity)
## W = 0.93408, p-value = 8.542e-05
shapiro.test(lol2$length)
##
## Shapiro-Wilk normality test
##
## data: lol2$length
## W = 0.90435, p-value = 2.277e-06
<- lm(log(group) ~ log(popularity), data=lol2)
model2 summary(model2)
##
## Call:
## lm(formula = log(group) ~ log(popularity), data = lol2)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.9957 -0.2875 0.2341 0.5700 1.2469
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 9.60840 0.43038 22.33 <2e-16 ***
## log(popularity) -0.07850 0.02371 -3.31 0.0013 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.9295 on 98 degrees of freedom
## Multiple R-squared: 0.1006, Adjusted R-squared: 0.0914
## F-statistic: 10.96 on 1 and 98 DF, p-value: 0.001304
plot(model2, 1)
plot(model2, 2)
plot(model2, 3)
influence.measures(model2)
## Influence measures of
## lm(formula = log(group) ~ log(popularity), data = lol2) :
##
## dfb.1_ dfb.lg.. dffit cov.r cook.d hat inf
## 1 0.281241 -0.395049 -0.624609 0.678 1.59e-01 0.0167 *
## 2 -0.539344 0.463762 -0.612860 0.775 1.63e-01 0.0234 *
## 3 0.053051 -0.121295 -0.326138 0.864 4.91e-02 0.0116 *
## 4 0.051313 -0.103949 -0.254559 0.928 3.10e-02 0.0120 *
## 5 0.085889 -0.132091 -0.239266 0.957 2.78e-02 0.0144
## 6 0.084058 -0.127766 -0.227659 0.965 2.53e-02 0.0146
## 7 0.107682 -0.145123 -0.214172 0.990 2.26e-02 0.0185
## 8 0.017859 -0.055082 -0.175220 0.976 1.51e-02 0.0111
## 9 0.098277 -0.129105 -0.182253 1.008 1.65e-02 0.0201
## 10 0.102478 -0.129855 -0.171861 1.019 1.47e-02 0.0233
## 11 -0.093294 0.060938 -0.167926 0.983 1.39e-02 0.0115
## 12 -0.267194 0.234250 -0.294291 0.986 4.24e-02 0.0273
## 13 -0.012756 -0.012622 -0.116813 1.003 6.80e-03 0.0101
## 14 0.017933 -0.038883 -0.100580 1.015 5.07e-03 0.0118
## 15 0.017565 -0.037413 -0.095454 1.017 4.57e-03 0.0118
## 16 -0.001403 -0.018194 -0.090601 1.015 4.11e-03 0.0104
## 17 0.025642 -0.040772 -0.077241 1.026 3.00e-03 0.0139
## 18 0.030224 -0.043828 -0.072862 1.030 2.67e-03 0.0157
## 19 0.034934 -0.042367 -0.051790 1.051 1.35e-03 0.0302
## 20 0.018206 -0.028813 -0.054251 1.031 1.48e-03 0.0139
## 21 0.021116 -0.025981 -0.032600 1.049 5.37e-04 0.0274
## 22 -0.013241 0.000249 -0.060185 1.023 1.82e-03 0.0100
## 23 0.005784 -0.015261 -0.044890 1.029 1.02e-03 0.0113
## 24 0.009422 -0.012068 -0.016285 1.044 1.34e-04 0.0222
## 25 0.004536 -0.012169 -0.036128 1.030 6.59e-04 0.0113
## 26 0.007596 -0.011513 -0.020434 1.035 2.11e-04 0.0147
## 27 0.004939 -0.007008 -0.011264 1.037 6.41e-05 0.0163
## 28 0.000277 -0.000372 -0.000547 1.040 1.51e-07 0.0186
## 29 -0.039226 0.029347 -0.057073 1.030 1.64e-03 0.0136
## 30 0.002159 -0.004356 -0.010633 1.033 5.71e-05 0.0120
## 31 -0.000282 -0.002777 -0.014138 1.031 1.01e-04 0.0104
## 32 -0.013693 0.017740 0.024430 1.042 3.01e-04 0.0212
## 33 -0.016900 0.022013 0.030613 1.041 4.73e-04 0.0207
## 34 -0.059520 0.070811 0.083655 1.054 3.53e-03 0.0353
## 35 -0.027422 0.034721 0.045886 1.043 1.06e-03 0.0234
## 36 -0.002685 0.000787 -0.008909 1.031 4.01e-05 0.0101
## 37 -0.004593 0.002601 -0.009854 1.032 4.90e-05 0.0107
## 38 -0.089641 0.080420 -0.095489 1.052 4.59e-03 0.0344
## 39 -0.060640 0.073834 0.090903 1.046 4.16e-03 0.0294
## 40 -0.039962 0.034174 -0.045827 1.042 1.06e-03 0.0225
## 41 -0.063807 0.077877 0.096295 1.044 4.67e-03 0.0289
## 42 0.002218 0.001001 0.014829 1.031 1.11e-04 0.0100
## 43 -0.051852 0.065177 0.085012 1.040 3.64e-03 0.0243
## 44 -0.012210 0.020338 0.040840 1.032 8.41e-04 0.0133
## 45 -0.002985 0.009924 0.032591 1.030 5.36e-04 0.0110
## 46 -0.013454 0.022857 0.046976 1.031 1.11e-03 0.0131
## 47 0.004647 0.000681 0.024609 1.030 3.06e-04 0.0100
## 48 -0.001350 0.001090 -0.001713 1.038 1.48e-06 0.0168
## 49 -0.124181 0.116083 -0.126464 1.085 8.06e-03 0.0635 *
## 50 -0.004284 0.013824 0.044847 1.028 1.01e-03 0.0110
## 51 0.011221 -0.005839 0.026219 1.030 3.47e-04 0.0105
## 52 -0.092255 0.112664 0.139459 1.037 9.76e-03 0.0288
## 53 0.012083 -0.005112 0.033233 1.029 5.57e-04 0.0102
## 54 -0.072120 0.091483 0.121306 1.032 7.39e-03 0.0232
## 55 -0.018695 0.032605 0.069034 1.026 2.40e-03 0.0129
## 56 -0.058397 0.076870 0.108903 1.029 5.96e-03 0.0199
## 57 -0.034443 0.050924 0.087168 1.026 3.82e-03 0.0152
## 58 -0.006635 0.019655 0.061373 1.025 1.90e-03 0.0111
## 59 -0.027500 0.043796 0.083144 1.025 3.47e-03 0.0138
## 60 -0.031796 0.049030 0.089137 1.024 3.99e-03 0.0143
## 61 -0.004290 0.018152 0.064798 1.024 2.11e-03 0.0109
## 62 0.027465 -0.022826 0.033103 1.039 5.53e-04 0.0191
## 63 0.001702 0.012051 0.063519 1.023 2.03e-03 0.0104
## 64 -0.052424 0.072867 0.113252 1.023 6.43e-03 0.0171
## 65 -0.000868 0.000792 -0.000904 1.067 4.13e-07 0.0431 *
## 66 0.013629 -0.000193 0.062235 1.023 1.95e-03 0.0100
## 67 0.022005 -0.009573 0.059386 1.024 1.78e-03 0.0103
## 68 -0.000862 0.017072 0.075158 1.020 2.84e-03 0.0105
## 69 0.031087 -0.019775 0.058016 1.026 1.70e-03 0.0113
## 70 0.037093 -0.032674 0.040561 1.049 8.31e-04 0.0285
## 71 0.029977 -0.017270 0.063130 1.024 2.01e-03 0.0108
## 72 -0.002921 0.020636 0.082400 1.018 3.41e-03 0.0107
## 73 0.021149 -0.006019 0.070967 1.021 2.53e-03 0.0101
## 74 -0.000799 0.018911 0.083957 1.018 3.54e-03 0.0105
## 75 0.003567 -0.003321 0.003645 1.085 6.71e-06 0.0589 *
## 76 0.010558 -0.009824 0.010794 1.084 5.89e-05 0.0583 *
## 77 0.041746 -0.037805 0.043932 1.061 9.74e-04 0.0385
## 78 -0.015004 0.014145 -0.015184 1.104 1.16e-04 0.0757 *
## 79 0.057850 -0.050380 0.064388 1.044 2.09e-03 0.0258
## 80 0.043469 -0.039805 0.045152 1.068 1.03e-03 0.0449 *
## 81 0.054801 -0.042693 0.074237 1.028 2.77e-03 0.0149
## 82 0.063646 -0.053753 0.074568 1.037 2.80e-03 0.0208
## 83 0.046576 -0.031897 0.078254 1.022 3.08e-03 0.0120
## 84 -0.020871 0.044241 0.112445 1.011 6.32e-03 0.0118
## 85 0.069769 -0.061477 0.076253 1.046 2.93e-03 0.0286
## 86 0.070833 -0.061696 0.078817 1.043 3.13e-03 0.0258
## 87 -0.036834 0.062241 0.127124 1.009 8.06e-03 0.0132
## 88 0.072733 -0.061478 0.085093 1.035 3.65e-03 0.0209
## 89 0.039738 -0.021735 0.088448 1.016 3.92e-03 0.0106
## 90 0.060290 -0.055931 0.061804 1.079 1.93e-03 0.0552 *
## 91 -0.103175 0.135097 0.189613 1.006 1.78e-02 0.0203
## 92 -0.055555 0.083960 0.148389 1.005 1.10e-02 0.0147
## 93 0.084615 -0.075297 0.091154 1.048 4.19e-03 0.0315
## 94 -0.084795 0.115961 0.175432 1.004 1.53e-02 0.0178
## 95 -0.107298 0.140313 0.196483 1.003 1.91e-02 0.0204
## 96 0.088341 -0.076397 0.099439 1.038 4.97e-03 0.0244
## 97 0.000300 0.024300 0.113877 1.006 6.47e-03 0.0105
## 98 0.093785 -0.082182 0.103373 1.041 5.38e-03 0.0272
## 99 0.096530 -0.085686 0.104362 1.046 5.48e-03 0.0307
## 100 0.070872 -0.053347 0.102034 1.019 5.22e-03 0.0138
<- 3 * 2/100
h_min h_min
## [1] 0.06
From what I have observed from the results, the result is more consistent between repeated trials and is more reflective of the true results. However, when we observe the normal Q-Q plot for all three models, we see that the points follow more of an exponential curve rather than the straight line. Therefore, our results may not be reliable to use as we cannot take the model assumption that the errors $and responses that is y are normally distributed.