I feel a little more like me… June 14, 2009
Posted by Alexander Mendez in Uncategorized.Tags: awake, data mining., dns, download, end of quarter, excel, graphs, hulu, images, life, opendns, plots, stats, time
trackback
I had a drink or two last night, while watching Coco on the Tonight Show. yey for hulu for time shifting. I am still quite annoyed with TW trying to get more money for the whole cable / internet tv situation. I wish that the US had separate ISP vs TV providers. I understand that the majority of Americans probably dont care, but it is annoying how much TW(and other internet providers) are trying to screw content providers on the web. The American people have footed the bill on providing a huge infrastructure to the telecoms, and they still want more. ugh.
I am feeling better today, cough is still kicking, but I feel more like me. This morning I was playing around with the data that I have been collecting off of open dns. I use to log into the opendns dashboard and download a month of daily data, but I would forget causing a few days to be missed between months, as well as only having daily data washes over alot of the nice hourly information. A few weeks back I decided to remedy this by setting up node02 to download each days hourly info.
I first needed to get through the password login. since I wanted to use wget, I found a little program that transcoded my firefox sqlite cookies into txt cookies that wget could handle. The python code that is on the website is commented in english making it understandable. OpenDNS stores a month so I first pythoned it up to download all of that first. I just used os.system to call wget. There are probably better ways, but this was quick. I had not learned about datatime yet so I just concatenated a string for the date.
#!/usr/bin/python
import sys
import os
CookiesFile = 'cookies.txt'
iYear=2009
iMonth=05
iDay=01
for Month in range(1,4):
for Day in range(1,31):
Date = '2009-%02d-%02d' % (Month,Day)
os.system("wget --load-cookies " + CookiesFile + " https://www.opendns.com/dashboard/stats/all/totalrequests/" + Date + ".csv")
After this, I decided to setup a daily download of the data. Since the previous day’s data would be complete I was going to download yesterday’s data. I learned about the datetime awesomeness, so I used it. I also wanted to be able to download a specific day in the past so I had some argv checking. I put in some hard links to where it was going to be downloaded because of the cron user that was going to be running the script. I also have it keeping a log just for the hell of things.
#!/usr/bin/python
import sys
import os
import datetime
if len(sys.argv) > 1:
num = int(sys.argv[1])
else:
num = 1
print num
CookiesFile = '/home/ajmendez/dns/cookies.txt'
OutputFolder = '/home/ajmendez/dns/Automatic'
Today = datetime.date.today()
Yesterday = Today - datetime.timedelta(days=num)
os.system("wget -c -q -P " + OutputFolder + " --load-cookies " + CookiesFile + " https://www.opendns.com/dashboard/stats/all/totalrequests/" + str(Yesterday) + ".csv")
os.system("echo " + str(Yesterday) + " >> " + OutputFolder + "/DNS.log")
So this has been running, I have about a a month and a half of data. So last night while cocoing it up with a Beam and coke, I copied it over to kobalt, and started messing with it in excel. Since my mouse died of lack of nourishment (no e- to keep it fed :\ ) I went to bed, but this morning I got around to doing some sorting and graphing.
So I started out by sorting by hour. I had a nice little to text conversion, since dates are actually large integers of some sort. Probably some count after some day. It looks like number of days after 1900. Meh. So after that did some SumIf ing to get per hour and per day plots. As you can see I get up around 8am, and then usually am on campus till around 7-8 where I get home. This happens to a bed time around 12 to 1am. sleep and then repeat.
I am not sure why, but in the daily data I seem to be more likely to be on the internet on Monday than any other day. Friday is a close second… I decided to take a look and see why. So did a sumifs to sum by day and sum by hour. giving me:
This gave me an interesting look at the data. As you can see (probably by clicking on the image) During the week I am up at the same time ~8am each day. Monday is a bit odd, since I would usually head home, and work on my cosmology paper for this past month, leading to that spike in dns requests due to searching for papers and the sort. it would decay since I needed to head to the tutorial center to teach. It would increase again since I would be burnt out and just fade to sleep by reddit and other stupidness. Friday as an interesting hump, Becuase I did not have to be on campus untill noon for office hours, I would usually just play it relaxed untill then. More reciently since I would be working in the library with the other physics grad students, I would push to get out of the hour earlier and i think that is why there is a small decay before the huge dropoff. I am pretty sure that the Huge hump Saturday is mainly due to this last saturday. If I look at the Amount per each weekend as a function of time, it seems at the largest contribution comes from the saturday before finals week, when i was finishing up my cosmology paper. interesting. but it seems that sundays and saturdays seem to be later on average than weekdays. Plotting this:
We can see that I seem to get up later on weekends, and I am at home more on the weekends than weekdays. I am also awake later during the weekends than week days. I really wished I had a whole quarter worth of data, as well as the other quarter to see how my sleep pattern has changed.
This data might be thrown off becuase of my apartment mate, or anything happening at home, but I doubt that would change much. I have my netlimiter data, which I think will be much more interesting. It will show when I download episodes and the watch hulu and the sort. but the smaller pdfs and random reddit pages will be washed out.



Comments»
No comments yet — be the first.