READIN: Filtering out page reads

(This is a page from my archives)

➻

More posts about:

➻	The site
➻	Programming Projects
➻	Projects

Archives index
Subscribe to RSS

This page renders best in Firefox (or Safari, or Chrome)

READIN started out as a place for me to keep track of what I am reading, and to learn (slowly, slowly) how to design a web site.

There has been some mission drift here and there, but in general that's still what it is. Some of the main things I write about here are reading books, listening to (and playing) music, and watching the movies. Also I write about the work I do with my hands and with my head; and of course about bringing up Sylvia.

The site is a bit of a work in progress. New features will come on-line now and then; and you will occasionally get error messages in place of the blog, for the forseeable future. Cut me some slack, I'm just doing it for fun! And if you see an error message you think I should know about, please drop me a line. READIN source code is PHP and CSS, and available on request, in case you want to see how it works.

See my reading list for what I'm interested in this year.

READIN has been visited approximately 236,737 times since October, 2007.

🦋 Filtering out page reads

You get a lot of stuff in your web server log file that does not have to do with actual human reads of your site. I wrote a script that I think shows all the human page views in an Apache log file. It relies on that browsers get css stylesheets, while robots generally don't. (It will miss humans using Lynx; it could easily be tweaked to fix that enough. Also, I have seen Yahoo getting css files; you can fix that by putting "Slurp" in the list of files you're not interested in.)

grep  "blog.css" $logfile  | // get all reads 
                                of blog.css
        awk '{print $1;}' |  // extract ip address
        sort | uniq |        // only show each ip once
        grep -f - $logfile | // now pass that list 
                                of ip's back to grep
        grep " 200 " |       // only show successful reads
        egrep -v (any files you're not interested in)

I believe you could also use "favicon.ico" instead of your css file, but this is less reliable -- I don't know how often browsers request favicon for sites they have already visited. Or you could use the filename of a graphic included on one of your pages and hosted on your site, I think this would work reasonably well.

posted evening of Tuesday, November 20th, 2007
➳ More posts about The site
➳ More posts about Programming Projects
➳ More posts about Projects

Respond:

Name:

E-mail:
(will not be displayed)

Link:

Remember info

Drop me a line! or, sign my Guestbook.
•
Check out Ellen's writing at Patch.com.

What's of interest:

➪	Stanley made me think of a funny lyric.
➪	Fun creativity from John & Belle's daughters.

(Other links of interest at my Google+ page. It's recommend ed!)

`READIN`

Jeremy's journal

`READIN`

🦋 Filtering out page reads

Respond:

What's of interest:

Where to go from here...