Yesterday during another boring phone call I googled for “fun python packages” and bumped into this nice article: “20 Python libraries you can’t live without“. While I already knew many of the packages mentioned there one caught my interest: Scrapy. Scrapy seems to be an elegant way not only for parsing web pages but also for travelling web pages, mainly those which have some sort of ‘Next’ or ‘Older posts’ button you wanna click through to e.g. retrieve all pages from a blog.

I installed Scrapy and ran into one import error, thus as mentioned in the FAQ and elsewhere I had to manually install pypiwin32:

pip install pypiwin32

Based on the example on the home page I wrote a little script to retrieve titles and URLs from my German blog “Axel Unterwegs” and enhanced it to write those into a Table-Of-Contents type HTML file, after figuring out how to overwrite the Init and Close method of my spider class.

import scrapy
header = """
<meta content='text/html; charset=UTF-8' http-equiv='Content-Type'/>
footer = """

class BlogSpider(scrapy.Spider):
 name = 'blogspider'
 start_urls = ['']
 def __init__(self, *a, **kw):
   super(BlogSpider, self).__init__(*a, **kw)
   self.file = open('blogspider.html','w')

 def parse(self, response):
   for title in response.css(''):
     t = title.css('a ::text').extract_first()
     url = title.css('a ::attr(href)').extract_first()
     self.file.write("<a target=\"_NEW_\" href=\"%s\">%s</a>\n<br/>" % (url.encode('utf8'),t.encode('utf8')))
     yield {'title': t, 'url': url}

   for next_page in response.css(''):
     yield response.follow(next_page, self.parse)
 def spider_closed(self, spider):

Thus, here is the TOC of my German blog.

I tried to get the same done with my English blog here on WordPress but have been struggling so far. One challenge is that the modern UI of WordPress does not have any ‘Older posts’ type of button anymore; new postings are retrieved as soon as you scroll down. Also the parsing doesn’t seem to work for now, but may be I figure it out some time later.




A better Tumblr share confirmation

Typically when you post something to Tumblr you get a confirmation like this:

A while ago I thought at least I would like to also see a link to my blog so that I right away can head to it to watch my new posting. Thus I wrote a little Greasemonkey script “Gimme link to my Tumblr  blog” to achieve this:

Today I got the idea that after having shared something Tumblr should take me right away to my dashboard. There I right away can see my own posting ( and that would be the best sort of confirmation I can get ) and if I have a bit spare time I right away could explore what others have posted in the mean time.

A while ago I wrote another Greasemonkey script “URL Redirects” to handle that sort of thing and after adding a new rule to re-direct from to I easily achieved changing the behavour of Tumblr how to confirm my sharing.

2011 in review

The stats helper monkeys prepared a 2011 annual report for this blog.

Here’s an excerpt:

The concert hall at the Syndey Opera House holds 2,700 people. This blog was viewed about 20,000 times in 2011. If it were a concert at Sydney Opera House, it would take about 7 sold-out performances for that many people to see it.

Click here to see the complete report.

My favorites for week 16, 2011

Big GrinSomething to laugh: my favorite comic strip of the weekabout Eastern

Happy Easter Weekend !

  Something to enjoy: my favorite photo  on flickr under a Common Creative licenseabout snakes

"4" Night Snake" by jbviper1.

Jerry B. aka jbviper1 has a nice collection of snake photos in his photo stream. Look at that beautiful Night Snake. Well, beautiful in some sense; I wouldn’t want to run into it actually, since I am a bit scared of snakes.

Something to talk about: my favorite quote of the weekabout writing

It is not a bad idea to get in the habit of writing down one’s thoughts. It saves one having to bother anyone else with them.

… unless you write them down in your blog Big Grin.

Embedding youtube videos now works on WordPress

To my surprise – I must have missed any announcement about this new feature – embedding youtube videos now works on WordPress. Sweet ! Embedding youtube videos usually uses some code like this …


   1: <iframe class="youtube-player" 
   2: title="YouTube video player" 
   3: height="390" 
   4: src="" 
   5: frameborder="0" 
   6: width="640" 
   7: allowfullscreen 
   8: type="text/html">
   9: </iframe>

… and iframes haven’t been allowed in WordPress, as I learned through this discussion; probably still are not allowed.

Nevertheless, when pasting some youtube video embed HTML code into my blog posting it now is converted to something like this …

… and works nicely, as you can see for example here.

2010 in review

The stats helper monkeys at mulled over how this blog did in 2010, and here’s a high level summary of its overall blog health:

Healthy blog!

The Blog-Health-o-Meter™ reads Fresher than ever.

Crunchy numbers

Featured image

A helper monkey made this abstract painting, inspired by your stats.

A Boeing 747-400 passenger jet can hold 416 passengers. This blog was viewed about 13,000 times in 2010. That’s about 31 full 747s.


In 2010, there were 54 new posts, growing the total archive of this blog to 166 posts.

The busiest day of the year was February 2nd with 79 views. The most popular post that day was Planing by the hour in MS Project.

Where did they come from?

The top referring sites in 2010 were,,,, and

Some visitors came searching, mostly for not a code reference at, not a code reference, perl not a code reference, alexander supertramp, and ms project hours.

Attractions in 2010

These are the posts and pages that got the most views in 2010.


Planing by the hour in MS Project June 2007


Perl Error “Not a CODE reference …” March 2009


So you wanna print out your blog ? May 2009


Windows XP under Windows Vista – with VirtualBox 2.0 September 2008


My third experience with OpenOffice V3 Macros March 2009

So you wanna print out your blog ?

Or you need a table of contents ? Or just part of your blog  or a table of contents for some posts in your blog for a particular series of blog posts you have been writing ? And you want that table of contents or printout sorted by date ascending ( oldest entry first ) instead of descending which is the default for all blog engines ?

Blog engines like Blogger or WordPress make it very convenient to blog and provide lots of useful functions, but when it comes to create a printout of your blog – or let’s say a pdf file – or present blog articles in a chronological order instead of reverse chronological as it is the default with all blog engines ( latest entry on top ) then you have to use some workarounds or manual work to achieve this. Printing usually gets messy due to the frame work the blog provider has put around your content,  the left and right side bars appear somewhere in your printout and get in your way.

Alternate hosting sites:

I have developed a tool currently hosted on HelioHost allowing to import a blog export file from either your WordPress or Blogger blog to easily get a table of contents or blog printout or pdf file.

In my wordpress blog here I have a series of blog posts about my trip through the Northwest of the USA in 2006. Creating a chronological table of contents or pdf version of all these articles now is a piece of cake with the help of my tool “Axel’s Blogs Export XML Parser”. Here it is.

How to use it ?

First of all an export file has to be created from a WordPress or Blogger blog. In WordPress this can be done by going to the dashboard, then selecting Tools – Export. A XML file will be produced and stored on your computer.

Now head to my tool and first click on “Browse…” to specify the location of this XML file on your computer, then click on “Process” to upload the file. The file will be kept only temporarily on my server, after 24 hours latest it will be deleted. A list of all blog entry titles will be shown in the sequence as they appear in your blog.

Now you can re-sort or filter as you wish:

  • Check on “Sort by date ?” to sort blog articles by date.
  • Check on “Sort by title ?” to sort blog articles by title.
  • Check on “Sort descending ?” to do any of  both sorts descending.
  • Specify a search term in “Search Title:” to filter your blog posts by post title.
  • Alternatively or in addition select one or more tags in the listbox below. Use the radio button “any” or “all” to decide whether any of the tags selected or all need to be assigned to the blog posts you want to filter out.
  • Once you have made your choices click on “Process” again.

If you are satisfied with your selection you now have a list of titles with underlying links to these blog posts in front of you in the right frame of the tool. If you view the source code of this frame ( In Friefox do a right-click, then select This Frame –> View Frame Source ) you get the HTML code for your table of contents.

To produce the full output including all the posts content check on the “Full Content ?” check box and click “Process” again. A full print out of your blog posts will appear in the right frame. You can now either print it or use a tool like FreePDF to create your pdf file of these articles.