How to tag mp3 files

I have a collection of mp3 files which I have named in the form "ARTIST – TITLE.mp3" and wanted to get them tagged properly.
My first plan was to write a Python script to do so, I tried two Python libraries: pytaglib and eyeD3. pytaglib didn’t install, on Windows you need a Visual Studio C++ compiler installed to make it work, which I don’t have currently. pytaglib was the reason why I tried to deal with ubuntu which confronted me with lots of other problems and finally didn’t buy me anything since pytaglib also didn’t install properly on ubuntu and ran into some other compile issues.
eyeD3 installed but apparenty can not handle modern mp3 tag formats.
I also tried MusicBrainz recommend in this article "How to tag all your audio files in the fastest possible way", but its user interface is weird and didn’t get me my files tagged. And I tried the linux id3tag command mentioned in the same article, again no success, looks like it does not support latest tag formats neither.
Then I bumped into Mp3tag for Windows. Brilliant. It made it a piece of cake to tag my mp3 files through a function ‘filename to tag’ where you can specify some sort of pattern for the filenames you have been using, %Artist% – %Title%.mp3 in my case, and a few clicks later all my files have been tagged properly.
I right away donated 5 bucks to the author of this freeware tool.

IPython and lxml

I have been playing a bit with ipython and lxml these days.

IPython is a powerful and interactive shell for Python. It supports browser based notebooks with support for code, text ( actually html markup ), mathematical expressions, inline plots and other rich media. Nice intro here:

Another nice demo what you can do with ipython actually is the pandas demo video here.

Several additional packages need to be installed first to really be able to use all these features, like pandas, mathplotlib or numpy. A good idea it is to install the entire scipy stack, as described here.

I did the installation first on my windows thinkpad and later on on a Mint Linux box.

This is some work to get thru, like bumping into missing dependencies and installing those first, or try several installation methods in case of problems. Sometimes it is better to take a compiled binary, sometimes using pip install, sometimes fetching a source code package and going from there.

I finally succeeded on both my machines. Next step was to figure out how to run an ipython notebook server, because using ipython notebooks in a browser is the most efficient and fun way to work with ipython. For Windows there are useful instructions here, on my Linux Mint machine it worked very differently, working instructions I finally found here.

After that I developed my first notebook using lxml, called GetTableFromWikipedia, which basically goes out on a wikipedia page ( im my case the one about Chemical Elements ) and fetch a table from there ( in my case table # 10 with a list of chemical elements ), retrieves that table using lxml and xpath code and converts it to csv.

The nice thing about ipython is that you can write code into cells and then just re-run those cells to see results immediately in the browser. This makes it very efficient and convenient to develop code by simply trying, or to do a lot “prototyping” — which sounds more professional.

Having an ipython notebook server running locally on your machine is certainly a must for developing a notebook. But how to share notebooks with others ? I found http://nbviewer.ipython.org allowing to share notebooks with the public. You have to store your notebook somewhere in the cloud and pass the URL to the nbviewer. I uploaded my notebook to one of my dropbox folder and here we go: have a look ! Unfortunately it is not possible to actually run the notebook with nbviewer ( nbviewer basically converts a notebook to html  ).

My notebook of course works with other tables too, like the List of rivers longer than 1000 km, published in this wikipedia article as table # 5.

How Quizroom works …

Now, as promised, a few insights into how Quizroom works.
As I already explained: Quizroom auto-generates questions based on facts I have stored in its database, so there is no need to setup pre-defined questions and answers.
It is designed in a way that it allows me to keep an arbitrary number of fact tables in my database with an arbitrary number of facts. For example I have one table called facts_countries containing a list of countries with their population and rank by population ( guess who is number 1 by the way ).
The key table in the Quizroom database is the table called questions which contains the question templates, assigned to categories. For the category "Geography" for instance there is one question template which looks like this:

Question = "Which of these countries has the highest population ?"
Answer = "Country"
Criteria = "max(Population)"
Table = "facts_countries"
Category = "Geography"
Ref =
http://en.wikipedia.org/wiki/List_of_countries_by_population

There are multiple questions in category "Geography", so first thing Quizroom does is picking one randomly. Let’s assume it has picked the one shown above. This tells Quizroom to go to the facts_countries table and pick four records randomly from there. From those 4 records one is picked as the "right" answer depending on the criteria, here the one with the highest population. The question is displayed plus the four possible answers. That’s basically it.
There is a column Ref with the URL from where I have taken the data. You might have noticed that after you have answered a question a "Reference" link is shown at the bottom of the screen, so you actually can go there and verify the source of the facts used for that particular question.
The challenge now is to feed the Quizoom database with interesting facts, stored in a structured way. Wikipedia of course is a good source and some articles have a lot of tables, which make is easy to some extend to derive those structured data. I actually wrote a little Greasemonkey script to transform HTML tables into CSV files easy to import into a database.
Nevertheless, even HTML tables are hard to digest for a structured database in many cases. If for example you look at this Wikipedia article into the table of Countries you notice that for several countries footnotes have been added. This kind of disturbs the attempt to transform such a table into a structured format and requires extra data cleaning effort.
My little Greasemonkey script is just a start, may be a more powerful browser extension is needed to assist in fetching unstructured data and transforming it into useful structured data. Many facts come in format of lists with special rules, something for instance not supported by that script yet.
So much for now. If you, dear reader, know of any source in the internet with interesting facts organized in a structured way please let me know; may be these facts could become the fuel for more interesting questions in Quizroom.

Welcome to Quizroom !

My first little Python based web project "Quizroom" went live on Frihost. My original plan was to implement it using MySQL, but technical limitations on Frihost so far forced me to re-write my server code and thus for now use SQLite as a backend.

Quizroom is a little quiz game asking you multiple-choice questions in several categories I have set up so far, as we have currently:
* Geography
* Movies
* Science
Quizroom auto-generates questions based on facts I have stored in its database, so there is no need to setup pre-defined questions and answers. In a later blog posting I explain a bit more about how it works.

For now, here is the quick user’s guide:

When you start the game you first select one of the categories or "all" if you wanna play them all. Then you click on the upper center field to get started and the first question is displayed together with 4 possible answers. A timer starts running, as you see a progress bar advancing from the left to the right at the bottom of the browser window.
The sooner you answer right, the better !
The time you get is 10 seconds plus 1 second for every 30 words you have to read ( question + all answers ).
If you answer right you gain as many points as seconds were left before you would have been running into a timeout.
If you answer wrong you loose one energy point per 100 points you have scored so far, which basically means: the higher your score the more energy you loose when answering wrong.
A right answer gets you 1 energy point, up to a maximum of 20. You start with 10.
If you run out of energy game is over for you. At this time – if you made it into the high score list – you have the opportunity to leave your name ( or Frihost user name or whatever ) in the high score list. You can also click the closing "x" at the top right corner of that dialog box if you do not want to show up on the highscore list.
A push button in the right column of the screen lets you re-start the game. While the game is running you also can end it any time thru another push-button in the right column of the browser window.
That’s basically it. Give it a try and have fun playing. Click HERE to ge started.

Changing the color of selected rows in a PyGridTableBase grid

First of all I want to express what a great wx.Python extension wx.grid is. I am developing a little browser for large log files and when I first started to get this done thru native list controls I realized how slow those are when it comes to list large amount of data. Thanks to wx.grid which is basically using a very sophisticated form of a virtual list control my little browser now advanced to a real useful tool.

Documentation of everything around wx.Python is available but not in a form I would call comprehensive. Like with many other libraries software developers are using these days google.com is your friend and you need to google for solutions, search thru code repositories ( like Nullege for Python ) or user groups ( like the wxpython-users group on Google ).  Stackoverflow of course is another great source of answers.

One simple thing I could not get achieved for a couple of hours: changing the color used for selected rows. Like wx.Python list controls wx.Grid uses a dark blue for this which looks odd on my Windows 7 machine.

After gazing again at the MegaTable sample code I finally found the solution: your own row selection ( or call it highlighting ) color can be implemented by changing the Draw method of a font renderer you can use as a plugin to a grid panel.

Here is my version of the Draw method of my MegaFontRenderer using a light grey as a row selection color rather than the odd wx.BLUE ( my changes in yellow ):

    def Draw(self, grid, attr, dc, rect, row, col, isSelected):
        # Here we draw text in a grid cell using various fonts
        # and colors.  We have to set the clipping region on
        # the grid's DC, otherwise the text will spill over
        # to the next cell
        dc.SetClippingRect(rect)

        # clear the background
        dc.SetBackgroundMode(wx.SOLID)
        
        HIGHLIGHT_COLOR = (240,240,240)
        
        if isSelected:
            dc.SetBrush(wx.Brush(HIGHLIGHT_COLOR, wx.SOLID))
            dc.SetPen(wx.Pen(HIGHLIGHT_COLOR, 1, wx.SOLID))
        else:
            dc.SetBrush(wx.Brush(wx.WHITE, wx.SOLID))
            dc.SetPen(wx.Pen(wx.WHITE, 1, wx.SOLID))
        dc.DrawRectangleRect(rect)

        text = self.table.GetValue(row, col)
        dc.SetBackgroundMode(wx.SOLID)

        # change the text background based on whether the grid is selected
        # or not
        if isSelected:
            dc.SetBrush(self.selectedBrush)                
            dc.SetTextBackground(HIGHLIGHT_COLOR)
        else:
            dc.SetBrush(self.normalBrush)
            if self.background_color_index:
                idx = self.table.GetValue(row, int(self.background_color_index)-1)
                dc.SetTextBackground(COLORS[int(idx)-1])
            else:            
                dc.SetTextBackground("white")

        dc.SetTextForeground(self.color)
        dc.SetFont(self.font)
        dc.DrawText(text, rect.x+1, rect.y+1)

        # Okay, now for the advanced class 🙂
        # Let's add three dots "..."
        # to indicate that that there is more text to be read
        # when the text is larger than the grid cell

        width, height = dc.GetTextExtent(text)
        
        if width > grid.GetColSize(col) and not self.colSize:
            # width, height = dc.GetTextExtent("...")
            # x = rect.x+1 + rect.width-2 - width
            # dc.DrawRectangle(x, rect.y+1, width+1, height)
            # dc.DrawText("...", x, rect.y+1)
            grid.SetColSize(col, width)

        dc.DestroyClippingRegion()

Problem to connect to MySQL on Fedora 17

I am running Fedora 17 in a VirtualBox and after installing httpd web server and mysql-server I encountered the following problem – not initially, but later on after a restart of my VM: my web application ( written in Python and using the  mysql.connector ) wasn’t able to connect to MySQL anymore. In the error_log of my web server I saw

mysql.connector.errors.InterfaceError, referer …

at the bottom and at the beginning of the chain of errors:

cannot open Packages database in /var/lib/rpm, referer: …

Google offered 975.000 search results about the latter error message but after reading a couple of confusing articles about this I gave up to find the answer there.

I decided to run a little test Python script via console and look: it worked ! Thus apparently my Python application was just unable to connect to MySQL within my web server’s environment.

I decided to stop my httpd and start it again ensuring doing so as root user. And look again: problem solved ( for now ), my web application worked !

Python: Slicing

As I mentioned here: A sequence is a special data structure in Python where data elements can be referenced thru numbers, an index so to speak.

This hasn’t been the entire truth. Actually data elements in a sequence can be also referenced thru ranges, a pair of start and end index so to speak, separated by a colon. This would return another string when used on a string or another list when used on lists. Neither the start nor the end index has to be specified: 0 would be the default for the start index, and when no ending index is specified then the sequence will be returned up to the very end.

Negative indexing can be used to index a sequence from the right ( or from the end ). Thus –1 addresses the last element in a sequence.

Those ranges always specify the first element to retrieve and the last element not to retrieve, thus [0:2] would tell Python to get the first and second element of the sequence and stop at the third element ( the one with index 2; remember: indexing starts with 0 ). [0:-1] would tell Python to get all the elements from a sequence except the last one.

Here are a few examples done with a string:

>>> str = "We will we will rock you !"
>>> str[0]
'W'
>>> str[-1]
'!'
>>> str[0:2]
'We'
>>> str[-5:]
'you !'
>>> str[-5:-1]
'you '
>>> str[3:-5]
'will we will rock '
>>> str[:-2]
'We will we will rock you'

>>> str[:]
'We will we will rock you !'