Python Expression Evaluator Version 2

I have extended my Python Expression Evaluator (alternate link) )a bit, adding the main feature to support regular expressions . Regular expressions need to be typed in between slashes and need some input text to be applied to, that’s why now a second input field has been added to the user interface.

Regular expressions are very well supported in Python ( like in Perl ), thus not much additional code is needed to support those:

 1: ...
 2:     if expr[0] == "/":
 3:         m = re.match("\/(.*)\/",expr)
 4:         if m:
 5:             expr2 = m.group(1)
 6:             m = re.match(expr2,input)
 7:             if m:
 8:                 for i in range(len(input)):
 9:                     if i in range(m.start(0),m.end(0)):
 10:                         char = "<strong>%s</strong>" % input[i]
 11:                     else: char = "%s" % input[i]
 12:                     for j in range(1,len(m.groups())+1):
 13:                         if i in range(m.start(j),m.end(j)):
 14:                             char = "<span class='highlighted'>%s</span>" % char
 15:                     output += char
 16:
 17:     else:
 18: ...

The Python code above first checks whether input contains a regular expression ( starting with a forward slash ). The first regular expression evaluation (re.match(…; re is the name of the Python module for regular expression supported to be imported at the beginning of the program ) is to get the regular expression itself in between the two slashes, the second regular expression evaluation actually evaluates that regular expression. m is the name of the object returned by the evaluation, having some useful attributes:

  • m.group(0) contains the matching part of the input
  • m.start(0) contains the starting position of the matching part of the input
  • m.endt(0) contains the ending position of the matching part of the input
  • m.groups() is a list of groups defined in the regular expression to extract part of the input; defined in form of round brackets within the regular expression
  • m.group(n) with n > 0 is content of group n
  • m.start(n) with n > 0 is the starting position of group n
  • m.end(n) with n > 0 is the ending position of group n

Those useful object attributes help to do what I attempt to do with that code: highlight those characters in my input string being in the overall match or in any of the groups: the first case is indicated by a bold font, being contained in a group by yellow background color. Thus my for loop

 1: for i in range(len(input)):

iterates over the input provided character by character.Then I analyze whether that character is contained in the overall match to put some HTML “strong” tags around it. Then I analyze for each group returned …

 1: for j in range(1,len(m.groups())+1):

… whether the character is contained in a group and give it a yellow background in that case. To do this I use a class called “highlighted” defined in my CSS file for this little application:

 1: .highlighted { background: yellow; }

And here it is (or here) : version 2 of my Python Expression Evaluator supporting regular expressions

A Python Expression Evaluator

To start going with my first little Python based web application here I came up with Python Expression Evaluator. What is does ? The name says it all: it evaluates Python expressions, which the user can enter into a form and send to the server where this little 25-liner does its work and returns the result plus all the HTML code to render it nicely on the user’s screen:

It allows the user to type in any type of Python expression, like e.g.

  • 80 / 4, or any other type of basic calculation, thus we can use it as a calculator
  • len("Hellow World!") – we can use it to compute the length of a given string
  • "Hello World".count("o") to find out how often a particular character shows up in a given string
  • … and many more ( ideas ? )

Here is the code:

   1: #!/usr/bin/python
   2:  
   3: import cgitb; cgitb.enable()
   4:  
   5: import cgi
   6: form = cgi.FieldStorage()
   7:  
   8: expr = form.getvalue('expr', None)
   9:  
  10:  
  11:  
  12: if expr != None:
  13:     try: output = expr + " => " + str(eval(expr))
  14:     except Exception,e: output = "<font color=\"red\">%s => %s</font>" % (expr,e)
  15: else: output = ""
  16:  
  17: print '''Content-type: text/html
  18:  
  19: <html>
  20:   <head>
  21:     <title>Python Expression Evaluator</title>
  22:   </head>
  23:   <body>
  24:     <h1>Python Expression Evaluator</h1>
  25:     <div>%s</div>
  26:     <br>
  27:     <form action='py_eval.py'>
  28:     Expression <input type='text' name='expr' />
  29:     <input type='submit' />
  30:     </form>
  31:   </body>
  32: </html>
  33: ''' % output 

Let’s decipher what it does:

  1. Let the script know we are using Python code
  2. Import cgitb module and enable CGI Tracebacks to nicely show error messages on the screen rather than in the web server log file. Not really needed here since my little script basically catches all sorts of exceptions, as we will see in a minute. Thanks to Magnus Lie Hetland and his great book "Beginning Python: From Novice to Professional, Second Edition" for this tip !
  3. Import cgi module, mainly used to retrieve values sent to the server
  4. Implement the cgi FieldStorage to retrieve values sent to the server
  5. Evaluate the expression sent to the server. With the help of exception handling all possible exceptions are handled and translated to a message ( variable output ) sent back to the user
  6. Generate the HTML for the user frontend and insert the output message; either the output from the eval() or the exception message.


Try the expression ("10/0") to see how Python’s exception handling catches that error. If I would remove my own exception handling from the code and change it from

   1: if expr != None:
   2:     try: output = expr + " => " + str(eval(expr))
   3:     except Exception,e: output = "<font color=\"red\">%s => %s</font>" % (expr,e)
   4: else: output = "" 

to just

   1: output = expr + " => " + str(eval(expr)) 

then the cgitb module would kick in and transform the unhandled exception into a message shown in the browser, like here for example:

Nice, so far.

What enhancements can we think of to enhance this little tool ? Here are my ideas, any more to come ?

  1. Support multi-line Python code
  2. Support regular expressions
  3. Support expression storage & retrieval ( partially works thru your browser; try to hit the Down while entry field has focus )
  4. Support a more dynamic user interface
  5. 5. … ?

Running a sub process from a Python script

Python is such a powerful language that “Python scripts” actually deserve to be called “Python programs” or “Python applications”, but nevertheless, let’s go with this title for todays’s blog posting: “Running a sub process from a Python script”.

Currently I am writing test scripts for one of our storage products and thus apparently I have the need to invoke commands from my Python script, capture the output and look at the return code.

There are various ways in Python how to do this. One way is popen which allows to run an external command in a sub process and return its output in form of a file like stream, so that it can be read and analyzed any further ( see this easy example ). Nevertheless, there are many popens available for Python, from the standard popen provided by the os module to several variations in other modules like popen2 and subprocess, called popen, popen2, popen3, popen4.  A bit confusing for a Python newbie, especially since I had come up with some specific requirements:

  1. I wanted to capture stderr as well – the file stream for error messages – and actually treat it like the standard output stream so that in case of an error the error messages are displayed where usually the output is displayed. And depending on my future test scenarios I probbaly have the need to analyze the error message stream as well.
  2. I need to capture the return code from the command executed in the sub process.

After working on this for 2 to 3 hours and exploring all the possibilities I got to the point where I thought I either can have 1 or 2, but not both. Until I figured out this solution based on the popen2 module:

   1: import popen2
   2: # ...
   3: cmd = "ping bla"            # No, I am not using foo here ;-) 
   4: f = popen2.Popen4(cmd)
   5: while True:
   6:     try: line = f.fromchild.readline()
   7:     except IOError: break
   8:     if not line: break
   9:     # ...
  10: rc = f.poll()
  11: f.fromchild.close()

Popen4 from the popen2 module by default combines stderr and stdout into one stream, thus no need to handle both streams. This implementation actually seemed to work nicely, except … I got a “depreciation” warning about the popen2 module when importing it saying basically that this module will go away in the future and will be replaced by the subprocess module.

Thus I had to continue my research and find a working solution based on the subprocess module. Here it is:

   1: import subprocess
   2: # ...
   3: cmd = "ping blabla"                 # Still not using foo, but never mind ;-) 
   4: f = subprocess.Popen(cmd, shell=True, stderr=subprocess.PIPE, stdout=subprocess.PIPE)
   5: while True:
   6:     line = f.stderr.readline()
   7:     if not line: break
   8:     m = re.match("shell-init",line)        # Ignore shell-init errors
   9:     if not m:
  10:         # ...
  11: while True:
  12:    line = f.stdout.readline()
  13:    if not line: break
  14:    # ...
  15: rc = f.poll()
  16: f.stderr.close()
  17: f.stdout.close() 

Here I get two message streams, one for stdout and one for stderr to be handled independently. One hurdle is that I get strange “shell-init” errors always in my stderr stream, wheter or not the command itself finishs successfully. The other observation to make: the return code obtained by using this implementation is different than the one I get from the first implementation. If the command is okay – in my case I ping an existing network address, I get 0 in both cases. If I ping some non-existing network address ( like “foo”, “bla” or “blabla” ) I get a return code 512 with the first code snippet and a return code of 2 with the second alternative.

Even I have a solution now working quiet well I guess there is still more to explore for me.

File permissions may be fouled up on web server …

Sometimes it happens to me that after I have changed a CGI script and FTPed it over to my web server the script won’t run because it has lost its original permission settings, especially it has lost its “executable for all” file permission. The problem is: I can’t define “sometimes” more precisely. Sometimes I have to change the file permission after FTP has finished transferring the file, sometimes not. I can’t spot a pattern nor discover a fix for this. Somehow I got used to this problem and fixing file permissions became a default activity after I have transferred a file over to my web server. I even stopped wondering whether I am the only one having that problem and possibly overlooked some basic thing to avoid this, or whether this is a more common problem.

Today I have been reading this in chapter 15 of the book “Beginning Python: From Novice to Professional, Second Edition” by  Magnus Lie Hetland:

Tip: Sometimes, if you edit a script in Windows and it’s stored on a UNIX disk server (you may be accessing it through Samba or FTP, for example), the file permissions may be fouled up after you’ve made a change to your script. So if your script won’t run, make sure that the permissions are still correct.

It always feels good if you discover that you are not alone with a weird problem you have. Apparently this really seems to be a more common hiccup happening somewhere between Windows and Linux systems. Good to know.

Python IndentationError

Python is the first programming language I encounter where code indentation is more than a cosmetic issue or a means to make code more readable. In Python code indentation  actually replaces the “DO-END”s or curly brackets used in other languages to define code blocks, for example after an if-statement.

What looks like an elegant way to type less code introduces a new type of error I never have seen in any other programming language: the IndentationError, like this one: IndentationError: unexpected indent

Python requires one level of indentation to be four space characters ! Usually when using a text editor I ( and probably most programmers ) perform indentation conveniently by using the TAB key. Nevertheless, a tabulate character is not necessarily the same than four space characters, as a fact it is not.

When writing my first little Python program using my favorite code editor Notepad++ I was safe as long as I did not use more than one line of code in a code block. As soon as I started to use larger code blocks I suddenly bumped into this error and had to learn that lesson: a tab character is not equivalent to four space characters. Unless you tell your editor to convert a tab character to four space character, like it can be done in Notepad++ for the Python language, under Preferences –> Language Menu/Tab Settings:

After checking on “Replace by space” I seem to have that problem under control. I just wonder why this hasn’t been a default setting.

How to keep track of changes in a MS Excel spreadsheet

Did you know that you can keep track of changes made in a MS Excel spreadsheet, thus basically create an audit trail of changes made to a particular range in a particular worksheet ?

Here is how to do it. I have created a little sample spreadsheet listing some products with a price in a worksheet called “Products”. In a change history I want to see what product price was changed to what value when.

In order to do this I first create an extra worksheet named “ChangeHistory”. I define three columns: “Product”, “Price”, “Timestamp”.

Now I write some Visual Basic code for the change event of my “Products” worksheet. The easiest way to invoke the code editor properly is to do a right-click on the tab of my worksheet “Products”, then select “View Code”. In the appearing code editor I select “Worksheet” in the left drop down and “Change” in the right drop down. This takes me to a sub routine called “Worksheet_Change” into which I type in the following code:

   1: Dim AuditRecord As Range
   2: ' This is our change history ...
   3: Set AuditRecord = Worksheets("ChangeHistory").Range("A1:B65000")
   4: r = 0
   5: ' Now find the end of the Change History to start appending to ...
   6: Do
   7:    r = r + 1
   8: Loop Until IsEmpty(AuditRecord.Cells(r, 1))
   9: ' For each cell modified ...
  10: For Each c In Target
  11:   Value = c.Value
  12:   Row = c.Row
  13:   ' ... update Change History with value and time stamp of modification
  14:   AuditRecord.Cells(r, 1) = Worksheets("Products").Cells(Row, 1)
  15:   AuditRecord.Cells(r, 2) = Value
  16:   AuditRecord.Cells(r, 3).NumberFormat = "dd mm yyyy hh:mm:ss"
  17:   AuditRecord.Cells(r, 3).Value = Now
  18:   r = r + 1
  19: Next

Note:

  • Target is the range of changed cells as an input parameter to this sub routine,
  • Line 14 needs to be modified for a different worksheet name; here I grab the value from column 1 of my changed range as a label ( here: product ) of the item changed.

And here is how it works. Suppose we have the following initial list of products:

Now we make the following changes:

  1. We copy the price for product B to C and D
  2. We change price for product G to $ 4.100.

Thus we end up with this list:

If we check out our change history it reflects nicely what has been changed to what new value when:

Beware of the g flag when using regular expressions in javascript …

The following experience with regular expressions in Javascript has cost me a few hours of my life, thus it is probably worth to share it. This is bascially about the “g” flag ( described for instance in this tutorial here ): The “g” flag is the “global search flag” and is supposed to search for a pattern throughout the entire string given. Nevertheless, apparently this creates the situation that you can use a regular expression only once in your code. Look at this code sample:

  1:   regex = /Hello/g;
  2:   text = "Hello World!";
  3:   document.getElementById("my_output").innerHTML = "<p>Testing string \"" + text + "\" with regex <strong>" + regex + "</strong> : " + regex.test(text) + "</p>";
  4:   document.getElementById("my_output").innerHTML += "<p>Testing string \"" + text + "\" with regex <strong>" + regex + "</strong> : " + regex.test(text) + "</p><hr>"; 

What would you think is the output from the last two statements ? “True” and “true”, you think, since it looks like the test should be positive and those two statements are exactly the same ? If this is your answer then I can tell you it has been my expectation as well. Nevertheless: the answer is wrong. The output will be “true” in the first case, but “false” in the second case. Somehow the regular expression seems to work only once.

You can test it out with my little sample script here ( just click on the “Test…” button ).

When omitting the “g” flag you get a “true” in both cases.

How to overcome a major Ajax limitation ….

Call it limitation, call it security means: it’s usually one and the same: a security means on one side but an annoying limitation on the other side. Wouldn’t it be nice if you could just step into your house without having to search for your keys ? Wouldn’t it be nice if you just could open up your e-mail or enter any other application without having to remember any password ? Wouldn’t it be nice if you just could insert your credit card into a teller machine and get your money spit out without the extra step to recall and type in your pin code through this sticky keyboard ?

Well, that’s not how it works. The world out there is evil and not all people are good guys, that’s why we need security, also in the area of information technology.

Ajax – the powerful technique to dynamically add content to your web page – has security means aka limitations as well: you can not actually pull data from a different server behind the scenes, only from your own. As Steven Holzner wrote in chapter 3 “Creating Ajax Applications” in his book “Ajax: A Beginner’s Guide”:

However, here’s one thing to note: if the URL you connect to, such as http://www .starpowder.com/data.php, and the Ajax-enabled page (ajax.html here) that’s attempting to download that URL are on different servers, you’re going to have a security problem. If your Ajax-enabled page attempts to download data behind the scenes from a different server, your browser is going to suspect that something underhanded is going on, and will ask permission from the user, via a dialog box, before proceeding.

I actually noticed then when for instance using Ajax through jQuery ( doing a $.get or $.post call ) accessing data from a different server does not work at all, I even do not get any dialog displayed by my browser. This might be related to some security settings in my browser ( Firefox it is in this case ) or the fact that I use jQuery to do an Ajax request. When using Firebug to debug my request I see that it turns red and shows a 200 return code. 200 actually would mean everything is OK, but the red color indicates that it is not. Anyhow, I don’t get any data from this request.

To overcome this limitation some server side programming is needed to actually let some code on your server pull data from a different server and then send it to your browser side application. I have written a very simple server using Perl:

   1: #! c:\perl\bin\perl.exe
   2: # #!/usr/bin/perl
   3:  
   4: use LWP::Simple;
   5:  
   6: printf "Content-type: text/html\n\n";
   7:  
   8: foreach $a (@ARGV) {
   9:             my $html = get($a) or die $!;
  10:             print $html;
  11: }
  12:  

If my jQuery $.get call now calls this perl script and passes an URL of the page I actually want to access to this perl script everything works fine. Steven Holzner  has published some php code in his book in chapter 4 to do the very same thing.

<wbr> tags irritate HTML::TreeBuilder

This has cost me a few hours of my life. I have been using the perl module HTML::TreeBuilder and experienced a very weird behavior: it simply retrieved incomplete HTML code when using the look_down function in some cases ( I am using it to retrieve blog articles from a blog ) and it looked like this happened in cases when the retrieved HTML code contained nested tables, something like …

   1: <table>
   2: <tbody>
   3: <tr>
   4: <td>
   5: ....
   6:     <table>
   7:     <tbody>
   8:     <td> 
   9:     ....
  10:     </td>
  11:     </tbody>
  12:     </table>
  13: ...
  14: </td>
  15: </tr>
  16: </tbody>
  17: </table>

After a few hours of investigation and trying this and that I noticed this suspicious <wbr> tag in my HTML code and learned it is used to indicate to the browser that it might insert a word break if it wishes.

I decided to get rid of it and changed my perl codes as follows:

   1: my $page = get($url) or die $!;
   2: # Need to get rid off <wbr> tags; they confuse HTML::TreeBuildr and cause incomplete HTML code retrieved
   3: # especially in case of nested tables.
   4: $page =~ s/<wbr>//g;
   5: my $tree = HTML::TreeBuilder->new_from_content($page);

And bingo – my problem is gone !

Dojo, Ajax and JSON

In the book “Learning Dojo” by Peter Svensson Ajax is covered in chapter 4 “Dojo Ajax Feature” and a common way to send an asynchronous request to a server is this one:

   1: var x;
   2: dojo.xhr("Get",
   3:             {
   4:                 url:"/yourserver/server.php",
   5:                 handleAs: "text",
   6:                  load: function(data, ioArgs)
   7:                  {
   8:                     x = data;           
   9:                  }
  10:               });
  11: console.log("x == "+x);

Details about the XMLHTTP request object ( abbreviated XHR ) can be found here. Besides “Get” of course “Post”, “Put” and “Delete” can be used as well. The difference between “Get” and “Post” is, as far as I understood from this book, that “Post” sends parameter to the server in a more hidden way, as part of the document sent to the server, while with “Get” parameters are added to the URL, thus become visible and of course easier to hack.

Anyway, my focus in this blog posting is on the “handleAs” attribute of the dojo.xhr call. The terms “XMLHTTP” or “Ajax” suggest – since the “x” in “Ajax” stands for XML as well – that data is exchanged between client and server in XML format. Which is one possible way to do it, but not the only one. The example above shows a second way: exchanging simple (unformatted) text.

A third option is using JSON ( and there are some variants like ”json-comment-optional” or “json-comment-filtered”, see here ) and I have read comments from many people saying that parsing JSON data is faster than parsing XML. Since JSON stands for JavaScript Object Notation it is easy to imagine that this might be the right thing to do when using Javascript.

I spent an hour or so to figure out how this works, since even it has been mentioned in this book I couldn’t find any example. Key is to specify

   1: handleAs: “json”

as a parameter to the dojo.xhr call. So much to do on the client side, except later on dealing with the returned Javascript objects. Interesting is to find out what to do on the server side, when using php for instance.

What I did in my trial was to setup an array in php to be returned to the client. Thus, part of the server code looks like this, and key to success was to use the json_encode function on the array returned:

   1: // ...
   2: // Defining the array ...
   3: $ret_array = array();
   4: // Populating the array ...
   5: $ret_array["id"] = 101;  
   6: $ret_array["text"] = “Hello World!”;  
   7: // Returning the array to the client ...
   8: echo json_encode($ret_array);        
   9: // ...

And here is what the client gets. The following client side code  …

   1: dojo.xhr("Get",
   2:             {
   3:                 url:"/yourserver/server.php",
   4:                 handleAs: "json",
   5:                  load: function(data, ioArgs)
   6:                  {
   7:                     console.log("Server returned: "+data);
   8:                     console.log(data.id);
   9:                     console.log(data.text);
  10:                     console.dir(data);          
  11:                  }
  12:               });

… produces this output in the console window:

   1: Server returned: [object Object]
   2: 101
   3: Hello World!
   4: id        101
   5: text      "Hello World!"  

Follow

Get every new post delivered to your Inbox.