source code bean

13 May, 2009

An update on the mono memory issue

Posted by: Peter In: ASP.NET| Linux| Mono

The workaround I tested a few weeks ago did not really solve my problem – actually it made it worse. The autorestart caused mono to hang and not restart at all, so my site stopped responding every 6th hour. I quickly had to disable this. Still I had the memory consumption problems. From various sources I was told that the memory issues would be fixed in the recently released mono 2.4. Also I found a bug report regarding AutoRestart, which also should have been fixed in 2.4. So I decided to give Mono 2.4 a try.
The problem was that there were no packages for Ubuntu 8.10, so I had to download the sources and build my own deb packages. I found this great blog post that describes the process of building and installing Mono 2.4.

So far it seems to be working, but it is too early to say if the memory consumption issues are resolved.

I just had to rant about this. I really like wordpress, but i think they missed one really important “usability feature”, take a look at this picture:

source-code-bean-e280ba-edit-post-e28094-wordpress_1238533603943

Why is the text edior so damn small!? Shouldn’t it be the main part of the “Add new post” page? It only covers a fraction of the screen area. Thank god for firefox extensions like Resizeable Textarea

31 Mar, 2009

mod-mono-server2 memory consumption problems

Posted by: Peter In: ASP.NET| Linux| Mono

Lately I have been experiencing that the mod-mono-server2 process running on the server hosting the video upload web-service (I blogged about this in my last post) has been consuming a lot of memory. Once, it even reached the point where all memory (1.5GB) of the server was consumed and the Linux OOM-killer killed the mod-mono-server2 process. At this point apache failed to restart it. If you are interested you can see the kernel log here and the apache log here. The mono version I am using is mono-apache-server2 1.9.1-2 (Ubuntu 8.10 Intrepid).

I googled the issue and found some information on the mod_mono page:

Under high load, mono process consumes a lot of memory, website stops responding
These symptoms have been reported, but their underlying causes are not known. Set the MonoAutoRestartMode, MonoAutoRestartRequests, MonoMaxActiveRequests, and MonoMaxWaitingRequests directives as described earlier to limit the lifetime of the mono process and to restrict the concurrency happening in the server.

The above describes my issue pretty well, except for the “under high load” part. The video transcoding service is still in beta and is only used by one customer so far. It serves around 500 requests a day, which is not a lot, so I wouldn’t expect this kind of behavior. My guess is that every time a file is uploaded using the UploadFile webmethod additional memory is allocated, but not properly released/reused by mono.

As a workaround for this I will have the mod-mono-server2 process restart restart every 6th hour. This can be done by adding the following lines to /etc/apache2/mods-enabled/mod_mono.conf: Update: This did not work, caused my site to hang entirely every 6th hour.

  1. MonoAutoRestartMode Time
  2. MonoAutoRestartRequests 00:06

Hopefully this will limit the memory usage, for now. I would appreciate feedback from anyone who have experienced similar problems on mono.

A few months ago my employer asked me if it would be possible to create a web service for encoding videos. I had been playing around with Amazon’s web services for a while, and it seemed like the perfect foundation for building this.

I decided to build the backend in Python and use ffmpeg for encoding movies. I looked into building the web service frontend in Python as well, but the SOAP libraries I could find for Python did not seem very mature or maintained. Instead I started thinking about building it in ASP.NET (I had previous experiences from building web services in ASP.NET). After some research and testing with Apache and Mono (I wanted to use Linux VMs only) I decided to develop the frontend in ASP.NET but host it on Apache.

To make the service scalable I decided to break it down into several parts and use message passing between the different parts. The parts I broke it down into are:

  • Web service frontend – what the user calls to encode a movie. Implemented in ASP.NET, hosted on Apache/Mono on Linux.
  • Encode Worker – A python process managing the encoding of videos.
  • Encode Master – Manages number of running Encode Workers. Implemented in Python.

When a movie gets uploaded to the web service frontend it gets placed into the encode queue. The encode workers periodically checks if there is anything in the queue, and if it is encodes it. The Encode Master manages the number of running Encode Workers (based on the current length of the encode queue). If the queue growes to long, we just fire up a few new VMs running the worker.

This is a schematic view of how the service has been implemented and how the different components are related to the Amazon services:

Video Encoding Service

Right now we are implementing the solution for our first customer, pretty existing I must say! In an upcoming post I will discuss the different libraries for Python and ASP.NET I used for communicating with the Amazon Web Services.

02 Mar, 2009

Parsing AJAX web pages using PyKHTML

Posted by: Peter In: Linux| Python

I needed to parse data from a series of web pages, usually i would have used CURL to download the page and then used regular expressions to extract the data i was interested in. But the page i was going to parse was using AJAX to reload part of the page (when you clicked the ‘next page’) and did not provide a unique url to that page, which made my regular method pretty useless (since to use AJAX we need a Java Script Interpreter).

This is something I consider to be a general problem with AJAX pages, JavaScript can not change the url shown in the browser and it is therefore not possible to provide unique urls for AJAX calls (i.e. the next page link). Some sites, such as facebook, has solved this quite cleverly by modifying the part of the url after the hash (#) sign, which is possible to modify from JavaScript, to give users unique links to pages. I have been planning to use the same technique on the site I am currently working on, but I have not had time to implement it yet.

A solution to the problem that seemed a bit more challenging than using CURL and regular expressions, and which would be able to handle AJAX, was to program an existing web browser to visit the page and fetch the data from the web browser itself. I had programmed for KHTML before so i looked into if this was possible and found that Paul Giannaros had solved most of my problems. He had created PyKHTML:

PyKHTML is a Python module for writing website scrapers/spiders. Whereas traditional methods focus on writing the code to parse HTML/forms themselves, PyKHTML uses the excellent KHTML engine to do all the trudge work. It therefore handles web pages very well (even the severely crufty ones) and is pretty darn fast (implemented in C++). As a bonus, the module handles JavaScript and cookies transparently. Hurrah!

As an example for this post I decided to parse a digg article to find out who digged it. To understand the could you should know that KHTML uses a event driven programming model. This is my test program:

  1.  
  2. import sys
  3. sys.path.append("..")
  4. import pykhtml
  5.  
  6. # Setting debugWithGUI to true will give us the KHTML browser in a window.
  7. pykhtml.debugWithGUI = False
  8.  
  9. def processPage(browser, currentPage):
  10.     # Check if the next button is loaded
  11.     result = list(browser.document.getElementsByClass("nextprev", "a"))
  12.     if (len(result) < 1):
  13.         print "Next button not loaded"
  14.         pykhtml.timer(0.5, pykhtml.partial(processPage, browser, currentPage))
  15.         return
  16.  
  17.     # Get next page
  18.     nextprev = result[0]
  19.     nextPage = int(nextprev[‘onclick’].split(",")[1])  
  20.    
  21.     # Wait for ajax page reload to complete
  22.     if currentPage == nextPage:
  23.         pykhtml.timer(0.5, pykhtml.partial(processPage, browser, currentPage))
  24.         return
  25.    
  26.     elif nextPage < currentPage:
  27.         pykhtml.stopEventLoop()
  28.         return
  29.  
  30.     # Get users on current page
  31.     userListClass = list(browser.document.getElementsByClass("user-list", "ul"))
  32.     userList =  list(userListClass[0].getElementsByTagName("li"))
  33.  
  34.     for user in userList:
  35.         userName = list(user.getElementsByTagName("img"))[0].attributes[‘alt’]
  36.         print currentPage, userName
  37.  
  38.     # Go to next page
  39.     nextprev.click()
  40.     processPage(browser, nextPage)    
  41.  
  42. def main():
  43.     url = "http://digg.com/linux_unix/Linux_tips_every_geek_should_know/who"
  44.     browser = pykhtml.Browser()
  45.     browser.load(url, lambda b: processPage(b, 1))
  46.     pykhtml.startEventLoop()
  47.     return
  48.  
  49. if __name__ == "__main__":
  50.     main()
  51.  

As you can see, the first thing we do is to create a PyHTML browser and loads the url.   load() takes two arguments: the url to be loaded and a function pointer to the function that should be executed when KHTML has loaded the url. To be able to provide this function a argument, I construct a lambda function.

So when the page has been loaded, processPage() is called. First we check if the page has completed loading, otherwise we wait some. When the page has completed loading, it is time to access the KHTML DOM data. PyHTML provides us quite a few nifty functions to access the DOM of KHTML, such as:

  • getElementsByClass()
  • getElementsByTagName()
  • getElementById()
  • By accessing these functions we easily get access to the data we are interested in. To go to the next page, we get the the element with of the ‘nextprev’ class, and simply ‘clicks’ it by calling nextprev.click(). Then we do a recursive call to proccessPage() and processes the next page. When the program has terminated it will have listed all people who digged the article.

    23 Feb, 2009

    Friendly URLs and the Zend Router

    Posted by: Peter In: PHP| Zend Framework

    Creating custom friendly URLs using the Zend framework is really simple. The default routing setup for Zend is : ‘:module/:controller/:action/*’, (* will match any var/value) which is fine for most setups. However on some pages having the var/value might not look very good, for example this url is not very readable:

    /popular/index/type/images/page/1/sortOrder/alltime (controller/action/var/value/var/value/var/value)

    We would prefer something like this:
    /popular/images/1/alltime

    Luckily Zend provides a very flexible router that we can configure as we want. To start with, we create a new config file called routes.ini, and adds these lines:

    1.  
    2. routes.popular.route = popular/:type/:page/:sortOrder
    3. routes.popular.defaults.controller = popular
    4. routes.popular.defaults.action = index
    5. routes.popular.defaults.type = images
    6. routes.popular.defaults.sortOrder = alltime
    7. routes.popular.defaults.page = 1
    8. routes.popular.reqs.type = \w+
    9. routes.popular.reqs.page = \d+
    10. routes.popular.reqs.sortOrder = \w+
    11.  

    routes.popular.route tells us what to match. routes.popular.defaults.* sets default values for the variables (if none are given in the url). routes.popular.reqs.* sets requirements on the variables, for example that page must be a number.

    The last thing we need to do is add a few lines to our bootstrap.php:

    1.  
    2. $config = new Zend_Config_Ini(APPLICATION_PATH . ‘/config/routes.ini’);
    3. $router = $frontController->getRouter();
    4. $router->addConfig($config,‘routes’);
    5.  



    If you want to learn more about the Zend Framework:

    Zend Framework ships with several View Helpers, such as the url View Helper. Anywhere in a view script one can call $this->url() to generate a url to a certain page. For example:

    1. $this->url(array(
    2.   ‘module’ => ‘moduleName’,
    3.   ‘action’ => ‘actionName’,
    4.   ‘additionalParam’ =>‘value’))

    I wanted to create my own helper, for generating paths to thumbnails, and it turned out to be really simple. Since I wanted to be able to access the helper from any module, I created the view helper class in /library/My/View/Helper/Thumbnail.php. The definition of the class looks like this:

    1. class My_View_Helper_Thumbnail extends Zend_View_Helper_Abstract
    2. {
    3.   public function thumbnail($id, $type, $width, $height)
    4.   {
    5.     $thumbModel = new My_Model_Thumbnails();
    6.     return $thumbModel
    7.       ->getThumbnail($id, $width, $height);
    8.   }
    9. }

    The getThumbnail() function will check if there exists a thumbnail (of size $width x $height) for the image with id $id, if one exists the path to it is returned. Otherwise the thumbnail will be generated on the fly.
    The last piece of magic before it is possible to use our new view helper is to add one line to our bootstrap.php. After Zend MVC has been started, the path to our new view helper must be added to the layout in orders for view to be able to find it.

    1. $layout = Zend_Layout::startMvc(APPLICATION_PATH . ‘/layouts/scripts’);
    2. $layout->getView()->addHelperPath(‘My/View/Helper’, ‘My_View_Helper’);

    When this is done it is possible to call $this->thumbnail() in all view scripts in the project.

    13 Feb, 2009

    Welcome to Source Code Bean!

    Posted by: Peter In: General

    So I thought I should introduce myself to my readers. My name is Peter Moberg and I am a Software Engineer from Sweden. I took my masters in computer science at Chalmers University of Technology in the spring of 2007. After this I was offered to move to the Bay Area in California (hello San Francisco!) to work for one of the bigger software companies for a year. At this company I mostly did Python scripting and some low level programming in C.

    Now I am back in Sweden working as a .NET developer for start-up consulting company in Stockholm. We are mainly concentrating on .NET development but also, to some extent, PHP and Python.

    In this blog I intend to post observations I do in my work and also some tutorial like examples that I think might be of interest to others in the same field.

    Categories

    Adwords

    Twitter Updates


      • Petan: I got in bootstrap this (insted of $frontController->getRouter()): $config = new Zend_Config_Ini(APPLICATION_PATH . '/configs/routes.ini', 'rout
      • oanh tong ngoc: :) It's usefull but could U give everyone's an example with a project source code. Thanks
      • Peter: Hi Sohaib, It seems like the rewrite module isn't loaded by IIS. Have you uploaded the UrlRewriter dlls and made the changes to web.config on the ser

      About

      Welcome to source code bean! You will find information on tips and tricks on programming languages, server side stuff, and anything that causes troubles to web development.