<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>source code bean &#187; Python</title>
	<atom:link href="http://sourcecodebean.com/archives/category/python/feed" rel="self" type="application/rss+xml" />
	<link>http://sourcecodebean.com</link>
	<description>thoughts and ideas from a .net developer</description>
	<lastBuildDate>Sat, 26 May 2012 09:45:20 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>Playing with WSGI in Python (part 1)</title>
		<link>http://sourcecodebean.com/archives/playing-with-wsgi-in-python-part-1/280</link>
		<comments>http://sourcecodebean.com/archives/playing-with-wsgi-in-python-part-1/280#comments</comments>
		<pubDate>Mon, 10 Aug 2009 09:00:13 +0000</pubDate>
		<dc:creator>Peter</dc:creator>
				<category><![CDATA[Linux]]></category>
		<category><![CDATA[Python]]></category>
		<category><![CDATA[Web]]></category>

		<guid isPermaLink="false">http://sourcecodebean.com/?p=280</guid>
		<description><![CDATA[Lately I have been playing around with WSGI in Python. WSGI is an interface between the web server and the web application, it is meant to simplify writing your own web framework in Python. My intention is not to write an fully fledged web framework, but rather just play around with some ideas I have [...]]]></description>
			<content:encoded><![CDATA[<p>Lately I have been playing around with WSGI in Python. WSGI is an interface between the web server and the web application, it is meant to simplify writing your own web framework in Python. My intention is not to write an fully fledged web framework, but rather just play around with some ideas I have and try it out.</p>
<p><span id="more-280"></span><br />
In WSGI an application is just a callable object (remember that in Python functions are objects too!) that takes two parameters, environ and start_response. A very simple application could look like this:</p>
<div class="dean_ch" style="white-space: wrap;">
<ol>
<li class="li1">
<div class="de1">&nbsp;</div>
</li>
<li class="li1">
<div class="de1"><span class="kw1">def</span> application<span class="br0">&#40;</span>environ, start_response<span class="br0">&#41;</span>:</div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp;start_response<span class="br0">&#40;</span><span class="st0">&#8217;200 OK&#8217;</span>, <span class="br0">&#91;</span><span class="br0">&#40;</span><span class="st0">&#8216;content-type&#8217;</span>, <span class="st0">&#8216;text/html&#8217;</span><span class="br0">&#41;</span><span class="br0">&#93;</span><span class="br0">&#41;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp;<span class="kw1">return</span> <span class="br0">&#91;</span><span class="st0">&#8216;Hello world!&#8217;</span><span class="br0">&#93;</span></div>
</li>
<li class="li2">
<div class="de2">&nbsp;</div>
</li>
</ol>
</div>
<p>Not very exiting but it shows us the very basics of WSGI. The function start_response is a function that, as the name implies, starts sending out the response. This is where you give status and headers. Lastly the application returns an iterator with the body response (usually a list of strings or a list containing one string that is the entire body). As you can see WSGI lets code pass around web request in a fairly formal way.</p>
<p>To run this application you can either install Apache and configure it with modwsgi, this might however be slightly overkill just in order to test the application. Instead I recommend installing Python Paste (pythonpaste.com), which is kind of a framework for web frameworks. It includes a lot of functionality that can be reused, but more importantly right now, it includes a simple web server that can serve WSGI applications. In Ubuntu 9.04 (the operating system I am currently running) Paste can be installed using apt:</p>
<div class="dean_ch" style="white-space: wrap;">
<ol>
<li class="li1">
<div class="de1">apt-get install python-paste</div>
</li>
</ol>
</div>
<p>To run the application, add the folowing lines to the end of your source file:</p>
<div class="dean_ch" style="white-space: wrap;">
<ol>
<li class="li1">
<div class="de1">&nbsp;</div>
</li>
<li class="li1">
<div class="de1"><span class="kw1">if</span> __name__ == <span class="st0">&#8216;__main__&#8217;</span>:</div>
</li>
<li class="li1">
<div class="de1">&nbsp; <span class="kw1">from</span> paste <span class="kw1">import</span> httpserver</div>
</li>
<li class="li1">
<div class="de1">&nbsp; httpserver.<span class="me1">serve</span><span class="br0">&#40;</span>application, host=<span class="st0">&#8217;127.0.0.1&#8242;</span>, port=<span class="st0">&#8217;8000&#8242;</span><span class="br0">&#41;</span></div>
</li>
<li class="li2">
<div class="de2">&nbsp;</div>
</li>
</ol>
</div>
<p>
Now it should be possible to run the application:</p>
<div class="dean_ch" style="white-space: wrap;">
<ol>
<li class="li1">
<div class="de1">&nbsp;</div>
</li>
<li class="li1">
<div class="de1">$ python appserver.py
</div>
</li>
<li class="li1">
<div class="de1">serving on http://127.0.0.1:8000
</div>
</li>
<li class="li1">
<div class="de1">&nbsp;</div>
</li>
</ol>
</div>
<p>Your web application is now up running! In the next part i will introduce you to a slightly more exiting application.</p>
]]></content:encoded>
			<wfw:commentRss>http://sourcecodebean.com/archives/playing-with-wsgi-in-python-part-1/280/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Building a distributed web service using Amazon Web Services</title>
		<link>http://sourcecodebean.com/archives/building-a-distributed-web-service-using-amazon-web-services/93</link>
		<comments>http://sourcecodebean.com/archives/building-a-distributed-web-service-using-amazon-web-services/93#comments</comments>
		<pubDate>Sun, 22 Mar 2009 18:18:08 +0000</pubDate>
		<dc:creator>Peter</dc:creator>
				<category><![CDATA[Amazon Web Services]]></category>
		<category><![CDATA[ASP.NET]]></category>
		<category><![CDATA[Linux]]></category>
		<category><![CDATA[Mono]]></category>
		<category><![CDATA[Python]]></category>

		<guid isPermaLink="false">http://sourcecodebean.com/?p=93</guid>
		<description><![CDATA[A few months ago my employer asked me if it would be possible to create a web service for encoding videos. I had been playing around with Amazon&#8217;s web services for a while, and it seemed like the perfect foundation for building this. I decided to build the backend in Python and use ffmpeg for [...]]]></description>
			<content:encoded><![CDATA[<p>A few months ago my employer asked me if it would be possible to create a web service for encoding videos. I had been playing around with Amazon&#8217;s web services for a while, and it seemed like the perfect foundation for building this.</p>
<p>I decided to build the backend in Python and use ffmpeg for encoding movies. I looked into building the web service frontend in Python as well, but the SOAP libraries I could find for Python did not seem very mature or maintained. Instead I started thinking about building it in ASP.NET (I had previous experiences from building web services in ASP.NET). After some research and testing with Apache and Mono (I wanted to use Linux VMs only) I decided to develop the frontend in ASP.NET but host it on Apache.</p>
<p><span id="more-93"></span><br />
To make the service scalable I decided to break it down into several parts and use message passing between the different parts. The parts I broke it down into are:</p>
<ul>
<li>Web service frontend – what the user calls to encode a movie. Implemented in ASP.NET, hosted on Apache/Mono on Linux.</li>
<li>Encode Worker – A python process managing the encoding of videos.</li>
<li>Encode Master – Manages number of running Encode Workers. Implemented in Python.</li>
</ul>
<p>When a movie gets uploaded to the web service frontend it gets placed into the encode queue. The encode workers periodically checks if there is anything in the queue, and if it is encodes it. The Encode Master manages the number of running Encode Workers (based on the current length of the encode queue). If the queue growes to long, we just fire up a few new VMs running the worker. </p>
<p> This is a schematic view of how the service has been implemented and how the different components are related to the Amazon services:</p>
<p><img class="size-large wp-image-92" title="Video Encoding Service" src="http://sourcecodebean.com/wp-content/uploads/2009/03/dqcvideo-1024x535.png" alt="Video Encoding Service" width="650" /></p>
<p>Right now we are implementing the solution for our first customer, pretty existing I must say! In an upcoming post I will discuss the different libraries for Python and ASP.NET I used for communicating with the Amazon Web Services.</p>
]]></content:encoded>
			<wfw:commentRss>http://sourcecodebean.com/archives/building-a-distributed-web-service-using-amazon-web-services/93/feed</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Parsing AJAX web pages using PyKHTML</title>
		<link>http://sourcecodebean.com/archives/parsing-ajax-web-pages-using-pykhtml/42</link>
		<comments>http://sourcecodebean.com/archives/parsing-ajax-web-pages-using-pykhtml/42#comments</comments>
		<pubDate>Mon, 02 Mar 2009 22:28:39 +0000</pubDate>
		<dc:creator>Peter</dc:creator>
				<category><![CDATA[Linux]]></category>
		<category><![CDATA[Python]]></category>

		<guid isPermaLink="false">http://sourcecodebean.com/?p=42</guid>
		<description><![CDATA[I needed to parse data from a series of web pages, usually i would have used CURL to download the page and then used regular expressions to extract the data i was interested in. But the page i was going to parse was using AJAX to reload part of the page (when you clicked the [...]]]></description>
			<content:encoded><![CDATA[<p>I needed to parse data from a series of web pages, usually i would have used CURL to download the page and then used regular expressions to extract the data i was interested in. But the page i was going to parse was using AJAX to reload part of the page (when you clicked the &#8216;next page&#8217;) and did not provide a unique url to that page, which made my regular method pretty useless (since to use AJAX we need a Java Script Interpreter).</p>
<p>A solution to the problem that seemed a bit more challenging than using CURL and regular expressions, and which would be able to handle AJAX, was to program an existing web browser to visit the page and fetch the data from the web browser itself. I had programmed for KHTML before so i looked into if this was possible and found that Paul Giannaros had solved most of my problems. He had created PyKHTML<span id="more-42"></span>:</p>
<blockquote><p>
PyKHTML is a Python module for writing website scrapers/spiders. Whereas traditional methods focus on writing the code to parse HTML/forms themselves, PyKHTML uses the excellent KHTML engine to do all the trudge work. It therefore handles web pages very well (even the severely crufty ones) and is pretty darn fast (implemented in C++). As a bonus, the module handles JavaScript and cookies transparently. Hurrah!
</p></blockquote>
<p>As an example for this post I decided to parse a digg article to find out who digged it. To understand the could you should know that KHTML uses a event driven programming model. This is my test program:</p>
<div class="dean_ch" style="white-space: wrap;">
<ol>
<li class="li1">
<div class="de1">&nbsp;</div>
</li>
<li class="li1">
<div class="de1"><span class="kw1">import</span> <span class="kw3">sys</span></div>
</li>
<li class="li1">
<div class="de1"><span class="kw3">sys</span>.<span class="me1">path</span>.<span class="me1">append</span><span class="br0">&#40;</span><span class="st0">&quot;..&quot;</span><span class="br0">&#41;</span></div>
</li>
<li class="li1">
<div class="de1"><span class="kw1">import</span> pykhtml</div>
</li>
<li class="li2">
<div class="de2">&nbsp;</div>
</li>
<li class="li1">
<div class="de1"><span class="co1"># Setting debugWithGUI to true will give us the KHTML browser in a window.</span></div>
</li>
<li class="li1">
<div class="de1">pykhtml.<span class="me1">debugWithGUI</span> = <span class="kw2">False</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp;</div>
</li>
<li class="li1">
<div class="de1"><span class="kw1">def</span> processPage<span class="br0">&#40;</span>browser, currentPage<span class="br0">&#41;</span>:</div>
</li>
<li class="li2">
<div class="de2">&nbsp; &nbsp; <span class="co1"># Check if the next button is loaded</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; result = <span class="kw2">list</span><span class="br0">&#40;</span>browser.<span class="me1">document</span>.<span class="me1">getElementsByClass</span><span class="br0">&#40;</span><span class="st0">&quot;nextprev&quot;</span>, <span class="st0">&quot;a&quot;</span><span class="br0">&#41;</span><span class="br0">&#41;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; <span class="kw1">if</span> <span class="br0">&#40;</span><span class="kw2">len</span><span class="br0">&#40;</span>result<span class="br0">&#41;</span> &lt; <span class="nu0">1</span><span class="br0">&#41;</span>:</div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">print</span> <span class="st0">&quot;Next button not loaded&quot;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; &nbsp; &nbsp; pykhtml.<span class="me1">timer</span><span class="br0">&#40;</span><span class="nu0">0.5</span>, pykhtml.<span class="me1">partial</span><span class="br0">&#40;</span>processPage, browser, currentPage<span class="br0">&#41;</span><span class="br0">&#41;</span></div>
</li>
<li class="li2">
<div class="de2">&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">return</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp;</div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; <span class="co1"># Get next page</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; nextprev = result<span class="br0">&#91;</span><span class="nu0">0</span><span class="br0">&#93;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; nextPage = <span class="kw2">int</span><span class="br0">&#40;</span>nextprev<span class="br0">&#91;</span><span class="st0">&#8216;onclick&#8217;</span><span class="br0">&#93;</span>.<span class="me1">split</span><span class="br0">&#40;</span><span class="st0">&quot;,&quot;</span><span class="br0">&#41;</span><span class="br0">&#91;</span><span class="nu0">1</span><span class="br0">&#93;</span><span class="br0">&#41;</span> &nbsp;</div>
</li>
<li class="li2">
<div class="de2">&nbsp; &nbsp; </div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; <span class="co1"># Wait for ajax page reload to complete</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; <span class="kw1">if</span> currentPage == nextPage:</div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; &nbsp; &nbsp; pykhtml.<span class="me1">timer</span><span class="br0">&#40;</span><span class="nu0">0.5</span>, pykhtml.<span class="me1">partial</span><span class="br0">&#40;</span>processPage, browser, currentPage<span class="br0">&#41;</span><span class="br0">&#41;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">return</span></div>
</li>
<li class="li2">
<div class="de2">&nbsp; &nbsp; </div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; <span class="kw1">elif</span> nextPage &lt; currentPage:</div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; &nbsp; &nbsp; pykhtml.<span class="me1">stopEventLoop</span><span class="br0">&#40;</span><span class="br0">&#41;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">return</span> </div>
</li>
<li class="li1">
<div class="de1">&nbsp;</div>
</li>
<li class="li2">
<div class="de2">&nbsp; &nbsp; <span class="co1"># Get users on current page</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; userListClass = <span class="kw2">list</span><span class="br0">&#40;</span>browser.<span class="me1">document</span>.<span class="me1">getElementsByClass</span><span class="br0">&#40;</span><span class="st0">&quot;user-list&quot;</span>, <span class="st0">&quot;ul&quot;</span><span class="br0">&#41;</span><span class="br0">&#41;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; userList = &nbsp;<span class="kw2">list</span><span class="br0">&#40;</span>userListClass<span class="br0">&#91;</span><span class="nu0">0</span><span class="br0">&#93;</span>.<span class="me1">getElementsByTagName</span><span class="br0">&#40;</span><span class="st0">&quot;li&quot;</span><span class="br0">&#41;</span><span class="br0">&#41;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp;</div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; <span class="kw1">for</span> <span class="kw3">user</span> <span class="kw1">in</span> userList:</div>
</li>
<li class="li2">
<div class="de2">&nbsp; &nbsp; &nbsp; &nbsp; userName = <span class="kw2">list</span><span class="br0">&#40;</span><span class="kw3">user</span>.<span class="me1">getElementsByTagName</span><span class="br0">&#40;</span><span class="st0">&quot;img&quot;</span><span class="br0">&#41;</span><span class="br0">&#41;</span><span class="br0">&#91;</span><span class="nu0">0</span><span class="br0">&#93;</span>.<span class="me1">attributes</span><span class="br0">&#91;</span><span class="st0">&#8216;alt&#8217;</span><span class="br0">&#93;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">print</span> currentPage, userName</div>
</li>
<li class="li1">
<div class="de1">&nbsp;</div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; <span class="co1"># Go to next page</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; nextprev.<span class="me1">click</span><span class="br0">&#40;</span><span class="br0">&#41;</span></div>
</li>
<li class="li2">
<div class="de2">&nbsp; &nbsp; processPage<span class="br0">&#40;</span>browser, nextPage<span class="br0">&#41;</span> &nbsp; &nbsp;</div>
</li>
<li class="li1">
<div class="de1">&nbsp;</div>
</li>
<li class="li1">
<div class="de1"><span class="kw1">def</span> main<span class="br0">&#40;</span><span class="br0">&#41;</span>:</div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; url = <span class="st0">&quot;http://digg.com/linux_unix/Linux_tips_every_geek_should_know/who&quot;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; browser = pykhtml.<span class="me1">Browser</span><span class="br0">&#40;</span><span class="br0">&#41;</span></div>
</li>
<li class="li2">
<div class="de2">&nbsp; &nbsp; browser.<span class="me1">load</span><span class="br0">&#40;</span>url, <span class="kw1">lambda</span> b: processPage<span class="br0">&#40;</span>b, <span class="nu0">1</span><span class="br0">&#41;</span><span class="br0">&#41;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; pykhtml.<span class="me1">startEventLoop</span><span class="br0">&#40;</span><span class="br0">&#41;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; <span class="kw1">return</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp;</div>
</li>
<li class="li1">
<div class="de1"><span class="kw1">if</span> __name__ == <span class="st0">&quot;__main__&quot;</span>:</div>
</li>
<li class="li2">
<div class="de2">&nbsp; &nbsp; main<span class="br0">&#40;</span><span class="br0">&#41;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp;</div>
</li>
</ol>
</div>
<p>As you can see, the first thing we do is to create a PyHTML browser and loads the url.  &nbsp; <code> load() </code>takes two arguments: the url to be loaded and a function pointer to the function that should be executed when KHTML has loaded the url. To be able to provide this function a argument, I construct a lambda function. </p>
<p>So when the page has been loaded, <code>processPage()</code> is called. First we check if the page has completed loading, otherwise we wait some. When the page has completed loading, it is time to access the KHTML DOM data. PyHTML provides us quite a few nifty functions to access the DOM of KHTML, such as:</p>
<li><code>getElementsByClass()</code></li>
<li><code>getElementsByTagName()</code></li>
<li><code>getElementById()</code></li>
<p>By accessing these functions we easily get access to the data we are interested in. To go to the next page, we get the the element with of the &#8216;nextprev&#8217; class, and simply &#8216;clicks&#8217; it by calling nextprev.click(). Then we do a recursive call to proccessPage() and processes the next page. When the program has terminated it will have listed all people who digged the article. </p>
]]></content:encoded>
			<wfw:commentRss>http://sourcecodebean.com/archives/parsing-ajax-web-pages-using-pykhtml/42/feed</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
	</channel>
</rss>

<!-- Performance optimized by W3 Total Cache. Learn more: http://www.w3-edge.com/wordpress-plugins/

Minified using disk: basic
Page Caching using disk: enhanced

Served from: sourcecodebean.com @ 2012-05-30 06:10:41 -->
