tag:quandyfactory.com,2010-3-1:/201031
2010-3-1T12:00:00Z
Quandy Factory Newsfeed - All
Quandy Factory is the personal website of Ryan McGreal in Hamilton, Ontario, Canada..
http://quandyfactory.com/projects/47/east_side_mario_napolitana_sauce
2010-03-05T12:00:00Z
East Side Mario Napolitana Sauce
<h3>Ingredients</h3>
<p>Chopped Tomatoes <br />
Tomato Puree <br />
Onions <br />
Soybean Oil <br />
Garlic <br />
Salt <br />
Extra Virgin Olive Oil <br />
Basil <br />
Black Pepper</p>
<ul>
<li>May Contain Citric Acid</li>
</ul>
Ryan McGreal
2
http://quandyfactory.com/blog/46/mpaa_and_piracy
2010-02-26T12:00:00Z
MPAA and Piracy
<p>When I was a kid, I had a <em>Fat Albert</em> book in which one of the Cosby Kids (for details of who did what I'm going on memory, but I think it was Bucky) gets upset about something and runs away. </p>
<p>The other kids all search the projects for him, but evening is coming on and it's getting harder to see.</p>
<p>At one point, Weird Harold is walking from streetlamp to streetlamp, peering into the pool of light under each pool. Rudy walks up and asks Harold what he's doing. Harold says he's looking for Bucky.</p>
<p><em>But why are you looking under the streetlamps?</em> asks Rudy.</p>
<p>Harold answers, <em>That's the only place I can see.</em></p>
<p>Which brings us to:</p>
<p><img src="http://www.geek.com/wp-content/uploads/2010/02/piratedvd.jpg" alt="What pirates get vs. what paying customers get" /></p>
<p>(<a href="http://www.geek.com/wp-content/uploads/2010/02/piratedvd.jpg">source</a>)</p>
Ryan McGreal
2
http://quandyfactory.com/essays/45/lost_opportunities_can_tell_us_much
2010-02-22T12:00:00Z
Lost opportunities can tell us much
<p><em>Published on February 19, 2010 in the <a href="http://thespec.com/article/724403">Hamilton Spectator</a></em></p>
<p>Recent quotes from business owners on the rapid transit choices facing Hamilton add up to a dispiriting - but perhaps inevitable - lack of imagination and understanding.</p>
<p>At issue are the three options in the Metrolinx Business Case Analysis for the east-west route:</p>
<ul>
<li><p>Full Light Rail Transit (LRT) from McMaster University to Eastgate Square</p></li>
<li><p>Bus Rapid Transit (BRT) from McMaster to Eastgate</p></li>
<li><p>Phased LRT from McMaster to Ottawa Street, with BRT from Ottawa to Eastgate until LRT is built in 2030. [*]</p></li>
</ul>
<p>Hamilton needs transformative change. Incrementing the status quo has not served us well - particularly the city's poorest neighbourhoods.</p>
<p>LRT doesn't just carry more people more quickly. It attracts hundreds of millions of dollars in new investment and draws many more people to live and work in the area, frequent its businesses, generate demand for new businesses, and interact creatively and productively.</p>
<p>Higher-order urban activity raises infrastructure productivity and boosts the rate of creative economic output. When cities intensify, energy and infrastructure costs grow more slowly than population, but the rate of innovation grows more quickly.</p>
<p>That adds up to more of the net economic growth and employment opportunities that Hamilton desperately needs.</p>
<p>Unfortunately, the business owners quoted lately seem to think LRT is merely about moving people around. I wish someone would organize a fact-finding trip with these business owners to the King-Spadina area in Toronto.</p>
<p>Once a manufacturing centre, King-Spadina was a mess of empty factories and warehouses in the 1990s, when a group of visionary urbanists came together to develop a new plan for the district.</p>
<p>Through a combination of planning rules that encourage mixed-use investment and an anchoring streetcar line, King-Spadina has experienced an impressive influx of new condominium developments, offices and entertainment businesses created through both adaptive reuse and new construction.</p>
<p>When the plan was unveiled, skeptics scoffed. "How will people get there?" they sneered. The answer, of course, is that people moved there in droves.</p>
<p>In just a 45-hectare area, King-Spadina attracted $55.6 million in new investment between 2000 and 2007, creating 700 new jobs and 230,000 square feet of property.</p>
<p>The population has quadrupled since 1996, and the biggest cohort has been young professionals looking for an urban lifestyle close to employment and social amenities.</p>
<p>One would be forgiven for assuming business owners would be head-over-heels about such an opportunity coming to Hamilton.</p>
<p>Surprisingly, business owners may not be the best people to talk to when determining whether and how to transform a neighbourhood. When the economic system is failing most people, does it really make sense to base planning decisions around the few who manage to survive?</p>
<p>The risk of owning a business and the fact that the current system works for them makes such business owners inherently risk-averse. Transformation may well bring huge benefits - particularly for property owners who will enjoy the windfall of rising property values - but it also means that the rules for success change.</p>
<p>The business strategies that work in a poor, failing economic and social environment might not transfer to a booming, thriving environment.</p>
<p>As a result, business owners feel they have a lot to lose.</p>
<p>Compound the fact that many owners don't actually seem to understand that LRT is qualitatively different from buses, and it's a recipe for fear and doubt.</p>
<p>But for every business owner worried that LRT might adversely affect their business, how many potential businesses locate elsewhere or simply never start up at all? How many potential customers never materialize because they chose to live elsewhere?</p>
<p>In the case of a transformative initiative like LRT, there's a real danger that survivorship bias will lead us horribly astray.</p>
<p>Survivorship bias is the error of studying only entities that survived some kind of selection process, and ignoring those entities that did not survive.</p>
<p>Business writer Jason Cohen shares the anecdote of British engineers during the Second World War trying to determine how best to armour planes undertaking aerial raids in Germany.</p>
<p>Studying the patterns of bullet holes on planes that returned from aerial raids, they noticed that the holes were mainly on the wings and tail, with few near the cockpits or fuel tanks.</p>
<p>Survivorship bias means concluding that the planes need more armour on the wings and tail because that's where the returning planes have the most bullet holes.</p>
<p>Of course, the planes that were hit in the cockpits and fuel lines were not available for study.</p>
<p>The business owners along the B-Line are survivors of the economic battle downtown Hamilton has suffered over the past half-century.</p>
<p>How many businesses were "shot down" because of Hamilton's low-quality transit, low population densities and pedestrian-repellent one-way streets?</p>
<p>Who will speak for those failed businesses and lost opportunities? How can we incorporate them into our public discourse so the survivorship bias of existing businesses doesn't lead us astray?</p>
<hr />
<p>* Note: under the phased approach, the route east from Ottawa Street would be served by the current BRT-like B-Line service, not by BRT on dedicated lanes.</p>
Ryan McGreal
2
http://quandyfactory.com/projects/44/hamilton_spectator_pdf_downloader
2010-02-15T12:00:00Z
Hamilton Spectator PDF Downloader
<p><style type="text/css">
div.formdiv { text-align: center; clear: both; width: 12em; }
.formdiv span { float: left; display: block; width: 5em; text-align: left; }
span.label { text-align: right; margin-right: 5px; }</p>
<p></style></p>
<form>
<div class="formdiv">
<span class="label">Year: </span>
<span> <input id="spec_year" name="spec_year" value="2010"></span>
</div>
<div class="formdiv">
<span class="label">Month: </span>
<span><input id="spec_month" name="spec_month" value="01"></span>
</div>
<div class="formdiv">
<span class="label">Day: </span>
<span> <input id="spec_day" name="spec_day" value="25"></span>
</div>
<div class="formdiv">
<span class="label">Page: </span>
<span> <input id="spec_page" name="spec_page" value="A13"></span>
</div>
<div class="formdiv">
<input type="button" name="spec_submit" value="Get PDF Page" onclick="spec_get_pdf(); return false;">
</div>
</form>
<script type="text/javascript">
function spec_get_pdf() {
var year = document.getElementById('spec_year').value;
var month = document.getElementById('spec_month').value;
var day = document.getElementById('spec_day').value;
year = set_length(year,4);
month = set_length(month,2);
year = set_length(year,2);
var page = document.getElementById('spec_page').value;
page = page.toUpperCase();
var specstring = 'http://www.hamiltonspectator.com/pdfs/'
var link = specstring+year+month+day+'/'+page+'.pdf';
location.href = link;
}
function set_length(val, len) {
var zeroes = '';
var diff = len - val.length;
if (diff > 0) {
for (var i=0;i<diff;i++) {
zeroes += '0';
}
}
return zeroes+val;
}
</script>
Ryan McGreal
2
http://quandyfactory.com/blog/43/review:_mark_pilgrim's_dive_into_python_3
2010-02-08T12:00:00Z
Review: Mark Pilgrim's Dive Into Python 3
<h3>TL;DR Summary</h3>
<p>If you're an experienced programmer, want to learn Python 3, and don't have a lot of time to waste, skip this review and just go straight to Mark Pilgrim's <em><a href="http://diveintopython3.org">Dive Into Python 3</a></em>.</p>
<p>DIP3 starts with practical, working code, takes it apart piece by piece, puts it back together, and leaves you with a solid understanding of the concepts and their applications.</p>
<p>DIP3 is <em>opinionated</em> but well-informed. It hammers home the important stuff and skips the blather. It's also very well written: terse, witty, and expressive (like Python itself).</p>
<h3>Introduction</h3>
<p>Mark Pilgrim's <em>Dive Into Python 3</em> came out late last year, but I didn't receive a review copy until late January - complete with autograph and a friendly note from the author apologizing for the delay. (Disclosure: I felt warm and fuzzy after receiving the personal note. Also, the review copy I received was free.)</p>
<p>If you're familiar with <a href="http://diveintomark.org">Mark Pilgrim</a>, you'll know that he's an opinionated writer. DIP3 is no exception; this is Python 3 the way Mark Pilgrim wants you to understand it. </p>
<p>The good news is that Pilgrim is a <em>reliable narrator</em>. That is, he really knows his stuff; and his opinions, while strong, are deeply knowledgeable, <a href="http://diveintomark.org/archives/2009/10/19/the-point">ethically and philosophically consistent</a>, and shared in a spirit of cooperation and stewardship.</p>
<h3>Purpose and Relation to Dive Into Python</h3>
<p>Comparisons to Pilgrim's original Python book, <em><a href="http://diveintopython.org/">Dive Into Python</a></em> (also published by Apress), are inevitable. The good news is that this edition improves on the original in most of the ways that matter. The book design is more elegant and stylish, with better use of white space and contrast, cleaner fonts (with one notable exception, about which more below), and clearer layout.</p>
<p>The book is written primarily for experienced programmers coming to Python for the first time, but Pilgrim recognizes that a lot of its readers will have a background in Python 2. </p>
<p>Since Python 3 breaks compatibility with the previous trajectory of versions, the book contains specially formatted bullet notes at points where version 3 breaks with 2.x; for instance, the merging of <code>int</code> and <code>long</code> datatypes, or the fact that the <code>/</code> division operator now triggers floating point division by default, rather than integer division.</p>
<p>By focusing mainly on Python 3 itself and then highlighting the diffs from Python 2, Pilgrim generally gets the best of both worlds. </p>
<p>Still, there are a few ways in which the book isn't entirely sure whether it wants to be a resource for programmers learning Python for the first time, or for Python 2 programmers who want to upgrade. It would be more helpful for experienced Python 2 programmers if Pilgrim noted the diffs consistently rather than sporadically.</p>
<p>Another significant change from DIP is the way it uses programs to introduce the language. In the original book. the introductory program was an 11 line script to build an ODBC connection string from a dictionary. It demonstrated function declaration, datatypes, everything-is-an-object, code blocks (and significant whitespace), and the <code>if __name__ == '__main__'</code> trick.</p>
<p>The next chapter covered dictionaries, lists, tuples, declaring and returning variables, string formatting (old style, with <code>% ()</code> notation), and basic list comprehensions. The program was dissected exhaustively over two chapters and concluded with an explicit acknowledgment:</p>
<blockquote>
<p>The <code>odbchelper.py</code> program and its output should now make perfect sense. </p>
</blockquote>
<p>In DIP3, by contrast, the introductory program spans four chapters - and there's never a discrete <em>aha</em> moment where you realize that you understand the program fully. </p>
<p>Still, overall the book is a commendable accomplishment: a concise, accessible, and above all <em>fun</em> introduction to a language also known for concision, accessibility and fun.</p>
<h3>Why Python 3?</h3>
<p>Python 3 has not yet enjoyed wide adoption among programmers - existing Python programmers or otherwise - and part of the reason is that the language is caught in the chicken-egg problem whereby the lack of ported libraries makes it less appealing for coders, while the lack of existing coders makes it less compelling for developers to port their libraries.</p>
<p>My hunch is that <em>Dive Into Python 3</em> will carry us closer to the tipping point in which coders and library supporters start to upgrade <em>en masse</em>. </p>
<p>For one thing, Python 2 programmers will come away from this book with a much clearer understanding of how many problems and pain points Python 3 solves. In addition, version 3 sweetens the deal with powerful new data structures that already make me envious when I go back to work on projects written in 2.x.</p>
<h3>Diving In</h3>
<p>If you've read programming books before, you'll know that the general format is to start with lots of theoretical background and history, then introduce the syntax, data types and common code blocks; and finally, after a few chapters, start putting together a program.</p>
<p>That was my original experience trying to learn Python a few years ago by reading through <em><a href="http://www.greenteapress.com/thinkpython/thinkCSpy/">How to Think Like a Computer Scientist: Learning with Python</a></em> by Allen Downey, Jeff Elkner and Chris Meyers. </p>
<p>It wasn't until I read Mark Pilgrim's original <em>Dive Into Python</em> that I really started to get an internalized sense of the language. He raised 'learn by doing' to the level of an art form, and it clicked with me in an immediate and sustained way.</p>
<p>It becomes clear that <em>Dive Into Python 3</em> follows the same aggressively practical, hands-on approach as soon as you read the opening line of the introduction:</p>
<blockquote>
<p>Welcome to Python 3. Let's dive in.</p>
</blockquote>
<p>The introduction covers installing Python 3, and it's a significant improvement on the equivalent chapter in the original DIP. If you run Windows, Mac OSX or Debian/Ubuntu, the book takes you step by step through the installation process. (If you run a more exotic operating system, you can probably figure out how to install Python all by yourself.)</p>
<h3>Basics: Functions, Datatypes and Comprehensions</h3>
<p><strong>Chapter 1</strong> covers function declaration and arguments, doc strings, <code>sys.path</code>, objects (and the oft-repeated fact that everything, in Python, is an object), denoting code blocks through indentation (i.e. significant whitespace), exceptions, variable declarations, and the <code>if __name__ == '__main__'</code> trick. </p>
<p>These are the absolute basics, and it's noteworthy that Pilgrim includes an introduction to exceptions. He's an opinionated programmer, and he wants you to understand that exception handling <em>is</em> fundamental to good Python code, not an exotic extra to be mentioned in passing once you're already proficient.</p>
<p><strong>Chapter 2</strong> dives into Python's datatypes: booleans and boolean contexts, numbers, type coercion, operations, fractions, lists, tuples (immutable lists), dictionaries, sets (new in Python 3, sets are unordered lists with a syntax similar to dictionaries), and the special <code>None</code> type.</p>
<p><strong>Chapter 3</strong> introduces comprehensions, a delightful set of syntactic sweeteners that allow you to map and filter an iterable collection using a terse one-liner:</p>
<pre><code>newlist = [func(item) for item in oldlist if test(item)]
</code></pre>
<p>New in version 3, Python adds dictionary and set comprehensions to the list comprehensions it already supported.</p>
<p>Aside: this is one of the places where Pilgrim <em>doesn't</em> specifically mention the difference between Python 2 and 3. I found myself going back to the Python 2 documentation for a sanity check just to make sure these structures weren't there and I just somehow missed them.</p>
<h3>Bytes vs. Characters</h3>
<p><strong>Chapter 4</strong> breaks form. Instead of diving straight into code, Pilgrim opens with three pages of exposition - the only such indulgence in the book, and with good reason. Possibly the most beautiful and simultaneously despairing chapter in the book - indeed, this achieves a level of pathos befitting a novel, let alone a technical manual - Pilgrim recounts the tragedy of text on an international data network. And it <em>is</em> a tragedy. </p>
<p>In what might be the most significant break from verson 2, Python 3 consistently and explicitly treats strings as <strong>streams of bytes</strong>, not streams of characters.</p>
<p>Because a string is a stream of bytes, it must be <strong>encoded</strong> using a string encoding that maps the bytes to specific characters.</p>
<p>In Python before version 3, the default encoding was <a href="http://en.wikipedia.org/wiki/ASCII">ASCII</a>, the American Standard Code for Information Interchange, a seven-bit encoding that handles all conventional English characters - the lowercase and uppercase letters, numerals, and punctuation symbols - plus various control characters including tabs, spaces, newlines and so on.</p>
<p>Of course, not everyone speaks or writes English, and many other languages use additional characters (like the French <em>e-accent-aigu</em> é or the German <em>eszett</em> ß) or even collections of characters with little or no overlap to English (like Japanese Kanji). For these languages, ASCII is wholly inadequate, and different languages have independently developed various encoding systems that often map the same character to different bytes.</p>
<p>The potential for chaos is huge, and indeed a number of incompatible encodings have already caused plenty of grief for people trying to process text on computers. </p>
<p>We've all seen web pages with jumbles of nonsense characters where you would expect, say, quotation marks to go. This is caused by a mismatch between the encoding used to produce the text - say, <a href="http://en.wikipedia.org/wiki/Windows-1252">ANSI CP-1252</a> in Microsoft Word) and the encoding used to render the text later (say, <a href="http://en.wikipedia.org/wiki/ISO/IEC_8859-1">ISO/IEC 8859-1</a>).</p>
<p>Pilgrim sets the scene eloquently and then, with a seemingly innocuous "Enter Unicode", leads us on a harrowing emotional tennis match, swinging back and forth between optimistic proposed solutions - let's agree to put every single character into one big encoding! - and new problems - each character now takes up 4 bytes! - to follow-up solutions - use agreed-upon subsets! - to still more problems.</p>
<blockquote>
<p>Now cry a lot because everything you thought you knew about strings is wrong, and there ain't no such thing as plain text.</p>
</blockquote>
<p>Pilgrim doesn't offer simple solutions to the problems he raises, because they don't <em>have</em> simple solutions. Character encoding may be the Great Granddaddy of more-or-less insoluble technical problems. They can't be solved, as such, but if you understand the problems well enough they can be addressed more or less safely.</p>
<p>But as Pilgrim points out in regards to the necessary shortcuts that programs take in mapping out solutions:</p>
<blockquote>
<p>[It's] a good assumption right up until the moment that it's not.</p>
</blockquote>
<p>String processing is further complicated by the matter of big-endian vs. little-endian byte ordering - which is only a problem when people try to share files from different computers, "perhaps on a worldwide web of some sort".</p>
<p>With all this in mind, you can't help but conclude, despairingly: <em>Character encoding is hard!</em></p>
<p>But rather than just giving up and going shopping, Pilgrim leads you through the maelstrom and delivers you - shaken but intact - on the mostly-safe harbour of UTF-8. Not that UTF-8 isn't also beset with traps and gotchas.</p>
<p>Chapter 4 demonstrates formatting strings (using Python's powerful new string formatting syntax, first introduces in version 2.6), common string methods, string slicing (a string, after all, is a list of <del>characters</del> <ins>bytes</ins>), the difference between strings and bytes (lots of gotchas exposed here), and encoding of source code (Python 2 files were ASCII by default, whereas Python 3 files are UTF-8 by default).</p>
<h3>Digression on the Book's Text Formatting</h3>
<p>This is as good a place as any to mention a formatting bug in this edition: the peppering of the text with artifacts - hollow vertically aligned rectangles - in place of special characters like em-dashes. For example:</p>
<blockquote>
<p>The first line imports the <code>humansize</code> program as a module▯a chunk of code that you use interactively or from a larger Python program.</p>
</blockquote>
<p>Given the attention Pilgrim gives to character encoding (I still remember the <a href="http://www.reddit.com/r/Python/comments/8mc40/dive_into_python_3_everything_you_thought_you/">reddit thread</a> on which he posted an early draft of that section for review and feedback - some of it unbearably pedantic), it seems bizarre that his own book would be bitten by an encoding gotcha! </p>
<p>I contacted Pilgrim to ask if he knew what happened but did not receive a response in time for publication. It may be related to the fact that he had to <a href="http://diveintomark.org/archives/2009/03/27/dive-into-history-2009-edition">convert the book from HTML to MS Word</a> before submitting it to the publisher.</p>
<h3>Regular Expressions</h3>
<p><strong>Chapter 5</strong> delves into regular expressions, a powerful and pragmatic DSL for solving real-world data extraction problems. Yet in what is turning into a common theme, regex is complicated and contains some gnarly gotchas and edge cases.</p>
<p>Pilgrim takes us through real-world use scenarios in much the same way that real workaday programmers approach them: by iterating through a sequence of progressive changes and enhancements, until we're satisfied that the solution is <a href="http://en.wikipedia.org/wiki/Principle_of_good_enough">Good Enough</a>.</p>
<p>One of the nicest touches of Python's support for regex is <em>verbose mode</em>, which ignores whitespace and allows inline comments so that the regex still makes sense when you have to look at it again in six months. Unless you're parsing text on a daily basis and know regex inside and out, this <em>will</em> be helpful to you.</p>
<h3>Iterating Through Iteration</h3>
<p><strong>Chapter 6</strong> tackles closures and generators. While Python is not as purely functional as a Lisp or even Ruby, it still provides some powerful functional tools. Pilgrim takes a thorny problem - matching pluralization rules for a variety of English nouns - and combines regular expressions with generators (functions that yield their values lazily and iteratively) and meta-functions (functions that take functions as arguments and return functions).</p>
<p>Pilgrim combines these tools to build generators that act as closures and yield functions based on regex pattern matching. It's heady stuff, but by the time he's finished taking it apart and putting it back together, you can't help but understand it. Along the way, we learn about files and file-like objects, and the powerful <code>with</code> context (about which more below).</p>
<p><strong>Chapter 7</strong> dives deeper, tackling the same problem using classes and iterators. Everything in Python is an object, with properties and methods. In this chapter, Pilgrim explains who you can define your own classes and instantiate them as objects - this case, an iterator class.</p>
<p>He covers Python's special reserved word <code>pass</code>, which does nothing but is necessary for those occasions when Python syntactically requires something - what Pilgrim calls "a Python reserved word that just means, 'move along, nothing to see here'."</p>
<p>He covers Python's special <code>__init__()</code> method, distinguishing it semantically from the C++ constructor method; the special <code>self</code> argument in Python clases; instantiating classes; instance variables; passing in parameters and calling methods.</p>
<p>Then he shows how to create an iterator class and revisits the pluralization code from the previous chapter to produce the same results as a more terse, abstract form.</p>
<p>Aside: when I first discovered Python, I was delighted to discover that it appeared to be lists all the way down. Pilgrim argues by way of contrast that it's actually "iterators all the way down."</p>
<p><strong>Chapter 8</strong> indulges in a bit of whimsy by way of introducing the <code>itertools</code> library in the context of an extremely terse (just 14 lines!) Alphametics solver. Pilgrim dives a little deeper into regular expressions, introducing the <code>findall()</code> method and highlighting another pattern matching gotcha (it doesn't return overlapping matches). He also shows how you can use the new <code>set</code> datatype to extract the unique items in a list.</p>
<p>We also learn how to make assertions and catch <code>AssertError</code>s with the terse, one-line syntax you may have come to expect from Python by now.</p>
<p>Speaking of which, Pilgrim revisits the generator paradigm with a one-line generator expression. If you prefer not to iterate, you can pass the generator right into a <code>tuple()</code>, <code>list()</code> or <code>set()</code> function.</p>
<p>An important paradigm of functional programming is the use of <em>lazy evaluation</em>, in which the values in an iterable object are calculated on the fly as needed rather than all at once.</p>
<p>Python comes equipped with the powerful <code>itertools</code> module, which includes methods for generating permuatations, products, combinations, chains, groups, and more.</p>
<p>Along the same lines, Pilgrim shows how to treat a file as an iterable list (most things in Python are iterable), with all the methods and functions available to manipulate lists.</p>
<p>The chapter closes with a vintage Pilgrim dive when he iterates through the chain of problems / solutions / new problems related to Python's <code>eval()</code> function. <code>eval()</code> is powerful - and <em>dangerous</em>, since it can run arbitrary code that does anything Python can do (<code>subprocess.getoutput('rm-rf')</code> anyone?).</p>
<blockquote>
<p>Say it with me: "<code>eval()</code> is evil!"</p>
</blockquote>
<p>It turns out that there is no truly reliable, safe way to expose <code>eval()</code> to untrusted third parties.</p>
<h3>Unit Testing</h3>
<p><strong>Chapter 9</strong> introduces unit testing. If you unit test, you'll appreciate the value of a powerful, simple testing framework; if you don't unit test, you may well close this paragraph a convert.</p>
<p>Python, of course, comes with a <code>unittest</code> module that lets you subclass simple, robust test suites to prove, empirically, that your code does what it's supposed to do (at least on a micro level).</p>
<p>Using <code>unittest</code> and the code from Chapter 5 to generate Roman Numerals, Pilgrim takes you through testing, debugging and (in <strong>Chapter 10</strong>), safe refactoring. </p>
<p>His philosophy is simple and elegant:</p>
<ol>
<li>Write tests.</li>
<li>Write code.</li>
<li>Test code.</li>
<li>Debug code.</li>
<li>When all the tests pass, stop coding.</li>
</ol>
<p>Over the course of two chapters, you'd be hard pressed not to come away persuaded that Pilgrim's onto something here.</p>
<h3>File I/O</h3>
<p><strong>Chapter 11</strong> dives deep into files. I/O is central to computing, and Pilgrim has touched on files and file-like objects several times so far. In this chapter, files are the main attraction.</p>
<p>In Python 3, the <code>open()</code> function takes an <code>encoding</code> argument so that Python knows how to encode the string of bytes that is a file:</p>
<blockquote>
<p>Bytes are bytes; characters are an abstraction. A string is a sequence of Unicode characters. But a file on disk is not a sequence of Unicode characters; a file on disk is a sequence of bytes. So if you read a "text file" from disk, how does Python convert that sequence of bytes into a sequence of characters?</p>
</blockquote>
<p>If you don't specify an encoding, Python 3 uses your system's default encoding (e.g. Windows, where the default encoding is CP-1252). That might be the file's encoding, but it might not - assume at your peril! This is especially true given that Python is cross-platform. Code written on, say, a Linux machine that makes assumptions about file encoding might suddenly fail on a Windows machine.</p>
<p>Pilgrim then dives into opened files as stream objects, with useful properties and methods, including <code>readline()</code>, <code>seek()</code> and <code>tell()</code>. Stream objects also behave like iterators, yielding a line at a time.</p>
<p>In addition to reading files, Python can write files and append new lines to files. But not all files are text files. Python can also open, read, write and append to binary files, byte by byte.</p>
<p>Further, Pilgrim explains how to use the <code>io</code> library to create and manipulate file-like objects that may not correspond to actual files on disk but still possess file-like properties and methods.</p>
<p>Of course, Python can also handle compressed files via the <code>gzip</code> module. (The standard Python library is phenomenal in its breadth and usefulness.)</p>
<p>Python is thoroughly cross-platform, but it allows you to use and redirect the stdin, stdout and stderr pipes indigenous to *nix systems. Python shows you how to hook into these pipes.</p>
<p>Pilgrim advocates using the <code>with</code> context as the safest way to handle files and file-like objects. He notes that Python 3.1 supports multiple nested <code>with</code> contexts (if you have Python 3 installed, take a few minutes to upgrade to 3.1).</p>
<h3>Serialization via XML, Pickle and JSON</h3>
<p><strong>Chapter 12</strong> wrestles XML. Yes, XML is a special flavour of hell, but you <em>will</em> have to deal with it from time to time, and Python has easy, powerful tools to help you. He sketches out a crash course on XML, concluding, "Now you know just enough XML to be dangerous."</p>
<p>The standard library offers <code>xml.etree.ElementTree</code> to parse, walk, search, and iterate (it really <em>is</em> iterators all the way down) through, create, and modify your XML object.</p>
<p>Pilgrim pauses to draw our attention to the third <code>lxml</code> module, a third-party library that replicates the ElementTree API but extends it with more powerful search methods.</p>
<p>With matching APIs, your code can prefer the more powerful third party library but fall back on the standard library if the former is not installed.</p>
<p>Pilrim also shows how you can dig into an XML object with bad syntax and recover its data using <code>lxml</code>.</p>
<p>Of course, XML is evil (albeit an often necessary evil), and <strong>Chapter 13</strong> introduces other methods of serializing Python objects. </p>
<p>Sometimes you want to store more than just strings in your data persistence routine, and Python has (wait for it) powerful tools for serializing even complex objects and datatypes - like the standard <code>pickle</code> module, which has <code>dump()</code> and <code>load()</code> methods for storing object,s dicts, lists, and so on.</p>
<p>But <code>pickle</code> is only good for Python programs, and data often has to pass between different applications written in different languages. Fortunately, Python comes equipped to handle <a href="http://www.json.org/">JSON</a>, or JavaScript Object Notation, a data format developed by Douglas Crockford and inspired by javascript object literal syntax (in fact, JSON <em>is</em> valid javascript).</p>
<p>JSON can handle objects, lists and dicts, and the standard <code>json</code> library can handle JSON. You can also extend JSON with custom serializers for unsupported data types, like tuples.</p>
<p>Pilgrim really knows how to hammer home the gotchas. In his umpteenth reminder to define the encoding of an opened file:</p>
<blockquote>
<p>You'll forget! I forget sometimes! And everything will work right up until the moment that it fails, and then it will fail most spectacularly.</p>
</blockquote>
<h3>Take a REST</h3>
<p><strong>Chapter 14</strong> takes the reader through creating RESTful, HTTP-based web services. (Pilgrim wisely decided not to update the chapter on SOAP web services from the orginal DIP.) </p>
<p>Python does HTTP via the <code>urllib</code> and <code>httplib</code> modules (standard library), but Pilgrim recommends the third-party <code>httplib2</code> module instead. It's more complete and clean than the former two. In particular, it supports HTTP caching, last-modified testing, ETags, compression, and 30x redirects.</p>
<p>Pilgrim notes:</p>
<blockquote>
<p><code>urllib</code> speaks HTTP like I speak Spanish - enough to get by in a jam but not enough to hold a conversation. HTTP is a conversation. It's time to upgrade to a library that speaks HTTP fluently.</p>
</blockquote>
<p>Pilgrim shows how to fetch data from a web resources without being rude or inefficient, by accepting compressed data, caching files locally, remembering permanent redirects, and so on.</p>
<p>He also demonstrates RESTful interactions with HTTP GET, POST, UPDATE and DELETE requests to a web API (in his example, <a href="http://identi.ca">identi.ca</a>, an open source, twitter-like microblogging service).</p>
<p>Important note: <code>httplib2</code> returns bytes, not strings. Yes, the character demon rears its head again. You need to specify an encoding.</p>
<h3>Porting and Packaging</h3>
<p>I've mentioned that DIP3 doesn't always explicitly highlight the differences between Python 2 and 3. <strong>Chapter 15</strong> comes as something of a remedy.</p>
<p>The bytes/characters dichotomy is paramount throughout this book, since it's a huge gotcha that <em>will</em> burn you sooner or later. (It's one of the reasons the Python development team decided to break compatibility to create Python 3 in the first place.) </p>
<p>It's not surprising that Pilgrim ported a library (from Mozilla) that guesses the character encoding of byte sequences into Python. In this chapter, Pilgrim walks the reader through the sometimes painful exercise of porting the library, called <code>chardet</code>, from Python 2 to Python 3.</p>
<p>While the book has noted difference between the versions along the way, it has not served as a systematic guide to the changes. As Chapter 15 shows, there are some important but subtle differences under the hood that will break your code and may be infuriating to track down.</p>
<p>Python includes the <code>2to3.py</code> tool, which automates the automatable aspects of porting, and highlights those issues it couldn't handle.</p>
<p>Of course, every chapter does double- and triple-duty, and this chapter also introduces the structure and internal arrangement of multi-file libraries.</p>
<p>Speaking of which, <strong>Chapter 16</strong> covers packaging Python libraries for distribution. If nothing else, Pilgrim gets an <em>A+++++ Would Definitely Read Again</em> for demystifying the <code>distutils</code> library and unlocking the PyPi online catalogue. </p>
<p>I've always found the documentation on packaging to be arcane and inaccessible, and this chapter is a much-appreciated remedy.</p>
<h3>Conclusion</h3>
<p>DIP3 is not suitable as an introduction to programming, since Pilgrim assumes you bring a background in programming concepts to the table; but it's perfect for busy programmers looking to gain mastery in Python 3.</p>
<p>Over the course of a witty, compelling narration, the book repeatedly hammers both Python 3's most important strengths (terse syntax, rich datatypes, everything-is-an-object, abundant iteration, powerful libraries) and most dangerous pitfalls (bytes vs. characters).</p>
<p>Pilgrim wastes no more time than is strictly necessary on exposition. We program to solve problems, and DIP3 keeps problem solving - practical, real-world problems - in the driver's seat throughout.</p>
<p>Notwithstanding a few minor quibbles, I can heartily recommend this book to anyone who wants to tackle Python 3. Since reading it, I find myself thinking about my own modest Python applications - sooner or later I'm going to have to cut the cord and make the jump to Python 3. </p>
<p>After reading <em>Dive Into Python 3</em>, that point seems closer than ever.</p>
<h3>Reference</h3>
<p>Mark Pilgrim, <em>Dive Into Python 3</em>, Apress, 2009. </p>
<p>Text copyright © 2009 by Mark Pilgrim. Licenced under the <a href="http://creativecommons.org/licenses/by-sa/3.0/">Creative Commons Attribution Share-Alike</a> licence. </p>
<p><em>DIP3</em> is <a href="http://diveintopython3.org/">available for download</a> in HTML or PDF; or you can clone the document repository:</p>
<pre><code>you@localhost:~$ hg clone http://hg.diveintopython3.org/ diveintopython3
</code></pre>
Ryan McGreal
2
http://quandyfactory.com/blog/41/top_10_programming_lessons_in_10_years
2010-01-25T12:00:00Z
Top 10 Programming Lessons in 10 Years
<p>After reading <a href="http://www.dcs-media.com/Archive/20-20-top-20-programming-lessons-ive-learned-in-20-years-FI">Top 20 Programming Lessons I've Learned in 20 Years</a> by Jonathan Danylko, a recent article <a href="http://news.ycombinator.com/item?id=1049890">featured on Hacker News</a>, I was inspired to come up with my own list of programming lessons.</p>
<p>These lessons are pretty basic, but they have served me well over the years. When I find myself getting lost in a project, it helps to go back to the basics and get some perspective. </p>
<p>I have learned most of these lessons the hard way, i.e. by not following them and getting burned. In some cases, I have been burned more than once. (What can I say? I'm a slow learner.) </p>
<p>On the other hand, following every lesson consistently always leads to positive outcomes. Many of these lessons seem like additional work, but my experience has always been that they save time and aggravation over the life of the project. A little investment of effort up front can prevent a lot of catch-up work down the road.</p>
<p>So here's my top 10 (plus one bonus) list of programming lessons I've learned over the last ten years.</p>
<h3>Break it down.</h3>
<p>A whole project - even an ostensibly simple one - is overwhelming to contemplate and leads to defensive <a href="/blog/1/productivity_and_procrastination">procrastination</a>. Always take the time at the beginning to break the project into manageable components, even if it seems like a waste of time. You'll be glad you did when a week or two goes by, the project scope has expanded, and you're starting to lose site of the big picture. </p>
<p>With a list of components, you don't have to worry about the big picture - you just work through the steps until complete. Of course, this means you must be sure to review and revise the list of steps as your requirements change.</p>
<h3>Your requirements will change.</h3>
<p>Accept it. Embrace it. Understand that the final product will be better if you're willing to change your mind when the facts change. </p>
<p>It's generally impossible to know, in advance, exactly what the end product is going to look like. The harder you try to detail the end product in absolute, unchanging terms, the more wrong you will end up being.</p>
<h3>Obey Gall's Law.</h3>
<p>When coding a project, take the shortest path to <a href="http://en.wikipedia.org/wiki/Gall%27s_law" title="A complex system that works is invariably found to have evolved from a simple system that worked.">a simple system that works</a>, even if it's only a fractional subset of the total project requirements. Then incrementally add functionality, testing as you go, until the project is completed.</p>
<h3>Document as you go.</h3>
<p>This includes both code comments and external documentation for application and/or API users. Every time you add or change a feature, update the documentation to reflect the change. Just make it a core part of your workflow. </p>
<p>If you wait until you've finished coding to start documenting, it will end up looking forced, rushed, and incomplete. Ideally, include your documentation right inside your version control.</p>
<h3>Use version control.</h3>
<p>Disk space is cheap and abundant, so commit early and commit often. Make sure your commit comments reflect what you're capturing in each snapshot. </p>
<h3>Maintain separate development and production environments.</h3>
<p>Never you mind that 'quick fix' that will just take a monent to type out. Put it in development, test it, commit the change, and then push to production. Do this every time, and you won't end up looking like an asshole when something goes wrong.</p>
<h3>Backup and restore.</h3>
<p>This should be a no-brainer, so don't be one of those people who has to <a href="http://superuser.com/questions/82036/recovering-a-lost-website-with-no-backup" title="Coding Horror, indeed!">learn the hard way</a>. You need a proper, reliable, redundancy-tested offsite backup, and <em>you need to test regularly</em> that you can restore from your backups. </p>
<p>No, a RAID array is not a backup solution. Nor is blindly trusting your hosting provider to do their job.</p>
<h3>Leave your code in a working state at the end of every day.</h3>
<p>Don't walk out on code that won't run. It's surprisingly demoralizing to come to work the next day knowing that a broken build is waiting for you. If you have to roll back or comment out a half-finished code block, do it. </p>
<p>In addition, it's helpful to draft up a quick todo list so you can quickly pick up where you left off the next day. (Thanks to <a href="http://news.ycombinator.com/item?id=1050652" title="Write out your mental cache at the end of the day.">bsaunder</a> on Hacker News for this suggestion.)</p>
<h3>If you can't figure out a problem, walk away.</h3>
<p>Cognitive science tells us that our brains protect us from the stress of not knowing how to solve a problem by ... hiding key information about the problem from consciousness. Thanks, brain. </p>
<p>For small to medium-sized problems, I find a good brisk walk is enough to break the logjam in my mind and see through to a solution. For big problems, I may have to pull out the big guns: a good night's sleep. </p>
<p>I'm not being silly or even exaggerating: <em>sleep on it</em> is an essential tool of my problem solving strategy, a tool on which I rely regularly, and which has yet to fail me.</p>
<h3>Fix bugs first.</h3>
<p>Don't add new features while any identified bugs are still outstanding. When part of your system doesn't work, the last thing you want to do is introduce additional complexity. </p>
<p>When troubleshooting bugs, try to be scientific about it. Don't just randomly make changes and hope the bug will go away - this is a recipe for introducing more bugs. Make sure you come to understand what's causing the bug so that you can be confident your fix really is a fix. (Thanks to <a href="http://news.ycombinator.com/item?id=1050722" title="Understand what was causing the bug once it's fixed.">bendtheblock</a> on Hacker News for this suggestion.)</p>
<h3>Communicate, communicate, communicate.</h3>
<p>When you're developing a project for someone, keep the channels open at all times. Clearly communicate your timelines, progress and difficulties and provide regular status reports. </p>
<p>If you realize something is going to take longer than you expected, resist the urge to go into hiding! Instead, notify them as early as possible. Likewise, if the client comes back to you with changes that impact your time lines, be very clear in communicating what those impacts will be. </p>
<p>Finally, though it may seem like a hassle, make yourself available to answer questions and provide demonstrations. If you followed #3 and #8, you will always have <em>something</em> to show.</p>
Ryan McGreal
2
http://quandyfactory.com/projects/40/pytoc
2010-01-19T12:00:00Z
PyToc
<h3>Introduction</h3>
<p>PyToc generates a table of contents for an HTML document based on headings, with anchor links from the TOC to specific headings.</p>
<h3>Download</h3>
<p>You can download the latest version of PyToc from its github repository:</p>
<ul>
<li><a href="http://github.com/quandyfactory/PyToc">http://github.com/quandyfactory/PyToc</a></li>
</ul>
<h3>Requirements</h3>
<ul>
<li>Python 2.5 or 2.6</li>
<li><a href="http://www.crummy.com/software/BeautifulSoup/">BeautifulSoup</a> 3.0.8</li>
</ul>
<h3>Using PyToc</h3>
<p>You can see the code in action on my website's <a href="http://quandyfactory.com/about/">About</a> page.</p>
<p>It's pretty simple to use. <a href="http://github.com/quandyfactory/PyToc">Download</a> pytoc.py and save it somewhere in your PATH. </p>
<p>Here's a demonstration:</p>
<pre><code>>>> import urllib
>>> import pytoc
>>> url = 'http://quandyfactory.com/projects/40/pytoc'
>>> page = urllib.urlopen(url)
>>> html = page.read()
>>> toc = pytoc.Toc(html_in=html)
>>> toc.make_toc()
True
>>> toc.html_toc # returns an HTML table of contents
>>> toc.html_out # returns the html with anchors and numbering in headings
>>> toc.toc_list # returns a list of tuples in the form (section number, title)
</code></pre>
<h4>Input Properties</h4>
<p>The following are input properties you enter to generate the table of contents.</p>
<ul>
<li><p>html_in - The HTML document for which you want to generate a table of contents.</p>
<p>This is the only necessary property to assign. The rest have default values that may meet your needs.</p></li>
<li><p>levels - A list of numbers corresponding to the heading levels you want to include in your TOC.</p>
<p>E.g. [3, 4] would include <code><h3></code> and <code><h4></code> headings.</p>
<p>Default is [3, 4].</p></li>
<li><p>id - The base id of the HTML table of contents to be generated.</p>
<p>Default is "toc".</p></li>
<li><p>title - The title of the generated table of contents.</p>
<p>Default is "Contents".</p></li>
</ul>
<h4>Methods</h4>
<ul>
<li><p>make_toc() - this generates the table of contents and populates the output properties. </p>
<p>Returns True when complete.</p></li>
</ul>
<h4>Output Properties</h4>
<p>After calling the make_toc() method, the following output properties are populated with values.</p>
<ul>
<li><p>html<em>out - The same as html</em>in except with the TOC anchors and numbering included in the headings.</p></li>
<li><p>html_toc - The generated HTML table of contents.</p></li>
<li><p>toc_list - A list of tuples containing the anchors and headings, in case you would rather roll your own HTML table of contents.</p></li>
</ul>
<p>That's it, really.</p>
<h3>History</h3>
<p>This library started out as one-off code to generate a table of contents for a long document that I'm converting from MS Word format over to HTML. </p>
<p>The existing Word document has had many contributors and editors over the years and the format is a shambles. The table of contents is a mess and the headings are all over the place. (Thanks, WYSIWYG.) </p>
<p>I converted the whole thing into <a href="http://daringfireball.net/projects/markdown/" title="John Gruber's Markdown">Markdown</a>, a simple plain-text formatting syntax that converts to clean, structural HTML. </p>
<p>I still wanted the final document to have a table of contents, but I didn't want to have to go through the bother of maintaining the thing - especially if I ever wanted to add a new section in the middle, which would require a re-numbering of all the subsequent sections and subsections.</p>
<p>I whipped up a simple parser that walked the document and dynamically generated a table of contents, plus anchors in the document so the section headings listed in the contents could straight down to the sections themselves.</p>
<h3>Need for Flexibility</h3>
<p>It worked, but the code was brittle. It required the HTML formatting to be very strict, e.g. the following would work:</p>
<pre><code><h3>Some subheading title</h3>
</code></pre>
<p>but the following would not work:</p>
<pre><code><h3>
Some subheading title
</h3>
</code></pre>
<p>Documents generated using <a href="http://code.google.com/p/python-markdown2/">python-markdown2</a> would work pretty consistently, but documents generated using, say, <a href="http://www.freewisdom.org/projects/python-markdown/">python-markdown</a> would not, since the latter produces messier HTML output. </p>
<p>Of course, with documents produced using other means, all bets were off.</p>
<p>Anyway, I decided that if this was going to be at all useful as a general-purpose tool, it needed to be more flexible and forgiving. Of course, if you're programming in python, parsing HTML and want to be flexible and forgiving, there's no better tool than the mighty <a href="http://www.crummy.com/software/BeautifulSoup/">BeautifulSoup</a> parsing library.</p>
<p>So I re-wrote it using BeautifulSoup.</p>
Ryan McGreal
2
http://quandyfactory.com/site/13/about
2010-01-11T12:00:00Z
About
<h3>Introduction</h3>
<p>My name is Ryan McGreal, and I live in Hamilton, Ontario, Canada with my family. I work as a web programmer, writer, editor and troublemaker, though it's the programming that pays the bills. </p>
<h4>Raise the Hammer</h4>
<p>My principal activity that <em>doesn't</em> pay the bills is my role as editor of <a href="http://raisethehammer.org">Raise the Hammer</a>, an online magazine dedicated to sustainable urban revitalization in Hamilton. I also work as the city editor at <a href="http://hmag.ca">H Magazine</a>.</p>
<h4>Hamilton Light Rail</h4>
<p>I am also a proud founding member of <a href="http://hamiltonlightrail.com">Hamilton Light Rail</a>, a community group dedicated to bringing light rail transit to Hamilton.</p>
<h4>Published Essays</h4>
<p>I have written several essays on urban issues that have been published in the <em>Hamilton Spectator</em> and elsewhere over the past five years.</p>
<h4>Contact</h4>
<p>You can reach me via email at <a href="mailto:ryan@quandyfactory.com">ryan@quandyfactory.com</a>.</p>
<h3>This Site</h3>
<p>This is my personal website, repository of essays and projects, and playground for new ideas. </p>
<p><em>Quandy</em> is a portmanteau of "Quick and Dirty", which can be a useful method of approaching problems. "Quick and dirty" has the benefit of being, well, quick, as well as flexible for those cases when initial requirements end up changing (i.e. just about every nontrivial project). </p>
<p>It suggests an iterative approach, on the reasoning that it's easier to build something simple and then make it better than it is to try and spring a fully-formed application from your forehead. </p>
<p>As John Gall famously stated: </p>
<blockquote>
<p>A complex system that works is invariably found to have evolved from a simple system that worked. The inverse proposition also appears to be true: A complex system designed from scratch never works and cannot be made to work.</p>
</blockquote>
<h3>Eye Quandy</h3>
<h4>Quandy Logo</h4>
<p>The red Quandy logo in the top left corner is courtesy of <strong>Trevor Shaw</strong>, a great local graphics designer and the creative director of <a href="http://www.getjuice.ca/">Juice Creative</a>.</p>
<h4>Footer Image</h4>
<p>The awesome cityscape panorama in the footer was taken by the talented photographer and amateur urbanist <strong>Aaron Segaert</strong>, and is used with permission.</p>
<h3>Interests</h3>
<p>In recent years I have been particularly interested in: the nature of city economies and urban development; the role of public participation and community engagement in creating and sustaining a healthy society; and ways to increase the openness, transparency and responsiveness of organizational governance and policy making.</p>
<h4>Conceptual Overlap</h4>
<p>I admit that my ideas about openness in government and policy making reflect my experience using and developing software: an open, information-sharing approach with peer review results in better results than a closed, proprietary approach based on blind trust.</p>
<h4>Jack of All Trades</h4>
<p>My interests take me all over the place, figuratively, from land use patterns and transportation modes to the global energy situation, geopolitics, social policy, economics and political economy, democratic structures and traditions, broad-based community organizing, local politics and current affairs, architecture, city life, ecology, sustainability, cognitive psychology, and more.</p>
<p>I don't claim expertise in any of these areas, but I am committed to studying the experts and following empirical best practices in these domains. </p>
<h4>Benefit from Shared Expertise</h4>
<p>The great thing about living in an open, knowledge-based culture is that you can benefit from the expertise of others. Once you establish the credibility of expertise, you can use it as a kind of knowledge <a href="http://en.wikipedia.org/wiki/Application_programming_interface">API</a> that allows you to take advantage of the expertise without necessarily knowing everything about the internals.</p>
<p>If not for this ability for non-experts to access expertise, there would be no way for the benefits of that expertise to disseminate into the broader society and inform our policy decisions.</p>
<h3>Programming</h3>
<p>I enjoy programming and have benefited immensely from the vast, rich ecosystem of free and open source software available to programmers today (see "Technical Notes", below).</p>
<h4>Great Time for OK Coders</h4>
<p>I know enough about great programmers and their remarkable contributions to understand that I am not a great programmer. Nevertheless, the rich ecosystem of programming languages, libraries, frameworks and tools means even a duffer like me can be creative and productive - and that's a <a href="http://c2.com/cgi-bin/wiki?GoodThing">Good Thing</a>.</p>
<h4>Productive Modern Languages</h4>
<p>One of the great things about modern programming languages is how highly expressive they are. You can create working code very efficiently, with a minimum of boilerplate. </p>
<p>That means it's easy to develop simple tools that do exactly what you want them to do and no more - and to do them quickly.</p>
<h4>Shared Open Source Software</h4>
<p>Recently I have begun releasing a few such handy tools under a free software / open source licence. You can find my shared resources hosted on <a href="http://github.com/quandyfactory">GitHub</a>. </p>
<p>The code isn't beautiful, but I'm <a href="http://code.google.com/events/io/sessions/MythGeniusProgrammer.html">no genius</a>.</p>
<h3>Website Development</h3>
<p>I do a bit of freelance web application development. Feel free to contact me via email at <a href="mailto:ryan@quandyfactory.com">ryan@quandyfactory.com</a> to inquire about services and rates.</p>
<h4>Outsourced Graphic Design</h4>
<p>I am not a graphic designer, and my own website design tends to the very minimal. However, I do have a good working relationship with a talented graphic designer who can design the layout and colour scheme to reflect your organization.</p>
<p>I am also happy to work with a design that you provide.</p>
<h3>Technical Notes</h3>
<p>There's no particularly good reason why I didn't simply use WordPress or Drupal or some other off-the-shelf blogging software for this site; except that I enjoy building things (also, PHP makes the baby Jesus cry). </p>
<p>Anyway, it's not like I built the site from scratch.</p>
<ul>
<li><p>It runs on an <a href="http://www.apache.org/">Apache</a> web server on a <a href="http://www.linux.org/">Linux</a> machine hosted by the awesome admins at <a href="http://webfaction.com?affiliate=hammertime">WebFaction</a>. </p></li>
<li><p>It is written in the <a href="http://python.org">Python programming language</a> and uses the lightweight <a href="http://webpy.org">web.py</a> application development framework. </p></li>
<li><p>Web.py talks to Apache via the <a href="http://code.google.com/p/modwsgi/">mod_wsgi</a> server module, which implements the standard <a href="http://www.wsgi.org/wsgi/">Web Services Gateway Interface (WSGI)</a> specification for Python applications to communicate with web servers.</p></li>
<li><p>The site stores its documents in a <a href="http://dev.mysql.com/">MySQL</a> database, to which it connects via the ingenious <a href="http://www.sqlalchemy.org/">SQLAlchemy</a> database toolkit and object-relational mapper (ORM).</p></li>
<li><p>Documents are saved in <a href="http://daringfireball.net/projects/markdown/">Markdown</a> syntax and converted to HTML for display using the <a href="http://code.google.com/p/python-markdown2/">python-markdown2</a> library (which is itself a re-implementation of the original <a href="http://www.freewisdom.org/projects/python-markdown/">python-markdown</a> library).</p></li>
<li><p>It also uses <a href="/projects/5/quandy">Quandy</a>, a library of handy classes and functions that I use frequently in writing web code. </p></li>
</ul>
<p>In other words, I'm sitting here on the shoulders of giants - and the view is grand!</p>
Ryan McGreal
2
http://quandyfactory.com/blog/39/the_virtue_of_forgiving_html_parsers
2010-01-08T12:00:00Z
The Virtue of Forgiving HTML Parsers
<p>Today Hacker News <a href="http://news.ycombinator.com/item?id=1039353" title="View-Source Is Good? Discuss. on Hacker News">featured</a> an essay by Alex Russell of the Dojo javscript toolkit in which he mused on <a href="http://alex.dojotoolkit.org/2010/01/view-source-is-good-discuss/" title="View-Source Is Good? Discuss.">the virtues of view-source</a> as they apply to the internet. </p>
<p>Since browsers render HTML, javascript and CSS from plain text, it's possible not only to see the finished product - a rendered web page - but also its underlying source code. </p>
<p>Every browser I've ever seen includes an option to <em>view source</em> - to see the underlying HTML markup, javascript code and style sheet formatting that the browser uses to render a web page.</p>
<p>Russell makes the important point that it was the ability to <em>view source</em> - especially in the early days of the internet - that made web content easily accessible to anyone who took the time to read the code and to experiment with it to discover how it works.</p>
<blockquote>
<p>View-source provides a powerful catalyst to creating a culture of shared learning and learning-by-doing, which in turn helps formulate a mental model of the relationship between input and output faster. Web developers get started by taking some code, pasting it into a file, loading it in a browser and switching between editor and browser between even the most minor changes. </p>
<p>This is a stark contrast with other types of development, notably those that impose a compilation step on development, in which the process of seeing what what done requires an intermediate action. </p>
<p>In other words, immediacy of output helps build an understanding of how the system will behave, and <code>ctrl-r</code> becomes a seductive and productive way for developers to accelerate their learning in the copy-paste-tweak loop. </p>
<p>The only required equipment is a text editor and a web browser, tools that are free and work together instantly. That is to say, there's no waiting between when you save the file to disk and when you can view the results. It's just a <code>ctrl-r</code> away. [paragraph breaks added]</p>
</blockquote>
<p>On reading this, a parallel observation occurred to me: <em>The power of view-source is multiplied when generous parsers forgive errors and render anyway.</em> </p>
<p>The value of a <em>big, broad</em> internet is far greater than the value of a <em>clean, pure</em> internet.</p>
<p>Most programmers will agree that programs or data streams with malformed syntax should fail, and fail fast. The worst thing that can happen with a software application - particularly a complex one - is for errors to pass silently while malformed data flows into data storage, only to produce gibberish - or worse, subtly wrong results - on export.</p>
<p>Passing a string into a function that expects an integer should throw an exception; passing a set of four values into a method that expects three parameters should raise a red flag; and so on.</p>
<p>By contast, the HTML parsers in every browser are extremely forgiving. If an HTML parser encounters an opening tag for an element that is missing a corresponding closing tag, it just accepts the missing close tag as implied and renders as if it was there. </p>
<p>Likewise, if the parser encounters a malformed or non-standard element, it will either try to render it somehow by guessing at the code's intention (naturally, different parsers approach such matters differently) or will just ignore it completely and move onto the next line.</p>
<p>As a result, even badly malformed HTML can still produce a readable web page.</p>
<p>Purists bristle at this, insisting that it makes the web a worse place by forcing browser makers to code for errors, encouraging sloppy coding practices, causing the same content to be rendered differently on different browsers, and so on.</p>
<p>They're missing the point. <em>The virtue of forgiving parsers is that they vastly increase the pool of people able and willing to create web content.</em></p>
<p>If you're already a programmer, HTML syntax is easy enough to understand and produce in valid form.</p>
<p>However, most people who create web content aren't programmers - particularly during the early days of the internet when it grew exponentially and established the virtuous cycle of positive network externalities that ultimately dragged more professional developers onto that platform.</p>
<p>Rather, they were amateur enthusiasts exploring a new technological domain. Thanks to HTML, view-source and forgiving parsers, the number of people who could create web pages was vastly higher than the number of people who could write computer programs.</p>
<p>It was the rapid democratization of HTML made possible by view-source and forgiving parsers that accounts for much of its success as a language - and of the success of the internet as a platform.</p>
<p>When an HTML parser finds code so bad that it can't render it, the parser just skips it and moves to the next line, in the manner of VB's <code>on error resume next</code> (programmers are welcome to cringe here). Contrast the stricture of XML parsers, which are obliged to fail on encountering malformed code and produce no output at all.</p>
<p>Since HTML rendering in response to an HTTP GET request is essentially <a href="http://en.wikipedia.org/wiki/Idempotence" title="Idempotence on Wikipedia">idempotent</a>, there's no real harm in continuing to parse code after encountering an error - but the positive network effects from this are huge.</p>
<p>If only programmers possessed the arcane ability to produce well-formed, valid markup, it would never have experienced the early growth that transformed it into the reigning standard of a huge and growing public network.</p>
<p>One more thing: for many people, HTML provided a gently sloping pathway into programming for people who might otherwise never have managed to overcome the steep barriers to entry of, say, C, with its verbose syntax and elaborate requirements. (Compiler? What's a compiler? And why does it hate me?)</p>
<p>Many programmers today (myself included) found their way into programming from a start in HTML, after bumping into the limitations of static content, exploring event handling in javascript, and then making the jump over to PHP or classic ASP and thence to SQL - and ultimately over to more modern languages with more robust, structured software design principles baked into them than the spaghetti code that powered a lot of early dynamic websites.</p>
<p>Again, there are some people who consider this to be a terrible thing - a watering down of an industry that <em>ought</em> to have high barriers to entry. Setting aside my own interest in the matter, I disagree. Getting more computing and networking capability directly into the hands of more people can only increase the rate of technical innovation by deploying more expressive power more widely.</p>
Ryan McGreal
2
http://quandyfactory.com/projects/32/stack_trace_for_hnshpy_on_proxy
2010-01-07T12:00:00Z
Stack Trace for hnsh.py on proxy
<h3>The Stack Trace</h3>
<p>Here's the original stack trace (Python 2.6 on a Windows XP machine behind a proxy server) for <a href="http://scottjackson.org/software/hnsh/">Hacker News Shell</a>.</p>
<pre><code>C:\Python26>python hnsh.py
Traceback (most recent call last):
File "hnsh.py", line 558, in <module>
hnsh = HackerNewsShell()
File "hnsh.py", line 314, in __init__
self.stories = self.h.getLatestStories(self.alreadyReadList)
File "hnsh.py", line 194, in getLatestStories
source = self.getSource("http://news.ycombinator.com")
File "hnsh.py", line 35, in getSource
f = urllib2.urlopen(url)
File "C:\Python26\lib\urllib2.py", line 124, in urlopen
return _opener.open(url, data, timeout)
File "C:\Python26\lib\urllib2.py", line 389, in open
response = self._open(req, data)
File "C:\Python26\lib\urllib2.py", line 407, in _open
'_open', req)
File "C:\Python26\lib\urllib2.py", line 367, in _call_chain
result = func(*args)
File "C:\Python26\lib\urllib2.py", line 1140, in http_open
return self.do_open(httplib.HTTPConnection, req)
File "C:\Python26\lib\urllib2.py", line 1115, in do_open
raise URLError(err)
urllib2.URLError: <urlopen error [Errno 11001] getaddrinfo failed>
</code></pre>
<h3>Update 1: Proxies With Urllib2</h3>
<p>After prompting the user for their proxy server, you'll probably want to do something like this:</p>
<pre><code># Example: proxies is a dict
proxies = { 'http': 'http://proxy.domain.com:80' }
proxy_support = urllib2.ProxyHandler(proxies)
opener = urllib2.build_opener(proxy_support)
urllib2.install_opener(opener)
urllib2.urlopen(url)
</code></pre>
<p>This is untested, but it should work more or less as written.</p>
<h3>Update 2: If No Proxy</h3>
<p>It may be helpful to note further that you can submit an empty <code>proxies</code> dict in the above code if the internet connection is direct:</p>
<pre><code>proxies = {}
</code></pre>
<p>This way, you can use the same code to connect to the URL whether or not the user is behind a proxy server.</p>
<h3>Update 3: hnsh.py on GitHub</h3>
<p>Scott Jackson, author of hnsh.py, moved the code into git and posted the repository on GitHub:</p>
<ul>
<li><a href="http://github.com/scottjacksonx/hnsh">http://github.com/scottjacksonx/hnsh</a></li>
</ul>
<p>Check this out from the <a href="http://github.com/scottjacksonx/hnsh/blob/master/hnsh.py">source code</a>:</p>
<blockquote>
<p>Special thanks to Ryan McGreal (http://github.com/quandyfactory) for the code that makes hnsh work from behind a proxy.</p>
</blockquote>
<p><em>flushes with pride</em></p>
Ryan McGreal
2