tag:quandyfactory.com,2010-7-27:/2010727 2010-7-27T12:00:00Z Quandy Factory Newsfeed - Blog Quandy Factory is the personal website of Ryan McGreal in Hamilton, Ontario, Canada.. http://quandyfactory.com/blog/69/how_to_write_a_blog_post 2010-07-26T12:00:00Z How to Write a Blog Post <p>The most basic guideline to writing a blog is this: <strong>Provide value for your readers.</strong> Valuable writing is personal, informative, instructional, revelatory, entertaining, and engaging. The object of your blog is not to promote your organization but to serve the interests of your readers. </p> <p>Don't think of it as an alternative channel for press releases or other marketing activities; people can see a sales pitch coming a mile away. Instead, think of it as a way to build relationships with readers by sharing your expertise in a way that helps them and encourages them to share your articles more widely and come back for more.</p> <h3>Topics</h3> <p>There's no predicting which topics will generate the most interest, but variety of topics and approaches will give your blog the broadest set of opportunities to engage readers. </p> <p>If you stick with it, blogging becomes a habit that integrates itself into your workflow and actually helps you by engaging readers to discuss, expand, scrutinize and clarify your own thoughts.</p> <ul> <li><p><strong>Relax.</strong> Think of blogging as a conversation, not an assignment. A blog entry is closer in tone to a personal correspondence than a term paper. Think about subjects, issues or events that you would enjoy discussing with your peers.</p></li> <li><p><strong>Write for yourself.</strong> Write about things that interest or challenge you. The single most-read article I've ever written is a technical guide on <a href="/blog/65/designing_a_restful_web_application">designing web applications</a>, which I wrote to clarify the underlying principles so I would understand them more fully.</p></li> <li><p><strong>Try to surprise.</strong> Most nonfiction writing is goal-oriented and persuasive, but the term "essay" comes from the French verb <em>essayer</em>, which means "to try". An <em>exploratory</em> essay is a formalized attempt to understand an issue rather than defend a conclusion. Exploratory essays often lead in <a href="http://www.paulgraham.com/essay.html">surprising</a> directions, which makes for a compelling read.</p></li> <li><p><strong>Solve a problem.</strong> Your day is filled with problems and challenges that are similar to those faced by other people. Share some of the innovative ways you have solved particular problems and invite readers to share theirs. Another popular blog entry I've written was a summary of lessons I've learned trying to deal with <a href="/blog/54/shared_awareness_a_better_way_to_manage_comment_trolls">comment trolls</a>.</p></li> <li><p><strong>Find and explore a niche.</strong> The internet is a big enough medium that even seemingly narrow subjects, if well-developed, can attract lively reader communities. This also provides opportunities to transfer ideas and practices from one niche to another.</p></li> <li><p><strong>Explain a statistic.</strong> Modern citizens are bombarded with statistics delivered out of the blue and out of context. You can humanize statistics by explaining what they mean in concrete terms and by sharing anecdotes that sketch the human faces behind the numbers.</p></li> <li><p><strong>Share your experience.</strong> Your value to your organization goes beyond a narrow job description and encompasses all the experiences, values and connections you bring with you to work. Tap into those so your writing reflects your broader competence.</p></li> <li><p><strong>Entertain.</strong> Good blog writing is like good conversation: refreshing, invigorating, and entertaining as well as informative. Feel free to have a bit of fun.</p></li> <li><p><strong>Revisit older posts.</strong> Over time, with new information and greater experience, our opinions often change and evolve. It's worth going back to older posts periodically to reassess them - what is still true, and what has changed. Obviously you won't be able to do this right away.</p></li> </ul> <h3>Writing and Tone</h3> <ul> <li><p><strong>Write in first person.</strong> You are writing as a <em>person</em> as well as on behalf of an organization. Allow your writing to reflect your personality and your personal voice. Don't be afraid to refer to yourself as "I" if you're writing about something you've done.</p></li> <li><p><strong>Use the active voice.</strong> Your blog is written <em>by</em> people, <em>about</em> people and <em>for</em> people. Don't drop the people out of your writing by relegating them to the tail-end of your sentences. Figure out who is the active agent in a sentence and move that person or entity forward into the subject.</p> <p><em>Passive</em>: "Ideas were gathered from community residents on how to use the vacant land." <br /> <em>Active</em>: "Community residents shared their ideas on how to use the vacant land."</p></li> <li><p><strong>Have an opinion.</strong> Readers know you are a human and expect you to have opinions. Good opinions are <em>reasonable</em> and <em>defensible</em> but <em>encourage discussion</em>. </p></li> <li><p><strong>Add value.</strong> Don't just re-post a link to something else. At the very least, add your own commentary that expands, clarifies, corrects or otherwise enhances the linked content.</p></li> <li><p><strong>Vary the length of your posts.</strong> Sometimes a short piece with a pithy observation or bit of commentary can stand on its own. Other issues deserve a more in-depth treatment. Both are welcome and keep the blog interesting.</p></li> <li><p><strong>Leave some room.</strong> Your article doesn't need to be exhaustive or anticipate every possible critique. A well-written article covers an important aspect of an issue but leaves room for readers to add to the conversation by expanding, clarifying, and challenging what you've written. </p></li> </ul> <h3>Editing</h3> <ul> <li><p><strong>Always check spelling and grammar.</strong> Typos and other flagrant errors look sloppy. Aim for the sweet spot that feels like conversational English - the terrain that lies between the strictures of formal writing and the morass of split infinitives and grocers' apostrophes.</p></li> <li><p><strong>Edit for clarity and brevity.</strong> Stephen King recently shared the simplest advice he ever received for revising: <em>Draft 2 equals draft 1 minus 10 percent.</em> He also advises, "Kill your darlings" - by which he means sentences that serve only to display their own cleverness.</p></li> <li><p><strong>Remove noise words.</strong> We all habitually pepper our writing with such tics as "to be sure", "in order to", "bottom line", "liaise", "going forward", "in the loop", "two cents", "for what it's worth", "in my [humble] opinion", "ducks in a row", "outside the box", "at the end of the day", "vis-a-vis", and a profusion of adverbs (e.g. "<em>very</em> unique") that communicate nothing but hyperbole. Get rid of them.</p></li> <li><p><strong>Define Jargon.</strong> Some business-speak is a necessary evil. If you have to use industry-specific terminology, write the terms and acronyms out in full and define them for readers.</p></li> </ul> <h3>Presentation</h3> <ul> <li><p><strong>Write a Punchy Title.</strong> Clear, descriptive, humorous titles catch the reader's attention and help search engines determine the article's content. Warning: don't go overboard. A hyperbolic title that oversells the content will leave readers feeling betrayed.</p></li> <li><p><strong>Keep paragraphs short.</strong> Big walls of text are hard to scan on a computer screen. Keep paragraphs short - between one and three sentences - and discrete in terms of their contents.</p></li> <li><p><strong>Include subtitles.</strong> Subtitles within an article, particularly a longer one, break up the text into manageable chunks.</p></li> <li><p><strong>Use appropriate layout techniques.</strong> If you want to draw attention to a key word or phrase, make the text <strong>bold</strong>. If you have a list of items, use lists (like the lists in this guide). Use bullet-point lists for your items unless they need to be in a specific order, in which case use numbered lists. </p></li> <li><p><strong>Summarize.</strong> After writing your article, write a one- or two-sentence summary and post the summary at the top, just beneath the title. Combined with a punchy title, a clear summary will help readers decide whether your article is worth reading.</p></li> </ul> Ryan McGreal 2 http://quandyfactory.com/blog/65/designing_a_restful_web_application 2010-06-25T12:00:00Z Designing a RESTful Web Application <h3>Introduction</h3> <p>I'm working on a couple of projects that involve building a web service, and I decided early on that because of our business constraints - having to communicate with a variety of different systems of varying levels of sophistication - it made sense to keep the web service as simple and accessible as possible.</p> <p>That pointed me toward designing a RESTful web service that transmits data in a simple format over straight HTTP. After all, just about any programming language imaginable can make an HTTP request. I also decided to go with JSON for the data format, in part because I've been <a href="/blog/50/couchdb_working_notes">experimenting lately with CouchDB</a> and appreciate both the simplicity and flexibility of JSON and the fact that you can find a JSON parser for any language.</p> <p>This blog entry is my attempt to get all the concepts of RESTful web service design straight. There's a good chance that some of this information is wrong; and if you notice something, please <a href="mailto:ryan@quandyfactory.com">let me know about it</a>. I'll investigate your argument and update the essay as applicable.</p> <p>With that in mind, here we go.</p> <h3>Representational State Transfer</h3> <p>Representational State Transfer, or REST, is a model for designing networked software systems based around clients and servers. In a RESTful system, a client makes a <strong>request</strong> for a <strong>resource</strong> on a server, and the server issues a <strong>response</strong> that includes a representation of the resource.</p> <p>The concept was formalized in 2000 by Roy Fielding, one of the architects of Hypertext Transfer Protocol (HTTP), about more which below, in his doctoral dissertation. It is not surprising, then, that REST and HTTP mesh very smoothly.</p> <p>A RESTful client-server system is <strong>stateless</strong>, meaning each request against the server contains all the information the server needs to process it; and <strong>cacheable</strong>, in that the server can specify whether and for how long resource representations can be cached either locally on the client or on intermediate servers between the client and the server.</p> <h3>Hypertext Transfer Protocol</h3> <p>Hypertext Transfer Protocol (HTTP) is a stateless protocol based on a client requesting a resource across a network and the server providing a response. As such, an HTTP transaction entails a <strong>request</strong> and a <strong>response</strong>. The request goes from the client to the server, and the response goes from the server back to the client.</p> <h4>HTTP Requests</h4> <p>An HTTP request has three parts:</p> <ol> <li><p>The <strong>request</strong> line, which includes the HTTP method (or "verb"), the URI, and the HTTP version. E.g.</p> <p>GET /article/1/ HTTP/1.1</p></li> <li><p>One or more optional <a href="http://en.wikipedia.org/wiki/List_of_HTTP_headers">HTTP headers</a>, which are key/value pairs that characterize the data being requested and/or provided.</p></li> <li><p>An optional <strong>message body</strong>, which is data being sent from the client to the server as part of the request.</p></li> </ol> <h4>HTTP Responses</h4> <p>An HTTP response also has three parts:</p> <ol> <li><p>The <a href="http://en.wikipedia.org/wiki/List_of_HTTP_status_codes">HTTP status code</a>, indicating the status of the requested URI, e.g.</p> <p>HTTP/1.1 200 OK</p></li> <li><p>One or more optional <strong>HTTP headers</strong>, which are key/value pairs that characterize the data being provided.</p></li> <li><p>An optional <strong>message body</strong>, which is the data being returned to the client in response to the request.</p></li> </ol> <h3>Resources</h3> <p>HTTP deals in <strong>resources</strong>. Each URI points to a resource on the server. Think of URIs as nouns, not verbs, with one URI for each resource. For example, if you want to add the ability to create an article, it might be tempting to create a URI called <code>/create_article</code>. This is wrong, because it conflates the object (the resource) and the action (creation). </p> <p>Instead, it makes more sense to have a resource called <code>/article</code> and a <strong>method</strong> that lets you create articles.</p> <h3>HTTP Methods</h3> <p>HTTP defines several methods, or "verbs", to execute on a resource: <code>HEAD</code>, <code>GET</code>, <code>POST</code>, <code>PUT</code>, <code>DELETE</code>, <code>TRACE</code>, <code>OPTIONS</code>, <code>CONNECT</code>, and <code>PATCH</code>. However, the following four are most commonly used in web services:</p> <h4>GET Method</h4> <p>To retrieve a resource, issue an HTTP <strong>GET</strong> request. GET requests have no side effects, meaning they do not produce any change in the resource. GET requests do not include a message body, but GET responses usually do.</p> <h4>POST Method</h4> <p>To submit data to be processed, issue an HTTP <strong>POST</strong> request. POST requests require a message body, i.e. the data to be processed. </p> <p>For example, if there is a resource called <code>/article</code> and you want to add a new article, issue a POST request to <code>/article</code> with the content. The server will create a new <em>subsidiary</em> URI under <code>/article</code> - for example, <code>/article/9001</code> - and assign that URI to the content you sent with your POST request.</p> <p>Important note: POST requests are <em>not</em> idempotent, meaning multiple POST requests will create multiple resources with unique identifiers.</p> <h4>PUT Method</h4> <p>To place content at an existing resource, issue an HTTP <strong>PUT</strong> request. For example, if there is a URI <code>/article/9001</code> and you want to replace the content served at that URI, issue a PUT request to that URI with the new content.</p> <p>Important note: PUT requests <em>are</em> idempotent, i.e. issuing 1 or 5 or 50 identical PUT requests will have the same effect on the resource. <strong>PUT</strong> requests must include a message body (the resource to be placed at the URL). </p> <h4>DELETE Method</h4> <p>To remove a resource (and remove its accompanying URI), issue an HTTP <strong>DELETE</strong> request. DELETE requests should be idempotent, i.e. issuing 1 or 5 or 50 identical DELETE requests will delete exactly one resource. DELETE requests do not require a message body.</p> <h4>Idempotence</h4> <p>This funny-looking word is crucial to designing an effective web service. A request is <strong>idempotent</strong> if issuing it more than once does not change the resource state beyond issuing it just once. Read that again if you have to.</p> <p>For example, a DELETE request is idempotent if the first request deletes a resource at a URI, and the second request does nothing because the resource at that URI is already deleted. </p> <p>For another example, a PUT request is idempotent if the first request creates a resource at a URI, and the second request creates the same resource at the same URI.</p> <p>A request is <em>not</em> idempotent if issuing it more than once <em>does</em> change the resource. For example, a POST request to add a comment to a document is not idempotent, if issuing the POST request twice adds the comment twice (so that the document contains two identical comments). </p> <h4>PUT vs. POST</h4> <p>My original understanding of HTTP methods was that you would use PUT to create a resource and POST to update it. This seems in keeping with common sense, but it breaks down when you apply the all-important filter of idempotence.</p> <p>An interesting discussion on Stack Overflow <a href="http://stackoverflow.com/questions/630453/put-vs-post-in-rest">tackles this issue</a>, but what convinced me to change my mental model was this observation:</p> <blockquote> <p>POST creates a child resource, so POST to <code>/items</code> creates a resources that lives under the <code>/items</code> resource. Eg. <code>/items/1</code>.</p> <p>PUT is for creating or updating something with a known URL. </p> </blockquote> <p>This collision between the inclination to regard PUT as creating and POST as updating is a significant source of confusion about how to well-design a RESTful system, and deserves more attention.</p> <p>Furthermore, it's tempting to assume that the HTTP verbs line up precisely with the SQL CRUD verbs, but while they're superficially similar, they're not identical. Treating them as such leads to this kind of gotcha. </p> <p>It's important to keep the logic of HTTP methods separate from the logic of SQL queries, and to develop specialized appropriate logic between the two domains that ensures the data processing on the server produces responses that satisfy the requirements of the HTTP methods (particularly in respect to idempotence).</p> <h3>Conceiving the Web Service: A Resource/Method Table</h3> <p>At a conceptual level, a RESTful web service API is a matrix of resources and methods that exposes the functionality of the service to third party applications. Below is an example of what that matrix might look like. </p> <p>Again, note well that actions are not mapped to URIs. A resource is an <em>object</em>, a <em>noun</em>, and the action inheres to the HTTP Verb, not to the URI. As a result, the same resource URI can serve different responses (corresponding with different actions) depending on the HTTP Verb.</p> <table> <caption>REST Resource/Action Matrix</caption> <thead> <tr> <th colspan="4">Request</th> <th rowspan="2">Server Action</th> <th rowspan="2">Response</th> <th rowspan="2">Idempotent</th> </tr> <tr> <th>Resource</th> <th>Parameters</th> <th>Method</th> <th>Data</th> </tr> </thead> <tbody> </tr> <tr> <td>/article</td> <td></td> <td>POST</td> <td>article details</td> <td>creates a new article</td> <td>returns confirmation and id</td> <td class="red">No</td> </tr> <tr> <td>/article</td> <td>/id</td> <td>GET</td> <td></td> <td>gets article details</td> <td>returns article details</td> <td class="green">Yes</td> </tr> <tr> <td>/article</td> <td>/id</td> <td>PUT</td> <td>new article details</td> <td>updates article details</td> <td>returns confirmation and updated article details</td> <td class="green">Yes</td> </tr> <tr> <td>/article</td> <td>/id</td> <td>DELETE</td> <td></td> <td>deletes an article </td> <td>returns confirmation of deleted article</td> <td class="green">Yes</td> </tr> </tbody> </table> <p>This is the RESTful way to organize a web service: URIs are objects and HTTP Verbs are actions performed on those objects.</p> <h3>HTTP Response Data Formats</h3> <p>Every web server is also a RESTful web service, accepting GET and POST requests to particular resources, and then performing actions and serving data in response.</p> <p>A conventional web server delivers its data in HTML format, with related Javascript (<code>text/jss</code>), CSS (<code>text/css</code>) and image files. HTML is an excellent format for marking up textual data for human use, but it has very limited expressive power for structuring data beyond simple documents.</p> <p>The most common formats used to transmit structured data across HTTP are <strong>XML</strong> and <strong>JSON</strong>, with an honourable mention for <strong>YAML</strong>.</p> <h4>XML</h4> <p><a href="http://www.w3.org/XML/">XML</a>, or eXtensible Markup Language, is a markup (i.e. tag) based syntax based on SGML for formatting structured, text-based data. XML is a format in which to create domain specific markup languages that define particular data structures. </p> <p>For example, <a href="http://en.wikipedia.org/wiki/RSS">RSS</a> and <a href="http://en.wikipedia.org/wiki/Atom_%28standard%29">Atom</a> are XML language standards defined to structure documents published to websites so that the documents can be 'syndicated' to feed readers and third party sites for display.</p> <p>Likewise, the default underlying structure of Microsoft Office documents since Office 2007 is an XML language called <a href="http://en.wikipedia.org/wiki/Office_Open_XML">OOXML</a>.</p> <p>XML structures data by defining elements, properties, data types and allowable nesting rules in an XML schema called a <a href="http://en.wikipedia.org/wiki/Document_Type_Definition">Document Type Definition</a>, or DTD. </p> <p>A given XML document specifies which DTD schema should define its structure with a <a href="http://en.wikipedia.org/wiki/Document_Type_Declaration">Document Type Declaration</a>, or DOCTYPE.</p> <p>A given XML document can reference multiple DTDs by using namespaces.</p> <p>Here is a sample XML file containing contact information about a person, taken from <a href="http://en.wikipedia.org/wiki/JSON">Wikipedia</a>.</p> <pre><code>&lt;?xml version="1.0" encoding="UTF-8"?&gt; &lt;Person&gt; &lt;firstName&gt;John&lt;/firstName&gt; &lt;lastName&gt;Smith&lt;/lastName&gt; &lt;age&gt;25&lt;/age&gt; &lt;address&gt; &lt;streetAddress&gt;21 2nd Street&lt;/streetAddress&gt; &lt;city&gt;New York&lt;/city&gt; &lt;state&gt;NY&lt;/state&gt; &lt;postalCode&gt;10021&lt;/postalCode&gt; &lt;/address&gt; &lt;phoneNumber type="home"&gt;212 555-1234&lt;/phoneNumber&gt; &lt;phoneNumber type="fax"&gt;646 555-4567&lt;/phoneNumber&gt; &lt;/Person&gt; </code></pre> <p>XML must be: </p> <ul> <li><p><strong>Well-formed</strong> - elements are properly nested, tags are properly closed and match case, special characters are escaped, and Unicode characters are encoded; and </p></li> <li><p><strong>Valid</strong> - its elements and attributes match the rules defined in the DTD.</p></li> </ul> <p>An XML Language called XSLT can be used to map XML documents into other markup languages, e.g. HTML.</p> <p>XML is in wide use in applications that transfer structured data over the internet.</p> <h4>JSON</h4> <p><a href="http://json.org/">JSON</a>, or "JavaScript Object Notation", is a lightweight data format introduced in 2001 by Douglas Crockford.</p> <p>Pronounced "Jason", JSON is based on JavaScript object literal notation, a syntax for creating objects in JavaScript by literally describing their properties and methods. Here is the JSON equivalent to the XML code in the previous section.</p> <pre><code>{ "firstName": "John", "lastName": "Smith", "age": 25, "address": { "streetAddress": "21 2nd Street", "city": "New York", "state": "NY", "postalCode": "10021" }, "phoneNumber": [ { "type": "home", "number": "212 555-1234" }, { "type": "fax", "number": "646 555-4567" } ] } </code></pre> <p>While the XML above contained 367 characters (not including indentation), the equivalent JSON contains only 272 characters - only three-quarters as large.</p> <p>JSON supports <strong>lists</strong> (ordered sets of values) and <strong>dictionaries</strong> (unordered collections of key/value pairs) with arbitrary nesting and various data types: <strong>number</strong>, <strong>string</strong>, <strong>boolean</strong>, <strong>list<em>, *</em>object</strong> and <strong>null</strong>. </p> <p>Like XML, JSON is also in wide use in applications that transfer structured data over the internet.</p> <p>A major advantage over XML is that the syntax is much simpler and less verbose, which makes it lighter across networks as well as more human-readable.</p> <p>Mature JSON parsers are available for a wide range of programming languages in addition to JavaScript. Python, for example, <a href="http://docs.python.org/library/json.html">includes a json parser</a> as part of its standard library (as of version 2.6; earlier versions can use the third-party <a href="http://pypi.python.org/pypi/simplejson/">simplejson</a> library). </p> <p>A decent JSON parser converts an object back and forth between the programming language's native data types and their JSON equivalents. That way, an application can receive a JSON object, convert it into a native object, process it natively, and then convert the final result back to JSON to be dispatched elsewhere.</p> <h4>YAML</h4> <p><a href="http://www.yaml.org/">YAML</a>, pronounced to rhyme with "camel", is a recursive acronym meaning "YAML Ain't Markup Language". Whereas JSON is a more minimal data format than XML, YAML takes minimalism to an extreme, eschewing quotation marks, brackets and curly braces altogether in favour of significant indentation and line breaks.</p> <p>The YAML equivalent to the XML and JSON contact examples above would be:</p> <pre><code>firstName: John lastName: Smith age: 25 address: streetAddress: 21 2nd Street city: New York state: NY postalCode: 10021 phoneNumber: - type: home number: 212 555-1234 - type: fax number: 646 555-4567 </code></pre> <p>That works out to just 239 characters - including the significant white space.</p> <hr /> <p><em>Special thanks to <a href="http://socialtech.ca/ade">Adrian Duyzer</a> for reading a draft of this essay and setting me straight on the respective roles of PUT and POST.</em></p> Ryan McGreal 2 http://quandyfactory.com/blog/59/ubuntu_1004_first_thoughts 2010-05-01T12:00:00Z Ubuntu 10.04 First Thoughts <p>Last night I upgraded my Acer Aspire One from Ubuntu 9.10 to 10.04 (I use the regular edition, not the netbook edition). 10.04 is a Long-Term Support (LTS) release, meaning Canonical promises to support the desktop edition for three years and the server edition (which comes without a GUI layer) for five years.</p> <p>It took all night to download the upgrade (watching the <em>time remaining</em> on the download status dialog brought me back to the halcyon days of the Windows 95 file-transfer time remaining status) but the upgrade itself was painless.</p> <p>So far, here's what I think:</p> <h3>Boot Time</h3> <p>This is the most immediate, obvious improvement in 10.04. True to their commitment, the Ubuntu developers have made big strides in shaving as much as they can off startup and shutdown times.</p> <p>Startup is visibly faster than 9.10, which didn't really improve much on 9.04. From the power button through the kernel selection to the login screen runs a cool 35-40 seconds. </p> <p>Add another six or seven seconds after logging in to bring up the desktop, and another two or three seconds to auto-connect to the local wifi network.</p> <p>One interesting note: 8.10 and 9.04 displayed long lists of command-line messages during bootup, while 9.10 eliminated that for a splash screen. Now I get command-line messages again, albeit only about eight lines and for only a second or two.</p> <h3>Shutdown</h3> <p>Shutdown is even nicer: six seconds flat. </p> <p>I have only two small gripes with the shutdown process:</p> <ol> <li><p>The indicator applet on the panel splits the user menu and shutdown menu into two separate dropdowns, but they still sit right beside each other as if they were a single item. It will take a little getting used to.</p></li> <li><p>The default behaviour for shutdown is to pop up a confirmation dialog, but like 9.10 you can't right-click on it to access preferences and turn off the dialog. The easiest way to fix this is to open up a terminal and start <code>gconf-editor</code>. (If you don't have it installed, fire up the Ubuntu Software Centre and install Configuration Editor.) Expand "Apps" in the tree menu on the left, select the "indicator-session" setting on the right side and check "suppress_logout_restart_shutdown" to turn off shutdown confirmations.</p></li> </ol> <h3>Appearance</h3> <p>It's not brown. Really, that's the main thing. Beyond that, if you've upgraded after already customizing the appearance of 9.10, most of your settings remain intact (with one notable exception, below).</p> <p>I tried out the new default Ambiance theme and didn't like it. It's just a little too purple and dreary for me. But as before, you can customize the hell out of your window styles via System -> Preferences -> Appearance.</p> <h3>Window Controls</h3> <p>In fairness, the window titlebar button positions are less bad than they were when Canonical first proposed moving them to the left side of the window chrome. In the first iteration, the buttons were in the order <code>[-][_][X]</code>, but in the release version they're in the order <code>[X][_][-]</code> with the close button on the very left. </p> <p>Yet I just don't see a benefit to moving them to the left side that justifies the aggravation of having to unlearn a habit that goes back nearly 20 years. After all, I originally chose Ubuntu in part because it seemed to be the easiest transition from a background in Windows.</p> <p>Luckily there's an easy way to fix it. Open a terminal and run <code>gconf-editor</code>. In the tree menu on the left click on Apps -> metacity -> general. Double-click on the "button_layout" setting on the right side and replace the text with: <code>menu:minimize,maximize,close</code> to restore the previous behaviour.</p> <p>A word of warning: if you go into System -> Preferences -> Appearance and change the theme, the window buttons will reset to the left side again.</p> <h3>Wifi</h3> <p>Ubuntu 8.10 and 9.04 were a bit painful to get wifi working on an Aspire One. The <em>only</em> thing that worked for me was to download madwifi and compile from source. Every time a major system update installed, I had to reinstall madwifi (<a href="https://help.ubuntu.com/community/CheckInstall">checkinstall</a> didn't work for me).</p> <p>This changed with 9.10, in which wifi Just Worked, and it still works in 10.04. In fact, it seems to connect considerably faster now - and the little connection icon in the NetworkManager applet looks prettier than the previous one. (With the icon in 9.10, I always had to double-check whether I was looking at the wifi or battery status.)</p> <h3>Ubuntu One</h3> <p>I decided to try out Ubuntu One, Canonical's cloud storage offering. Roughly equivalent to significant players (e.g. Dropbox), Ubuntu One offers 2 GB of cloud storage for free with the option to buy 50 GB for $10 per month. Also like Dropbox, you can share individual files and folders publicly.</p> <p>Note: you access Ubuntu One through your Me menu, not through System -> Administration or System -> Preferences.</p> <p>Setting up an account is easy enough, though the web interface felt slow and unstable. Actual syncing seems pretty smooth, though I haven't yet had a chance to try and modify files using different systems at the same time. More to come as I play with this.</p> <p>The syncing options seem fairly impressive. You can even install a Firefox addon to sync your bookmarks.</p> <p>Ubuntu One also comes with a music store provided by 7Digital. The selection is passable if nowhere near exhaustive, and the prices are not bad. I'd rather see songs in the $0.25-$0.50 range - for me, that's the break-even point against the opportunity cost of downloading free music - but we're moving in the right direction.</p> <p>The music store also provides MP3s, which will grate Free Software purists but serves the rest of us just fine.</p> <p>The store integrates closely with the Rhythmbox music player, which suffers most of the aggravations of a full-featured music player but at least can boast that it's not nearly as bloated or clumsy to use as iTunes.</p> <p>For iPod users who are sick of iTunes and/or itching to run Linux instead of Mac or Windows, you can actually use Rhythmbox to load and sync your iPod.</p> <h3>Firefox</h3> <p>There was a hullabaloo when Canonical announced that they were switching the default search engine on Ubuntu Firefox from Google to Yahoo, but they switched back before the final release. Hey, money talks when you're giving your product away for free.</p> <p>As for Firefox itself, Ubuntu 10.04 ships with version 3.6.3. The lag between Firefox releases and the version available through Ubuntu's package manager was a fairly long-standing aggravation, which led to workarounds like Ubuntuzilla to get newer versions working. </p> <p>I hope this is a sign that Canonical is committed to staying on top of new Firefox releases.</p> <h3>Problems</h3> <p>I only had two real software failures.</p> <p>Brasero flat-out stopped working, returning an "unknown error" on any attempt to burn a CD or DVD. It looks like a lot of people are having this problem. Sooner than try to fight my way to a fix, I just started using Gnomebaker instead.</p> <p>Gtk-Gnutella also stopped working. The version in the Lucid repository is two points behind the current version and throws an "ancient version detected" alert; it also fails to connect to the filesharing network. </p> <p>My attempts to compile the newest version from source (I tried both checkinstall and a straight compilation) produced an application that would crash seconds after opening. I finally gave up and switched to Frostwire.</p> <p>Otherwise, everything continued to work as expected.</p> <h3>Summary</h3> <p>There's certainly a lot more to explore before I've gotten to the bottom of this new edition, but so far I'm impressed. It's fast, solid and stable to use, the ratio of Just Works to Needs Hacking is approaching 1:0, the upgrade retained most of my previous settings without breaking anything, and several gripes I previously had with Ubuntu have been resolved. </p> <p>Aside from a few annoyances that are fairly easy to work around, I have no major complaints.</p> Ryan McGreal 2 http://quandyfactory.com/blog/54/shared_awareness:_a_better_way_to_manage_comment_trolls 2010-04-15T12:00:00Z Shared Awareness: A Better Way to Manage Comment Trolls <h3>Trolling</h3> <p>As everyone who has spent more than five minutes on the internet knows, in the absence of personal accountability, some people inevitably turn into assholes.</p> <p class="image"> <img src="/static/images/internet_dickwad_theory.gif" alt="John Gabriel's Greater Internet Dickwad Theory" title="John Gabriel's Greater Internet Dickwad Theory"><br> <a href="http://www.pennyarcademerch.com/pat070381.html"> John Gabriel's Greater Internet Dickwad Theory </a> </p> <p>Not many, mind you, but even one determined troll on an internet forum can be enough to bring the entire affair crashing down.</p> <p>The term "troll" comes originally from the fishing method of dangling a shiny lure out the back of a boat while putting along just quickly enough to grab the interest of a fish. The fish that can't resist the lure is snared on the hook and eaten.</p> <p>(Of course, the term also connotes ugly, hairy brutes who lurk under bridges waiting to snatch unwary passers-by.)</p> <p>Trolls posting anonymously on message boards absolutely <em>love</em> to provoke outrage. The troll's objective is not to change anyone's mind; rather, it's to write in such a way as to elicit a sequence of increasingly angry, frustrated replies that drag the discussion further and further off-topic until the original thread is in shambles.</p> <p>Crude trolls subsist on the meager attention given to their vulgarity; but sophisticated trolls can keep a debate going for days by dancing on the fine line between feigned reasonableness and deliberate obtusity. They are by far the more damaging to online discourse.</p> <p>What makes trolls disruptive is not the trollish comments themselves, but the chain of outraged replies they manage to elicit. </p> <p class="image"> <img src="/static/images/xkcd_duty_calls.png" alt="Duty Calls" title="What do you want me to do? LEAVE? Then they'll keep being wrong!"><br> <a href="http://xkcd.com/386/"> Duty Calls </a> </p> <p>Successfully managing trolls requires understanding why this happens and breaking the chain of replies.</p> <h3>Shared Awareness</h3> <p>Trolls attract replies by exploiting the general absence of <strong>shared awareness</strong> on most mediated communications platforms. Shared awareness is a special state of meta-knowledge among collaborators that allows them to act cooperatively on the knowledge.</p> <p>If I know something and you know something but I don't know that you know it and you don't know that I know it, we can't build on that knowledge even though we both know the same thing.</p> <p>Shared awareness is not reached until: I know something, you know something, I know that you know, you know that I know, I know that you know that I know, and you know that I know that you know. (Whew!)</p> <p>Once we all know that we all know the thing, we can then act on it. Without that shared awareness, our ability to put the information to use is sharply curtailed.</p> <p>Now, sometimes the absence of shared awareness is a blessing. There are many potentially socially awkward situations - say, someone in a room full of people passes wind - in which the absence of shared awareness of who farted means everyone can take a position of <a href="http://en.wikipedia.org/wiki/Plausible_deniability">plausible deniability</a>. </p> <p>Not only does the culprit avoid embarrassment, but also the other parties don't have to deal with the discomfort of having to address the fact that one of them farted. After all, most people don't like confrontations.</p> <h3>Someone is Wrong on the Internet!</h3> <p>In a conventional online comment system, several participants may each know something, but they do not necessarily have shared awareness about it (everyone knows that everyone knows).</p> <p>When a troll posts a comment, for example, other participants may decide independently that the comment is inappropriate and/or wrong and/or offensive, but they have no way of knowing whether anyone else feels the same way.</p> <p>A conscientious forum-goer could just ignore the troll and hope for the best, but <em>silence is affirmation</em>, and a comment left to stand unchallenged starts to look like it represents the prevailing view of the forum participants.</p> <p>Hence, the conscientious forum-goer feels no choice but to reply to the troll, if only to assure others that at least someone else disagrees with it. </p> <p>Now the fish is hooked, and a skillful troll can keep that fish on the line for a long time. </p> <h3>Collateral Damage</h3> <p>In the meantime, other fish get scared away by the thrashing. </p> <p>Pointless, unending debates between trolls and earnest opponents do two things:</p> <ol> <li>They disrupt and crowd out legitimate discussion; and</li> <li>They deter other potential commenters from getting involved, lest they be trolled as well.</li> </ol> <p>Eventually, the community starts to dissipate altogether. At this point, the troll - really, a parasite (if you'll forgive the switched metaphor) - abandons the dying host and sets off in search of a fresh victim.</p> <p>The important thing to remember is that the damage takes place not only because the troll posts a comment <em>but also because an earnest commenter decides to reply</em>. </p> <p>In turn, earnest commenters get into it with trolls because in the absence of shared awareness, they have no other way to know whether the community as a whole accepts or rejects the troll's arguments. </p> <p>Worst of all, the very dialogue with the troll, undertaken for the purpose of invalidating the troll's position, actually deters others from sharing their own positions and clearing up the matter.</p> <h3>Three Points of Attack</h3> <p>The three necessary conditions for a troll disrupting a forum are:</p> <ol> <li>An absence of shared awareness;</li> <li>A troll; and</li> <li>A well-meaning opponent to debate with the troll.</li> </ol> <p>It's safe to generalize that any online forum will have both trolls and well-meaning opponents willing to debate with them. Since most online forums also don't have shared awareness, that means all three necessary conditions are likely to be present and disruption is likely to follow.</p> <p>Since all three conditions are <em>necessary</em> for disruption to take place, we have three points of attack in trying to prevent it:</p> <ol> <li>Establish shared awareness;</li> <li>Block the trolls; or</li> <li>Block the well-meaning replies.</li> </ol> <p>Of the three options, #1 is the most promising. If you can establish a shared awareness that the troll is inappropriate - i.e. everyone knows that everyone knows that the troll is not representative - then well-meaning opponents no longer feel obliged to reply. </p> <p>In turn, since the objective of a troll is to provoke replies, a troll who can't get anyone to take the bait will eventually give up and move on to another forum.</p> <p>So how do we go about establishing shared awareness on a text-based communications forum?</p> <h3>Clear Understanding of Rules</h3> <p>The first step to shared awareness is for everyone on the forum to have a clear understanding of the rules and a clear set of expectations on now to behave and how others should behave. It's not enough just to say of trolling that <a href="http://en.wikipedia.org/wiki/I_know_it_when_I_see_it">we know it when we see it</a></p> <p>The rules should be simple, straightforward, fair, and agreeable to the members. Ideally, they should participate in determining what the rules will be. That way, the members have a sense of investment and ownership in the policies that govern their actions on the forum.</p> <p>However, the rules should probably include some variation on the following:</p> <ul> <li>Maintain a civil and polite tone, even when disagreeing with someone.</li> <li>Argue from evidence and cite your sources.</li> <li>Do not use rude or insulting language.</li> <li>Do not post comments that are needlessly inflammatory, seek to provoke an emotional reaction from others, are attempts to disrupt and derail the discussion, or abuse evidence and reasoning to defend an unjustifiable conclusion.</li> <li>Do not post comments that are off-topic.</li> </ul> <p>The important thing is that the participants feel invested in community standards that define them as civil, reasonable and responsible. Again, most people - even on the internet - like to think of themselves in these terms and actually value the personal exchanges they get to have on socially functional online forums.</p> <p>A pseudonymous screen name might be less personal than a real name, but it still inheres to a real personal identity, and people can still form relationships while communicating on the internet - <em>if</em> their community is successful in establishing and maintaining a shared awareness of their values.</p> <h3>Communicate Without Posting a Reply</h3> <p>By themselves, codes of conduct are only useful insofar as people see them being applied fairly and consistently. </p> <p>For inappropriate comments, there needs to be a mechanism for members of the community to <em>flag them as inappropriate</em> without actually posting replies; and that flag must be <em>visible to everyone</em> so that an individual looking at the comment can see how many other people have already flagged it.</p> <p>If you have used <a href="http://reddit.com">reddit</a>, <a href="http://digg.com">Digg</a>, or <a href="http://news.ycombinator.com">Hacker News</a>, you may have already have experienced such a mechanism at work. In these three link sharing sites, registered users can up-vote or down-vote both submitted articles and user comments. </p> <p>While each site works a little bit differently, a given article's or comment's score reflects some combination of the number of upvotes, the number of downvotes, the 'reputations' of the users who assigned upvotes and downvotes, and the amount of time that has passed since the article or comment was posted. </p> <p>Some sites introduce an additional constraint in that registered users do not gain the ability to downvote comments until after they have first attained a minimum threshold of reputation based on how other users have upvoted and downvoted <em>their</em> articles and comments.</p> <p>Hacker News also starts fading the text colour of comments with net negative scores. The benefit to this is that it sends a strong visual message about how the community feels about the comment.</p> <h3>Remove the Incentive</h3> <p>So what happens after you implement this form of community moderation? Earnest, well-meaning commenters can see for themselves that the community has rejected a troll and hence no longer feel obliged to set the record straight in a reply. </p> <p>Eventually, the troll gets bored at not being able to elicit any more replies and moves on.</p> <p>This is the key to successful community moderation: <strong>establishing shared awareness removes the incentive to reply to trolls, which in turn removes the incentive to troll</strong>.</p> <p>To borrow a phrase from <a href="http://en.wikiquote.org/wiki/John_Gilmore">John Gilmore</a>, the other forum members learn to treat the trolls as damage and route around them. Eventually the trolls give up.</p> <p>It might take a while to establish shared awareness, but the forum admins must be patient and remain steadfast in the face of what is likely to be frenetic opposition from the trolls before they finally quit. </p> <p>Have no doubt: the trolls will immediately understand what is happening, and will fight to undermine it with every rhetorical tool at their disposal, accusing you of censorship, tyranny, megalomania and anything else they can think of. They will fling a litany of abuse at you in the hope that some of it will stick.</p> <h3>Free and Voluntary</h3> <p>The best way to fend off these attacks is to ensure that the moderation is free, voluntary, and non-censoring. The following guidelines seem to work well:</p> <ol> <li><p>Regularly reinforce community standards by having the policy displayed prominently on the site and reminding users of their important role in maintaining those standards.</p></li> <li><p>Troll comments can visually represent the negative judgments of their peers - for example by fading in colour or shrinking in size in proportion to the negativity of their score - but they should not disappear completely. (Note: it's reasonable to make an exception for spam and just delete it.)</p></li> <li><p>Moderation ought to apply directly to comments, not to the users making them. People respond to incentives, and some trolls actually stop trolling and start participating responsibly when their trolls are voted down but their fair posts are voted up.</p></li> <li><p>Similarly, blacklisting trollish users generally doesn't work. As long as it's easy to create <a href="http://en.wikipedia.org/wiki/Sockpuppet_%28Internet%29">sockpuppet accounts</a> with throwaway email addresses, forums behave like badly applied wallpaper: when you push a bubble down, it just pops up somewhere else.</p></li> </ol> <h3>Censorship</h3> <p>Finally, a few words on censorship as an alternative method of blocking trolls:</p> <p>Censorship is a slippery slope. Deciding which comments to delete is ultimately a judgment call, and people who are emotionally invested in the success of a forum are notoriously poor at remaining objective and applying standards consistently. </p> <p>There's a risk that legitimate comments will get the axe. Even worse, there's a risk that an atmosphere of censorship will deter people from participating - the very problem that moderation was introduced to solve!</p> <p>Finally, censorship gives the trolls strong rhetorical ammunition to accuse you of heavy-handedness and elicit sympathy from other participants. The perceived lack of transparency over which comments are acceptable puts everyone into an oppositional stance, and the trolls will be watching carefully for examples of hypocrisy.</p> <p>Both philosophically and practically, censorship is at least as harmful as its targets. It depends naively on the consistent objectivity and benevolence of the moderators, introduces a <a href="http://en.wikipedia.org/wiki/Chilling_effect_%28term%29">chilling effect</a> on participants who may hold legitimate beliefs that break with the majority, and arms trolls with a credible claim that their rights are being violated.</p> Ryan McGreal 2 http://quandyfactory.com/blog/50/couchdb_working_notes 2010-04-09T12:00:00Z CouchDB Working Notes <p>Note: this article is very much a work in progress. It functions mainly as a place where I can document what I learn about CouchDB as I play around with it and get a progressively better sense of what it does and how it works.</p> <h3>Summary</h3> <p>CouchDB is a non-relational database server built in Erlang that stores its data as JSON and communicates RESTfully over HTTP.</p> <h3>Install CouchDB</h3> <p>First, you need to install CouchDB. If you're using a civilized operating system with a package manager, this won't be difficult.</p> <pre><code>ryan@home ~$ sudo apt-get install CouchDB </code></pre> <p>The package manager will install CouchDB and all its dependencies. The installation script will finish with this hopeful line:</p> <pre><code> * Starting database server CouchDB </code></pre> <p>That's it. Couch is up and running.</p> <h3>RESTful API</h3> <p>The first thing to know about CouchDB is that it that you connect to it over HTTP using the standard HTTP "verbs" like GET, PUT, POST and DELETE - the very same protocol that browsers use to connect to web servers. </p> <p>This is important to understand: you don't need database drivers or clients or anything else to interact with CouchDB. You just need to be able to send an HTTP request to the CouchDB server and receive the response.</p> <p>You can prove this by opening your browser and navigating to <a href="http://localhost:5984">http://localhost:5984</a>. It should return something like this:</p> <pre><code>{"CouchDB": "Welcome", "version": "0.10.0"} </code></pre> <p>So far, so good. But what are you looking at?</p> <h3>JSON</h3> <p>If you've built any Ajax web apps in the past few years, the output from the CouchDB web server should be familiar to you. It's <a href="http://json.org/">JSON</a>, or JavaScript Object Notation, the lightweight data format introduced by Douglas Crockford. </p> <p>(BTW if you write JavaScript but haven't read Crockford's book <cite><a href="http://www.amazon.ca/exec/obidos/ASIN/0596517742/wrrrldwideweb">JavaScript: The Good Parts</a></cite>, I highly recommend it. Chances are you're Doing It Wrong today.)</p> <p>JSON is based on JavaScript object literal notation, a method of creating objects in JavaScript by literally describing their properties and methods:</p> <pre><code>myObject = { name: "Ryan McGreal", age: 36, emails: ["ryan@quandyfactory.com", "editor@raisethehammer.org"], doSomething: function() { alert("This is a useful method!"); } } </code></pre> <p>JSON supports arrays and dictionaries with nesting and various data types (number, string, boolean, array, object and null). It also supports schemas. </p> <p>A major advantage over XML is that the syntax is much simpler and less verbose, which makes it lighter across networks as well as more human-readable.</p> <p>While it's possible to evaluate JSON as JavaScript using <code>eval()</code>, it's not recommended (for what I hope are obvious reasons). It's much better to use a JSON parser. </p> <p>The good news is that there are already JSON parsers for wide range of programming languages in addition to JavaScript. Python, for example, <a href="http://docs.python.org/library/json.html">includes the <code>json</code> module</a> as part of the standard library (as of version 2.6). There are also libraries for PHP, Ruby, Perl, Java, Scala, Erlang, Common Lisp, and Clojure, among others.</p> <h3>Create a CouchDB Database</h3> <p>Given that you connect to CouchDB over HTTP, it should not surprise you to learn that you execute actions against CouchDB using the HTTP request method 'verbs': GET, PUT, POST and DELETE. </p> <p>As a result, any programming language that can communicate over HTTP can connect to a CouchDB server.</p> <p>Let's create a new database. For simplicity's sake I'll connect to the CouchDB server on the command line using <a href="http://curl.haxx.se/">curl</a>. The -X command line argument specifies which HTTP request method to use.</p> <pre><code>ryan@home ~$ curl -X PUT http://localhost:5984/mydb </code></pre> <p>CouchDB returns the following response (in JSON format, or course):</p> <pre><code>{"ok": true} </code></pre> <p>You now have a CouchDB database called <code>mydb</code>. Prove it by running a GET request against the URL for your database:</p> <pre><code>ryan@home ~$ curl -X GET http://localhost:5984/mydb </code></pre> <p>Check it out: your database has a URL!</p> <p>CouchDB will return a summary of the database properties (I've added whitespace to the CouchDB JSON responses to aid readability):</p> <pre><code>{ "db_name": "mydb", "doc_count": 0, "doc_del_count": 0, "update_seq": 0, "purge_seq": 0, "compact_running": false, "disk_size": 79, "instance_start_time": "1276950495049303", "dis_format_version": 4 } </code></pre> <p>Notice the <code>doc_count</code> property of 0. We should fix that by adding a document. </p> <h3>Add a Document</h3> <p>You add documents to a database using the HTTP POST method (the same method HTML forms use to send form data to the server).</p> <pre><code>ryan@home ~$ curl -X POST http://localhost:5984/mydb \ --header 'Content-Type: application/json' \ --data '{"title": "My First CouchDB Document", "author": "Ryan McGreal", \ "date_posted": "2010-04-09 11:38:26", "section": "blog", \ "content": "This is my first-ever CouchDB document. Niiice." }' </code></pre> <p>CouchDB returns a response like the following:</p> <pre><code>{ "ok": true, "id": "bf58234234aed2389dcb23423", "rev": "1-4ac239847fea987bd2340" } </code></pre> <p>Let's take a look at what just happened. </p> <p>We executed an HTTP POST request with a Content-Type header of "application/json" (since we're sending the data in JSON format) and a JSON object for the posted data. The JSON object includes the following keys and values:</p> <ul> <li><strong>title</strong>: My First CouchDB Document</li> <li><strong>author</strong>: Ryan McGreal</li> <li><strong>date_posted</strong>: 2010-04-09 11:38:26</li> <li><strong>section</strong>: blog</li> <li><strong>content</strong>: This is my first-ever CouchDB document. Niiice. </li> </ul> <p>CouchDB accepted the JSON object, parsed it to ensure that it's valid JSON, and then created a document in the mydb database with the content you provided. </p> <p>It also issued the document with an ID and a revision ID, which it included in the response so that you can access the document later. </p> <p>The revision ID is additionally beneficial: you've got version control built into your database. CouchDB also serves revision numbers as HTTP ETags, so you can take advantage of caching.</p> <h3>View a Document</h3> <p>Let's take a look at the document we just created.</p> <pre><code>ryan@home ~$ curl -X GET http://localhost:5984/mydb/bf58234234aed2389dcb23423 </code></pre> <p>Your document has a URL, too!</p> <p>CouchDB returns the following response:</p> <pre><code>{ "ok": true, "id": "bf58234234aed2389dcb23423", "rev": "1-4ac239847fea987bd2340", "title": "My First CouchDB Document", "author": "Ryan McGreal", "date_posted": "2010-04-09 11:38:26", "section": "blog", "content": "This is my first-ever CouchDB document. Niiice." } </code></pre> <p>There it is. You can access its properties and values just like you would a recordset. </p> <p>The difference is that the CouchDB data is formatted in JSON and can be nested - i.e. an object's property could be an array or another object with its own keys and values - rather than forced into a flat table.</p> <h3>Add Another Document</h3> <p>Let's try adding a document with nested data:</p> <pre><code>ryan@home ~$ curl -X POST http://localhost:5984/mydb \ --header 'Content-Type: application/json' \ --data '{"title": "Some Handy Links", "author": "Ryan McGreal", \ "links": ["http://quandyfactory.com", "http://raisethehammer.org", \ "http://hamilton350.com"]}' </code></pre> <p>CouchDB responds:</p> <pre><code>{ "ok": true, "id": "e6ed2389dcb26100caeg3423", "rev": "1-7c21a0fd39fea987bd2340" } </code></pre> <p>Now we have two documents in our database with different sets of fields - you're not forced to map your data into a standardized set of properties. </p> <p>If we tried to do this in a traditional RDBMS, our documents table would have to have a <code>links</code> column with a Null value in the first document, as well as a <code>content</code> column with a Null value in the second document.</p> <p>With CouchDB, you can give your documents arbitrary fields and values depending on what's applicable for each document.</p> <h3>Update a Document</h3> <p>[Coming Soon]</p> <h3>Data Views</h3> <p>[Coming Soon]</p> <h3>Misc.</h3> <p>Let's create another database.</p> <pre><code>ryan@home ~$ curl -X PUT http://localhost:5984/mydb </code></pre> <p>Wait a minute; didn't we just create a database called <code>mydb</code>?</p> <p>CouchDB returns this response:</p> <pre><code>{ "error": "file_exists", "reason": "The database could not be created, the file already exists." } </code></pre> <p>Whew, dodged a bullet there. CouchDB doesn't let you accidentally overwrite an existing database.</p> <p>At any time you can get an array of all the databases on your server with the following GET request:</p> <pre><code>ryan@home ~$ curl -X GET http://localhost:5984/_all_dbs ["mydb"] </code></pre> <p>There's plenty more, but this should be enough to get you started thinking about how you might use CouchDB in projects whose data models don't necessarily lend themselves to strict relational structures.</p> <h3>Utils Web Interface</h3> <p>One last thing: CouchDB also provides a web interface with some handy utilities for navigating around the server and its databases. </p> <p>Just load <a href="http://localhost:5984/_utils">http://localhost:5984/_utils</a> in your browser to see it.</p> Ryan McGreal 2 http://quandyfactory.com/blog/46/mpaa_and_piracy 2010-02-26T12:00:00Z MPAA and Piracy <p>When I was a kid, I had a <em>Fat Albert</em> book in which one of the Cosby Kids (for details of who did what I'm going on memory, but I think it was Bucky) gets upset about something and runs away. </p> <p>The other kids all search the projects for him, but evening is coming on and it's getting harder to see.</p> <p>At one point, Weird Harold is walking from streetlamp to streetlamp, peering into the pool of light under each pool. Rudy walks up and asks Harold what he's doing. Harold says he's looking for Bucky.</p> <p><em>But why are you looking under the streetlamps?</em> asks Rudy.</p> <p>Harold answers, <em>That's the only place I can see.</em></p> <p>Which brings us to:</p> <p><img src="http://www.geek.com/wp-content/uploads/2010/02/piratedvd.jpg" alt="What pirates get vs. what paying customers get" /></p> <p>(<a href="http://www.geek.com/wp-content/uploads/2010/02/piratedvd.jpg">source</a>)</p> Ryan McGreal 2 http://quandyfactory.com/blog/43/review:_mark_pilgrim's_dive_into_python_3 2010-02-08T12:00:00Z Review: Mark Pilgrim's Dive Into Python 3 <h3>TL;DR Summary</h3> <p>If you're an experienced programmer, want to learn Python 3, and don't have a lot of time to waste, skip this review and just go straight to Mark Pilgrim's <em><a href="http://diveintopython3.org">Dive Into Python 3</a></em>.</p> <p>DIP3 starts with practical, working code, takes it apart piece by piece, puts it back together, and leaves you with a solid understanding of the concepts and their applications.</p> <p>DIP3 is <em>opinionated</em> but well-informed. It hammers home the important stuff and skips the blather. It's also very well written: terse, witty, and expressive (like Python itself).</p> <h3>Introduction</h3> <p>Mark Pilgrim's <em>Dive Into Python 3</em> came out late last year, but I didn't receive a review copy until late January - complete with autograph and a friendly note from the author apologizing for the delay. (Disclosure: I felt warm and fuzzy after receiving the personal note. Also, the review copy I received was free.)</p> <p>If you're familiar with <a href="http://diveintomark.org">Mark Pilgrim</a>, you'll know that he's an opinionated writer. DIP3 is no exception; this is Python 3 the way Mark Pilgrim wants you to understand it. </p> <p>The good news is that Pilgrim is a <em>reliable narrator</em>. That is, he really knows his stuff; and his opinions, while strong, are deeply knowledgeable, <a href="http://diveintomark.org/archives/2009/10/19/the-point">ethically and philosophically consistent</a>, and shared in a spirit of cooperation and stewardship.</p> <h3>Purpose and Relation to Dive Into Python</h3> <p>Comparisons to Pilgrim's original Python book, <em><a href="http://diveintopython.org/">Dive Into Python</a></em> (also published by Apress), are inevitable. The good news is that this edition improves on the original in most of the ways that matter. The book design is more elegant and stylish, with better use of white space and contrast, cleaner fonts (with one notable exception, about which more below), and clearer layout.</p> <p>The book is written primarily for experienced programmers coming to Python for the first time, but Pilgrim recognizes that a lot of its readers will have a background in Python 2. </p> <p>Since Python 3 breaks compatibility with the previous trajectory of versions, the book contains specially formatted bullet notes at points where version 3 breaks with 2.x; for instance, the merging of <code>int</code> and <code>long</code> datatypes, or the fact that the <code>/</code> division operator now triggers floating point division by default, rather than integer division.</p> <p>By focusing mainly on Python 3 itself and then highlighting the diffs from Python 2, Pilgrim generally gets the best of both worlds. </p> <p>Still, there are a few ways in which the book isn't entirely sure whether it wants to be a resource for programmers learning Python for the first time, or for Python 2 programmers who want to upgrade. It would be more helpful for experienced Python 2 programmers if Pilgrim noted the diffs consistently rather than sporadically.</p> <p>Another significant change from DIP is the way it uses programs to introduce the language. In the original book. the introductory program was an 11 line script to build an ODBC connection string from a dictionary. It demonstrated function declaration, datatypes, everything-is-an-object, code blocks (and significant whitespace), and the <code>if __name__ == '__main__'</code> trick.</p> <p>The next chapter covered dictionaries, lists, tuples, declaring and returning variables, string formatting (old style, with <code>% ()</code> notation), and basic list comprehensions. The program was dissected exhaustively over two chapters and concluded with an explicit acknowledgment:</p> <blockquote> <p>The <code>odbchelper.py</code> program and its output should now make perfect sense. </p> </blockquote> <p>In DIP3, by contrast, the introductory program spans four chapters - and there's never a discrete <em>aha</em> moment where you realize that you understand the program fully. </p> <p>Still, overall the book is a commendable accomplishment: a concise, accessible, and above all <em>fun</em> introduction to a language also known for concision, accessibility and fun.</p> <h3>Why Python 3?</h3> <p>Python 3 has not yet enjoyed wide adoption among programmers - existing Python programmers or otherwise - and part of the reason is that the language is caught in the chicken-egg problem whereby the lack of ported libraries makes it less appealing for coders, while the lack of existing coders makes it less compelling for developers to port their libraries.</p> <p>My hunch is that <em>Dive Into Python 3</em> will carry us closer to the tipping point in which coders and library supporters start to upgrade <em>en masse</em>. </p> <p>For one thing, Python 2 programmers will come away from this book with a much clearer understanding of how many problems and pain points Python 3 solves. In addition, version 3 sweetens the deal with powerful new data structures that already make me envious when I go back to work on projects written in 2.x.</p> <h3>Diving In</h3> <p>If you've read programming books before, you'll know that the general format is to start with lots of theoretical background and history, then introduce the syntax, data types and common code blocks; and finally, after a few chapters, start putting together a program.</p> <p>That was my original experience trying to learn Python a few years ago by reading through <em><a href="http://www.greenteapress.com/thinkpython/thinkCSpy/">How to Think Like a Computer Scientist: Learning with Python</a></em> by Allen Downey, Jeff Elkner and Chris Meyers. </p> <p>It wasn't until I read Mark Pilgrim's original <em>Dive Into Python</em> that I really started to get an internalized sense of the language. He raised 'learn by doing' to the level of an art form, and it clicked with me in an immediate and sustained way.</p> <p>It becomes clear that <em>Dive Into Python 3</em> follows the same aggressively practical, hands-on approach as soon as you read the opening line of the introduction:</p> <blockquote> <p>Welcome to Python 3. Let's dive in.</p> </blockquote> <p>The introduction covers installing Python 3, and it's a significant improvement on the equivalent chapter in the original DIP. If you run Windows, Mac OSX or Debian/Ubuntu, the book takes you step by step through the installation process. (If you run a more exotic operating system, you can probably figure out how to install Python all by yourself.)</p> <h3>Basics: Functions, Datatypes and Comprehensions</h3> <p><strong>Chapter 1</strong> covers function declaration and arguments, doc strings, <code>sys.path</code>, objects (and the oft-repeated fact that everything, in Python, is an object), denoting code blocks through indentation (i.e. significant whitespace), exceptions, variable declarations, and the <code>if __name__ == '__main__'</code> trick. </p> <p>These are the absolute basics, and it's noteworthy that Pilgrim includes an introduction to exceptions. He's an opinionated programmer, and he wants you to understand that exception handling <em>is</em> fundamental to good Python code, not an exotic extra to be mentioned in passing once you're already proficient.</p> <p><strong>Chapter 2</strong> dives into Python's datatypes: booleans and boolean contexts, numbers, type coercion, operations, fractions, lists, tuples (immutable lists), dictionaries, sets (new in Python 3, sets are unordered lists with a syntax similar to dictionaries), and the special <code>None</code> type.</p> <p><strong>Chapter 3</strong> introduces comprehensions, a delightful set of syntactic sweeteners that allow you to map and filter an iterable collection using a terse one-liner:</p> <pre><code>newlist = [func(item) for item in oldlist if test(item)] </code></pre> <p>New in version 3, Python adds dictionary and set comprehensions to the list comprehensions it already supported.</p> <p>Aside: this is one of the places where Pilgrim <em>doesn't</em> specifically mention the difference between Python 2 and 3. I found myself going back to the Python 2 documentation for a sanity check just to make sure these structures weren't there and I just somehow missed them.</p> <h3>Bytes vs. Characters</h3> <p><strong>Chapter 4</strong> breaks form. Instead of diving straight into code, Pilgrim opens with three pages of exposition - the only such indulgence in the book, and with good reason. Possibly the most beautiful and simultaneously despairing chapter in the book - indeed, this achieves a level of pathos befitting a novel, let alone a technical manual - Pilgrim recounts the tragedy of text on an international data network. And it <em>is</em> a tragedy. </p> <p>In what might be the most significant break from verson 2, Python 3 consistently and explicitly treats strings as <strong>streams of bytes</strong>, not streams of characters.</p> <p>Because a string is a stream of bytes, it must be <strong>encoded</strong> using a string encoding that maps the bytes to specific characters.</p> <p>In Python before version 3, the default encoding was <a href="http://en.wikipedia.org/wiki/ASCII">ASCII</a>, the American Standard Code for Information Interchange, a seven-bit encoding that handles all conventional English characters - the lowercase and uppercase letters, numerals, and punctuation symbols - plus various control characters including tabs, spaces, newlines and so on.</p> <p>Of course, not everyone speaks or writes English, and many other languages use additional characters (like the French <em>e-accent-aigu</em> &#233; or the German <em>eszett</em> &szlig;) or even collections of characters with little or no overlap to English (like Japanese Kanji). For these languages, ASCII is wholly inadequate, and different languages have independently developed various encoding systems that often map the same character to different bytes.</p> <p>The potential for chaos is huge, and indeed a number of incompatible encodings have already caused plenty of grief for people trying to process text on computers. </p> <p>We've all seen web pages with jumbles of nonsense characters where you would expect, say, quotation marks to go. This is caused by a mismatch between the encoding used to produce the text - say, <a href="http://en.wikipedia.org/wiki/Windows-1252">ANSI CP-1252</a> in Microsoft Word) and the encoding used to render the text later (say, <a href="http://en.wikipedia.org/wiki/ISO/IEC_8859-1">ISO/IEC 8859-1</a>).</p> <p>Pilgrim sets the scene eloquently and then, with a seemingly innocuous "Enter Unicode", leads us on a harrowing emotional tennis match, swinging back and forth between optimistic proposed solutions - let's agree to put every single character into one big encoding! - and new problems - each character now takes up 4 bytes! - to follow-up solutions - use agreed-upon subsets! - to still more problems.</p> <blockquote> <p>Now cry a lot because everything you thought you knew about strings is wrong, and there ain't no such thing as plain text.</p> </blockquote> <p>Pilgrim doesn't offer simple solutions to the problems he raises, because they don't <em>have</em> simple solutions. Character encoding may be the Great Granddaddy of more-or-less insoluble technical problems. They can't be solved, as such, but if you understand the problems well enough they can be addressed more or less safely.</p> <p>But as Pilgrim points out in regards to the necessary shortcuts that programs take in mapping out solutions:</p> <blockquote> <p>[It's] a good assumption right up until the moment that it's not.</p> </blockquote> <p>String processing is further complicated by the matter of big-endian vs. little-endian byte ordering - which is only a problem when people try to share files from different computers, "perhaps on a worldwide web of some sort".</p> <p>With all this in mind, you can't help but conclude, despairingly: <em>Character encoding is hard!</em></p> <p>But rather than just giving up and going shopping, Pilgrim leads you through the maelstrom and delivers you - shaken but intact - on the mostly-safe harbour of UTF-8. Not that UTF-8 isn't also beset with traps and gotchas.</p> <p>Chapter 4 demonstrates formatting strings (using Python's powerful new string formatting syntax, first introduces in version 2.6), common string methods, string slicing (a string, after all, is a list of <del>characters</del> <ins>bytes</ins>), the difference between strings and bytes (lots of gotchas exposed here), and encoding of source code (Python 2 files were ASCII by default, whereas Python 3 files are UTF-8 by default).</p> <h3>Digression on the Book's Text Formatting</h3> <p>This is as good a place as any to mention a formatting bug in this edition: the peppering of the text with artifacts - hollow vertically aligned rectangles - in place of special characters like em-dashes. For example:</p> <blockquote> <p>The first line imports the <code>humansize</code> program as a module&#9647;a chunk of code that you use interactively or from a larger Python program.</p> </blockquote> <p>Given the attention Pilgrim gives to character encoding (I still remember the <a href="http://www.reddit.com/r/Python/comments/8mc40/dive_into_python_3_everything_you_thought_you/">reddit thread</a> on which he posted an early draft of that section for review and feedback - some of it unbearably pedantic), it seems bizarre that his own book would be bitten by an encoding gotcha! </p> <p>I contacted Pilgrim to ask if he knew what happened but did not receive a response in time for publication. It may be related to the fact that he had to <a href="http://diveintomark.org/archives/2009/03/27/dive-into-history-2009-edition">convert the book from HTML to MS Word</a> before submitting it to the publisher.</p> <h3>Regular Expressions</h3> <p><strong>Chapter 5</strong> delves into regular expressions, a powerful and pragmatic DSL for solving real-world data extraction problems. Yet in what is turning into a common theme, regex is complicated and contains some gnarly gotchas and edge cases.</p> <p>Pilgrim takes us through real-world use scenarios in much the same way that real workaday programmers approach them: by iterating through a sequence of progressive changes and enhancements, until we're satisfied that the solution is <a href="http://en.wikipedia.org/wiki/Principle_of_good_enough">Good Enough</a>.</p> <p>One of the nicest touches of Python's support for regex is <em>verbose mode</em>, which ignores whitespace and allows inline comments so that the regex still makes sense when you have to look at it again in six months. Unless you're parsing text on a daily basis and know regex inside and out, this <em>will</em> be helpful to you.</p> <h3>Iterating Through Iteration</h3> <p><strong>Chapter 6</strong> tackles closures and generators. While Python is not as purely functional as a Lisp or even Ruby, it still provides some powerful functional tools. Pilgrim takes a thorny problem - matching pluralization rules for a variety of English nouns - and combines regular expressions with generators (functions that yield their values lazily and iteratively) and meta-functions (functions that take functions as arguments and return functions).</p> <p>Pilgrim combines these tools to build generators that act as closures and yield functions based on regex pattern matching. It's heady stuff, but by the time he's finished taking it apart and putting it back together, you can't help but understand it. Along the way, we learn about files and file-like objects, and the powerful <code>with</code> context (about which more below).</p> <p><strong>Chapter 7</strong> dives deeper, tackling the same problem using classes and iterators. Everything in Python is an object, with properties and methods. In this chapter, Pilgrim explains who you can define your own classes and instantiate them as objects - this case, an iterator class.</p> <p>He covers Python's special reserved word <code>pass</code>, which does nothing but is necessary for those occasions when Python syntactically requires something - what Pilgrim calls "a Python reserved word that just means, 'move along, nothing to see here'."</p> <p>He covers Python's special <code>__init__()</code> method, distinguishing it semantically from the C++ constructor method; the special <code>self</code> argument in Python clases; instantiating classes; instance variables; passing in parameters and calling methods.</p> <p>Then he shows how to create an iterator class and revisits the pluralization code from the previous chapter to produce the same results as a more terse, abstract form.</p> <p>Aside: when I first discovered Python, I was delighted to discover that it appeared to be lists all the way down. Pilgrim argues by way of contrast that it's actually "iterators all the way down."</p> <p><strong>Chapter 8</strong> indulges in a bit of whimsy by way of introducing the <code>itertools</code> library in the context of an extremely terse (just 14 lines!) Alphametics solver. Pilgrim dives a little deeper into regular expressions, introducing the <code>findall()</code> method and highlighting another pattern matching gotcha (it doesn't return overlapping matches). He also shows how you can use the new <code>set</code> datatype to extract the unique items in a list.</p> <p>We also learn how to make assertions and catch <code>AssertError</code>s with the terse, one-line syntax you may have come to expect from Python by now.</p> <p>Speaking of which, Pilgrim revisits the generator paradigm with a one-line generator expression. If you prefer not to iterate, you can pass the generator right into a <code>tuple()</code>, <code>list()</code> or <code>set()</code> function.</p> <p>An important paradigm of functional programming is the use of <em>lazy evaluation</em>, in which the values in an iterable object are calculated on the fly as needed rather than all at once.</p> <p>Python comes equipped with the powerful <code>itertools</code> module, which includes methods for generating permuatations, products, combinations, chains, groups, and more.</p> <p>Along the same lines, Pilgrim shows how to treat a file as an iterable list (most things in Python are iterable), with all the methods and functions available to manipulate lists.</p> <p>The chapter closes with a vintage Pilgrim dive when he iterates through the chain of problems / solutions / new problems related to Python's <code>eval()</code> function. <code>eval()</code> is powerful - and <em>dangerous</em>, since it can run arbitrary code that does anything Python can do (<code>subprocess.getoutput('rm-rf')</code> anyone?).</p> <blockquote> <p>Say it with me: "<code>eval()</code> is evil!"</p> </blockquote> <p>It turns out that there is no truly reliable, safe way to expose <code>eval()</code> to untrusted third parties.</p> <h3>Unit Testing</h3> <p><strong>Chapter 9</strong> introduces unit testing. If you unit test, you'll appreciate the value of a powerful, simple testing framework; if you don't unit test, you may well close this paragraph a convert.</p> <p>Python, of course, comes with a <code>unittest</code> module that lets you subclass simple, robust test suites to prove, empirically, that your code does what it's supposed to do (at least on a micro level).</p> <p>Using <code>unittest</code> and the code from Chapter 5 to generate Roman Numerals, Pilgrim takes you through testing, debugging and (in <strong>Chapter 10</strong>), safe refactoring. </p> <p>His philosophy is simple and elegant:</p> <ol> <li>Write tests.</li> <li>Write code.</li> <li>Test code.</li> <li>Debug code.</li> <li>When all the tests pass, stop coding.</li> </ol> <p>Over the course of two chapters, you'd be hard pressed not to come away persuaded that Pilgrim's onto something here.</p> <h3>File I/O</h3> <p><strong>Chapter 11</strong> dives deep into files. I/O is central to computing, and Pilgrim has touched on files and file-like objects several times so far. In this chapter, files are the main attraction.</p> <p>In Python 3, the <code>open()</code> function takes an <code>encoding</code> argument so that Python knows how to encode the string of bytes that is a file:</p> <blockquote> <p>Bytes are bytes; characters are an abstraction. A string is a sequence of Unicode characters. But a file on disk is not a sequence of Unicode characters; a file on disk is a sequence of bytes. So if you read a "text file" from disk, how does Python convert that sequence of bytes into a sequence of characters?</p> </blockquote> <p>If you don't specify an encoding, Python 3 uses your system's default encoding (e.g. Windows, where the default encoding is CP-1252). That might be the file's encoding, but it might not - assume at your peril! This is especially true given that Python is cross-platform. Code written on, say, a Linux machine that makes assumptions about file encoding might suddenly fail on a Windows machine.</p> <p>Pilgrim then dives into opened files as stream objects, with useful properties and methods, including <code>readline()</code>, <code>seek()</code> and <code>tell()</code>. Stream objects also behave like iterators, yielding a line at a time.</p> <p>In addition to reading files, Python can write files and append new lines to files. But not all files are text files. Python can also open, read, write and append to binary files, byte by byte.</p> <p>Further, Pilgrim explains how to use the <code>io</code> library to create and manipulate file-like objects that may not correspond to actual files on disk but still possess file-like properties and methods.</p> <p>Of course, Python can also handle compressed files via the <code>gzip</code> module. (The standard Python library is phenomenal in its breadth and usefulness.)</p> <p>Python is thoroughly cross-platform, but it allows you to use and redirect the stdin, stdout and stderr pipes indigenous to *nix systems. Python shows you how to hook into these pipes.</p> <p>Pilgrim advocates using the <code>with</code> context as the safest way to handle files and file-like objects. He notes that Python 3.1 supports multiple nested <code>with</code> contexts (if you have Python 3 installed, take a few minutes to upgrade to 3.1).</p> <h3>Serialization via XML, Pickle and JSON</h3> <p><strong>Chapter 12</strong> wrestles XML. Yes, XML is a special flavour of hell, but you <em>will</em> have to deal with it from time to time, and Python has easy, powerful tools to help you. He sketches out a crash course on XML, concluding, "Now you know just enough XML to be dangerous."</p> <p>The standard library offers <code>xml.etree.ElementTree</code> to parse, walk, search, and iterate (it really <em>is</em> iterators all the way down) through, create, and modify your XML object.</p> <p>Pilgrim pauses to draw our attention to the third <code>lxml</code> module, a third-party library that replicates the ElementTree API but extends it with more powerful search methods.</p> <p>With matching APIs, your code can prefer the more powerful third party library but fall back on the standard library if the former is not installed.</p> <p>Pilrim also shows how you can dig into an XML object with bad syntax and recover its data using <code>lxml</code>.</p> <p>Of course, XML is evil (albeit an often necessary evil), and <strong>Chapter 13</strong> introduces other methods of serializing Python objects. </p> <p>Sometimes you want to store more than just strings in your data persistence routine, and Python has (wait for it) powerful tools for serializing even complex objects and datatypes - like the standard <code>pickle</code> module, which has <code>dump()</code> and <code>load()</code> methods for storing object,s dicts, lists, and so on.</p> <p>But <code>pickle</code> is only good for Python programs, and data often has to pass between different applications written in different languages. Fortunately, Python comes equipped to handle <a href="http://www.json.org/">JSON</a>, or JavaScript Object Notation, a data format developed by Douglas Crockford and inspired by javascript object literal syntax (in fact, JSON <em>is</em> valid javascript).</p> <p>JSON can handle objects, lists and dicts, and the standard <code>json</code> library can handle JSON. You can also extend JSON with custom serializers for unsupported data types, like tuples.</p> <p>Pilgrim really knows how to hammer home the gotchas. In his umpteenth reminder to define the encoding of an opened file:</p> <blockquote> <p>You'll forget! I forget sometimes! And everything will work right up until the moment that it fails, and then it will fail most spectacularly.</p> </blockquote> <h3>Take a REST</h3> <p><strong>Chapter 14</strong> takes the reader through creating RESTful, HTTP-based web services. (Pilgrim wisely decided not to update the chapter on SOAP web services from the orginal DIP.) </p> <p>Python does HTTP via the <code>urllib</code> and <code>httplib</code> modules (standard library), but Pilgrim recommends the third-party <code>httplib2</code> module instead. It's more complete and clean than the former two. In particular, it supports HTTP caching, last-modified testing, ETags, compression, and 30x redirects.</p> <p>Pilgrim notes:</p> <blockquote> <p><code>urllib</code> speaks HTTP like I speak Spanish - enough to get by in a jam but not enough to hold a conversation. HTTP is a conversation. It's time to upgrade to a library that speaks HTTP fluently.</p> </blockquote> <p>Pilgrim shows how to fetch data from a web resources without being rude or inefficient, by accepting compressed data, caching files locally, remembering permanent redirects, and so on.</p> <p>He also demonstrates RESTful interactions with HTTP GET, POST, UPDATE and DELETE requests to a web API (in his example, <a href="http://identi.ca">identi.ca</a>, an open source, twitter-like microblogging service).</p> <p>Important note: <code>httplib2</code> returns bytes, not strings. Yes, the character demon rears its head again. You need to specify an encoding.</p> <h3>Porting and Packaging</h3> <p>I've mentioned that DIP3 doesn't always explicitly highlight the differences between Python 2 and 3. <strong>Chapter 15</strong> comes as something of a remedy.</p> <p>The bytes/characters dichotomy is paramount throughout this book, since it's a huge gotcha that <em>will</em> burn you sooner or later. (It's one of the reasons the Python development team decided to break compatibility to create Python 3 in the first place.) </p> <p>It's not surprising that Pilgrim ported a library (from Mozilla) that guesses the character encoding of byte sequences into Python. In this chapter, Pilgrim walks the reader through the sometimes painful exercise of porting the library, called <code>chardet</code>, from Python 2 to Python 3.</p> <p>While the book has noted difference between the versions along the way, it has not served as a systematic guide to the changes. As Chapter 15 shows, there are some important but subtle differences under the hood that will break your code and may be infuriating to track down.</p> <p>Python includes the <code>2to3.py</code> tool, which automates the automatable aspects of porting, and highlights those issues it couldn't handle.</p> <p>Of course, every chapter does double- and triple-duty, and this chapter also introduces the structure and internal arrangement of multi-file libraries.</p> <p>Speaking of which, <strong>Chapter 16</strong> covers packaging Python libraries for distribution. If nothing else, Pilgrim gets an <em>A+++++ Would Definitely Read Again</em> for demystifying the <code>distutils</code> library and unlocking the PyPi online catalogue. </p> <p>I've always found the documentation on packaging to be arcane and inaccessible, and this chapter is a much-appreciated remedy.</p> <h3>Conclusion</h3> <p>DIP3 is not suitable as an introduction to programming, since Pilgrim assumes you bring a background in programming concepts to the table; but it's perfect for busy programmers looking to gain mastery in Python 3.</p> <p>Over the course of a witty, compelling narration, the book repeatedly hammers both Python 3's most important strengths (terse syntax, rich datatypes, everything-is-an-object, abundant iteration, powerful libraries) and most dangerous pitfalls (bytes vs. characters).</p> <p>Pilgrim wastes no more time than is strictly necessary on exposition. We program to solve problems, and DIP3 keeps problem solving - practical, real-world problems - in the driver's seat throughout.</p> <p>Notwithstanding a few minor quibbles, I can heartily recommend this book to anyone who wants to tackle Python 3. Since reading it, I find myself thinking about my own modest Python applications - sooner or later I'm going to have to cut the cord and make the jump to Python 3. </p> <p>After reading <em>Dive Into Python 3</em>, that point seems closer than ever.</p> <h3>Reference</h3> <p>Mark Pilgrim, <em>Dive Into Python 3</em>, Apress, 2009. </p> <p>Text copyright &copy; 2009 by Mark Pilgrim. Licenced under the <a href="http://creativecommons.org/licenses/by-sa/3.0/">Creative Commons Attribution Share-Alike</a> licence. </p> <p><em>DIP3</em> is <a href="http://diveintopython3.org/">available for download</a> in HTML or PDF; or you can clone the document repository:</p> <pre><code>you@localhost:~$ hg clone http://hg.diveintopython3.org/ diveintopython3 </code></pre> Ryan McGreal 2 http://quandyfactory.com/blog/41/top_10_programming_lessons_in_10_years 2010-01-25T12:00:00Z Top 10 Programming Lessons in 10 Years <p>After reading <a href="http://www.dcs-media.com/Archive/20-20-top-20-programming-lessons-ive-learned-in-20-years-FI">Top 20 Programming Lessons I've Learned in 20 Years</a> by Jonathan Danylko, a recent article <a href="http://news.ycombinator.com/item?id=1049890">featured on Hacker News</a>, I was inspired to come up with my own list of programming lessons.</p> <p>These lessons are pretty basic, but they have served me well over the years. When I find myself getting lost in a project, it helps to go back to the basics and get some perspective. </p> <p>I have learned most of these lessons the hard way, i.e. by not following them and getting burned. In some cases, I have been burned more than once. (What can I say? I'm a slow learner.) </p> <p>On the other hand, following every lesson consistently always leads to positive outcomes. Many of these lessons seem like additional work, but my experience has always been that they save time and aggravation over the life of the project. A little investment of effort up front can prevent a lot of catch-up work down the road.</p> <p>So here's my top 10 (plus one bonus) list of programming lessons I've learned over the last ten years.</p> <h3>Break it down.</h3> <p>A whole project - even an ostensibly simple one - is overwhelming to contemplate and leads to defensive <a href="/blog/1/productivity_and_procrastination">procrastination</a>. Always take the time at the beginning to break the project into manageable components, even if it seems like a waste of time. You'll be glad you did when a week or two goes by, the project scope has expanded, and you're starting to lose site of the big picture. </p> <p>With a list of components, you don't have to worry about the big picture - you just work through the steps until complete. Of course, this means you must be sure to review and revise the list of steps as your requirements change.</p> <h3>Your requirements will change.</h3> <p>Accept it. Embrace it. Understand that the final product will be better if you're willing to change your mind when the facts change. </p> <p>It's generally impossible to know, in advance, exactly what the end product is going to look like. The harder you try to detail the end product in absolute, unchanging terms, the more wrong you will end up being.</p> <h3>Obey Gall's Law.</h3> <p>When coding a project, take the shortest path to <a href="http://en.wikipedia.org/wiki/Gall%27s_law" title="A complex system that works is invariably found to have evolved from a simple system that worked.">a simple system that works</a>, even if it's only a fractional subset of the total project requirements. Then incrementally add functionality, testing as you go, until the project is completed.</p> <h3>Document as you go.</h3> <p>This includes both code comments and external documentation for application and/or API users. Every time you add or change a feature, update the documentation to reflect the change. Just make it a core part of your workflow. </p> <p>If you wait until you've finished coding to start documenting, it will end up looking forced, rushed, and incomplete. Ideally, include your documentation right inside your version control.</p> <h3>Use version control.</h3> <p>Disk space is cheap and abundant, so commit early and commit often. Make sure your commit comments reflect what you're capturing in each snapshot. </p> <h3>Maintain separate development and production environments.</h3> <p>Never you mind that 'quick fix' that will just take a monent to type out. Put it in development, test it, commit the change, and then push to production. Do this every time, and you won't end up looking like an asshole when something goes wrong.</p> <h3>Backup and restore.</h3> <p>This should be a no-brainer, so don't be one of those people who has to <a href="http://superuser.com/questions/82036/recovering-a-lost-website-with-no-backup" title="Coding Horror, indeed!">learn the hard way</a>. You need a proper, reliable, redundancy-tested offsite backup, and <em>you need to test regularly</em> that you can restore from your backups. </p> <p>No, a RAID array is not a backup solution. Nor is blindly trusting your hosting provider to do their job.</p> <h3>Leave your code in a working state at the end of every day.</h3> <p>Don't walk out on code that won't run. It's surprisingly demoralizing to come to work the next day knowing that a broken build is waiting for you. If you have to roll back or comment out a half-finished code block, do it. </p> <p>In addition, it's helpful to draft up a quick todo list so you can quickly pick up where you left off the next day. (Thanks to <a href="http://news.ycombinator.com/item?id=1050652" title="Write out your mental cache at the end of the day.">bsaunder</a> on Hacker News for this suggestion.)</p> <h3>If you can't figure out a problem, walk away.</h3> <p>Cognitive science tells us that our brains protect us from the stress of not knowing how to solve a problem by ... hiding key information about the problem from consciousness. Thanks, brain. </p> <p>For small to medium-sized problems, I find a good brisk walk is enough to break the logjam in my mind and see through to a solution. For big problems, I may have to pull out the big guns: a good night's sleep. </p> <p>I'm not being silly or even exaggerating: <em>sleep on it</em> is an essential tool of my problem solving strategy, a tool on which I rely regularly, and which has yet to fail me.</p> <h3>Fix bugs first.</h3> <p>Don't add new features while any identified bugs are still outstanding. When part of your system doesn't work, the last thing you want to do is introduce additional complexity. </p> <p>When troubleshooting bugs, try to be scientific about it. Don't just randomly make changes and hope the bug will go away - this is a recipe for introducing more bugs. Make sure you come to understand what's causing the bug so that you can be confident your fix really is a fix. (Thanks to <a href="http://news.ycombinator.com/item?id=1050722" title="Understand what was causing the bug once it's fixed.">bendtheblock</a> on Hacker News for this suggestion.)</p> <h3>Communicate, communicate, communicate.</h3> <p>When you're developing a project for someone, keep the channels open at all times. Clearly communicate your timelines, progress and difficulties and provide regular status reports. </p> <p>If you realize something is going to take longer than you expected, resist the urge to go into hiding! Instead, notify them as early as possible. Likewise, if the client comes back to you with changes that impact your time lines, be very clear in communicating what those impacts will be. </p> <p>Finally, though it may seem like a hassle, make yourself available to answer questions and provide demonstrations. If you followed #3 and #8, you will always have <em>something</em> to show.</p> Ryan McGreal 2 http://quandyfactory.com/blog/39/the_virtue_of_forgiving_html_parsers 2010-01-08T12:00:00Z The Virtue of Forgiving HTML Parsers <p>Today Hacker News <a href="http://news.ycombinator.com/item?id=1039353" title="View-Source Is Good? Discuss. on Hacker News">featured</a> an essay by Alex Russell of the Dojo javscript toolkit in which he mused on <a href="http://alex.dojotoolkit.org/2010/01/view-source-is-good-discuss/" title="View-Source Is Good? Discuss.">the virtues of view-source</a> as they apply to the internet. </p> <p>Since browsers render HTML, javascript and CSS from plain text, it's possible not only to see the finished product - a rendered web page - but also its underlying source code. </p> <p>Every browser I've ever seen includes an option to <em>view source</em> - to see the underlying HTML markup, javascript code and style sheet formatting that the browser uses to render a web page.</p> <p>Russell makes the important point that it was the ability to <em>view source</em> - especially in the early days of the internet - that made web content easily accessible to anyone who took the time to read the code and to experiment with it to discover how it works.</p> <blockquote> <p>View-source provides a powerful catalyst to creating a culture of shared learning and learning-by-doing, which in turn helps formulate a mental model of the relationship between input and output faster. Web developers get started by taking some code, pasting it into a file, loading it in a browser and switching between editor and browser between even the most minor changes. </p> <p>This is a stark contrast with other types of development, notably those that impose a compilation step on development, in which the process of seeing what what done requires an intermediate action. </p> <p>In other words, immediacy of output helps build an understanding of how the system will behave, and <code>ctrl-r</code> becomes a seductive and productive way for developers to accelerate their learning in the copy-paste-tweak loop. </p> <p>The only required equipment is a text editor and a web browser, tools that are free and work together instantly. That is to say, there's no waiting between when you save the file to disk and when you can view the results. It's just a <code>ctrl-r</code> away. [paragraph breaks added]</p> </blockquote> <p>On reading this, a parallel observation occurred to me: <em>The power of view-source is multiplied when generous parsers forgive errors and render anyway.</em> </p> <p>The value of a <em>big, broad</em> internet is far greater than the value of a <em>clean, pure</em> internet.</p> <p>Most programmers will agree that programs or data streams with malformed syntax should fail, and fail fast. The worst thing that can happen with a software application - particularly a complex one - is for errors to pass silently while malformed data flows into data storage, only to produce gibberish - or worse, subtly wrong results - on export.</p> <p>Passing a string into a function that expects an integer should throw an exception; passing a set of four values into a method that expects three parameters should raise a red flag; and so on.</p> <p>By contast, the HTML parsers in every browser are extremely forgiving. If an HTML parser encounters an opening tag for an element that is missing a corresponding closing tag, it just accepts the missing close tag as implied and renders as if it was there. </p> <p>Likewise, if the parser encounters a malformed or non-standard element, it will either try to render it somehow by guessing at the code's intention (naturally, different parsers approach such matters differently) or will just ignore it completely and move onto the next line.</p> <p>As a result, even badly malformed HTML can still produce a readable web page.</p> <p>Purists bristle at this, insisting that it makes the web a worse place by forcing browser makers to code for errors, encouraging sloppy coding practices, causing the same content to be rendered differently on different browsers, and so on.</p> <p>They're missing the point. <em>The virtue of forgiving parsers is that they vastly increase the pool of people able and willing to create web content.</em></p> <p>If you're already a programmer, HTML syntax is easy enough to understand and produce in valid form.</p> <p>However, most people who create web content aren't programmers - particularly during the early days of the internet when it grew exponentially and established the virtuous cycle of positive network externalities that ultimately dragged more professional developers onto that platform.</p> <p>Rather, they were amateur enthusiasts exploring a new technological domain. Thanks to HTML, view-source and forgiving parsers, the number of people who could create web pages was vastly higher than the number of people who could write computer programs.</p> <p>It was the rapid democratization of HTML made possible by view-source and forgiving parsers that accounts for much of its success as a language - and of the success of the internet as a platform.</p> <p>When an HTML parser finds code so bad that it can't render it, the parser just skips it and moves to the next line, in the manner of VB's <code>on error resume next</code> (programmers are welcome to cringe here). Contrast the stricture of XML parsers, which are obliged to fail on encountering malformed code and produce no output at all.</p> <p>Since HTML rendering in response to an HTTP GET request is essentially <a href="http://en.wikipedia.org/wiki/Idempotence" title="Idempotence on Wikipedia">idempotent</a>, there's no real harm in continuing to parse code after encountering an error - but the positive network effects from this are huge.</p> <p>If only programmers possessed the arcane ability to produce well-formed, valid markup, it would never have experienced the early growth that transformed it into the reigning standard of a huge and growing public network.</p> <p>One more thing: for many people, HTML provided a gently sloping pathway into programming for people who might otherwise never have managed to overcome the steep barriers to entry of, say, C, with its verbose syntax and elaborate requirements. (Compiler? What's a compiler? And why does it hate me?)</p> <p>Many programmers today (myself included) found their way into programming from a start in HTML, after bumping into the limitations of static content, exploring event handling in javascript, and then making the jump over to PHP or classic ASP and thence to SQL - and ultimately over to more modern languages with more robust, structured software design principles baked into them than the spaghetti code that powered a lot of early dynamic websites.</p> <p>Again, there are some people who consider this to be a terrible thing - a watering down of an industry that <em>ought</em> to have high barriers to entry. Setting aside my own interest in the matter, I disagree. Getting more computing and networking capability directly into the hands of more people can only increase the rate of technical innovation by deploying more expressive power more widely.</p> Ryan McGreal 2 http://quandyfactory.com/blog/36/newsfeed_bug:_fixed 2009-12-20T12:00:00Z Newsfeed Bug: Fixed <p>I just noticed that the <a href="http://quandyfactory.com/feeds/all">main newsfeed</a> for this site has a major bug: links back to the articles don't, er, link back to the articles. I'll try to fix this tonight.</p> <p><strong>Update</strong> - This is now fixed.</p> Ryan McGreal 2