Smart Disorganized Individuals: 07/01/2013

Wednesday, July 31, 2013

The Future Of Programming

Bret Victor has another classic talk up :

Bret Victor - The Future of Programming from Bret Victor on Vimeo.

Watch it. The conceit is entertaining, from his clothes to the overheads.

However, despite the brilliance of the presentation, I think he might be wrong. And the fact that it's taken 40 years for these promising ideas NOT to take off, may suggest there are some flaws in the ideas themselves.

Coding > Direct Manipulation

Like most visually-oriented people Bret gives great importance to pictures. If I remember correctly, something like 33% of the human brain is visual cortex and specialized in handling our particular 2D + depth way of seeing. So it's hardly surprising that we imagine that this kind of data is important or that we continually look for ways of pressing that part of the brain into service for more abstract data-processing work.

However, most data we want to handle isn't of this convenient 2D or 2.5D form. You can tell this because our text-books are full of different kinds of data-structure, from arrays, lists and queues, to matrices of 2, 3 and higher dimensions, to trees, graphs and relational databases. If most data was 2D, then tables and 2D matrices would be the only data-structures programmers would ever use, and we'd have long swapped our programming languages for spreadsheets.

Higher dimensional and complex data-structures can only be visualized in 2, 2.5 or even 3 dimensions by some kind of projection function. And, Bret, to his credit has invented some ingenious new projections for getting more exotic topologies and dynamics down to 2D. But even so, only a tiny proportion of our actual data-storage requirements are ever likely to be projectable into a visual space.

Once you accept that, then the call for a shift from coding to direct manipulation of data-structures starts to look a lot shakier. Right now, people are using spreadsheets ... in situations which lend themselves to it. Most of the cases where they're still writing programs are cases where such a projection either doesn't exist or hasn't been discovered (in more than 30 years since the invention of the spreadsheet).

Procedures > Goals / Constraints

It seems like it must be so much easier to simply tell the computer what you want rather than how to do it. But how true is that?

It's certainly shorter. But we have a couple of reasons for thinking that it might not be easier.

1) We've had the languages for 40 years. And anyone who's tried to write Prolog knows that it's bloody difficult to formulate your algorithms in such a form. Now that might be because we just don't train and practice enough. But it might be genuinely difficult.

The theoretical / mathematical end of computer science is always trying to sell higher-level abstractions which tend in the direction of declarative / constraint oriented programming, and relatively few people really get it. So I'm not sure how much this is an oversight by the programmer community vs. a genuine difficulty in the necessary thinking.

2) One thing that is certain : programming is very much about breaking complex tasks down into smaller and simpler tasks. The problem with declarative programming is that it doesn't decompose so easily. It's much harder to find part solutions and compose them when declaring a bunch of constraints.

And if we're faced with a trade-off between the virtue of terseness and the virtue of decomposability, it's quite possible that decomposibility trumps terseness.

There may be an interesting line of research here : can we find tools / representations that help in making declarative programs easier to partially specify? Notations that help us "build-up" declarations incrementally?

3) I have a long-standing scepticism from my days working with genetic algorithms that might well generalize to this theme too. With a GA you hope to get a "free lunch". Instead of specifying the design of the solution you want (say in n-bits), you hope you can specify a much shorter fitness function (m-bits) and have the computer find the solution for you.

The problem is that there are many other solutions that the computer can find, that fit the m-bit fitness function but aren't actually (you realize, retrospectively) the n-bit solution that you really want. Slowly you start building up your fitness function, adding more and more constraints to ensure the GA solves it the right rather than wrong way. Soon you find the complexity of your fitness function is approaching the complexity of a hand-rolled solution.

Might the same principle hold here? Declarative programming assumes we can abstract away from how the computer does what it does, but quite often we actually DO need to control that. Either for performance, for fine-tuning the user's experience, for robustness etc.

Anyone with any relational database experience will tell you that writing SQL queries is a tiny fraction of the skills needed for professional database development. Everything else is scaling, sharding, data-mining, Big Data, protecting against failure etc. etc. We used to think that such fine grained control was a temporary embarrassment. OK for systems programmers squeezing the most out of limited memory and processor resources. But once the computers became fast enough we could forget about memory management (give it to the garbage collector) or loop speed (look at that wonderful parallelism). Now we're in the future we discover that caring about the material resources of computation is always the crucial art. One resource constraint becomes cheap or fast enough to ignore, but your applications almost immediately grow to the size that you hit a different limit and need to start worrying again.

Professional software developers NEVER really manage to ignore the materiality of their computation, and so will never really be able to give up fine-grained control to a purely declarative language.

(SQL is really a great example of this. It's the most successful "tell the computer what you want not how you want it done" language in computing history. And yet there's still a lot of tuning of the materiality required, either by db-admins or more recently witnessed by the NoSQL movement, returning to more controllable hierarchical databases, mainly to improve their control.)

Text Dump > Spatial Relations

I already pointed out the problems of assuming everything conveniently maps onto human vision.

I'm as fascinated by visual and gestural ideas for programming as the next geek. But I'm pretty convinced that symbols and language are way, way, way more flexible and powerful representation schemes than diagrams will ever be. Symbols are not limited to two and a half dimensions. Symbols can describe infinite series and trees of infinite depth and breadth. Yadda yadda yadda.

Of course we can do better than the tools we have now. (Our programs could be outlines, wiki-like hypertexts, sometime spreadsheets, network diagrams etc. Or mixes of all of these, as and when appropriate.) But to abandon the underlying infrastructure of symbols, I think is highly unlikely.

Sequential > Parallel

This one's fascinating in that it's the one that seems most plausible. So it's also disturbing to think that it has a history as old as the other (failed) ideas here. If anything, Victor makes me pessimistic about a parallel future by putting it in the company of these other three ideas.

Of course, I'll reserve full judgement on this. I have my Parallella "supercomputer" on order (courtesy of KickStarter). I've dabbled a bit in Erlang. I'm intrigued by Occam-π. And I may even have a go at Go.

And, you know what? In the spirit of humility, and not knowing what I'm doing, I'm going to forget everything I just wrote. I'll keep watching Bret's astounding videos; and trying to get my head around Elm-lang's implementation of FRP. And dreaming of ways that programming will be better in the future.

And seeking to get to that better future as quickly as possible.

Tuesday, July 30, 2013

Cthulhu

My software is more or less like Cthulhu. Normally dead and at the bottom of the sea, but occasionally stirring and throwing out a languid tentacle to drive men's minds insane. (Or at least perturb a couple of more recklessly adventurous users.)

However there's been a bit more bubbling agitation down in R'lyeh recently. The latest weird dream returning to trouble the world is GeekWeaver, the outline based templating language I wrote several years ago.

GeekWeaver was basically driven by two things : my interest in the OPML Editor outliner, and a need I had to create flat HTML file documentation. While the idea was strong, after the basic draft was released, it languished.

Partly because I shifted from Windows to Linux where the OPML Editor just wasn't such a pleasurable experience. Partly because GW's strength is really in having a templating language when you don't have a web server; but I moved on to doing a lot of web-server based projects where that wasn't an issue. And partly, it got led astray - spiralling way out of control - by my desire to recreate the more sophisticated aspects of Lisp, with all kinds of closures, macros, recursion etc.

I ended up assuming that the whole enterprise had got horribly crufty and complicated and was an evolutionary dead end.

But suddenly it's 2013, I went to have quick look at GeekWeaver, and I really think it's worth taking seriously again.

Here are the three reasons why GeekWeaver is very much back in 2013 :

Fargo

Most obviously, Dave Winer has also been doing a refresh of his whole outlining vision with the excellent browser-based Fargo editor. Fargo is an up-to-date, no-comprise, easy to use online OPML Editor. But particularly important, it uses Dropbox to sync. outlines with your local file-system. That makes it practical to install GeekWeaver on your machine and compile outlines that you work on in Fargo.

I typically create a working directory on my machine with a symbolic link to the OPML file which is in the Fargo subdirectory in Dropbox and the fact that the editor is remote is hardly noticable (maybe a couple of seconds lag between finishing an edit and being able to compile it).

GitHub

What did we do before GitHub? Faffed, that's what. I tried to put GeekWeaver into a Python Egg or something, but it was complicated and full of confusing layers of directory. And you need a certain understanding of Python arcana to handle it right. In contrast, everyone uses Git and GitHub these days. Installing and playing on your machine is easier. Updates are more visible.

GeekWeaver is now on GitHub. And as you can see from the quickstart guide on that page, you can be up and running by copying and pasting 4 instructions to your Linux terminal. (Should work on Mac too.) Getting into editing outlines with Fargo (or the OPML Editor still works fine) is a bit more complicated, but not that hard. (See above.)

Markdown

Originally GeekWeaver was conceived as using the same UseMod derived wiki-markup that I used in SdiDesk (and now Project ThoughtStorms for Smallest Federated Wiki). Then part of the Lisp purism got to me and I decided that such things should be implementable in the language, not hardwired, and so started removing them.

The result was, while GeekWeaver was always better than hand-crafting HTML, it was still, basically hand-crafting HTML, and maybe a lot less convenient that using your favourite editor with built-in snippets or auto-complete.

In 2013 I accepted the inevitable. Markdown is one of the dominant wiki-like markup languages. There's a handy Python library for it which is a single, install away. And Winer's Fargo / Trex ecosystem already uses it.

So in the last couple of days I managed to incorporate a &&markdown mode into GeekWeaver pretty easily. There are a couple of issues to resolve, mainly because of collisions between Markdown and other bits of GeekWeaver markup, but I'm now willing to change GeekWeaver to make Markdown work. It's obvious that even in its half-working state, Markdown is a big win that makes it a lot easier to write a bigger chunks of text in GeekWeaver. And, given that generating static documentation was GeekWeaver's original and most-common use-case, that's crucial.

Where Next?

Simplification. I'm cleaning out the cruft, throwing out the convoluted and buggy attempts to make higer-order blocks and lexical closures. (For the meantime.)

Throwing out some of my own idiosyncratic markup to simplify HTML forms, PHP and javascript. Instead GW is going to refocus on being a great tool for adding user-defined re-usable abstractions to a) Markdown and b) any other text file.

In recent years I've done other libraries for code-generation. For example, Gates of Dawn is Python for generating synthesizers as PureData files. (BTW : I cleaned up that code-base a bit, recently, too.)

Could you generate synths from GeekWeaver? Sure you could. It doesn't really help though, but I've learned some interesting patterns from Gates of Dawn, that may find their way into GW.

Code Generation has an ambiguous reputation. It can be useful and can be more trouble than it's worth. But if you're inclined to think using outlining AND you believe in code-gen then GeekWeaver is aiming to become the perfect tool for you.

Saturday, July 27, 2013

Quora Scraping As A Service

If you missed it, I launched a Fiverr Gig.

Quick explanation.

Wednesday, July 24, 2013

Google's New Email Tabs

There's a lot of discussion going on around them. Eg. on Quora.

I started writing a comment on a comment where Tim Bushell asks :

why shouldn't they be "red", "green", "blue"?

ie. user-defined or neutral.

But then felt it would be better here :

Probably because Google have a database of thousands of email addresses and patterns that they've classified into these categories of "social", "promotion" etc., and with this move they're basically giving you, the customer, the benefit of that classification scheme.

They assume that if you just want to program your own categories and sort accordingly you're already doing it via filters.

What probably didn't occur to Google was that the world is full of people who WANT to be able to define their own categories and filters but never realized that GMail (like every email client in the 20+ years) already HAS this feature.

What's happening is that just by showing people tabbed email, they've suddenly woken everyone up to the fact that your email client can be programmed to filter emails. (Who knew?)

What happens now is going to be interesting.

If Google know how to listen, they'll take advantage of it, add the ability to define your own tabs, integrate it seemlessly with the existing filter architecture of GMail (maybe improve the UI of that a bit, eg. drag / dropping between tabs) and get to bask in the adoration of having "reinvented email".

If not, they'll keep the two systems separate (ie. filter-definition hidden away where most people can't find or understand it) and not only will the opportunity be squandered, but many people will continue to hate the tabs.

Monday, July 22, 2013

Modules In Time : Synthesizing GitHub and Skyrim

Thanks to Bill Seitz I picked up on a Giles Bowkett post I'd missed a couple of months ago which compares the loosely coupled asynchronous style of development that companies like GitHub both promote and live, with the intensely coupled synchronous raids that occur in online game-worlds.

Bowkett seems confused by the apparent contradictions between the two. And yet obviously impressed by the success of both. He wants to know what the correct synthesis is.

That really shouldn't be so hard to imagine. The intense coupling is what happens in pair-programming, for example. Or the hackday or sprint. Its focus is on creating a single minimum product or adding a single feature / story to it.

The right synthesis, to my way of thinking, is intense / tight / adrenalin fuelled / synchronous coupling over short periods, where certain combinations of talents (or even just two-pairs of eyes) are necessary. And loose / asynchronous coupling everywhere else. Without trying to squash everyone's work into some kind of larger structure which looks neat but doesn't actually serve a purpose.

The future of work is highly bursty!

It shouldn't surprise us, because modularity is one of the oldest ideas in software : tight-cohesion within modules of closely related activities. Loose and flexible coupling between modules. It's just that with these work-patterns we're talking about modules in time. But the principle is the same. The sprint is the module focused on a single story. The wider web of loosely asynchronous forks and merges is the coupling between modules.

GrabQuora on GitHub

A couple of tweaks to the Quora grabbing script. Makes it worth upgrading to a full GitHub project.

Quora Scraper

I love Quora. It's a great site and community. But I started getting a bit concerned how much writing I was doing there which was (potentially) disappearing inside their garden and not part of the body of thinking I'm building up on ThoughtStorms (or even my blogs).

Fortunately, I discovered Quora has an RSS feed of my answers, so I can save them to my local machine. (At some point I'll think about how to integrate them into ThoughtStorms; should I just make a page for each one?)

Anyway here's the script (powered by a lot of Python batteries.)

And this turns the files back into a handy HTML page.

Tuesday, July 16, 2013

GeekWeaver

OK ... not shouting much yet, but what with the relaunch of OPML and Fargo, there's a bit of a GeekWeaver refresh going on.

I put the code on GitHub and starting to clean it up, making it suitable for use ...

Watch this space ...

Saturday, July 06, 2013

Restraining Bolts

Today I'm being driven crazy trying to print out FiloFax pages on an HP printer.

Although I've created a PDF file of the right size, I have the right size piece of paper, and I've set up the paper-size in the print-driver, the printer is refusing to print because it detects a "paper size mismatch".

A quick look through HP's site reveals a world of pain created by this size-checking-sensor which can't be over-ridden. People are justifiably pissed off.

What's striking is that this is a problem that didn't exist previously. There are many accounts in this forum of people who, on their older printers, happily used incorrect page-size settings in the driver, with odd-sized paper, and just got their job done.

HP by trying to add "smartness" to their product have made it less usable. This is such a common anti-pattern, engineers should be taught it in school : the more smart-constraints you add to a product, the more likely you are going to disempower and piss-off the edge-cases and non-standard users.

Recently I wrote a Quora answer which I brought to ThoughtStorms : MachineGatekeepers . I worried for people who didn't know how to navigate technological problems in a world where we're encaged by technology.

But I have an even greater worry. The road to hell is paved with "helpful constraints" added by idiots. And we're all suffering as technologies which, with a pinch of know-how or intuition we could bend to our will, become iron cages. It's no good knowing how to google the FAQ or engage with tech. support when HP support is effectively non-existent.

The most disturbing thought here is that BigTech knows this, and increasingly takes away our freedom with one hand and sells it back to us on the other. If enough people complain that their HP won't print on FiloFax pages what's the most likely result? That HP release a fix to disable the page-size-sensor? Or that they'll just release a new printer model which also handles FiloFax paper but is otherwise equally restricted?

Friday, July 05, 2013

Fargo For LinkBlogging

I'm a couple of days into LinkBlogging using Fargo, (at Yelling At Strangers From The Sky) and I have to say, I'm getting into the swing and it's great.

If you keep the outline open in a tab, it's about as fast and convenient to post to Fargo as posting a link to Plus or Twitter. (Which is where traditional blogs like WordPress / Blogger often fall short). In fact, G+ is now getting bloated that it can take 10 seconds just to open the "paste a new message" box. It's a lot faster than that.

It would be nice if it could automatically include a picture or chunk of text from the original page the way FB / G+ do, that's turned out to be a compelling experience for me, but it's a nice not must-have.

A question, is there any kind of API for the outline inside the page which a bookmarklet could engage with? (Is that even possible given the browser security model?)

Thursday, July 04, 2013

Fargo and Google

Couple of quick notes :

1) I'm too dependent on Google. Unlike the case of Facebook, I can't just cancel my account. Google is too deeply entwined with my life. But I am taking steps to disengage if not 100% at least a significant chunk.

2) I'm playing around a bit more with Dave Winer's Fargo outliner. And it is shaping up to be excellent, both as an outliner and expression of Winer's philosophy. (No surprises.)

So, to combine the two, I'm documenting my Google-leaving thoughts in a public outline. Check it out.

Update : I've also been wondering about having a linkblog, somewhere I can quickly throw links rather than G+ (which is inside the Google Walled River). Maybe Fargo will help there too.