Smart Disorganized Individuals

Monday, November 25, 2013

How GitHub (no longer) Works

Very interesting talk from GitHub's Zach Holman on how the company's decentralized culture is evolving as it grows.

Monday, November 04, 2013

Aaaargh!

I have to write a fucking custom Tuple class in my Java program just to have a function that returns a pair of values?

Friday, November 01, 2013

Programming Language Features for Large Scale Software

My Quora Answer to the question : What characteristics of a programming language makes it capable of building very large-scale software?

The de facto thinking on this is that the language should make it easy to compartmentalize programming into well segregated components (modules / frameworks) and offers some kind of "contract" idea which can be checked at compile-time.

That's the thinking behind, not only Java, but Modula 2, Ada, Eiffel etc.

Personally, I suspect that, in the long run, we may move away from this thinking. The largest-scale software almost certainly runs on multiple computers. Won't be written in a single language, or written or compiled at one time. Won't even be owned or executed by a single organization.

Instead, the largest software will be like, say, Facebook. Written, deployed on clouds and clusters, upgraded while running, with supplementary services being continually added.

The web is the largest software environment of all. And at the heart of the web is HTML. HTML is a great language for large-scale computing. It scales to billions of pages running in hundreds of millions of browsers. Its secret is NOT rigour. Or contracts. It's fault-tolerance. You can write really bad HTML and browsers will still make a valiant effort to render it. Increasingly, web-pages collaborate (one page will embed services from multiple servers via AJAX etc.) And even these can fail without bringing down the page as a whole.

Much of the architecture of the modern web is built of queues and caches. Almost certainly we'll see very high-level cloud-automation / configuration / scripting / data-flow languages to orchestrate these queues and caches. And HADOOP-like map-reduce. I believe we'll see the same kind of fault-tolerance that we expect in HTML appearing in those languages.

Erlang is a language designed for orchestrating many independent processes in a critical environment. It has a standard pattern for handling many kinds of faults. The process that encounters a problem just kills itself. And sooner or later a supervisor process restarts it and it picks up from there. (Other processes start to pass messages to it.)

I'm pretty sure we'll see more of this pattern. Nodes or entire virtual machines that are quick to kill themselves at the first sign of trouble, and supervisors that bring them back. Or dynamically re-orchestrate the dataflow around trouble-spots.

Many languages are experimenting with Functional Reactive Programming : a higher-level abstraction that makes it easy to set up implicit data-flows and event-driven processing. We'll see more languages that approach complex processing by allowing the declaration of data-flow networks, and which simplify exception / error handling in those flows with things like Haskell's "Maybe Monad".

Update : Another thing I'm reminded of. Jaron Lanier used to have this idea of "Phenotropic Programming" (WHY GORDIAN SOFTWARE HAS CONVINCED ME TO BELIEVE IN THE REALITY OF CATS AND APPLES) Which is a bit far out, but I think it's plausible that fault-tolerant web APIs and the rest of the things I'm describing here, may move us closer.

Wednesday, October 30, 2013

OWL Broken

Doh! Actually OWL is very broken. Will be posting fix shortly. Will keep you informed.

Monads in Python (Again)

Dustin Getz provides one of the best "Monads for idiot Python programmers" explanations I've seen.

Excellent! I think I almost do understand this one.

Monday, October 21, 2013

Pissed With Ubuntu

Seriously, it was just a simple upgrade. How hard should that have been?

Instead it crashed in the middle. When I rebooted it told me my disk was broken. I found some instructions to fix the problem online. It ran these for a while, fixing some packages before telling me my package manager was too broken so it was aborting.

End result. A Ubuntu that boots into low-res mode without wifi :-(

Bleah!

Thursday, October 17, 2013

OWL Fix

There's a big fix for OWL today. There were some mysterious times when pages that I thought I was changing were getting reverted. I thought originally that this was a glitch from me btsync-ing between my laptop and tablet. Or maybe my attempts at doing background synchronization between the browser localStorage and the server were failing.

Nothing seemed to completely eliminate this intermittent problem. But today I realized it was much simpler. I was basically using web.py's "static" file serving to pull the OPML files off the server into Concord. But "static" is meant for static files (doh!). The browser was caching them. (Maybe because of some header web.py was putting out.) Anyway, I just changed the server to reading the files into memory and spitting their contents out, just like any other dynamic web-page, and the problem looks like it's gone away.

I'll keep an eye out, but I think that was it.

Wednesday, October 09, 2013

Blame the Tools for Thought

Giles Bowkett :

This is, in my opinion, the strongest argument for seeing Unix and basic coding skills as fundamental required literacy today. As prostheses for memory and identity, computers are too useful not to use, but if you don't know how to craft your own code which gives you a UX which matches the way you think, you're doomed to matching the way you think to the available tools, and even the best available tools basically suck. Interaction design is not only incredibly hard to do well, it's also incredibly idiosyncratic.

Wednesday, September 25, 2013

OWL Server

OWL now has a simple Python server that saves OPML files to your local machine.

More here.

Saturday, September 21, 2013

Why Don't Browsers Let Web-Apps Write To The Local File System?

My Quora question :

I mean, I know why. It's a security thing.

But why couldn't a browser have an API for scripts to read / write the file system and a security feature where the web-app has to ask and be given permission by the user before it runs? (Just as Android apps. have to tell you what permissions they need before you install them.) Couldn't the browsers successfully police this?

Surely if the browser manufacturers were to offer this capability, they'd more or less kill native Windows / Macintosh application development overnight and become the default platform for desktop computers. (So maybe Microsoft don't have the incentive, but Google and Firefox do.)

Friday, September 20, 2013

Hack Your Life With A Private Wiki Notebook

Bill Seitz is writing a book on organizing your life with wiki.

Looking forward to it.

Introducing OWL

I love outlining. I love wiki. What do you get when you create a mutant cross-breed of the two?

A fucking power tool, that's what!

It's just a rough draft, at the moment, a rough mashup of Concord and ideas from SdiDesk. But I think you can see it's compelling ...

Tuesday, September 17, 2013

Dave open-sources his Outliner

Concord is Fargo open-sourced.

Dave is, of course, awesome.

Saturday, August 17, 2013

GeekWeaver : Fixed Variable Substitution in Markdown

GeekWeaver : Fixed a bug that prevented variables being evaluated in Markdown mode.

Tuesday, August 06, 2013

Xiki

This looks very interesting :

I laughed when I first saw it, said "it's like Emacs". Seems Emacs is involved somehow. Also reminds me of Enso.

Sunday, August 04, 2013

QuoraGrabber is Dead!

Long live RSS Backup!
Really, a separate script / project just to back-up Quora is overkill. Now I have a more general script for backing up from any RSS feed. (Which I'll be able to use to ensure I have copies of what I write here and on Composing.)
I also made it a bit saner at keeping the useful HTML markup (ie. links etc.)
It's on GitHub.

Saturday, August 03, 2013

Modularity At Fine Granularity

Ian Bicking has a fascinating question. I'm just going to quote the whole thing because it's so good and important :

The prevailing wisdom says that you should keep your functions small and concise, refactoring and extracting functions as necessary. But this hurts the locality of expectations that I have been thinking about. Consider:
function updateUserStatus(user) {
  if (user.status == "active") {
    $("<li />").appendTo($("#userlist")
      .text(user.name)
      .attr("id", "user-" + user.id);
  } else {
    $("#user-" + user.id).remove();
  }
}
Code like this is generally considered to be terrible – there’s logic for users and their status, mixed in with a bunch of very specific UI-related code. (Which is all tied to a DOM state that is defined somewhere else entirely — but I digress.) So a typical refactoring would be:
function updateUserStatus() {
  if (user.status == "active") {
    displayUserInList(user);
  } else {
    removeUserFromList(user);
  }
}
With the obvious definition of displayUserInList() and removeUserFromList(). But the first approach had certain invariants that the second does not. Assuming you don’t mess with the UI/DOM directly, and assuming that updateUserStatus() is called when it needs to be called, the user will be in the list or not based strictly on the value of user.status. After refactoring there are functions that could be called in other contexts (e.g., displayUserInList()). You can look at the code and see that particular things happen when updateUserStatus() is called, but it’s not as easy to determine what is going to happen when inspect the code from the bottom up. For instance, you want to understand why things end up in
— you search for #userlist but you now get two functions instead of one, and to understand the logic you have to trace that back to the calling function, and you have to wonder if now or in the future anyone else will call those functions. The advantage of the first function is that blocks of code are strict. You execute from the top to the bottom, with clear control structures. When GOTO existed you couldn’t reason so well, but we’ve gotten rid of that! (Of course there are still other exceptions.) It’s not entirely clear what intention drives the refactoring (besides adherence to conventional standards of code beauty), but it’s probably more about code organization than about making the control flow more flexible. Extracting those functions means that you now have the power to make the UI inconsistent with the model, and that hardly seems like a feature. And I have to wonder: are some of these basic patterns of “good” code there because we have poor tools for code organization? We express too many things with functions and methods and classes (and perhaps modules) because that’s all we have. But those are full of unintended semantic meaning. Anyone have examples of languages that have found novel ways of keeping code organized?

So, it's a great question on modularity where we tend not to have much explicit thinking : down at the smaller granularity (compared to all the patterns for classes etc.) My immediate comment is that if Ian refactored his code like this :


function updateUserStatus() {
    var id = "user-"+user.id;
    if (user.status == "active") {
        addToList("#userlist",user.name,id);
    } else {
        removeFromList("#userlist",id);
    }
}

it would solve most of the problems. In this version we aren't fussily creating extra functions for tiny fragments of functionality which are only relevant to narrow situations (ie. users, userlists). Now the new functions are more generic and widely applicable. They're doing enough that it's worth the overhead of creating them. They're still usefully hiding the bit of complexity we DON'T want to think about here - the actual jQuery / HTML details of how lists are constructed - but they're leaving the important details - WHICH list we're updating and what parts of a user we show - in this locality rather than allowing it to become diffuse across the program. Of course, we can't prevent another bit of code updating the list itself somewhere. (That's more a quirk of the HTML environment where the DOM is global. In many analogous cases we could prevent most of the code having unauthorized access to a list simply by making it private within a class.)

Thursday, August 01, 2013

Alan Kay

This is a great presentation by Alan Kay; the spirit lurking behind Bret Victor.

Wednesday, July 31, 2013

The Future Of Programming

Bret Victor has another classic talk up :

Bret Victor - The Future of Programming from Bret Victor on Vimeo.

Watch it. The conceit is entertaining, from his clothes to the overheads.

However, despite the brilliance of the presentation, I think he might be wrong. And the fact that it's taken 40 years for these promising ideas NOT to take off, may suggest there are some flaws in the ideas themselves.

Coding > Direct Manipulation

Like most visually-oriented people Bret gives great importance to pictures. If I remember correctly, something like 33% of the human brain is visual cortex and specialized in handling our particular 2D + depth way of seeing. So it's hardly surprising that we imagine that this kind of data is important or that we continually look for ways of pressing that part of the brain into service for more abstract data-processing work.

However, most data we want to handle isn't of this convenient 2D or 2.5D form. You can tell this because our text-books are full of different kinds of data-structure, from arrays, lists and queues, to matrices of 2, 3 and higher dimensions, to trees, graphs and relational databases. If most data was 2D, then tables and 2D matrices would be the only data-structures programmers would ever use, and we'd have long swapped our programming languages for spreadsheets.

Higher dimensional and complex data-structures can only be visualized in 2, 2.5 or even 3 dimensions by some kind of projection function. And, Bret, to his credit has invented some ingenious new projections for getting more exotic topologies and dynamics down to 2D. But even so, only a tiny proportion of our actual data-storage requirements are ever likely to be projectable into a visual space.

Once you accept that, then the call for a shift from coding to direct manipulation of data-structures starts to look a lot shakier. Right now, people are using spreadsheets ... in situations which lend themselves to it. Most of the cases where they're still writing programs are cases where such a projection either doesn't exist or hasn't been discovered (in more than 30 years since the invention of the spreadsheet).

Procedures > Goals / Constraints

It seems like it must be so much easier to simply tell the computer what you want rather than how to do it. But how true is that?

It's certainly shorter. But we have a couple of reasons for thinking that it might not be easier.

1) We've had the languages for 40 years. And anyone who's tried to write Prolog knows that it's bloody difficult to formulate your algorithms in such a form. Now that might be because we just don't train and practice enough. But it might be genuinely difficult.

The theoretical / mathematical end of computer science is always trying to sell higher-level abstractions which tend in the direction of declarative / constraint oriented programming, and relatively few people really get it. So I'm not sure how much this is an oversight by the programmer community vs. a genuine difficulty in the necessary thinking.

2) One thing that is certain : programming is very much about breaking complex tasks down into smaller and simpler tasks. The problem with declarative programming is that it doesn't decompose so easily. It's much harder to find part solutions and compose them when declaring a bunch of constraints.

And if we're faced with a trade-off between the virtue of terseness and the virtue of decomposability, it's quite possible that decomposibility trumps terseness.

There may be an interesting line of research here : can we find tools / representations that help in making declarative programs easier to partially specify? Notations that help us "build-up" declarations incrementally?

3) I have a long-standing scepticism from my days working with genetic algorithms that might well generalize to this theme too. With a GA you hope to get a "free lunch". Instead of specifying the design of the solution you want (say in n-bits), you hope you can specify a much shorter fitness function (m-bits) and have the computer find the solution for you.

The problem is that there are many other solutions that the computer can find, that fit the m-bit fitness function but aren't actually (you realize, retrospectively) the n-bit solution that you really want. Slowly you start building up your fitness function, adding more and more constraints to ensure the GA solves it the right rather than wrong way. Soon you find the complexity of your fitness function is approaching the complexity of a hand-rolled solution.

Might the same principle hold here? Declarative programming assumes we can abstract away from how the computer does what it does, but quite often we actually DO need to control that. Either for performance, for fine-tuning the user's experience, for robustness etc.

Anyone with any relational database experience will tell you that writing SQL queries is a tiny fraction of the skills needed for professional database development. Everything else is scaling, sharding, data-mining, Big Data, protecting against failure etc. etc. We used to think that such fine grained control was a temporary embarrassment. OK for systems programmers squeezing the most out of limited memory and processor resources. But once the computers became fast enough we could forget about memory management (give it to the garbage collector) or loop speed (look at that wonderful parallelism). Now we're in the future we discover that caring about the material resources of computation is always the crucial art. One resource constraint becomes cheap or fast enough to ignore, but your applications almost immediately grow to the size that you hit a different limit and need to start worrying again.

Professional software developers NEVER really manage to ignore the materiality of their computation, and so will never really be able to give up fine-grained control to a purely declarative language.

(SQL is really a great example of this. It's the most successful "tell the computer what you want not how you want it done" language in computing history. And yet there's still a lot of tuning of the materiality required, either by db-admins or more recently witnessed by the NoSQL movement, returning to more controllable hierarchical databases, mainly to improve their control.)

Text Dump > Spatial Relations

I already pointed out the problems of assuming everything conveniently maps onto human vision.

I'm as fascinated by visual and gestural ideas for programming as the next geek. But I'm pretty convinced that symbols and language are way, way, way more flexible and powerful representation schemes than diagrams will ever be. Symbols are not limited to two and a half dimensions. Symbols can describe infinite series and trees of infinite depth and breadth. Yadda yadda yadda.

Of course we can do better than the tools we have now. (Our programs could be outlines, wiki-like hypertexts, sometime spreadsheets, network diagrams etc. Or mixes of all of these, as and when appropriate.) But to abandon the underlying infrastructure of symbols, I think is highly unlikely.

Sequential > Parallel

This one's fascinating in that it's the one that seems most plausible. So it's also disturbing to think that it has a history as old as the other (failed) ideas here. If anything, Victor makes me pessimistic about a parallel future by putting it in the company of these other three ideas.

Of course, I'll reserve full judgement on this. I have my Parallella "supercomputer" on order (courtesy of KickStarter). I've dabbled a bit in Erlang. I'm intrigued by Occam-π. And I may even have a go at Go.

And, you know what? In the spirit of humility, and not knowing what I'm doing, I'm going to forget everything I just wrote. I'll keep watching Bret's astounding videos; and trying to get my head around Elm-lang's implementation of FRP. And dreaming of ways that programming will be better in the future.

And seeking to get to that better future as quickly as possible.

Tuesday, July 30, 2013

Cthulhu

My software is more or less like Cthulhu. Normally dead and at the bottom of the sea, but occasionally stirring and throwing out a languid tentacle to drive men's minds insane. (Or at least perturb a couple of more recklessly adventurous users.)

However there's been a bit more bubbling agitation down in R'lyeh recently. The latest weird dream returning to trouble the world is GeekWeaver, the outline based templating language I wrote several years ago.

GeekWeaver was basically driven by two things : my interest in the OPML Editor outliner, and a need I had to create flat HTML file documentation. While the idea was strong, after the basic draft was released, it languished.

Partly because I shifted from Windows to Linux where the OPML Editor just wasn't such a pleasurable experience. Partly because GW's strength is really in having a templating language when you don't have a web server; but I moved on to doing a lot of web-server based projects where that wasn't an issue. And partly, it got led astray - spiralling way out of control - by my desire to recreate the more sophisticated aspects of Lisp, with all kinds of closures, macros, recursion etc.

I ended up assuming that the whole enterprise had got horribly crufty and complicated and was an evolutionary dead end.

But suddenly it's 2013, I went to have quick look at GeekWeaver, and I really think it's worth taking seriously again.

Here are the three reasons why GeekWeaver is very much back in 2013 :

Fargo

Most obviously, Dave Winer has also been doing a refresh of his whole outlining vision with the excellent browser-based Fargo editor. Fargo is an up-to-date, no-comprise, easy to use online OPML Editor. But particularly important, it uses Dropbox to sync. outlines with your local file-system. That makes it practical to install GeekWeaver on your machine and compile outlines that you work on in Fargo.

I typically create a working directory on my machine with a symbolic link to the OPML file which is in the Fargo subdirectory in Dropbox and the fact that the editor is remote is hardly noticable (maybe a couple of seconds lag between finishing an edit and being able to compile it).

GitHub

What did we do before GitHub? Faffed, that's what. I tried to put GeekWeaver into a Python Egg or something, but it was complicated and full of confusing layers of directory. And you need a certain understanding of Python arcana to handle it right. In contrast, everyone uses Git and GitHub these days. Installing and playing on your machine is easier. Updates are more visible.

GeekWeaver is now on GitHub. And as you can see from the quickstart guide on that page, you can be up and running by copying and pasting 4 instructions to your Linux terminal. (Should work on Mac too.) Getting into editing outlines with Fargo (or the OPML Editor still works fine) is a bit more complicated, but not that hard. (See above.)

Markdown

Originally GeekWeaver was conceived as using the same UseMod derived wiki-markup that I used in SdiDesk (and now Project ThoughtStorms for Smallest Federated Wiki). Then part of the Lisp purism got to me and I decided that such things should be implementable in the language, not hardwired, and so started removing them.

The result was, while GeekWeaver was always better than hand-crafting HTML, it was still, basically hand-crafting HTML, and maybe a lot less convenient that using your favourite editor with built-in snippets or auto-complete.

In 2013 I accepted the inevitable. Markdown is one of the dominant wiki-like markup languages. There's a handy Python library for it which is a single, install away. And Winer's Fargo / Trex ecosystem already uses it.

So in the last couple of days I managed to incorporate a &&markdown mode into GeekWeaver pretty easily. There are a couple of issues to resolve, mainly because of collisions between Markdown and other bits of GeekWeaver markup, but I'm now willing to change GeekWeaver to make Markdown work. It's obvious that even in its half-working state, Markdown is a big win that makes it a lot easier to write a bigger chunks of text in GeekWeaver. And, given that generating static documentation was GeekWeaver's original and most-common use-case, that's crucial.

Where Next?

Simplification. I'm cleaning out the cruft, throwing out the convoluted and buggy attempts to make higer-order blocks and lexical closures. (For the meantime.)

Throwing out some of my own idiosyncratic markup to simplify HTML forms, PHP and javascript. Instead GW is going to refocus on being a great tool for adding user-defined re-usable abstractions to a) Markdown and b) any other text file.

In recent years I've done other libraries for code-generation. For example, Gates of Dawn is Python for generating synthesizers as PureData files. (BTW : I cleaned up that code-base a bit, recently, too.)

Could you generate synths from GeekWeaver? Sure you could. It doesn't really help though, but I've learned some interesting patterns from Gates of Dawn, that may find their way into GW.

Code Generation has an ambiguous reputation. It can be useful and can be more trouble than it's worth. But if you're inclined to think using outlining AND you believe in code-gen then GeekWeaver is aiming to become the perfect tool for you.