Smart Disorganized Individuals

Sunday, December 08, 2013

Leaving Blogger

Part #232 of the ongoing evolution of my online life. I am migrating Smart Disorganized from Blogger to a WP blog on one of my own domains. In this case the new URL is going to be : http://sdi.thoughtstorms.info/. That's Smart Disorganized Individuals at ThoughtStorms. Makes sense because ThoughtStorms is the domain name for my wiki, the Project ThoughtStorms activity around the SFW. And now for OWL, the Outliner with Wiki Linking.
It's all part of a greater, if very slowly executed, plan.

Monday, November 25, 2013

How GitHub (no longer) Works

Very interesting talk from GitHub's Zach Holman on how the company's decentralized culture is evolving as it grows.

Monday, November 04, 2013

Aaaargh!

I have to write a fucking custom Tuple class in my Java program just to have a function that returns a pair of values?

Friday, November 01, 2013

Programming Language Features for Large Scale Software

My Quora Answer to the question : What characteristics of a programming language makes it capable of building very large-scale software?

The de facto thinking on this is that the language should make it easy to compartmentalize programming into well segregated components (modules / frameworks) and offers some kind of "contract" idea which can be checked at compile-time.

That's the thinking behind, not only Java, but Modula 2, Ada, Eiffel etc.

Personally, I suspect that, in the long run, we may move away from this thinking. The largest-scale software almost certainly runs on multiple computers. Won't be written in a single language, or written or compiled at one time. Won't even be owned or executed by a single organization.

Instead, the largest software will be like, say, Facebook. Written, deployed on clouds and clusters, upgraded while running, with supplementary services being continually added.

The web is the largest software environment of all. And at the heart of the web is HTML. HTML is a great language for large-scale computing. It scales to billions of pages running in hundreds of millions of browsers. Its secret is NOT rigour. Or contracts. It's fault-tolerance. You can write really bad HTML and browsers will still make a valiant effort to render it. Increasingly, web-pages collaborate (one page will embed services from multiple servers via AJAX etc.) And even these can fail without bringing down the page as a whole.

Much of the architecture of the modern web is built of queues and caches. Almost certainly we'll see very high-level cloud-automation / configuration / scripting / data-flow languages to orchestrate these queues and caches. And HADOOP-like map-reduce. I believe we'll see the same kind of fault-tolerance that we expect in HTML appearing in those languages.

Erlang is a language designed for orchestrating many independent processes in a critical environment. It has a standard pattern for handling many kinds of faults. The process that encounters a problem just kills itself. And sooner or later a supervisor process restarts it and it picks up from there. (Other processes start to pass messages to it.)

I'm pretty sure we'll see more of this pattern. Nodes or entire virtual machines that are quick to kill themselves at the first sign of trouble, and supervisors that bring them back. Or dynamically re-orchestrate the dataflow around trouble-spots.

Many languages are experimenting with Functional Reactive Programming : a higher-level abstraction that makes it easy to set up implicit data-flows and event-driven processing. We'll see more languages that approach complex processing by allowing the declaration of data-flow networks, and which simplify exception / error handling in those flows with things like Haskell's "Maybe Monad".

Update : Another thing I'm reminded of. Jaron Lanier used to have this idea of "Phenotropic Programming" (WHY GORDIAN SOFTWARE HAS CONVINCED ME TO BELIEVE IN THE REALITY OF CATS AND APPLES) Which is a bit far out, but I think it's plausible that fault-tolerant web APIs and the rest of the things I'm describing here, may move us closer.

Wednesday, October 30, 2013

OWL Broken

Doh! Actually OWL is very broken. Will be posting fix shortly. Will keep you informed.

Monads in Python (Again)

Dustin Getz provides one of the best "Monads for idiot Python programmers" explanations I've seen.

Excellent! I think I almost do understand this one.

Monday, October 21, 2013

Pissed With Ubuntu

Seriously, it was just a simple upgrade. How hard should that have been?

Instead it crashed in the middle. When I rebooted it told me my disk was broken. I found some instructions to fix the problem online. It ran these for a while, fixing some packages before telling me my package manager was too broken so it was aborting.

End result. A Ubuntu that boots into low-res mode without wifi :-(

Bleah!

Thursday, October 17, 2013

OWL Fix

There's a big fix for OWL today. There were some mysterious times when pages that I thought I was changing were getting reverted. I thought originally that this was a glitch from me btsync-ing between my laptop and tablet. Or maybe my attempts at doing background synchronization between the browser localStorage and the server were failing.

Nothing seemed to completely eliminate this intermittent problem. But today I realized it was much simpler. I was basically using web.py's "static" file serving to pull the OPML files off the server into Concord. But "static" is meant for static files (doh!). The browser was caching them. (Maybe because of some header web.py was putting out.) Anyway, I just changed the server to reading the files into memory and spitting their contents out, just like any other dynamic web-page, and the problem looks like it's gone away.

I'll keep an eye out, but I think that was it.

Wednesday, October 09, 2013

Blame the Tools for Thought

Giles Bowkett :

This is, in my opinion, the strongest argument for seeing Unix and basic coding skills as fundamental required literacy today. As prostheses for memory and identity, computers are too useful not to use, but if you don't know how to craft your own code which gives you a UX which matches the way you think, you're doomed to matching the way you think to the available tools, and even the best available tools basically suck. Interaction design is not only incredibly hard to do well, it's also incredibly idiosyncratic.

Wednesday, September 25, 2013

OWL Server

OWL now has a simple Python server that saves OPML files to your local machine.

More here.

Saturday, September 21, 2013

Why Don't Browsers Let Web-Apps Write To The Local File System?

My Quora question :

I mean, I know why. It's a security thing.

But why couldn't a browser have an API for scripts to read / write the file system and a security feature where the web-app has to ask and be given permission by the user before it runs? (Just as Android apps. have to tell you what permissions they need before you install them.) Couldn't the browsers successfully police this?

Surely if the browser manufacturers were to offer this capability, they'd more or less kill native Windows / Macintosh application development overnight and become the default platform for desktop computers. (So maybe Microsoft don't have the incentive, but Google and Firefox do.)

Friday, September 20, 2013

Hack Your Life With A Private Wiki Notebook

Bill Seitz is writing a book on organizing your life with wiki.

Looking forward to it.

Introducing OWL

I love outlining. I love wiki. What do you get when you create a mutant cross-breed of the two?

A fucking power tool, that's what!

It's just a rough draft, at the moment, a rough mashup of Concord and ideas from SdiDesk. But I think you can see it's compelling ...

Tuesday, September 17, 2013

Dave open-sources his Outliner

Concord is Fargo open-sourced.

Dave is, of course, awesome.

Saturday, August 17, 2013

GeekWeaver : Fixed Variable Substitution in Markdown

GeekWeaver : Fixed a bug that prevented variables being evaluated in Markdown mode.

Tuesday, August 06, 2013

Xiki

This looks very interesting :

I laughed when I first saw it, said "it's like Emacs". Seems Emacs is involved somehow. Also reminds me of Enso.

Sunday, August 04, 2013

QuoraGrabber is Dead!

Long live RSS Backup!
Really, a separate script / project just to back-up Quora is overkill. Now I have a more general script for backing up from any RSS feed. (Which I'll be able to use to ensure I have copies of what I write here and on Composing.)
I also made it a bit saner at keeping the useful HTML markup (ie. links etc.)
It's on GitHub.

Saturday, August 03, 2013

Modularity At Fine Granularity

Ian Bicking has a fascinating question. I'm just going to quote the whole thing because it's so good and important :

The prevailing wisdom says that you should keep your functions small and concise, refactoring and extracting functions as necessary. But this hurts the locality of expectations that I have been thinking about. Consider:
function updateUserStatus(user) {
  if (user.status == "active") {
    $("<li />").appendTo($("#userlist")
      .text(user.name)
      .attr("id", "user-" + user.id);
  } else {
    $("#user-" + user.id).remove();
  }
}
Code like this is generally considered to be terrible – there’s logic for users and their status, mixed in with a bunch of very specific UI-related code. (Which is all tied to a DOM state that is defined somewhere else entirely — but I digress.) So a typical refactoring would be:
function updateUserStatus() {
  if (user.status == "active") {
    displayUserInList(user);
  } else {
    removeUserFromList(user);
  }
}
With the obvious definition of displayUserInList() and removeUserFromList(). But the first approach had certain invariants that the second does not. Assuming you don’t mess with the UI/DOM directly, and assuming that updateUserStatus() is called when it needs to be called, the user will be in the list or not based strictly on the value of user.status. After refactoring there are functions that could be called in other contexts (e.g., displayUserInList()). You can look at the code and see that particular things happen when updateUserStatus() is called, but it’s not as easy to determine what is going to happen when inspect the code from the bottom up. For instance, you want to understand why things end up in
— you search for #userlist but you now get two functions instead of one, and to understand the logic you have to trace that back to the calling function, and you have to wonder if now or in the future anyone else will call those functions. The advantage of the first function is that blocks of code are strict. You execute from the top to the bottom, with clear control structures. When GOTO existed you couldn’t reason so well, but we’ve gotten rid of that! (Of course there are still other exceptions.) It’s not entirely clear what intention drives the refactoring (besides adherence to conventional standards of code beauty), but it’s probably more about code organization than about making the control flow more flexible. Extracting those functions means that you now have the power to make the UI inconsistent with the model, and that hardly seems like a feature. And I have to wonder: are some of these basic patterns of “good” code there because we have poor tools for code organization? We express too many things with functions and methods and classes (and perhaps modules) because that’s all we have. But those are full of unintended semantic meaning. Anyone have examples of languages that have found novel ways of keeping code organized?

So, it's a great question on modularity where we tend not to have much explicit thinking : down at the smaller granularity (compared to all the patterns for classes etc.) My immediate comment is that if Ian refactored his code like this :


function updateUserStatus() {
    var id = "user-"+user.id;
    if (user.status == "active") {
        addToList("#userlist",user.name,id);
    } else {
        removeFromList("#userlist",id);
    }
}

it would solve most of the problems. In this version we aren't fussily creating extra functions for tiny fragments of functionality which are only relevant to narrow situations (ie. users, userlists). Now the new functions are more generic and widely applicable. They're doing enough that it's worth the overhead of creating them. They're still usefully hiding the bit of complexity we DON'T want to think about here - the actual jQuery / HTML details of how lists are constructed - but they're leaving the important details - WHICH list we're updating and what parts of a user we show - in this locality rather than allowing it to become diffuse across the program. Of course, we can't prevent another bit of code updating the list itself somewhere. (That's more a quirk of the HTML environment where the DOM is global. In many analogous cases we could prevent most of the code having unauthorized access to a list simply by making it private within a class.)

Thursday, August 01, 2013

Alan Kay

This is a great presentation by Alan Kay; the spirit lurking behind Bret Victor.

Wednesday, July 31, 2013

The Future Of Programming

Bret Victor has another classic talk up :

Bret Victor - The Future of Programming from Bret Victor on Vimeo.

Watch it. The conceit is entertaining, from his clothes to the overheads.

However, despite the brilliance of the presentation, I think he might be wrong. And the fact that it's taken 40 years for these promising ideas NOT to take off, may suggest there are some flaws in the ideas themselves.

Coding > Direct Manipulation

Like most visually-oriented people Bret gives great importance to pictures. If I remember correctly, something like 33% of the human brain is visual cortex and specialized in handling our particular 2D + depth way of seeing. So it's hardly surprising that we imagine that this kind of data is important or that we continually look for ways of pressing that part of the brain into service for more abstract data-processing work.

However, most data we want to handle isn't of this convenient 2D or 2.5D form. You can tell this because our text-books are full of different kinds of data-structure, from arrays, lists and queues, to matrices of 2, 3 and higher dimensions, to trees, graphs and relational databases. If most data was 2D, then tables and 2D matrices would be the only data-structures programmers would ever use, and we'd have long swapped our programming languages for spreadsheets.

Higher dimensional and complex data-structures can only be visualized in 2, 2.5 or even 3 dimensions by some kind of projection function. And, Bret, to his credit has invented some ingenious new projections for getting more exotic topologies and dynamics down to 2D. But even so, only a tiny proportion of our actual data-storage requirements are ever likely to be projectable into a visual space.

Once you accept that, then the call for a shift from coding to direct manipulation of data-structures starts to look a lot shakier. Right now, people are using spreadsheets ... in situations which lend themselves to it. Most of the cases where they're still writing programs are cases where such a projection either doesn't exist or hasn't been discovered (in more than 30 years since the invention of the spreadsheet).

Procedures > Goals / Constraints

It seems like it must be so much easier to simply tell the computer what you want rather than how to do it. But how true is that?

It's certainly shorter. But we have a couple of reasons for thinking that it might not be easier.

1) We've had the languages for 40 years. And anyone who's tried to write Prolog knows that it's bloody difficult to formulate your algorithms in such a form. Now that might be because we just don't train and practice enough. But it might be genuinely difficult.

The theoretical / mathematical end of computer science is always trying to sell higher-level abstractions which tend in the direction of declarative / constraint oriented programming, and relatively few people really get it. So I'm not sure how much this is an oversight by the programmer community vs. a genuine difficulty in the necessary thinking.

2) One thing that is certain : programming is very much about breaking complex tasks down into smaller and simpler tasks. The problem with declarative programming is that it doesn't decompose so easily. It's much harder to find part solutions and compose them when declaring a bunch of constraints.

And if we're faced with a trade-off between the virtue of terseness and the virtue of decomposability, it's quite possible that decomposibility trumps terseness.

There may be an interesting line of research here : can we find tools / representations that help in making declarative programs easier to partially specify? Notations that help us "build-up" declarations incrementally?

3) I have a long-standing scepticism from my days working with genetic algorithms that might well generalize to this theme too. With a GA you hope to get a "free lunch". Instead of specifying the design of the solution you want (say in n-bits), you hope you can specify a much shorter fitness function (m-bits) and have the computer find the solution for you.

The problem is that there are many other solutions that the computer can find, that fit the m-bit fitness function but aren't actually (you realize, retrospectively) the n-bit solution that you really want. Slowly you start building up your fitness function, adding more and more constraints to ensure the GA solves it the right rather than wrong way. Soon you find the complexity of your fitness function is approaching the complexity of a hand-rolled solution.

Might the same principle hold here? Declarative programming assumes we can abstract away from how the computer does what it does, but quite often we actually DO need to control that. Either for performance, for fine-tuning the user's experience, for robustness etc.

Anyone with any relational database experience will tell you that writing SQL queries is a tiny fraction of the skills needed for professional database development. Everything else is scaling, sharding, data-mining, Big Data, protecting against failure etc. etc. We used to think that such fine grained control was a temporary embarrassment. OK for systems programmers squeezing the most out of limited memory and processor resources. But once the computers became fast enough we could forget about memory management (give it to the garbage collector) or loop speed (look at that wonderful parallelism). Now we're in the future we discover that caring about the material resources of computation is always the crucial art. One resource constraint becomes cheap or fast enough to ignore, but your applications almost immediately grow to the size that you hit a different limit and need to start worrying again.

Professional software developers NEVER really manage to ignore the materiality of their computation, and so will never really be able to give up fine-grained control to a purely declarative language.

(SQL is really a great example of this. It's the most successful "tell the computer what you want not how you want it done" language in computing history. And yet there's still a lot of tuning of the materiality required, either by db-admins or more recently witnessed by the NoSQL movement, returning to more controllable hierarchical databases, mainly to improve their control.)

Text Dump > Spatial Relations

I already pointed out the problems of assuming everything conveniently maps onto human vision.

I'm as fascinated by visual and gestural ideas for programming as the next geek. But I'm pretty convinced that symbols and language are way, way, way more flexible and powerful representation schemes than diagrams will ever be. Symbols are not limited to two and a half dimensions. Symbols can describe infinite series and trees of infinite depth and breadth. Yadda yadda yadda.

Of course we can do better than the tools we have now. (Our programs could be outlines, wiki-like hypertexts, sometime spreadsheets, network diagrams etc. Or mixes of all of these, as and when appropriate.) But to abandon the underlying infrastructure of symbols, I think is highly unlikely.

Sequential > Parallel

This one's fascinating in that it's the one that seems most plausible. So it's also disturbing to think that it has a history as old as the other (failed) ideas here. If anything, Victor makes me pessimistic about a parallel future by putting it in the company of these other three ideas.

Of course, I'll reserve full judgement on this. I have my Parallella "supercomputer" on order (courtesy of KickStarter). I've dabbled a bit in Erlang. I'm intrigued by Occam-π. And I may even have a go at Go.

And, you know what? In the spirit of humility, and not knowing what I'm doing, I'm going to forget everything I just wrote. I'll keep watching Bret's astounding videos; and trying to get my head around Elm-lang's implementation of FRP. And dreaming of ways that programming will be better in the future.

And seeking to get to that better future as quickly as possible.