hero banner

Lessons learnt running a public API from #dddnorth

October 19th, 2014

Yesterday I gave a talk at #DDDNorth (a free community lead conference in the “Developer! Developer! Developer!” series) about running public facing RESTful APIs. Many thanks to all the kind feedback I’ve had about the talk on social media – thrilled that so many people enjoyed it. It was a varient on previous talks I’ve given at a couple of usergroups on the topic – so here are the updated slides.

Google presentation link

Deferred execution in C# – fun with Funcs

October 16th, 2014

I want to talk a little about deferred execution in C#.

I use deferred execution a lot in my code – frequently using it to configure libraries and build for extensibility – but I’ve met lots of people that understand the concept of deferred execution (“this happens later”) but have never really brushed up against it in C#.

Deferred execution is where code is declared but not immediately run – instead being invoked later

Deferred execution is formally described by MSDN as meaning “the evaluation of an expression is delayed until its realized value is actually required”.

It’s common in JavaScript to supply a method as a callback which is later invoked:

image

In the above example, we’re declaring a new function while calling doSomething(), that is later executed as a callback by the doSomething() method, rather than being executed when we call it. Because JavaScript is executed linearly, it makes extensive use of callbacks in just about every part of the language.

By contrast, deferred execution is less obvious in C#, even though there have been keywords and types that leverage deferred execution available for years. There are two common ways that deferred execution is implemented in C#

The Yield Keyword

The yield keyword was introduced in C#2 as some sweet syntatic sugar to help people implement iterators and enumerators without boilerplate code. Using the yield keyword generates a state machine at compile time and actually does a surprising amount. There’s a really great (and ancient) post by Raymond Chen about how yield is implemented – but the short version is, you can “yield return” in methods that return an IEnumerable<T> and the compiler will generate a class with a whole bunch of goto statements in it. It’s an elegant compiler trick that a lot of you will have used, even if you didn’t realise it at the time.

The Action and Func Types

Now we get to the fun stuff. Actions and Func’s were also introduced in C#2, but became much more common from C#3 onwards when Lambdas were introduced to the language. They look like this:

image

Delegates and lambdas in C# are collectively called “Anonymous methods”, and (sometimes) act as closures. What this means, is that when an anonymous method is declared, it can capture “outer variables” so that you can use them later.  There’s a good response by Eric Lippert explaining the exact semantics of anonymous methods in a StackOverflow post here, and numerous examples around the web. This is interesting because you can use a closure to capture some context in one place in the application, and invoke it somewhere else.

There are lots of fun use cases for this, and I want to highlight a couple.

Templating methods

Sometimes referred to as the “hole in the middle” pattern – there are plenty of scenarios where you have a block of repetitive identical code, and four or five tiny variations. This is one of the most frequent sources of copy-paste code, and a great refactor for lots of older codebases. You could create an abstract base class, and a whole bunch of inheritance to solve this problem – or you could do something simpler, where this

image 

can trivially become this

image

Entirely removing the repetition from the codebase without having to build a bunch of abstract cruft. The actual bodies of the methods are entirely different, but with some creative use of deferred execution, we can make sure we only include the boilerplate code once. There are more compelling examples though, consider this pattern if you’re boiler plating HTTP requests, or doing repetitive serialization.

Method Interception

If you’re authoring libraries and want to give your users a way to “meddle with the default behaviour” providing them optional func’s is a really neat way to do it without compromising your design. Imagine a library call that by design swallows exceptions – while that’s not a great idea, we’ll run with it. So users might rightfully want to know when this happens, so you can leverage optional parameters and Action callbacks to give them a hook without compromising your library call.

image

This is a very simplistic example – there are whole frameworks built on the concept of wiring up large collections of Funcs and Actions that are chained together. The Nancy web framework’s Before and After hooks are nothing more than a chain of Funcs that get executed in sequence. In fact, the whole of the OWIN work-in-progress spec for the next generation of .NET webservers revolves around the use of a single “AppFunc”.

Mutator Actions and Funcs that leverage current context

I use Funcs for configuration in just about every library that I own. They’re a great way to allow people to configure the future behaviour of a library, in a repeatable way. In the following example I’m going to create a Factory class that will store an Action that it’ll use to modify the type it creates each time create is called.

image

This is especially useful if you want to do something like “get a value out of the current request” or some other thing that changes each time the factory is called – if you design your Actions and Funcs right, passing in the “current executing context” you can declaratively define behaviours at start-up that evaluate differently on each execution.

I’ve commonly used these kinds of configuration overrides to do things like “fetch the current tenant from context” or “get the ISession from a session scoped IoC container” – the configuration model of ReallySimpleEventing is a small example of using deferred execution and Actions to override the default behaviour when the library encounters an unhandled exception. A default “Throw all” implementation is provided, with the ability to override by configuration.

Working around problems with classes you can’t control the creation of

I recently had to dip into doing some WCF code – and one of the less desirable parts about working with WCF is that it creates your services for you. What this means is that if you want to use some kind of DI container across your codebase based around constructor injection, you’re out of luck. It’s annoying, and it can ruin your testing day, but with a little bit of legwork you can use Func’s to plug that hole.

Given a class that looks like this that’s created by some framework magic

image

You can make a publically settable static func, that’ll act as a proxy to your container and bind it up at bootstrapping time. That’s a bit of a mouthful, so let me illustrate it.

image

In the above example, you can wire up a public static Func<MyDependency> to your container in your bootstrapping code. This means that even if you don’t control the lifecycle of your class, you have a hook to callback to the container to grab a current valid instance of your dependency, without relying on deep static class usages or service locators. It’s preferable, because you can override this behaviour in test classes, giving you a way to test this previously hard to test code. This is especially useful if you want to exhume references to HttpContext or some other framework provided static from your code.

Dictionaries of methods for control flow

Here’s a fun example. Lets say you write a command line app, that takes one of five parameters, and you want to execute a different method based on the parameter passed. Simple enough, lets right an if statement!

image

A little naff and repetitive, but it’ll do. You could perhaps refactor this to a switch statement

image

This looks a little tighter, but it’s still quite verbose – even for such a small example. With the help of a dictionary and some Actions, you can convert this to a registry of methods – one that you could even modify at runtime.

image

This is a visibly concise way of expressing the same control flow in a way that’s mutable.

This is just a taste of some of the things you can do when you leverage deferred execution, anonymous methods and the Action and Func types in .NET.  There are plenty of open source codebases that make use of these kinds of patterns in code, so do dig deeper!

How to make it harder for people to steal your users accounts (and data!)

September 3rd, 2014

The internet has been a-buzz with the recently high-profile thefts of a lot of salacious celebrity photos. As soon as it transpired that iCloud was allegedly the source of these leaks – many, myself included, feared the worst – some kind of exploited, zero-day exploit in Apples cloud storage solution.

As it turns out, the vulnerability wasn’t really anything new, just a well coordinated and lengthy exercise in social engineering and password theft. It appears the data that was stolen and subsequently leaked online had been collected and traded over several months and obtained by gaming the password recovery and signup processes of iCloud, and presumably, several other online sites.

So how does it work?

It’s actually fairly simple – the hackers recover the passwords for the accounts of their targets, and then restore any stored files or device images. This is mostly achieved by identifying the valid email address of the target, and then gaming the password recovery systems.

The hackers first need to verify that the email they’ve been given is a valid login credential – they do this by brute forcing signup and password recovery pages on the target website – paying close attention to the login failure messages returned. If a failure message indicates that the login credential is correct, but the password is wrong, the hacker then has 50% of the information required to break into that account.

They’ll then use a social media driven attack on password recovery questions based on publically available (or privately sourced) data on the target, attempting to guess the recovery answers. If the site they’re targeting isn’t particularly security conscious, as a last ditch they’ll attempt to script a brute force attack to crack into an account. This can be very obvious, so it’s not a preferred attack vector.

Any data stolen can subsequently be used to break into other digital accounts owned by the target – people often use the same passwords everywhere, or if they break into an email account, they can simply password reset everything. Often any contacts or addresses stolen can be used to attack other targets – especially in scenarios where important people have been targeted where they can potentially learn contact information of other potential targets.

Avoid being part of the problem

Here are a few things that you can change in your software to make it less likely that your accounts are hijacked, or that your site is used to exploit somebody else:

Return generic “username or password invalid” responses to authentication failures

Your UX department will probably hate you for this, but it’s a very simple compromise – never tell a user which part of their login credentials failed. Simple by returning “Invalid Password” instead of “Invalid Credentials” verifies to a potential hacker that the email address they attempted to use is already registered.

Use usernames for authentication instead of email addresses

Using email addresses during authentication allows a potential hacker to exploit your signup process to verify a valid account. If your site returns a message like “Email already registered” you are confirming the existence of a user account that’s open to exploitation. You can potentially mitigate against this by returning a more generic “Email not allowed” but often even that is enough of a positive signal to a potential hacker.

Ensure authentication attempts are rate limited

This is really anti-brute-forcing-101 – don’t let people repeatedly submit a login form without slowing them down after a few failed attempts. Once you’re in the region of 5 or so failed attempts, it’s perfectly acceptable to start making users complete CAPTCHAs – google offers recaptcha for free – use it. This is a UX compromise, but so long as you only flag suspicious traffic you’re not going to inconvenience many real users. Make sure that your CAPTCHA implementation doesn’t rely on cookies or other client side state – people writing tools to brute force logins will simply not respect them.

Ensure API authentication attempts are rate limited

Similar to using CAPTCHAs in your website – make sure any API authentication methods are similarly rate limited. The simplest implementation is to start scaling the time a login attempt takes based on the number of failures. For 1-10 attempts – allow the user to login as fast as they can, but after that, start artificially slowing your response times after every failed request from a given IP, client or key – preventing brute forcing your APIs.

Ditch all those “password recovery” questions

There aren’t many good ways to deal with password recovery – but the obvious one is to use SMS verification based on real phone numbers. Allowing people to recover accounts based on personal details that are easy to Google is crazy. If the answers to your password recovery questions are available on your users public Facebook profiles (date of birth, mothers name, place of birth, first school, etc) then you’re just letting your users fall into the pit of failure by accepting them as security questions. Prefer verification by phone or SMS – there are great services out there that make this easy so prefer them over perpetuation the “password recovery” problem. Alternatives to this are accepting “recovery email addresses” like Gmail, and verifying a small 0-99p charge made to any registered credit card.

If your users refuse to give you contact information to recover their account, don’t, under any circumstances, hand the account over.

Don’t enforce restrictive password requirements

Enforcing silly “0-9 characters, alpha numeric and at least one special character” style password requirements absolutely reduces the attack space for a password hack and opens your site up to simplistic brute force attacks. Encourage users to set long passwords, or jumbled pass phrases. We know passwords aren’t great, but they’re what we have – so the longer the better.

Don’t store passwords in either clear text or a reversible hash

Whilst not directly related, please remember that any kind of clear text or reversible password hash is a terrible idea – if you’re compromised, there’s a good chance that your users passwords will be stolen and used to attack more important online properties they own. Play nice, use something like bcrypt.

In the vast majority of cases of “hacking a user account” data made publically available is used to exploit a third party site – so lets all play nicely and protect our users together, lest the data stolen by something even more valuable that humiliating or salacious photos.

Inverting the FN keys on a Microsoft Wedge Keyboard

August 11th, 2014

Just picked up one of the pretty nice Microsoft Wedge bluetooth keyboards, and was reasonable incensed when I discovered there was no FN-Lock anywhere on the thing – and it’d been hardware fixed to have my precious function keys used as media player keys. Not very Visual Studio friendly!

Apparently, no official toggle exists, but here’s an AutoHotKey script that works:

Media_Play_Pause::F1
Volume_Mute::F2
Volume_Down::F3
Volume_Up::F4
<+#F21::F5
<!<#F21::F6
<^<#F21::F7
<#F21::F8
PrintScreen::F9
Home::F10
End::F11
PgUp::F12
F1::Media_Play_Pause
F2::Volume_Mute
F3::Volume_Down
F4::Volume_Up
F5::<+#F21
F6::<!<#F21
F7::<^<#F21
F8::<#F21
F9::PrintScreen
F10::Home
F11::End
F12::PgUp

Stolen from http://stackoverflow.com/questions/14996902/microsoft-wedge-mobile-keyboard – which took me an age to find in the the Googles, so hopefully this’ll help someone.

A summer in the wilderness; on not working; volunteering; taking clients

August 6th, 2014

There’s been a little bit of radio silence here this summer – though it’s probably been my busiest one in memory.

This summer in technology for me has been unusual. Burn-out is real, regardless of project type, and I’m always conscious of avoiding it after some particularly nasty experiences in the past – so I thought I’d make summer this year a little different.

At the end of May I finished up consulting for the awesome tech team over at JUST EAT to take a bit of a break. The team there are brilliant and on a big push for community engagement and participation in open source software. This was one of the core initiatives I was working with them on – to help them grow and scale with a first class software development culture central to the way they work. It was a fun and interesting challenge, but as I approached 7-8 months on site, it started to feel to me like they’d really got in nailed – they didn’t need me. I love it when that happens – it’s probably the best bit of the consultancy gig – when the guys on the floor have the skills to take everything forward without you.

Around about the same as I started feeling like “I’d given what I have” a friend of mine was coming up to a sabbatical at his job, and was scheming to drive across America. Really, when an opportunity like that presents itself, you’d be a fool not to, so through June, I spent a month on the road – hotel to motel to resort – from LA, down to NOLA, then back up to New York. It’s hard to describe a month long experience of an entire continent and do it sufficient justice – so let me just say “I’d recommend you try it if you ever get the chance”.

I got home from the states on June 18th – almost two months ago now – and decided to NOT push myself back into prolonged client work straight away, to take some time out and see if I could do something more useful with my time for a few months. So for the last couple of months, I’ve been trying to give a little back, and I want to tell you about a few of the highlights, and hopefully encourage people to get themselves out of the zone a little more often.

In these last 8 weeks:

  • I’ve contributed to three different open source projects
  • Worked on some free extensions to ReallySimpleEventing to use Azure message bus as a backplane
  • Spent time porting ASP.NET MVC’s HtmlHelpers to #NancyFx
  • Started volunteering some IT consulting to an amazing charity called StreetDoctors (check then out!) via an awesome project called SocialCoder
  • Helped hook up sponsorship for the first OpenSpaceBeers in years, a favourite event of mine
  • Spent a week mentoring children ages 11-15 learn to code at the incredible Young Rewired State Festival of Code – meeting some brilliant and highly capable future technologists in the process
  • Been involved in advising a handful of early stage start-ups (no links yet! ;) )
  • Cut some code on my own little start-up.
  • Worked with the one client I can’t avoid – my mother!
  • Been to just about every London dev meet I could physically make
  • Went to a load of music festivals

To put it bluntly, my little holiday of “less work” turned into a more eclectic, fun, exciting and interesting kind of experience than I expected “quietly working on my own start-up idea” would when I decided to take time out.

Why am I writing this?

It’s easy to see this type of time out as unproductive or unfocused. It’s not. It’s enriching. When you’re really deep in the pit, toiling at the day-job, it’s so very easy to loose perspective, and to be oblivious to other kinds of work or experiences you could be having that are both personally enriching, and intellectually stimulating. I’ve written a tonne of code, had lots of new experiences, and got involved with helping people that I wouldn’t normally interact with.

I spent a lot of time explaining that in technology, there can be a better way – so what I want to say here, is that there are plenty of stimulating experiences away from working for “Mega Corp” – and I want to encourage you to take them and get involved.

I’ll probably be taking on some more time-intensive clients in October, but up until then, if you’d like to work with me on something small, or see how we can make your technology teams work better in London – do give me a shout – right now I’m flexible and can likely fit you in – there are outlines of some of the services I provide on my company website.

Forget about that day-job for a while – especially if you have the ability to.

Continuously deploying Azure services using AppVeyor and GitHub

May 13th, 2014

I put a talk together for the UK Azure user group’s first lightning talks event (hosted by the kind folks at JUST EAT) this evening.

Here’s the slide deck, with an almost-live-coded video where I’ll walk you through putting together automated deployment solution for Cloud services with AppVeyor – a cloud hosted CI platform for .NET.

If you’re interested in hearing more about this or similar topics, either as a user-group slot or professionally, get in touch.

Exploring QueueBackgroundWorkItem in ASP.NET and Framework 4.5.2

May 9th, 2014

One of the more frequently asked questions in ASP.NET web dev is “can I spin up my own thread and do some work from my request” and for a long time, the default answer was always “that’s a terrible idea, you shouldn’t do it” – the framework designers said don’t do it, and people that did it ended up running into fairly terrible problems – mostly because you don’t *really* control application lifecycle in ASP.NET, – IIS is within it’s right to recycle at any point.

For the last couple of years, WebBackgrounder has been the sanctioned-but-with-caveats way to run background tasks inside of your web app – Phill Haack, the author of WebBackgrounder, has a blog posts outlining the perils of background tasks inside of ASP.NET – and explains his motivations behind publishing the package in the first place.

But… it’s a reasonable request, isn’t it?

Truthfully, the motivation behind adding this in a point release of the framework is likely that it’s a scenario that comes up for a lot of people.

Want to do any fire and forget work? Before this, your users are waiting for you to finish before they get a response to their request.

Async and Await make it easier for the server to manage the context switching, but they don’t get the request back to the user any quicker. Some people started firing off Task<T>’s to do the hard work, but these tasks become sensitive to app domain recycles – while they’re probably good enough, the behaviour isn’t guaranteed.

Enter HostingEnvironment.QueueBackgroundWorkItem

As part of the release notes for .NET Framework 4.5.2 there was a single bullet point:

New HostingEnvironment.QueueBackgroundWorkItem method that lets you schedule small background work items. ASP.NET tracks these items and prevents IIS from abruptly terminating the worker process until all background work items have completed. These will enable ASP.NET applications to reliably schedule Async work items.

The takeaway from this is reliably. If you use this new HostingEnvironment queue in an ASP.NET app, any background tasks that can complete execution within the 30 second graceful shutdown period are guaranteed to execute safely.

You can use this functionality trivially in your MVC apps using the following snippets:

image

Just remember – if your tasks take longer than 30 seconds, and the app pool recycles, then all bets are off again and your tasks will be terminated for failing to complete within the graceful shutdown window.

In order to use these new features, you’re going to need to target framework 4.5.2 in your projects. You’ll need to do an in-place upgrade of the framework on all your web servers and build servers, and you’ll need to be mindful of some other changes (the change in behaviour for ViewState MAC will probably be of concern if you also host WebForms pages).

You can get the bytes to install the pre-reqs here:

The HostingEnvironment class is a static class – so if you want to write any tests that assert that your background tasks are being queued, you’re going to have to wrap and inject it.

This is a nice little feature in the latest update to the framework that should let developers remove some of the more flakey code that crops up in web-apps for common tasks like sending emails, updating search indexes and author asynchronous jobs.

Slidedeck + Video – Profiling applications with dotTrace

May 2nd, 2014

Performance profiling appears to have a high barrier to entry if you’ve never done it before. As part of some work for a client, I’ve been helping teach people how to profile code using JetBrains dotTrace. This slide deck is part of a larger session to help developers understand the numbers they’re seeing so they can find performance bottlenecks in their code.

The embedded YouTube video is available here: https://www.youtube.com/watch?v=VrFeayu9LBk

JetBrains publish a wide variety of videos covering the use of their profiler, so I urge you to check them out if you’re interested in deep diving.

Writing technical stuff that people want to read

April 23rd, 2014

As a development community we’re better together. We learn more, we build more awesome things and we’re happier. Blogs and Q&A sites like stackoverflow highlight the growing trend of self-taught programmers learning from the internet, rather than from schools or universities. This isn’t new, programmers have been self-learning for decades – but it used to be from books rather than things written by their peers in blogs and forums.

There are plenty of people who are interested in writing about their experiences – but putting yourself out there and actually doing it can be intimidating. It’s easy to think that you don’t have the skills to articulate your opinion or that you can’t write.

The barriers are real – so lets talk about them…

 

I’m scared of writing!

Writing is like any skill – it takes practise and discipline, and you’re probably not going to be amazing at it straight away. For the vast majority of people, being scared of writing is actually just the fear of sounding stupid – perhaps you don’t think your opinion on something counts, or that there are other smarter people talking about the same sort of things that you’re working on.

The truth is that you will only ever get better at anything by doing it. If a single person benefits from reading about your experiences, you’ve changed the community for the better. There will always be somebody smarter or more knowledgeable than you. If they disagree with what you say and respond to what you write, you learn how to be better – if they don’t, you help others learn.

You will get better – and the best way to get better is to write regularly. If you’re committed to improving, writing every day will help a lot, but practically, once a week is probably enough. When writing about tech, there are lots of small things worth sharing – so write about those, keep it short, and practise.

 

But I can’t write! How do I get started?

There are some easy tricks to get started – remember high school essay-writing? It’s a lot like that. People aren’t going to be interested in what you have to say by default, so you need to hook them and sell them with your title and first paragraph. This is more important in technical writing than other pieces because your reader is often looking for something specific, and you have to give them the impression that you’re dealing with the topics they’re looking for.

When I started high school I had a teacher who described writing essays as

“Tell them what you’re going to tell them.
 Tell them.
 Tell them what you told them.”

…this stuck with me, and it’s a useful structure to follow if you feel like you don’t know how to get started. You open with the hook, follow with the detail, and then close with the key learnings. It’s helpful to outline these sections of your piece as bullet points and then expand them into prose – rather than “just writing”, just like an essay plan in school.

Once you’ve nailed the core of your topic, read it twice and edit it down, cutting out complex or inaccessible language, explaining any jargon and removing filler words. This’ll make sure that what you’ve written is tight and to the point.

 

Telling stories

The purpose of most technical writing is to educate, but good writing informs and entertains. You need to learn how to strike this balance to make what you’re saying easy to follow and engaging. If you look at some of the most popular technical writers over the last decade, many of their most pieces are entertainment with education adding substance. Jeff Atwood and Joel Spolsky are both tell stories while talking about technical things. Stories help the writers establish their voice – making it more likely that readers will return and read subsequent pieces based on the “way they tell them”, regardless of topic.

With those authors, the stories that surround their technical posts are “framing devices” that help readers relate to the topic, sometimes taking the form of allegories to hook the reader. They’re used as introductions and foreshadow the technical parts of their articles, giving them context and motivation.

Contextualising with stories makes the concepts in their writing relatable to people who haven’t been exposed to the exact scenario they’re describing, but may have experienced something similar – helping the reader understand how they can apply or better understand the topic.

Framing your writing with stories helps people connect with what you’re writing, and it’s prevalence in the best technical writing helps offset some of the dryness that plagues most technical documentation.

 

But this is technology, it’s inherently dry and complex!

Just because technology is detailed and tightly articulated, it doesn’t mean that writing about it has to be dry. You’re not writing xmldoc/javadoc documentation, you’re writing content that people need to comprehend and understand – and there’s talent in distilling complex topics and dealing with them plainly. It’s easy for developers to fall back to “just pasting the code into a blog post and expecting someone to read it”, but people are unlikely to read through it.

Technology is complex, though, and if you’re writing things that deep-dive into complex topics, you’ll likely be writing for a specific and skilled audience. Don’t make the mistake in thinking that because your audience is proficient, they won’t appreciate well-articulated guidance.

Jon Skeet’s blog is a great example of writing that edges towards “posting pages of code” while still engaging the reader with narrative. Jon posts snippets of code, building up to larger complete examples – it’s literally “textbook style”, where small pieces of information, narrated and explained, are eventually combined together to form a whole example. This is the correct way to help people work through a detailed code sample, only hitting readers with “the big wall of text” at the end, when they’re equipped to understand it.

Anyone who already understands all the text above it will happily scroll past the explanation looking for the GitHub link – letting you serve both audiences.

 

Layout and formatting

Layout and formatting is a discipline to learn, and can be easier to pick up than the softer narrative skills. Formatting and layout help keep your writing scannable, which is especially important when you’re writing for the web, where people tend to scan content before committing to reading it.

There are a few tricks you can use to help people read your work;

  • Splitting your piece into headed sections with subheadings helps readers “get the gist”
  • Lists can be useful to guide a reader through specific advice (meta!)
  • Images or illustrations help break up longer pieces and can help grab attention
  • Highlighting important phrases with bold or italics helps people skimread
  • When writing for the web, use links to good further resources where applicable

If you’re trying to build a relationship with an audience, chunking longer posts into series can encourage repeat readers – it’s a good way to keep the momentum going when you’re just starting up, but beware starting something that you don’t intend to finish, as it’ll only antagonise your audience. If you’re going to “do a series”, cut a longer piece, rather than writing them piecemeal.

In general, highlight important content, and break up large amounts of text with paragraphs, images, and sub-headings. These splits should be informed by the outline of the piece that you started with, and should be natural, and you’ll get better at them over time.

 

Things to avoid

There are a handful of things that’ll put people off reading very quickly:

“The big wall of text” – characterised by a complete lack of formatting or flow. Pieces written without attention to layout put people off because of they can’t be scanned easily. The longer they get without attention to form, the less likely people are to read them.

“The business domain guy” – people are looking for lessons and flavour from your experiences, they don’t want to learn your entire business domain to understand the concepts you’re trying to explain. As a guideline, if you wouldn’t care to know it about somebody else’s business, they don’t want to know it about yours. If you’re using examples, it’s better to genericise them than to place the burden of understanding on the reader from the start – just explain how the example relates to your specific problem domain briefly.

“The nerd-rager” – as a simple guide, don’t loudly complain about anything that you don’t intend on suggesting a solution to – no good comes of angry people on the internet, so make sure you’re offering productive and constructive criticism rather than raging.

“The rambler” – normally the result of not re-reading the piece and subbing it when you’ve done. Resist the urge to just publish what you’ve written without cutting it down, because there will always be something you can cut.

“The giant code sample”GitHub or BitBucket (oh okay, or SourceForge) are the places for enormous, stand-alone code samples – link to them and resist the urge to post huge code samples without narration.

Steer clear of those archetypes, and you’ll be safe.

 

Wow, it’s *just* that easy?

There’s a lot of detail here, but the most important thing to remember is to avoid analysis paralysis while trying to find the next big thing to write about – just share what you know or what you’re learning and get started. You won’t be the best writer overnight and you’re not just going to stumble into your voice and find a great audience.

If you’re blogging, you’ll want to make sure you share what your write – at least on Twitter and Facebook – and perhaps with the more contentious audiences on Hacker News and reddit. You’ll get feedback, maybe sometimes negative, but don’t let it discourage you.

As a technical community, we need people to share what they know, and the absolute worst thing that will happen is that nobody will read what you publish – so don’t worry, everyone starts somewhere.

 

Slides:

Doing Open Source Right

March 31st, 2014

A brief history of free* and open source software…

The rise of free and open source software is reasonably well documented with several significant projects built and released starting in the 1970s with Emacs and the first version of the GPL. The momentum gathered, pushed on by the popularity of Linux, Perl, Apache and the LAMP stack in the mid to late 1990s.

Free or open source software now drives a huge portion of the web, and during the late 2000’s and early 2010’s the popularisation of source code sharing sites like SourceForge, GitHub and BitBucket along with the realisation that billion dollar businesses could be built and operated on open source software pushed more software towards being “open by default”. What was once perceived as a risk (“giving my property away for free”) started to gain traction in private business and even traditionally open-source-hostile organisations such as Microsoft – started taking pull requests and publishing their source code.

This context is important – even if you’re not the kind of person who previously would’ve ended up writing and publishing your own software, there’s an increasing chance that you now will because the organisation you work for decides open source is worth investing in.

But I’m scared, I don’t understand why we’re doing this?

As a developer, there are lots of passive benefits to “coding in the open” – it’s a great way to learn, it’s a great way to contribute back, and as an individual, it’s the only real opportunity you have to legitimately “take your work with you” from job to job. Open source software can become your professional portfolio, and as it does, getting it right is important.

As an organisation, the motivations behind adopting open source software are obvious – “hey free software!” – but the benefit in publishing your own open source is a little more obfuscated. There’s a moral aspect to it – if you’re building your business on open source software, it’s perhaps the “right” thing to do to give back to the community you’re benefiting from. Giving back isn’t going to make you money – it’ll probably cost you some – but there are good reasons why publishing or contributing back to open source projects is a rational thing for your business to do.

Open source is a great way to attract talent – hiring excellent people is hard and developers who enjoy contributing to open source software will be drawn to businesses willing to pay them to do just that. Their enthusiasm is infectious and will make your teams better. It’s a solid publicity tool to raise the profile of your organisation in the tech community. It’s a good way to enhance confidence in your business amongst technical people – if they can see your code, and it’s good, you’ll win supporters. In the end if you get external contributions back to your open source projects, that’s a nice thing to have.

If that all sounds intimidating, it’s ok – the fear of continual evaluation and scrutiny is human, especially when you consider we’re an industry of professional amateurs learning much of what we do as we go to keep up with the pace of change in the tech industry. As you increase your contributions, you get more familiar with the kind of feedback cycles open source gifts you with, and hopefully it all becomes a lot less intimidating – everyone is in it together. Reading lots of code makes you a better developer, and contributing back makes you better still. You’ll learn from experts, and maybe teach somebody else along the way.

So lets run an open source project!

Like everything else about building great software, open sourcing software requires discipline and effort. But there are some real world, practical tips to making your open source software successful. Remember that open source is a commitment – you can’t trivially “un-open” your software – once it’s done it’s done.

Don’t surprise potential users or contributors

Follow a predictable repository layout. There are some strong language neutral conventions for open source project topology that people have grown to expect. People know to look for familiar signposts in plain text or markdown; README, LICENSE and CONTRIBUTING files are essential and the guidance in them should always be accurate.

The README serves as the top level overview and getting started guide for your project. Compilation instructions, quick-start examples and links to any deeper documentation are essential. The CONTRIBUTING guide should give potential contributors useful information and your LICENSE file will likely be standard and people will expect to be able to see it.

Make sure building and testing is easy

Regardless of language or platform, you should stick to the established language conventions in that ecosystem.

People will give up on your project if the barrier to entry of build and testing is difficult or requires a lot manual configuration. Practically all mainstream programming languages have mature package management solutions, so use them. If you’re doing Ruby, make sure you’ve got a working rakefile, if you’re in .NET, I’d expect “F5” to build and run your project. Keeping that barrier to entry low is essential.

Where possible, leverage cloud continuous integration services to provide confidence in the current build of your software – it’ll help potential contributors know if they’re dealing with “works on my machine” problems.

Guiding contributors

The contributing file is your contract with potential contributors. It should give them useful information. You should make sure that you guide them towards running your test suite, explain how you’d prefer any pull requests or code submissions to be delivered, outline the coding conventions, and highlight key contributors.

Obviously this is a two way street and in order to encourage high quality contributions, you have to keep your end of the deal. It’s common to require a failing test and a fix for any contribution – this’ll make your life easy, but if you don’t publish a decent test suite or set of unit tests, you can’t realistically expect it. A lack of tests will dissuade contributions – would you change some code without knowing what the impact could be?

Be responsive and communicative

You might not want all the contributions that come your way, and it’s perfectly fine as a project owner to say no to a change that isn’t relevant to the software so make it clear what kind of changes you’re interested in. The simplest way to do that, is to guide users to create an issue in an issue tracker before they start working on a code submission. This helps stop people spending time and effort working on code that you later reject, preventing any animosity between potential contributors.

It’s also useful to create a roadmap of issues in your issue tracker, flagging simple changes that may be suitable for first time contributors if you’re looking to encourage submissions. This is a great way to gain confidence in submissions and a clear way to communicate the direction of development.

Finally, it’s important to respond at all. Respond to issues and pull requests in a timely manner, make sure you have a few canned twitter searches or alerts for people struggling with your software, and if need be, use free tools like Google groups to encourage searchable discussions that might help others later rather than private email conversations.

Don’t be afraid of criticism

By publishing your code, you’re welcoming comments and feedback – it’s not always going to be positive, but you should do your best to steer it towards being constructive. It’s worth remembering that if your code made somebodies job easier, or life better, it was worth publishing, even if it’s not the best code you’ve ever written.

Selecting a reasonable license

If you’re releasing source code, you must license it, even if you just want people to be able to “do whatever they want” with it. Choose A License offers excellent overviews of the most popular open source licenses, but the really short version is this:

  • If you care about users of your code contributing back and enforcing “software freedom”, choose a “copyleft” license, probably the latest version of the GPL. If you’re publishing a software library, you probably want the LGPL.
  • If you just want to put the code out in the open, and not have anyone try and use it against you in a court when they destroy their business with it, go with the MIT license.
  • If you’re worried about contributors submitting patented code and were considering the MIT license, you should probably go with the Apache license.

These are the most popular licenses, and they’ll probably cover what you’re trying to do. It’s worth noting that the GPL is a viral license, requiring software that includes GPL’d code to also be released under the GPL – it’s central to the philosophy of the FSF and the free software movement, but can be a barrier to adoption in for-profit organisations who don’t want to open source their own software.

Things to avoid

There are a few anti-patterns when it comes to sharing source code.

Using an open source repository as a “squashed, single commit mirror” defeats much of the purpose. Compressing your commits into single “Version 1.2”, “Version 1.3” commits hides the evolution of the software from people who might have a genuine interest in the changelog. This leads people to believe the the software is “open source in name only” and it’s hostile towards contributions.

Avoid pushing broken builds to the HEAD of your repository – if need be, maintain a development and a master branch, with only good, clean, releasable code going into master. This is just good practice, but when people who you don’t know could well be building on your codebase it becomes a worse than just ruining your colleagues day.

A quick recipe for success

We’ve talked about a broad range of topics here that will help you run an open source project responsibly, and why you’d want to do that – but lets nail down a specific pattern for running your first open source project using GitHub.

  • Sign up for a GitHub personal or company account (free for open source)
  • Select a license (Apache or GPL are sane defaults)
  • Publish your code in a Git repository on GitHub
  • Publish tests with your code
  • Use GitHub issues to construct a roadmap of future features
  • Tag some future features as “trivial” and suitable for new contributors
  • Include a contributing.md file that asks for a test and fix in a pull request
  • Discourage people sending pull requests of refactors or rewrites without prior discussion
  • Include obvious scripts in the root of your repository called things like “build-and-run-tests” to give people the confidence to contribute
 
*Footnote: Free Software vs. Open Source Software

There has long been contention between the concepts of “free software” and “open source software”, and while “all free software qualifies as open source, but not all open source software is free as in freedom” – I’m going to be avoiding the distinction here. If you’re interested in the discussions around this, this summary on Wikipedia is a good place to start, along with GNU’s article on “Why open source misses the point”.

If you’re not familiar with the distinction, free in “free software” is free as in “liberated, independent, without restrictions”, while many mistake it to mean “costs no money”. This is often explained as “free as in speech, not free as in beer” which I’ve never thought as an especially informative one-liner.