Archive for the ‘Programming’ Category

Cutting CODE!–A livestream show for programmers

Sunday, February 1st, 2015

I’ve spent some time recently thinking and discussing the idea of live-streaming coding sessions. It started with conversations with my brother about how there’s not really a Twitch TV for programming, but if there was I’d be really into that.

In a classic case of “The Simpsons Already Did It” a week after floating the original idea to start a pair-programming streamed show with Rob Cooper, Scott Hanselmen posted “Reality TV for Developers – Where is Twitch.tv for Programmers?”. At about the same time a new sub-reddit grew out of /r/programming called /r/WatchPeopleCode/ – the timing seemed a little too good to be true, so last Sunday I did a stealth trial run and mutely live-coded two hours of hacking on a random library I’ve been spiking. It was fairly dull stuff, but about 80 people came and went over the duration of two hours.

That’s enough of an audience for now. I love writing software, and I love pairing, so earlier today I got together with Chris Bird and we streamed our first “live-on-air code kata”. It clocks in at about two hours, and was fun to put together. Time allowing, I’m going to aim to put one or two of these together a week, ideally sticking to a ping-pong-pairing with conversation format.

Here’s the YouTube recording of the pilot “Cutting CODE!” stream, where we build an image to ASCII art converter in two hours, entirely driven by tests.

JustGiving’s love affair with Nancy – A case study

Monday, October 7th, 2013

During late 2011 to early 2012, JustGiving re-evaluated our approach to internationalisation. We shifted focus from having several systems sharing the same brand, too consolidating around our first, and biggest system, based out of London. As we started to consolidate our technology, we began maturing our software to support both internationalisation and multi-tenancy, making the decision that “one platform to rule them all” was a more appropriate design choice that endlessly porting features between different regional installations.

As part of this process, we had to evaluate the software and frameworks that we used while building new international functionality. One of the cornerstones involved enhancing our existing, fairly simplistic payment processing facilities, and enhancing them to support multiple currencies, multiple operating accounts, in different regions, through different payment service providers. We knew that we couldn’t rely on our sole existing payment provider (well, and PayPal) if we were to accept and settle transactions in more than twenty currencies, and we realised that just sticking with one provider was both risky from a business continuity perspective, and would end up costing us over the odds as we scaled out.

At the time our payment processing facility consisted of a Windows service, and an MSMQ queue installed on each of our web nodes. Users would make donations in their browser, data would be encrypted and stored in the database, and a message to start processing would be popped onto a queue. And… well, and we’d run some SQL and verify that everything had worked. When we were processing UK transactions, straight through, with no complexity, this was just about sufficient. We had debug logs, which were OK, and we could run a query or two to verify the number of transactions processed over a time window. The implementation was similarly straight-forward; a bunch of threads running .NET2.0 style MSMQ listeners, that would block until a message was received, and then call our payment service provider, persisting the result. Our payment service was simple, but it was also simplistic.

Then we sat down to think about what we’d want from an international payment service. We wanted rules to route payments between more than one payment provider, we wanted semi-automatic retrying and resilience, and we wanted to support new types or payments – pledges, things that required different types of processing. But most importantly, with this increased scope of complexity, we needed the kind of visibility we’d never really had with our invisible services before.

Just as we were starting to wrangle with the fact that we needed to completely re-work lots of our payment infrastructure, a framework called Nancy (or #NancyFx) started making ripples in the open source .NET community. There were a couple of frameworks at the time claiming to be .NET implementations of Sinatra (the popular ruby web framework), and we evaluated both Nancy and a competing framework called Nina. At the time, Nina was “feature complete” (and minimalist in its feature set) and Nancy was still under very active development, but there appeared to be some considerable hustle behind Nancy, with an obvious roadmap, and support, or planned support for popular IoC containers, view engines and other useful web stuff. This middle line between being an ultra-lightweight framework while supporting things that our development teams used and understood was compelling, especially when coupled with Nancy’s permissive hosting model – you could use it in IIS, you could use it hosted in WCF, it could host itself. We spiked up a quick sample app in an afternoon and immediately saw how we could iteratively introduce Nancy into our payment services as part of its re-working to give us some of the visibility we needed.

We started our “gentle” introduction of Nancy (with caution), by using its self hosting assembly, and put it side by side into our existing Windows service implementation. We hooked up Ninject, and started using Nancy to produce a simple status page hosted from inside our service. As the iterations progressed and we re-worked the internals of our payment services, we started making more extensive use of Nancy, maturing it from a read only status page into a fully featured dashboard and configuration portal.

As we extended our new payment processing agent to embed a rules engine, we used the dashboard to message the rules that were in play. As we added multiple payment service providers, we provided UI for our devops guys to enable and disable each payment service provider, and as we made our error handling more robust, we provided an interactive retry queue, right from the payment processing agent itself.

We started to use singleton objects shared by both our payment transacting code, and the Nancy modules, to recorded real-time statistics and surface them in graphs from within the application – giving our devops guys the visibility and confidence they needed when introducing new rules, making changes, and monitoring the performance of payment providers. Payment service providers are notoriously flaky, and the kind of statistics we were able to gather (average request times, specific errors, and interactive graphs on the dashboard home page) on several occasions allowed us to be the first client to respond to any outages, notably, before the payment service providers themselves even knew.

Nancy was perfect at being an enabler, while keeping out of the way of our regular development process. It provided us with low friction infrastructure, support for technology we knew and used from ASP.NET MVC, and played along with our other open source components (ninject, nhibernate, automapper). The introduction was painless due to its myriad of hosting options, and it supported a test-first TDD workflow from the start. It just worked, and it worked well enough that we trusted it with millions of pounds worth of transactions each day.

Home

We went on to use Nancy in several other high profile internal projects on the back of our experience with it in the most sensitive area of our system – it made its way into payment settlement and reconciliation, PCI Level 1 compliancy code, and deployment tools. If you’re building test driven modern web apps, APIs or dashboards, I wouldn’t hesitate in recommending it as a solid technology choice.

Building software that’s easy to monitor and administer

Monday, September 23rd, 2013

When you’re building software that’s central to what your company does, it’s very important that you build software that’s easy to support.

This might sound like obvious advice, but at the end of the day, if something goes wrong, you’re likely to be the person called to support the live system, so you should make sure that the system you build exposes the information you’re going to need to troubleshoot it. When times are good, it’s also important for you to be able to see the status of a system that’s currently in-flight. People will ask, so you may as well arm yourself with the information you need.

Here’s some simple, and practical advice, to making systems monitoring-friendly.

Make sure you have logging

Again, seemingly obvious advice, but make sure you’re logging important information. Logging is a solved problem across many languages and frameworks, so don’t reinvent it. Use Log4X (.NET/J/whatever), and make sure the logs roll and are available to everyone easily. There are some great services out there that support syslog searching and indexing – check out papertrailapp.com for my personal favourite.

Track and aggregate errors and exceptions

Understanding what constitutes a “normal” amount of errors in your application is very important. There are plenty of reasons for websites to generate errors under traffic, web-crawlers generating invalid uris, poor or malicious user input, however a single error in a payment processing system is often critical. You should spend time understanding the error profile of your application – fix the bugs that cause the “expected” errors to ensure that “real” errors don’t get mistaken for noise. There are plenty of services out there to help you track and fold errors, I particularly like raygun.io for .NET and JavaScript projects (though their support is much wider). You want to watch general trends of errors over time, along with new introductions, to understand how to respond to errors in your software after launch.

Windows software? Use the event log!

Log files are great, but some solid event log messaging and some custom performance monitors in your application will make that special sysadmin in your life very happy. There are plenty of tools that can monitor these logs for messages, status codes and spikes in performance counters (including Microsofts own SCOM along with lots of popular third party tools).

Building system services? Don’t hide them!

System services are common and just as easily forgotten. If you’re writing “invisible” software, it’s important to force it into the limelight so people don’t forget it’s there, and especially so they notice if it’s not running. As good practice, I always recommend running monitoring dashboards from inside the system service to ensure people know it’s there. I’m a big fan of embedding a web server in all system services that would otherwise be invisible that provide monitoring dashboards with the kind of statistics that you’d need during troubleshooting. Your applications will know what’s important to them, so measure stats in real time and message them over HTTP – everyone knows how to use a browser, and the presence of the status page is a great way to monitor availability. If you want to do a great job, you can use graphing libraries and expose the data as json for other systems to query. Consider surfacing things like “average time to process a request”, “number of failures since launch”, “throughput” and other metrics that’ll help you if you’re investigating live issues. If you’re working in C# / .NET I highly recommend using NancyFx as an embedded webserver in your system services.

Building APIs? Measure performance and make use of response headers to message information

The performance of your APIs will help the apps that depend on them flourish or fail – and there’s nothing more frustrating than a poor feedback cycle as an API developer. You should measure, in memory, in real time, per node, the number of requests you’re serving, average requests a second, per method, the rate of errors, and the overall percentage of errors in calls. You should return the time taken on the server as a request header (something like “X-ServerTime”) to help the caller debug any weird latency issues they’re encountering, and you should offer this information over the API itself, either via a reporting or status API call, or through a web dashboard. When I was working at JustGiving, we put a lot of effort into the end developer experience, serving both the API docs and single node statistics to the public per node and it saved us weeks of debugging and messaging. You can check out an example of what we did here: JustGiving single node stats page – not only did it help us diagnose problems, but it helped people coding against our APIs verify error behaviour if they experienced it.

Whenever you’re building anything, remember that you, or someone you work with, is going to be the person that has to fix it if it fails. So be nice to that person.