Humility in Tech

There have been a lot of amazing attempts recently to help the people of Ukraine, who are obviously suffering on a horrific scale right now. And it makes sense that when people want to help they will draw on their skills and knowledge how best they can. As someone who likes to read into (and generally agrees with) the ideas of effective altruism, I do think this is a positive thing to explore. If you have very sought after commercial skills that often attract high salaries and general support and praise from the wider community, the desire to use those skills can be particularly strong.

Where we can run into an issue though, is when we want to help and the desire to help using your specific set of skills leads to you making the cardinal sin of thinking that helping is straightforward, easy and something you can do on your own. No surprise here, but I am particularly thinking about those working in technology, specifically software and its related disciplines. 

I’m going to pick on this area not just because I know it best but also because I do think this area of expertise is particularly susceptible to what I’m talking about. If you are a software developer or a data scientist, then you have been told by countless media articles that you are working in the coolest areas of endeavour and the economy,you’ve been told constantly how your field is ‘eating the world’ and how your skills are in such high demand and low supply that companies will pay you very well to get you to do your thing for them. I do not want to say that none of this is true, technology is one of the most important and dynamic fields out there and the jobs do indeed pay well. The thing we need to do as technologists (and supporters of tech) though, especially when faced with a horrendous humanitarian crisis that asks challenging questions of ourselves like “how can I make a difference?” And “am I brave enough to help?” Is remain humble. What I mean by this is very explicitly we should recognize that while we may have an interesting and cool skill set, this does not make us uniquely positioned to help in super special ways all the time. It also does not give us a license to ignore true, deep subject matter expertise.

A brilliant, and slightly terrifying, example has been making waves recently. You may have seen it on the news as well, essentially a teenage Harvard student spun up a website to help match those wanting to offer their homes to Ukrainian refugees with refugees looking for shelter. On the face of it, this is a very noble and effective use of this person’s technical skill set. However, in the view of being humble, we need to dig a little deeper and check our assumptions about how technology can help without the guidance of subject matter expertise. The big problem here is that people with subject matter expertise and who had worked in this area before quickly picked up on the fact that this website was a security nightmare (see here, where one security expert called this site a potential 'Craigslist for paedophiles') . There was no validation of users, no real background checks in place for those offering accommodations and it was very evidently going to be a soft target for human traffickers and other bad actors. Is the person who built the site a bad guy for trying this? I think it’s very likely his intentions were utterly benevolent, but a bit of humility might have led him down a different path. For example, if he had contacted some effective charities working to help refugees (or even just looked into the topic a bit) it would have become pretty clear that actually a far better alternative would have been to signpost people to these charities on his site, or even help channel donations to them somehow. This is because these charities are effectively organizational embodiments of subject matter expertise, with very knowledgeable people, networks and processes that work together on the front lines to solve these problems. You can’t replicate that with a website, not with working in partnership with these people. A bit of humility would have gone a long way here.

In fact, our next episode of AI Right will cover this, where we discuss the amazing work that the Code Your Future team are doing right now to help refugees and disadvantaged people in a way that is safe and in line with the direction of charities and subject matter experts. This episode was really inspirational to record and will be humbling in a totally different sense.

Note: the developer in question has since tried to incorporate a lot of feedback on this issue into his solution.

Interfaces and Contracts

I've been thinking a lot about interfaces and contracts recently, specifically when it comes to data solutions. I think this is such an important topic and it being forgotten about is an issue I keep coming up against everyday.

All I mean here is that if you are building a piece of software and it is not a self-contained monolith (which is generally not a great idea - maybe I'll cover that in another post) then you are likely building components that have to talk to each other somehow. Unless you're doing it by magic, in which case, wrong blog!

So if your components have to talk to each other, what language are they speaking? Could be anything from a JSON body in a REST API call, to just using an agreed upon schema or a dump of data on a schedule, to prescribing to a topic in a pub/sub mechanism ... The key thing is that even asking this question puts you ahead of the game, as people can leave this too late or not really appreciate it's importance.

On top of the language you're speaking, you also have to, stretching the metaphor here, agree the etiquette you will use in speaking to one another. This to me is represented by the contract between the components. For example, I may be using a REST API call to transfer data, but I have to do this in a specific way, usually defined in the documentation of the system I am calling. This should specify key-value pairs I can provide, including which are mandatory and which are optional, and clearly state the form of the return I will get in different scenarios. By having this contract in place it means that new users of the API can build resilient mechanism for calling the API and processing the response. You have to do something similar no matter the transfer mechanism, but you really have to do it. When you don't do this, or more likely when you leave it too late in a project, you are running the risks of building something that breaks easily, not building on time or not even successfully building anything.

So, do me  a favour. When you are building your data solution, make sure to consider the interfaces and contracts in the solution. If you don't, I'll be annoyed and I'll think you've wasted someone's time.

Machine Learning Engineering with Python - available for pre-order!

Machine Learning Engineering with Python: Manage the production life cycle of machine learning models using standard processes and designs by [Andrew McMahon]

After many months of writing, developing, designing and researching after work and at weekends, my book is now available for pre-order on Amazon!

I'm really excited as I've always wanted to write books and I had set myself a goal of writing at least one book by the time I was 30. So as long as the book is fully available for order in October as planned I would say challenge completed (my birthday is at the end of November).

Why did I write this book?

I have been working for a few years in a variety of roles in data science and machine learning (ML), and my career has now strongly went in the direction of focussing on productionization. A totally made up word that we all now use to mean 'taking ML proof-of-concepts and making them into working software solutions'. I've found this to be the hardest problem in industrial data science and machine learning, or at least the hardest problem that comes up often enough to justify focussing on it.

So given this focus and how much I know teams can struggle to understand some of the things they need to go from cool idea or draft model through to working solution, I decided to write this book. It's by no means perfect, but I hope it's a good collection of some of the ideas, tools and techniques that I think most important when it comes to ML engineering.

What's in it?

The book consists of 8 chapters:

  1. Introduction to ML Engineering
  2. The Machine Learning Development Process
  3. From Model to Model Factory
  4. Packaging Up
  5. Deployment Patterns and Tools
  6. Scaling Up
  7. Building an Example ML Microservice
  8. Building an Extract Transform Machine Learning Use Case

The book will kick off with more strategic and process directed thinking. This is where I talk about what I think ML engineering means and what is so different about building ML solutions vs traditional programming.

We then move onto learning about how to create models again and again by building training services and then how to monitor models for important changes like concept or data drift. I then discuss strategies for triggering retraining of your models and how this all ties together.

Moving on there's more of an emphasis about some important foundational pieces like how to create good Python packages that wrap your ML functionality and what architecture patterns you can build against.

The last piece of the book focusses on deep dives on a specific topic covers mechanisms for scaling up your solution, with a particular focus on Apache Spark and serverless infrastructure on the Cloud.

Finally, the book finishes with 2 chapters on worked examples that bring together a lot of what's been discussed earlier in the book, with a particular focus on how to make the relevant choices to be successful when executing a real-life ML engineering project.

What's next?

First of all, I'm just super excited this is real. As I mentioned at the top of the article this has been a dream of mine for a long time and I think the topics are important ones to discuss. Hopefully the book can help people in data science, software development, machine learning and analytics roles be successful. That would make me happy too.

In terms of what's next I'm thinking that it would also be beneficial to expand the topics of the book into an online course. That way, people who would like the material structured in a slightly different way can also get the benefit. It would also allow me to expand on some of the material a bit more, giving a more conversational flavour to the material. I like the sound of that, hopefully you do too!

All in all, I just hope people enjoy the book and get benefit from it. I benefited immensely from technical books in this space when I was starting out (and I still do) so I'm really glad I can make my own small contribution to that body of learning.