Boss n' Data Podcast Appearance - 28th Oct 2022
Book Review: Comet for Data Science
Another fun book review!
Shifa Ansari and Packt were kind enough to send me a copy of Angelica Lo Duca's 'Comet for Data Science' recently. I've been making my way through it and deep diving into the bits I'm particularly keen on. I think Comet is a great tool and a nice piece of the MLOps stack.
- Lots of detail, really helpful in reading through the examples
- Has some nice touches like the treatment of feature engineering steps in pipelines as their own 'models' you should version and track (I'm stealing this!).
- Chapter 1-5 are really great for classic data science workflows, including covering how Comet can help with those presentations on model performance. Chapter 5 is titled 'Building a narrative in Comet', which is really helpful.
- Chapters 6 and 7 have some really good sections on DevOps, MLOps and how you can use Comet with CI/CD processes in GitLab (I am also stealing this!). Also an introduction to Kubernetes which was great to see. To be honest these were my favourite chapters, lots of great stuff in here. Very good chapters for ML and MLOps engineers.
- Chapters 8-11 have nice worked through examples to bring things to life, including examples with NLP and deep learning models.
Some other points
The way the chapters are split out (as I described above) does mean that if you are not a pure data scientist, you may not get much from the first parts of the book. Not a bad thing, just something to be aware of.Taken together, I thought it was an extremely useful book to have on the shelf.
Original post here:
MLOps Live write up of episode
Neptune.AI did a cool write up of our episode from a while back, which you can see below.
Driven By Data Podcast Appearance - 10th May 2022
Humility in Tech
There have been a lot of amazing attempts recently to help the people of Ukraine, who are obviously suffering on a horrific scale right now. And it makes sense that when people want to help they will draw on their skills and knowledge how best they can. As someone who likes to read into (and generally agrees with) the ideas of effective altruism, I do think this is a positive thing to explore. If you have very sought after commercial skills that often attract high salaries and general support and praise from the wider community, the desire to use those skills can be particularly strong.
Where we can run into an issue though, is when we want to help and the desire to help using your specific set of skills leads to you making the cardinal sin of thinking that helping is straightforward, easy and something you can do on your own. No surprise here, but I am particularly thinking about those working in technology, specifically software and its related disciplines.
I’m going to pick on this area not just because I know it best but also because I do think this area of expertise is particularly susceptible to what I’m talking about. If you are a software developer or a data scientist, then you have been told by countless media articles that you are working in the coolest areas of endeavour and the economy,you’ve been told constantly how your field is ‘eating the world’ and how your skills are in such high demand and low supply that companies will pay you very well to get you to do your thing for them. I do not want to say that none of this is true, technology is one of the most important and dynamic fields out there and the jobs do indeed pay well. The thing we need to do as technologists (and supporters of tech) though, especially when faced with a horrendous humanitarian crisis that asks challenging questions of ourselves like “how can I make a difference?” And “am I brave enough to help?” Is remain humble. What I mean by this is very explicitly we should recognize that while we may have an interesting and cool skill set, this does not make us uniquely positioned to help in super special ways all the time. It also does not give us a license to ignore true, deep subject matter expertise.
A brilliant, and slightly terrifying, example has been making waves recently. You may have seen it on the news as well, essentially a teenage Harvard student spun up a website to help match those wanting to offer their homes to Ukrainian refugees with refugees looking for shelter. On the face of it, this is a very noble and effective use of this person’s technical skill set. However, in the view of being humble, we need to dig a little deeper and check our assumptions about how technology can help without the guidance of subject matter expertise. The big problem here is that people with subject matter expertise and who had worked in this area before quickly picked up on the fact that this website was a security nightmare (see here, where one security expert called this site a potential 'Craigslist for paedophiles') . There was no validation of users, no real background checks in place for those offering accommodations and it was very evidently going to be a soft target for human traffickers and other bad actors. Is the person who built the site a bad guy for trying this? I think it’s very likely his intentions were utterly benevolent, but a bit of humility might have led him down a different path. For example, if he had contacted some effective charities working to help refugees (or even just looked into the topic a bit) it would have become pretty clear that actually a far better alternative would have been to signpost people to these charities on his site, or even help channel donations to them somehow. This is because these charities are effectively organizational embodiments of subject matter expertise, with very knowledgeable people, networks and processes that work together on the front lines to solve these problems. You can’t replicate that with a website, not with working in partnership with these people. A bit of humility would have gone a long way here.
In fact, our next episode of AI Right will cover this, where we discuss the amazing work that the Code Your Future team are doing right now to help refugees and disadvantaged people in a way that is safe and in line with the direction of charities and subject matter experts. This episode was really inspirational to record and will be humbling in a totally different sense.
Note: the developer in question has since tried to incorporate a lot of feedback on this issue into his solution.
Nice feedback on the book!
I've been really delighted to see such positive feedback on the book. When I got the thing published I was just glad it was going to be out there in the world and was going to act as a good go-to for my usual thought processes on building ML solutions. I didn't ever anticipate it would get such good reviews and endorsements!
Some of my favourites are given below:
"ML Ops is one of the hottest topics in analytics and data science at the moment. This book does a great job of providing a useful overview of the subject and really practical examples of how to do it in anger. If you are a data scientist or want to be one you should buy this book and read it before your next interview or big team meeting." -- Zachary Anderson, Chief Data and Analytics Officer, NatWest Group
"If you want to know how to take ML from applying the theory to actually doing it for real this book provides an excellent roadmap. For industry practitioners, this is going to become a required text and for academics like myself, this provides a detailed overview of the extra bits that we need to get students to consider in our courses. I cannot recommend this book highly enough!" -- Gordon Morrison, Professor of Data Science and Head of Computing Department at Glasgow Caledonian University
If these reviews make you feel like you might want to check out the book, then please do at the following sites!
Data Science and Technology Meetup Jan 2022 - Video Live!
The talk I gave at the Scotland Data Science and Technology Meetup is now up and available to view on YouTube. Previews of the MBN page for the talk and for the LinkedIn post about the talk going up are given below (really just so that I can play around with iframes :-D).
Interfaces and Contracts
I've been thinking a lot about interfaces and contracts recently, specifically when it comes to data solutions. I think this is such an important topic and it being forgotten about is an issue I keep coming up against everyday.
All I mean here is that if you are building a piece of software and it is not a self-contained monolith (which is generally not a great idea - maybe I'll cover that in another post) then you are likely building components that have to talk to each other somehow. Unless you're doing it by magic, in which case, wrong blog!
So if your components have to talk to each other, what language are they speaking? Could be anything from a JSON body in a REST API call, to just using an agreed upon schema or a dump of data on a schedule, to prescribing to a topic in a pub/sub mechanism ... The key thing is that even asking this question puts you ahead of the game, as people can leave this too late or not really appreciate it's importance.
On top of the language you're speaking, you also have to, stretching the metaphor here, agree the etiquette you will use in speaking to one another. This to me is represented by the contract between the components. For example, I may be using a REST API call to transfer data, but I have to do this in a specific way, usually defined in the documentation of the system I am calling. This should specify key-value pairs I can provide, including which are mandatory and which are optional, and clearly state the form of the return I will get in different scenarios. By having this contract in place it means that new users of the API can build resilient mechanism for calling the API and processing the response. You have to do something similar no matter the transfer mechanism, but you really have to do it. When you don't do this, or more likely when you leave it too late in a project, you are running the risks of building something that breaks easily, not building on time or not even successfully building anything.
So, do me a favour. When you are building your data solution, make sure to consider the interfaces and contracts in the solution. If you don't, I'll be annoyed and I'll think you've wasted someone's time.