Four Pitfalls of Agile Work in Data Science Teams (And How You Can Avoid Them)
- Von Natalia Grinberg
- Agile, Data Science, Project management
Share post:
Background
The Agile Manifesto was written in 2001 by 17 software engineers who wanted to give the typical software development projects of their day an overhaul: instead of detailed planning for tasks years in the future, unrealistic timelines and a plethora of unnecessary documentation, they advocated for delivering software that works and meets the customer’s needs in a timely fashion.
Over two decades later, agile methodologies like Scrum have become the de-facto standard not only in software development but also in adjacent fields like, for instance, data science.
Whoever has practiced agile methods in both software and data science, will have noticed that the effects on data science products and projects can differ from those on software teams quite a bit. There are a number of common pitfalls that can lead to difficulties delivering the desired value to stakeholders and clients and might yield plenty of frustration for everyone involved in the meantime. In this blog post we will explore four common patterns and some ways to keep them from raining on your agile data science parade, regardless of your role in the agile team.
Big tasks are hard to break down and estimate
Due to their explorative nature, the early phases of data science projects are particularly difficult to estimate!
Especially the early phases of any data science endeavor are exploratory in nature. Whether it is the actual data exploration phase or ridding the datasets of outliers and other impurities: more often than not you just cannot tell how long it will take. In the same vein, it can be equally challenging to break down a big task into smaller increments, since you just cannot anticipate what you will end up dealing with. This problem is typically not as big of an issue in software development and agile frameworks do not come forth with straightforward remedies to this. One way to deal with these issues is to let go of some “comme il faut” prescriptions on how tickets are to be written and estimated, regardless of how useful and well-intentioned they can be in software development teams.
DO:
Give it your best shot at making the stories and tasks as granular as you can and let “good enough” be good enough. Define crystal clear acceptance criteria for each increment and trust whoever will work on the task to figure out their way to reach the stated goal. If it helps to have the data scientists define subtasks as they progress with their task, in order to increase transparency and structure, go for it.
DON'T:
Let a task become a never-ending story (pun only semi-intended) without a clear outcome in sight. Just because the path to a goal is not completely charted, does not mean that time and resources cannot be managed. If the goal is clearly specified, effective and frequent communication within the team will show if meaningful progress is being made. Timeboxing certain tasks (e.g. “let’s give this idea a try for two days maximum to see if it works”) is also a helpful way to be prudent about precious development time while not neglecting experimentation.
Dependencies to other teams slow down progress
Data science teams are especially prone to be dependent on other teams to deliver necessary inputs or feedback. The data might be managed in a dedicated DWH team, the cloud infrastructure is in the hands of the internal IT and business stakeholders are scattered all over the place. Maybe you work with a client that struggles to provide information in a timely fashion. The impediments that result from these circumstances are a staple complaint in agile retrospectives. Aside from re-designing the entire organization to serve the needs of the data science unit, what could be done to remedy this pain point?
Data science teams are particularly susceptible to being dependent on other teams
DO:
Clarify within the team what you need from other people or departments on a regular basis and propose a low-effort way for the other party to provide these items. Effective communication with the right people can absolutely make or break a data science project or data product development. This holds equally true for client projects and any initiatives within a company.
DON'T:
Host pity parties in dailies and retrospectives without taking ownership of the success of the endeavor and proposing ways to make it happen despite organizational deficits. This is where empowered agile coaches can make a big difference to the team’s impact on their company or client.
Communication and business objectives are misaligned with the data science team
Data science teams (similarly to software dev teams) at times speak quite a different language than their stakeholders or clients. It is not uncommon to measure the contribution of a product owner by how effective they are able to bridge this gap. Nevertheless, everyone in an agile team can and should contribute to being in alignment with business objectives and efficient communication with stakeholders and clients.
Each member of the data science team should know his/her contribution to the company's goals
DO:
Be very clear on what your stakeholders or clients are trying to achieve and how the data science team contributes to those goals. Make sure that the reviews and other meetings enable the stakeholders to understand what progress is being made and the benefit that is provided to the company or the client. If your stakeholders would benefit from a crash course on data science terminology and concepts, do not hesitate to offer such training to them because it is bound to improve communication going forward. In the end the data science team benefits the most from their stakeholders participation and suggestions.
DON'T:
Invite stakeholders not familiar with the details of data science to the review and proceed to discuss highly technical concepts because they happen to be interesting to a hypothetical data science audience. The reviews are a valuable occasion to elicit stakeholder feedback and help to improve the outcomes for the business. Technical discussions ought to take place in a more appropriate setting if the stakeholders are not equipped to meaningfully contribute to them.
Adapting an agile mindset proves challenging
Data science as a discipline has been booming for just over a decade, and specialized courses have become increasingly common over the years. Nonetheless, the majority of data scientists are still recruited from STEM, economics and other quantitative fields taught in higher education. Such an academic background brings with it a set of values and attitudes that, for example, emphasize thoroughness and individual performance over experimentation and teamwork. Getting comfortable with the values and expectations of agile work can be demanding and counterintuitive to many individuals. It certainly takes time and a healthy dose of courage.
It takes courage to embrace the agile way of working
DO:
Be very clear on the team’s expectations regarding values and cultural norms. Give constructive feedback and discuss openly what is working and what is not.
DON'T:
Fall prey to stereotypes about certain professions or academic backgrounds. People learn and adapt to their environment more than we sometimes think is possible. The culture of the group and the incentive structures people find themselves in, shape how people interact with their environments. Thus it is an exercise in leadership to shape said culture and incentives to produce the desired results.
Conclusion
Even though the agile methodology was not necessarily designed with data science in mind, with a few tweaks it can be a natural fit. Communication and a growth mindset are every bit as important to the success of a data science product or project as the code and the math involved.
This list is far from exhaustive but attempts to shed some light on very common patterns that baffle plenty of product owners, agile coaches and data scientists. With artificial intelligence gaining relevance these days, new phenomena are bound to emerge and might warrant a sequel to this article at a future point in time.
Natalia Grinberg
Lead Data Scientist