Building well-balanced data science teams in conjunction with a relentless focus on creating data-based products for customers are the critical elements of any successful data science programme.
Sam Taylor, head of data science at Trainline, gives this advice to peers, while reflecting on the work of the team they have built over the past four years. And he puts a big emphasis on having a core data science team with people from different academic disciplines.
“Our main goal has been to use Trainline’s data to build great data products for our customers,” he says. “We spend a lot of time working with designers who draw on user research and listen to our customers, understanding what they need,” he says.
An example of those data products is SplitSave, which makes it easier for passengers to find the cheapest fare for a given journey, using the phenomenon of split ticketing. Trainline claims split ticketing could save UK train travellers up to £340m in 2020, compared with the cost of direct train ticket searches made via the Trainline app in October and November 2019. Or, at least it would have done, before the Covid-19 coronavirus crisis reduced train travel to a bare minimum compatible with public health safety.
The company has around 50 people in its data team, with a smaller team of hard-core data scientists, with people who have PhDs in physics, mathematics, bioinformatics, computer science and similar.
Taylor himself studied computer science at Aston University followed by a Masters in machine learning, data mining and high performance computing at the University of Bristol. His career includes a stint at Blackberry, where, he says: “I realised that I’ve always had a real curiosity about how things are working. At Blackberry, we were trying to figure out how the 3G protocol stack works and trying to predict causes of calls dropping and data science really took my fancy from there.”
“I think that the key thing in any data science and data team is to be solving really challenging problems,” he says. “And you need people with a variety of backgrounds for that, both academically and culturally. It’s very hard to solve complex problems if you are always going in from the same angle. For example, people in physics will tackle a problem very differently people to people from bioinformatics and vice versa. So, you do need that variety.”
Data science is also, he says, a team sport, and so questing for “unicorn data scientists who can do everything” does not make sense.
Nor is he a fan of the isolated data science labs approach that may work for the bigger technology and media companies, such as Google or Netflix.
“I have seen these bigger companies have research labs,” says Taylor. “At a company of Trainline’s size, we see the value that comes out of data science for our customers, in terms of ease of use, saving money, as well as the booking and travel experience. What we’ve built and how we work works very well for Trainline and probably would for a lot of other companies in our [ecommerce] space.
“I think the key thing, in this regard, is to really understand our customers because we work closely with the teams that have listened to them – that is to say, the product and user research teams. And we are very close to the engineering teams to help us to build and iterate on our data products so we can deliver ever increasing value to customers.”
When recruiting, Taylor says the main thing he looks for is curiosity, specifically “are they curious about understanding why things work?” And that is demonstrated by the questions they ask. “A lot of data science is about asking the right questions, and then diving into the data and figuring out the answer to those questions,” he says.
Domain knowledge, or at least an initial interest, is also important, he says. “The other good sign is that they want to know why things are happening in the remit of the Trainline, and how trains work.”
The data science team itself, he emphasises, is first and foremost focused on solving problems for the company’s customers, but another aspect of keeping them interested is the space they afford to “journal clubs”, where they will discuss the latest papers in the data science field. “There is nothing better than releasing a feature then seeing a commuter using that,” he says. But he is still leading a team that is fascinated by data problems.
“There is a cross-pollination of people reading each other’s work and really understanding and progressing. That’s one of the quirks that data scientists want to understand the new techniques and what other people are doing. So, there’s an element of how you can allow that while also delivering great value. You have to find that balance between learning, which is an essential part of data science, and delivering for the business.”
Python is the team’s main language. They use Spark for big data processing and Kafka for real time data streaming, says Taylor. “We use AWS heavily. All our infrastructure is in the cloud, and so we rely on a lot of the support that they give us.”
It is also important, he says to engage with the broader data science community, which, immediately means the community in London, though the company has offices in Edinburgh and Paris, too.
“There is a community of people wanting to share what they’re working on and their ideas and how they approach problems,” he says. “That is super engaging. We often host meetups at Trainline showcasing what we have worked on or how we approach problems and also invite others to present.”
The current public health crisis due to Covid-19 has put a dent in train travel in the UK. Once this is over, people will probably flock back to the trains. “Ultimately, we want to help move people to more sustainable forms of travel,” says Taylor. “So, travelling on the train, rather than by air or by car. So, whatever I can do to make train travel better and cheaper and easier is something that I’m really looking forward to in the future.”