26 août 2021
Posted by Kateryna Semenova, Android Developer Relations Engineer
Duolingo’s app began to experience growing pains due to scalability issues in their Android software architecture. They were able to solve these performance problems and regain developer productivity, by refactoring to a Model-View-ViewModel architecture and using Android Jetpack’s Dagger and Hilt for dependency injection. To learn more about how this impacted their business, read the accompanying article here.
Duolingo is the world’s most popular language learning app, with over ten million daily learners, because they’ve managed to make something people found daunting feel easy and fun. This continued success relies on a constant stream of innovations and updates — and a smooth-running app that can deliver all of them. To Duolingo, a single unresponsive app in a device anywhere in the world could mean a learner potentially discouraged. This commits them to app excellence, particularly on the Android devices used by sixty percent of their learners, including their CEO, who keeps track of the app from an entry-level phone. And so, when Duolingo's Android development team registered an increase in “App not Responding” errors, dropped frames — and even received a hand-written complaint — they took action immediately.
Their situation wasn’t that uncommon. Apps that lack scalable architecture and clear best practices often perform well at the beginning but show signs of technical debt as they grow. Duolingo’s Android codebase was designed to allow them to add and release new features rapidly, but the lack of an agreed-upon architecture was manifesting in increasingly frequent performance regressions. It was starting to suffer from unreliable frame rates, visually inconsistent or broken interactions, and a growing assortment of new bugs. These regressions not only inconvenienced learners but also cost the team substantial development effort to diagnose and repair. Duolingo’s Android development team realized that if they wanted to keep shipping new features while providing the target level of user experience, a new approach to their codebase was needed.
Discovery
First, they had to get to the bottom of what exactly was going on. A deep dive into the numbers uncovered that, as they added new functionality, the app’s rendering performance was regressing 5-10% every month. In fact, one particularly unwieldy release had increased crashes by 10%, slowed frame renders by 25%, and saw lessons starting 70% slower on entry-level devices.
Further analysis of their code led them to the conclusion that most of the app’s issues could be traced back to a single bottleneck: a global state object called DuoState, which was responsible for maintaining state across different features of the app. A number of popular features (like experience points and daily streak tracking) were using it to access vital information. Centralizing their data in this way had once enabled the team to iterate rapidly. They simply added properties to DuoState whenever a new feature needed to share information across the app. But now the unoptimized and frequent access to the object was causing increasing performance regressions.
DuoState was so tightly coupled to the entire codebase that even small changes could impact the rest of the app. The team feared that a minor new feature could have the unforeseen side effect of triggering many internal updates to the app, causing the entire release to be too slow for many devices. These performance regressions became more frequent as the app grew, and the team onboarded new engineers to keep up with the accelerating product roadmap. In 2020, as they added more developers, they were starting to see significant regressions cropping up as often as every 90 days. Upon closer inspection, the likelihood of a regression in a given release was proportional to the number of changes it implemented. At this rate, these regressions would completely derail the product roadmap within a few years.
This outdated architecture had become a bottleneck for both the performance of the app and the velocity of the team. After much internal debate, they stopped development of new features, including some closely tied to their bottom line. For two full months, Duolingo’s development team went all-in on refactoring their Android app in an effort they called the “Android Reboot”.
One of the team’s first key takeaways was that their code lacked clear boundaries. The DuoState object was readily available at any point in the code, inviting developers to access it frequently in inefficient ways. They needed to create a greater separation of concerns within the codebase. They decided to tease apart each feature into its own, clearly-defined module, using the Model-View-ViewModel architectural pattern. MVVM allowed them to remove calls to the monolithic DuoState object, letting many modules work in separate threads.
The team’s familiarity with MVVM, and Google’s support for it, made it an obvious choice. It allowed them to clearly document what logic should go into what files (including views, view models and repositories). This helped make their feature architecture more consistent. With a clear path to follow, the team quickly began refactoring their monolithic code into sets of classes with clear boundaries and responsibilities.
Along with MVVM, the team used Dagger and Hilt (also included in Android Jetpack) to implement repository patterns to replace DuoState. Dagger generates clear readable code that provides verbose error logging designed to help developers understand exactly what their code is doing, eliminating dead stack traces to reflected properties; and Hilt reduces the amount of boilerplate code needed to write for this.
This new architecture allowed the team to split DuoState into smaller objects. This immediately reduced unnecessary coupling between domains. For example, the code responsible for tracking a user’s progress could now access their experience points but not the number of times they’ve logged in during a month. These new architecture guidelines meant that while no single thing was too difficult to change, it took coordination and planning to change it across the app. Implementing the new architecture across the code base drove significant performance gains in aggregate.
MVVM architecture facilitates a separation of concerns between the domain data, the interface the learners see, and the logic for how these two realms interact. It gave Duolingo’s developers a more deliberate way to control how the app responds to internal state updates. They could now develop more complex user experiences without the risk of triggering regressions, or affecting the underlying business rules.
In the past, inconsistent application of development patterns made different parts of the codebase harder to understand and maintain. Without consensus, each developer implemented code as they saw fit.
MVVM, Dagger, and Hilt, provided the team with a more detailed understanding of how new features should be implemented. Following these best practices made the code easier and more predictable. Developers could now assist in debugging features that they hadn’t originally worked on. And new developers could be onboarded more efficiently; as long as they understood the architecture, they could contribute meaningfully right away. This new clarity significantly boosted the team’s development velocity.
Crucially, the new architecture also revealed that certain animation features in the app were underperforming on entry-level devices. Accordingly, the other core focus of the Android Reboot was the reduction of jank, dropped frames, and "App Not Responding” (ANR) errors. The team used repository patterns to help streamline the sharing of data between threads. These patterns ensured that they could more efficiently use device resources with multiple threaded modules. Moving work off the main thread improved responsiveness, overall frame rate, and led to smoother animations on entry-level devices. Performance on flagship devices improved as well.
In the six months working with the new architecture, Duolingo’s Android team has continued to ship new features without recording significant performance regressions. The days where they had to halt feature production to hunt and fix bugs are safely behind them.
The app’s daily ANR rate dropped 41%. The percentage of time that the app’s frame rate fell below target decreased by 28%. And importantly, users experienced a 40% increase in speed when scrolling through lessons, the leaderboard, and stories in the app.
The reboot allowed Duolingo to consistently provide their trademark fun, effective, and delightful language learning experience on a much wider range of Android devices.
Duolingo’s dedication to their mission made them the world's top app in the language learning space. Their commitment to app excellence — creating cutting edge educational experiences without compromising accessibility — is what kept them there.
If you’re interested in getting your team on board for your own Android Reboot, check out our condensed case study for product owners and executives linked here.