Taming Android One Device at a Time – This is How We Do It

The Apkudo For Developers (hereafter, AFD) backend is the brains behind your app tests. Made up of hundreds of devices networked together, this distributed system allows you to analyze your app’s performance and compatibility across the industry’s most comprehensive portfolio of Android devices. This post goes over some of what makes it tick, as well as some of what ticks us off—the problems we’ve run into while setting it up.

Let’s Get Started

We’ll start things off with a brief overview. The AFD backend consists primarily of a set of servers running custom software that allows us to manage, and, more pertinently, automatically run your app on, a huge variety of Android devices.

In total, the backend comprises:

  • 1 job server
  • 1 update server
  • 27 runner servers
  • 512 registered devices, about half of which are ready for active use (more on this later)
  • 1 database server

The job, update, and runner servers form the main part of the system, and we’ll explain them in more detail in a bit. The database server stores information about the devices, such as their location on our device racks and whether they’re currently activated. The devices themselves consist of a wide range of Android phones and tablets from both US and overseas markets.

In addition to our management software, the system also uses several tools and facilities provided by the Android project itself: the Android Debug Bridge (ADB), which is built into the Android operating system and allows a device to be controlled and monitored from a host computer; logcat, which offers similar functionality to the syslog facilities available on *nix systems, and monkey, which automatically generates input events for an Android device.

Making it All Work

We like to keep our devices as close to stock as possible; it is our policy not to root them unless there is some feature that’s absolutely required and isn’t available in the stock image. This is to ensure that the results we give you exemplify the typical user experience. The setup process for devices, therefore, consists mainly of recording asset information, ensuring that ADB is set up, and installing a device management app that allows us to control connectivity and screen settings, among others, on the device. Once this basic setup is complete, it’s time to plug the device into the AFD backend.

Each rack consists of 6 shelves, each holding up to 12 devices.

Each rack consists of 6 shelves, each holding up to 12 devices.

As well as the devices themselves, the backend comprises several types of servers working together. Runner servers are each in charge of managing ADB sessions for a small set of devices. This is necessary due physical and practical limitations that restrict the number of devices that can be plugged into a single host to about twelve. Runners are controlled by the job server, which manages and tracks the execution states of test runs across the entire set of devices. The job server is, in turn, controlled by the update server, which retrieves new test run requests from the frontend and passes them on. The servers may also be controlled manually by system admins, but in practice they generally operate autonomously.

Apkudo Backend Diagram

Due to the inherent bottlenecks presented by the devices (only one app can be tested on a device at a time), the hardware requirements of the system are rather modest. The runner servers are a fleet of mid-2011 Mac minis, which are well suited to the task due to their compactness and availability. Software-wise, they run a minimal installation of Debian Squeeze due to memory and Bluetooth conflict issues in the OSX version that shipped with them. The job, update, and database servers are more “traditional” Xeon boxes with more RAM and processing power at their disposal, also running Squeeze. The servers communicate with each other, and are controlled, using custom Python programs we developed specifically for the AFD service (more on this to come in a later post).

Once a device is hooked up to the system, it is ready to start running apps. Exercising an app consists of installing it on a device; running monkey on the app; collecting performance data in the form of monkey and logcat logs; uninstalling the app; and returning the logs to the user in addition to a pass/fail analysis of the test run, if available.

Keeping devices charged requires a bit of juice, which is supplied by the powered USB hubs through which they are connected to the runner servers. The USB specification allows a device to draw up to 500mA while connected to a host, which is enough to keep most phones topped off.

Biting the Hand That Defragments You

Our goal at Apkudo is to reduce fragmentation in the Android market. Somewhat ironically, we often find ourselves disadvantaged by this same fragmentation, which frequently throws a wrench into our analytics works. We’re ultimately at the mercy of device manufacturers with regard to a number of components necessary to communicate with devices, many of which we are not able to alter if they exhibit problems.

Here are a few of the types of problems that we encounter fairly often:

  • The Android operating system itself is the biggest wildcard here, or at least the hardest to deal with. OEMs may modify the Android code any way they see fit before burning it to a device, and, since we don’t make any modifications to the stock image, we must work around any issues we encounter in these customized OSes. This can make it difficult, for example, to handle crashes that leave a device responsive to ADB but unable to perform any actions requested of it. In extreme cases, an issue with a customized OS image is cause to “fail” a device (addressed below), while in less severe ones sometimes-flaky software is the primary reason for variations in device count and installation failures during test runs.
  • Freezes, restarts, and force-quits can leave a device in an inconsistent state, with settings changed and apps installed, which can lead to devices’ storage being filled and subsequent install fails.
  • monkey can sometimes break out of its “cage” (the command that restricts it to performing actions within a specified app) and wreak havoc on the OS at large, in some cases going so far as to disable ADB on the device. Other times, devices may turn themselves off entirely, due either to inactivity timers or dying batteries. This is another reason why devices may sometimes be unavailable, since we cannot change monkey or disable timers without modifying the operating system, and dead batteries take some time to charge.
  • Hardware issues can also cause problems. Swollen batteries constitute a big subset of these, and usually coincide with OS freezes (although which issue causes which is still under investigation). Unfortunately, hardware issues usually require individual physical intervention and so take longer than software issues to resolve.

Some devices have turned out to be incompatible with, or too unstable to function in, our testing system. These “failed” devices include many tablets, which often cannot charge when connected to a data transport. Other failed devices have versions of monkey which do not implement the options needed to sufficiently control their execution, or may have issues connection over ADB, even when reset to factory condition.

Monitoring to the Rescue!

While our software is quite adept at running apps on devices, there is a limit to what we can do programmatically once a device starts malfunctioning. This rings especially true when the malfunction involves ADB and cuts off communication to the device.

For situations like these, we have a set of tools that allow us to monitor the state of our devices remotely, particularly their ADB connectivity. For devices that may be connected to ADB but are otherwise uncooperative, we can also view the results of recent test runs, to discern devices that have consistently turned up “error” results, and we’re constantly on the lookout for techniques that may allow us to better determine a device’s internal state and whether it can be fixed automatically.

Monitoring is pointless without fixing, however. Most of the software problems we encounter are alleviated by running periodic tests and “cleaning cycles,” which ensure that the devices are running and communicating properly and are free of user apps. If that doesn’t fix a device, a factory reset is usually in order; most devices behave themselves afterward, at least for a while.

Hardware problems present a more difficult problem, and, obviously, can’t be fixed automatically. In this case our software can “deactivate” a device and ignore it until it is fixed and reactivated.

Failed devices are not discarded. They instead live in a sort of “limbo” state from which they may be activated if a solution is found to the problems they exhibit.

In Conclusion

Wrangling Android devices is tough business, as it turns out. As in any development project, the typical use case is quickly overshadowed by myriad edge cases that tend to cause mayhem whenever they surface.

The AFD backend was designed from the ground up to handle devices that are at times uncooperative, to say the least, and we strive to make as many devices available to developers as we can. To that end, our system is flexible, modular, and includes a variety of monitoring tools that let us know when a device is misbehaving so that we can get it back online ASAP.

If you have any further questions, feel free to ask in the comments.

Happy hacking,

Joe Tuzo
Software Engineer

Join the discussion No Comments

  • Mark says:

    Great to know what goes on under the covers of a great testing tool.

    Any chance you will be allowing more flexibility in the tests that can be run? For example my app is large enough that I seem to fail to install on many of your devices – so allowing the install timeout to be longer would probably help a lot. Additionally just 1500 events passed via the Monkey is nothing – it only ever gets to exercise a tiny fraction on my app – perhaps you could allow “off peak” users to specify an increased number of events to be dispatched?

    Finally you’ve put a lot of effort in to the UI for displaying results, and for casual usage that’s great, but I really want to be able to download all the results and look at them offline (or rather for my own scripts to process them offline!) Any chance you could add a “download a zip file of all results”?

    Keep up the good work

    • apkudo says:

      Thanks for your comment, Mark!

      That’s really great feedback. We’d be interested to know how many events you’d find most beneficial so we can work out how much time to allot to each individual test run. It’s something we’ve been debating internally, so let us know what you think so we can test the system.

      We sure have put a lot into the UI. Our frontend dev has done some really innovative stuff with D3 and HTML5 Canvas (working on a blog all about it). We’ve heard from a few Devs that the ability to download results is a feature they’d like to see. We’re looking to implement that really soon.

      Appreciate your feedback!

  • L says:

    I want to be able to:
    1) use some custom testing scripts (like monkeyrunner)
    2) take screenshots of my program
    3) select devices to run my program on
    4) filter logcat messages

  • […] for their app. If you would like to know more about the testing process Joe Tuzo has completed a blog post on the test procedure Apkudo are using. For Indie Android developers and small development studios […]

Leave a Reply