Wednesday, 1 August 2018

“P is for Power”—Android engineers talk battery life improvements in Android P

This interview is all about battery life.

With the last version of the Android P Developer Preview released, we're quickly heading toward the final build of another major Android version. And for Android P—aka version 9.0—battery life is a major focus. The Adaptive Battery feature will dole out background access to only the apps you use, a new auto brightness scheme has been devised, and the Android team has made changes to how background work runs on the CPU. All together, battery life should be batter (err, better) than ever.

To get a bit more detail about how all this works, we sat down with a pair of Android engineers: Benjamin Poiesz, group product manager for the Android Framework, and Tim Murray, a senior staff software engineer for Android. And over the course of our second fireside Android chat, we learned a bit more about Android P overall and some specific things about how Google goes about diagnosing and tracking battery life across the range of the OS' install base.

What follows is a transcript with some of the interview lightly edited for clarity. We also included some topical background comments in italics.

The right CPU core for the right job

First up on the docket is talk about CPU core affinity. Multi-core CPUs are all over the place nowadays, and while on a desktop you would get a CPU with many cores that are all exactly the same, on a mobile phone you usually get cores that come in varying "sizes" meant for different workloads. In a typical eight-core ARM design, you get a chip with a "big.LITTLE" architecture. That's four "big" cores that are fast and power hungry and four "little" cores that are slower but easier on energy usage. Having processes run on a big or little core can greatly affect how much power they use and how quickly they run. Assigning a process to a certain CPU or core is called setting the CPU affinity.

Android P is changing how CPU core affinity works for background processes, which should save a decent amount of battery power. This was given a throwaway line during the Google I/O keynote, but I think this is the first time it has been talked about in detail. 

Tim Murray: We've actually been doing core affinity work for big.LITTLE platforms since 2015. So, this is actually what got me working on performance, initially. Back in March 2015 or so, I actually read your article on the HTC One whatever-it-was, the first phone with Snapdragon 810.

Enlarge / The HTC One M9.

Ron Amadeo

Murray is talking about the HTC One M9 review, featuring the Snapdragon 810's infamous heat problems. 

Murray: I read it, and I was kind of looking around at some performance stuff in general on Android, too, but I knew we were using Snapdragon 810 for the Nexus 6P that year. I read the article, and you said like, "Hey, it runs really hot." And I got to think, "I wonder if we could do better." So, I started thinking about that and working on that with people on our kernel team and on the framework team. What we came up with was a way to control the affinity of services and specific processes from ActivityManager.

In Android, an "Activity" is a single screen of an app, like say, your email inbox, so the system-level service "ActivityManager" does what it says on the tin, it manages activities (and background services)—opening and closing them as requested or as needed for memory usage. 

Murray: From activity manager service, we track what's important to the user. Think of activity manager service as kind of a "macro-scheduler," I would say. While the kernel scheduler makes decisions on the millisecond or microsecond level, activity manager service tracks these kinds of macro-interactions, like, what services are running? What app is currently in the foreground? What can the user actually perceive? For example, if you're running navigation and listening to music and your screen is off, we know that even though the screen is off and you're not interacting with your phone, you care about navigation performance. You care about music performance. You probably don't care about much else at that point.

So, we started enforcing affinity controls using the knowledge in activity manager service. We started off really simple, so background services and cached apps could only run on little cores. Foreground services could use some big cores, but not all of them, and the app you're currently interacting with can use any core. When we tried this, it kind of blew us away. It was a double-digit percentage increase in performance per watt on every test we tried. So, basically, informing the kernel scheduler, kind of constraining the kernel scheduler about what's important to the user, enables it to make much better decisions that result in much better power and performance trade-offs.

So, we've been doing that for a long time now, and even on Pixel 1, which is still a big.LITTLE CPU, but it's much less big.LITTLE than other CPUs. It's still of benefit there, so we've used it on everything.

What we did in P was we looked at what was running when the screen was off on the big cores because big cores draw significantly more power than little cores. What we found was there were a lot of things running that were related to the system. So, they were kind of system services that were running. We looked at how many of these were performance critical and it turns out, not many—when the screen is off at least. If they are performance critical when the screen is off, it ends up being bound as a normal foreground service or something else. There's some other chain that informs activity manager that this process is important.

In P, what we did was, when you turn the screen off, these kinds of system services get moved to a more restricted CPU stack. So, rather than being able to use all of the little cores and some big cores, we just restrict them to only using little cores, and it saves some battery. It makes your battery more predictable because if there is ever a case where a system service is going to use some amount of power when the screen is off, now that power draw is reduced dramatically, just because the big cores are so much bigger than the little cores and draw so much more power as a result.

Ars: You said you started with core affinity on Google devices. I always thought of CPU scheduling as something that was up to the individual device manufacturers. Do you think something like this will survive the customization that happens on third-party stuff?

Murray: Yes, we actually do see this used on third-party devices. It's just part of normal Android, so you can build an Android image with support for CPU sets, and then the right stuff just happens. The OEM doesn't have to do anything except set up the CPU sets for their particular processor. It's not some big, invasive thing we had to do. It's a pretty straightforward tweak inside how we manage scheduling from user space.

Ars: So before, did the system just never bother with moving the background tasks around to other cores? It was just a free-for-all?

Murray: I wouldn't say free-for-all (laughing). Prior to 2015, we hadn't looked at a big.LITTLE SoC, in depth, as part of Nexus. OEMs had their own approaches to deal with this, but most of the time that was focused on the kernel scheduler and making decisions purely in the kernel scheduler to try to get the same effects. All we really did was make that explicit and make it easier for whatever kernel scheduler wasn't used—whether it was one of the HMT variations or EAS or whatever. We make it easier for the schedulers to make the right decision, kind of reduce the complexity of the kernel scheduler because you have all this information from the higher levels of the system that you can use instead.

Benjamin Poiesz: To expand on this, if you like, when you have a scheduler seeing like, "Hey, there's a bunch of work." It sees lots of work needs to happen. It doesn't understand is that important work or is that not important work, and the more things Activity Manager can teach the lower subsystems about, "Hey, the user's actively engaged with something," or, "This needs to happen, but please do it as efficiently as you can," then smarter decisions are happening under the hood. That was one of the key things.

The takeaway was: when the screen's off, there's probably not much that needs to happen right away. You could infer, like, "Well, maybe there just isn't that much work being generated, so then CPUs will stay low." But sometimes for different reasons, the subsystems, it'll set up alarms, set up jobs, try to do processing. And there wasn't really a way to articulate, "I have a bunch of stuff to do, but do it whenever." This gives us a better way to do that that's much more implicit, as opposed to asking all of our engineers, "Please flag what's important or not." That's gonna be hard. This makes it much more implicit.

Ars: So it's like a lot of the JobScheduler stuff but for CPU core affinity?

Poiesz: Yeah.

JobScheduler started in Android 5.0 as new traffic cop for background tasks. It promoted a "lazy" mentality for apps when you were on battery power. Apps want to wake up the phone to do something, but that uses power. So for non-critical tasks, one of JobScheduler's advances was promoting the idea of just deferring a task until the phone was plugged in and charging, where it had virtually unlimited power. These core affinity changes are applying the same line of thinking to core affinity: if you aren't important, you don't need to be sucking up system resources. In this case, apps get sent to the slower, lower power cores if they aren't important.

Poiesz: It also has nice carry-on effects for things like JobScheduler because it's implicit. If the screen is off and an app files 100 jobs, those things now will not get out of control. Because, the CPU, normally, in the past, in a very simple role, would be like, "Oh, my God, there is so much work for me to do. I better try to get it all done," and this makes it so that doesn't happen.

Murray: For example, if some background task wakes up and wants to run CPU work for one second, in a world without CPU sets or a sufficiently smart scheduler in the kernel, that would probably end up on the big core. The big core would probably end up running at maximum frequency, which draws a bunch of power. There's no indication of whether that work is actually important, whether it's actually latency-sensitive. And so we want to save the big cores for work that is latency-sensitive, when you're on battery, because we know that using the big core will consume more power. But if something isn't latency-sensitive, just move it to the little core and save a bunch of power. It's not going to run so slow in comparison that you will end up burning more power. The performance difference is not as big as the power difference.

Poiesz:  But another [example]—just in this kind of space that had happened under the hood, and involved apps, but it wasn't like a classic thing that needs developers to do work—is vDSO optimizations. The idea was allowing some of the function calls that apps would make to be able to call into the kernel but stay in user space.

vDSO is a "virtual dynamically linked shared object"—a way to export kernel space routines to user space. This is beneficial since making a kernel service request, aka a syscall, incurs a performance hit. 

Murray: So, things to read the clock, usually. Things that would be syscalls are now no longer syscalls.

Poiesz: So, that had been done for 64 [bit CPUs], we hadn't really done it for 32 yet, and this is just one of the things, like "yeah, good optimization, it should do this." When we did it, we weren't necessarily going like, "Oh yeah, this is gonna be a biggie." But we did it, we did some experiments on it, and suddenly we saw measurable, sizable results coming out of it. We're kind of scratching our heads going like, "But, why?"

We ended up looking into it, and we found situations where some apps are aggressively calling, and I think Tim's example time is the right one. They were aggressively calling time functions; it was thousands of times a second that these calls were happening... and you could be like, "Wow, I'm not even mad. It's just amazing," but it took this relatively inexpensive thing that we made even cheaper, and, suddenly, because it was amplified by thousands per second, you could see this very impactful change.

So, one, the quick thing is like, "Yeah, let's make this optimization, easy-peasy," but then it also triggers follow-on work. You start thinking about, "Hey, why are apps doing this? What's the strategy to deal with that classic problem where these low-level system calls are getting invoked?" It's not practical to log every system call. That would be absurd, right? But how do we find some way to find these much more easily and aggressively throughout the OS because, you know, you find one, it's probably not the only one.

Enlarge / The Android P release schedule.

We do experiments for this. It's a really nice piece of infrastructure, re-building out on Android for few a releases. That enables us to do these experiments in the beta populations and on Googlers to see, "Hey, when you make this change, what do you see happen in terms of power or performance in a lot of our key APIs?"

To me, it's more interesting when you do the experiment and you're like, "What?" Right? Because that's where you're gonna learn something interesting, and sometimes we get a nice one like this. We've also had the opposite, where we're like, "What? I thought that was gonna be a big one, and it ended up being very little." But we get both, and every single time we learn something new about the system because it's very complicated from top to nuts to bolts, right? It's almost like its own ecosystem at this point. So, it's always interesting to see what happens.

Several times in this interview we're going to hear about Google's experiments on the beta population, which is something I wasn't aware of until now. The "infrastructure" Poiesz talks about is most likely the A/B testingavailable through Google's Firebase developer console. Through Firebase, app developers (in this case, Google) can remotely swap bits of code in and out of an app for testing, all without having to update the app through the Play Store. For users this is invisible, and you'd never know a test was happening unless you noticed whatever was changed. I haven't heard of A/B testing happening at an OS level in production, but apparently people on the beta builds get to be Google's guinea pigs. So a future note for people running beta versions of Android: don't take battery life or performance issues too seriously, you might just be being experimented on. 

Better battery bugs with lots and lots of data

Enlarge / All the data from this setup check box does actually go somewhere.

Next we're going to learn about Google's great Usage and Diagnostic Data Dashboard. When you first set up an Android device, one of the slew of check boxes you're presented with is an option to "Send usage and diagnostic data" to Google. (After setup you can find this in Settings -> Google -> (top-right menu button) -> Usage & Diagnostics.) 

As the text says, this check box enables Google to automatically collect "diagnostic, device, and app usage data." All of this data does actually go somewhere, and that "somewhere" is a giant data dashboard that Google uses—among other things—to diagnose, experiment, and gain insight into battery life across the Android install base.

It's on by default, so I'd guess the majority of the two-billion-strong Android user base is funneling usage data into this dashboard. I'd imagine it's a treasure trove of data. 

Ars: I've been loving the Android P battery life. It seems like it's a lot better.

Poiesz: Good (laughing), that is very good to hear.

I don't think we're done. It's continuing and it's been an evolving process, but there was a lot of internal jokes of "P is for Power." It wasn't the official thing for P, but it was convenient timing. Tim's changes on background cores and the larger work of how we get the framework and the kernel to work better is a big piece of it.

We also did a huge amount of bug fixing, which is maybe not the most exciting space, but I don't think we can emphasize enough how much bugs can really cause a lot of power to disappear. Whether it's a crash or whether it's just something spinning in ways that it shouldn't. We fixed stuff across the board in terms of how Job Schedule functions, in terms of how Dose even works. Like, it's been around for a while, but we noticed some inconsistencies in behavior, and so we went through that and cleaned up stuff, and a lot of it is continually rising tide.

Variability's the biggest thing that we tried to tackle. "Percentage improvements" tends to be how people think about battery, but one of the big things we worry about a lot is making sure you have consistency as a user. Like, if you have one bad battery day a week or a month, even if every other day is good, that one day is gonna drive you nuts, and it makes you question whether or not you can depend on the phone. And so that's been a really big area, and the stuff that we talk about in background cores helps for that, things like Adaptive Battery really help for that. So, those have been some of the big ones.

Murray: I think one of the interesting things to take away from battery is that, first of all, it's incredibly complicated. I've been doing performance now for years. Power is relatively new as a focus area that I've started working on. Power is way harder because there are so many things that impact it.

The other takeaway is that we don't just have variability for a given user. There is variability across a population. I don't know if we can share like a dashboard graph, but when we look at battery life that users see, any number of events will change battery life across a population. For example, if you look at battery life around Christmas, the battery life that people see around Christmas is amazing because nobody's using their phone, nobody's sending email, and nobody's doing any of these things. This makes studying battery life incredibly difficult because it's very hard to separate the impact of whatever change you may have made from the broader trends going on in a population.

And so this is where the experiment framework that Ben was talking about comes in, which lets us try to get a better sense of the impact that changes are having, so we can understand battery life and try to improve it.

Poiesz: And it helps normalize structures. Christmas was a great example because Christmas Day is also like, you know, people unbox phones. We got a common case that happens, people call each other. So, you see screen on power takes a hit on that day because you do video calls. Video calls are one of the more expensive things you can do. But with your screen off, right, when the screen's not being used, very little is happening because everybody's quiet, especially for that week on either side of Christmas. Things are subtle.

But so many holidays show up and are not necessarily, obviously, a US-centric holiday. Something that is in a different region has an event and, it's like, "Yeah, we don't work in this country," and suddenly you can see these shifts in percentage points, and, "Oh, no." You don't want to react and panic, but also it makes it really hard to figure out, "Well, what's going on? Is something okay? Is something problematic, and what are the right trade offs to make?"

Murray: I would bet money that you could see trends caused by World Cup.

Enlarge / Certain events allow Android engineers to see extreme peaks and valleys in terms of usage—World Cup probably included (though maybe not Pele at the 1966 WC).

Art Rickerby/The LIFE Picture Collection/Getty Images

Ars: What is this dashboard? Is this just data from Googlers, or is this a back-end Android thing?

Poiesz: When you set up your Android device, there's a thing for Android metrics collection. It's health metrics.

This gives us a level of visibility into what's going on, and it can help us understand when we're doing a change or when we're seeing a regression to give us an idea of what's happening, and so that's something that we use to make sure that new OSes and new features are continually getting better, and it's a good enabler of what's going on.

Ars: Wow so like, two billion devices all get fed into this dashboard, and you guys can see all the stats?

Murray: I don't know what the number is of how many devices it is, but, yeah, it's a lot.

I took a guess with "two billion devices" because that's the number of active Android devices that are out there. In reality, you can opt out of the data collection, so it has to be somewhat less than two billion. It's on by default, though, so I doubt that many people opt out.  

Poiesz: And so that ends up being a decent tool, and that dashboard we're mentioning is very, very, very high-level to make sure that no one's seeing things they shouldn't. But the idea is that it helps us see what's going on to understand these issues. And the experiment stuff, what we run on the beta, you probably experienced. I think, maybe we emailed about issues with your screen brightness, they were running experiments in a beta population, just due to tuning and trying different models.

We have the infrastructure to see like, now that we have enough people who are using the beta, you have a better signal-to-noise ratio of what's going on. We can then make sure the things that we want to have working are working, are causing positive trends in the directions that we want, so that the end user gets their official OTA. They have something that's well-tuned, and we know the things that we want to work correctly, and so we're trying to use that beta population, that preview as a way to make sure, as we're going along the path, that things are working in the way that we want them to be working.

More talk of battery experiments, which help explain the wide range of Android P battery stories you'll hear out there. Mine has been great on the Pixel 2 XL, but on say, the /r/AndroidPreviews subreddit, you'll find that battery reports are all over the placeAndroid P also marks the first time an Android beta has been available on non-Google devices, further diversifying the range of devices running this beta code. 

It's only battery savings if you wait for power

Any time you're not using your phone, the processor tries to go into a deep sleep mode, because that saves the most power. When apps need to do background work, they can keep the phone processor awake with a "wake lock." One of the ways JobScheduler and Doze mode works is to clamp down on wake locks that apps are holding. Wake locks held in the system is something Google can track, and it's something advanced users can track with various apps. One of the points Poiesz and Murray wanted to make is that, once you get past one wake lock, though, more wake locks don't really matter.

Poiesz: The other space we try to understand battery is wake ups and wake locks.

Traditionally, you would say, "Oh, yeah, wake locks are bad. Wake locks keep the CPU up. Just drive wake locks to zero, and that's good. Any wake lock should try to be minimized." But it ends up being much more nuanced because the incremental cost of wake locks is very low—it's arguably almost nothing.

If you have no wake lock and you do a wake lock, that means now the CPU is being raised into a higher state and kept there, but if you had more wake locks on top of that first one, it doesn't incrementally make it much worse. And so it makes it really difficult, because, say, if you just start counting wake locks, is that really helping you lower the number of wake locks you might see? Oh, we reduced wake locks to 20 percent—let's make up a number—and then, did that also correspond in a power savings of 20 percent? Probably not, because it's much more nuanced about when the wake lock is being held, or even our subsystems of the OS being really clever and saying, "Hey, I'm going to opportunistically hold wake locks," meaning, "If something's happening for other reasons, I'll also do my work then." That's a really smart strategy even though your wake lock count won't change. The 'when' changed, but the 'what' didn't.

Ars: Right, so what matters is sleep time, not wake locks, I guess.

Poiesz:  Right, not necessarily. It's having that understanding of when you held the wake lock, what happened is now the critical thing to understand, not necessarily that it held the wake lock. And that level of indirection makes these problems really complicated.

The new ambient display in Android P.

Ars: OK, so what's an example of an opportunistic wake lock?

Murray: Think of it as not necessarily an opportunistic wake lock, but let's say you have a background job running. A background job needs to wake up and talk to the network, and it's gonna wake up for five seconds. Meanwhile, you have some other process running, which went to sleep, but it's running a lot of background CPU activity. When that first job wakes up, it's now holding a wake lock for five seconds. It's gonna keep the CPU out of suspend for five seconds. Now, that other process is able to run again. So, that will consume some amount of power, but it might look like, "Oh, this wake lock seemed expensive in some way. We saw battery drain while this wake lock was held." It doesn't mean that the thing holding the wake lock is doing anything wrong. It just meant that the system was online, and something happened somewhere, something was using power.

Poiesz: And it works well when it happens, so let's work through this as a debugging scenario. You go, "Oh, OK, well that wake lock has that first thing held. Get rid of that wake lock, and you'll save power." Well, that work that Tim was describing is still gonna happen somewhere. It's just going to happen somewhere else now, and so your net change may actually be nothing even though you removed that wake lock. And so it gets pretty interesting.

It's also the same, similar thing with wake ups. It's like an app triggers an alarm. When the alarm triggers, if the OS was already up for other reasons, that alarm's pretty cheap. If, when that alarm triggered, your device was in a relatively deep state of idleness, that's pretty expensive. So, it's that level of indirection in all these metrics makes it actually pretty interesting to figure out then. Like, how do you actually figure out attribution of, "Well, why did X, Y, and Z happen, and if X, Y, and Z didn't happen, what would've happened instead, and would that result in an improvement?" It's just peanut butter. You're just moving it around. You're not necessarily reducing the amount of the peanut butter, you're just spreading it more thinly. Because the phones are unplugged for so long, it doesn't matter much to your power life as a user if something happens at 9am, 10am, or 11am. It still happens.

Murray: Really, there are two ways to improve battery if you can't change the hardware. You either run the same amount of work more cheaply, somehow, which we do by things like reducing wake ups and things like that. Or, you just run less work. That's it. Those are the only two things you can do.

Poiesz: And deferring until power-on is one of the biggest things. Actually, this is an interesting point. It's not the thing we talked about with Adaptive battery. When the screen turned on before, we used to really commonly run jobs. We'd say, "Oh, the screen turned on. Let's go run some background work," right? And JobScheduler would stand up and do a wide amount of processing. And one of the things that changed with battery was we'd say, "Well, sure, my screen's on. Let's go run work for the apps you're using or that we think they're going to use." Everything else now gets deferred, and every time that you defer one of those apps that would've run work at that screen-on event. If it's deferred now all the way throughout the day until you're on power, then that's a savings. It still happened, but it happened when you were plugged in, as opposed to when you weren't.

Before, JobScheduler would maybe look at it and say, "Well, if the screen is on, very few things compete with the power drain of the screen. So, what's a bit more work?" So, those do add up into more and more savings, but they only add up to the savings if you're able to defer all the way until you're on power. If you're really just deferring it to the next screen-off event, it's peanut butter.

You may look at what happened and be like, "Yes, we nailed it. It's delayed. Everything's good," but you still delayed it to a state when you were still unplugged. That's the really tricky part, and I think Tim's summary is a really good one of, like, that's all that really matters in a particular device. Hardware does get better, and that's something that we always love is the hardware continually getting better and better and better, especially on the small cores. And that helps because it just helps lower the floor, and the lower you can make the floor, the better off you are.

Ars: You really feel like the hardware gets better all the time, in terms of power usage? Because it seems like battery life doesn't change that much in terms of full device usage because they just keep getting faster.

Murray: So, let's see: we had A53 on Snapdragon 808 and 810 and 5X and 6P. We had A53 in Pixel 2. No, it's the same core, but it draws a whole lot less power in Pixel 2 because in 5X and 6P was what, 20nm? The process nodes get a lot better. FinFET helped a ton. The power consumption on the small cores at low frequencies just gets better and better and better.

Ars:  OK, yeah, you mean just at the SoC, sure.

Poiesz:  You're right. Different parts of the hardware, some are getting better, some aren't. Some are neutral, and some are arguably getting worse. It depends on what you're looking at, like air quotes, "worse." The thing is, like, more RAM on device. That's definitely been a trend is adding more RAM to hardware. RAM is kind of a parasitic drain because it's always on.

Shout out to devices like the OnePlus 6 with a whopping (and totally unnecessary) 8GB of RAM. 

Poiesz:  So, if you have more modules running simultaneously, that's more power in a continual draw state, and so that's an area where certain parts of the hardware are getting better. Certain other parts of the hardware in terms of performance and capability, but from a power perspective, they may be offset by other gains.

In general, I think hardware is taking us in a positive direction, but it's always more nuanced than, "Is it up or is it down? Which thing is it?" Like, AMOLED screens have been another interesting space where it enables a bunch of use cases that weren't possible before because of the nature of AMOLED. Black pixels cost effectively nothing, and then that opens up a lot of cool features being possible.

Murray: One of the things that's kind of weird about phones and mobile SoCs as compared to, say, desktop machines are the big power draws. GPU is gonna be 300ish watts, CPU is another 100ish. Then you've got everything else, and it's pretty minor. It doesn't work that way in mobile. It's much more balanced between various components. There's no one thing that is like, "Here is 80 percent of your power. It goes to this one component." It doesn't work that way, and it's also work-load dependent. The units that cost a lot when you are taking a picture aren't the units that cost a lot when you're just streaming music or something. So, there are a lot more things to understand about power consumption than what's running on the CPU. It's also, what's your cell service? Is the modem drawing extra power because you're in really bad cell reception? Things like that.

Poiesz: Yeah, the music one is always a great one because many streaming services now offer caching, so you can pre-download stuff, so that, yeah, you're still playing and you're causing audio output and CPUs run into it processing, but the modem doesn't have to turn on to pull down contents on a periodic basis to make sure it's pre-fetching your audio. If you have it pre-downloaded, it's cached, and so stuff like that actually makes a pretty dramatic difference in your power profile.

But these things are all super nuanced. To understand what use case is happening, what's happening in the app, and what choices the user made determine your power profile. I guess one of the biggest challenges, in the long run, when we get things that are... if you imagine a world where the OS is about as efficient as it can be, what's really remaining is what apps the user installs, and how the user behaves with those applications will now determine your power. Right? If you play VR games on your phone, your battery life profile's gonna seem pretty different.

How do you help users understand that? You don't want it to be blaming, but you want people to also understand there's scarcity in this device. It only has so much room for batteries, and if you want to be pushing it pretty hard, these are the physical realities, and how do you help people understand that? On the whole, we don't want people to worry. Like, use your phone as you want to use your phone, and have a great day. But on the extremes, understanding how you're interacting and are playing with your apps and services, and the ones you use end up being a bigger and bigger factor. As more and more of the OS and the hardware becomes more efficient, it largely becomes user-behavior dictated.

Enlarge / The home screen of the revamped YouTube Music app, running on an iPad.

Jeff Dunn

Ars: Yeah, I never thought about the music thing. That's cool. I need to download all my Google Music stuff now and save battery.

Murray: Just to add a little bit more color to how weird this is, we had an issue a couple years back where, when you were listening to music, the phone could either hold a wake lock and keep the CPUs online, or it could power down and rely on some other hardware component to play music and just wake up every 10 or 15 seconds or so to refill that buffer. It actually used more power, at the time, to turn the CPU off than if you just hold a wake lock because there were some other crazy things happening in the kernel. And I think we got it pretty close to parity, but the point is that until you actually explore the space and see what's actually running for any particular work load, it's very hard to say what is the ideal for power.

Poiesz: Another good example of that we've been looking at is when to go into deep doze, like a full doze. That's primarily for if you put your phone down and haven't touched it. Like, my phone's been hanging out here most of the time. There's an idea, like you could naively say, "Well, to save more power, go into doze immediately, right? Screen off, go into doze, save power. I'll take my promotion please, thank you."

But in reality, the real trick isn't going into doze, it's coming out because, when you're coming out, you're gonna reset your connections. You're gonna wake up some of the apps, all the things that had been deferred while you were doing full doze now wake up, they freshen up because it's arguably been a while. That costs a fixed amount of time.

The metaphor for this can be is it better to stop your car and restart your car or better to let it idle if you're in traffic? Very similar, and so the question then becomes, "Well, what is the right number and the right conditions of that when to go into this full doze because there's a cost coming out to get everything up and running again and fresh when maybe it would've been better just to hang and idle?" So, that's where all these kind of things and why experiments for us are a way to try and get at this stuff because it's all gonna depend on how the user is using the phone and your ability to predict what's gonna happen, what's the right number. Five minutes, ten minutes, an hour? They're all gonna end up changing the power profile.

Ars: OK, and are these prediction attempts why I ended up with the AI-powered adaptive battery?

Poiesz: So, adaptive battery was one of those ones where we're trying to get better and better predictions about what we think you're going to be using, so then those things can be allowed to run more, and the things that we don't think will be used, we're deferring them. Before the JobScheduler and alarm manager, I didn't really have that as a concept. We mainly led the developer to find, purely, "Here's my constraints, and here's how aggressive I want to be," and, you know, big surprise, developers have their own goals. And that's fine, and I get that. It makes perfect sense. It's a lot to ask a developer to say, "Hey, if you don't think the users can use you, be quiet." Like, I don't know how you're gonna ever say that to someone legitimately, right, to "please limit yourself." Usually, developers actually want the opposite and neither disengages. You want to reacquire.

The OS, though, is a good steward. It's in an independent position to say, "OK, I see what the user is engaging with. They want this, they don't want that." In certain areas you're trying to understand the user's behavior, where I get app usage and should we run a job or not? That's where, I think, some of these machine learning techniques will be really powerful. It'll spore out over time.

Ars: So are we going to see any major advances from the switch to Skia?

Murray: It's a nice backend improvement, but it's not really noticeable... I mean, I guess it's noticeable to me who looks at frame time benchmarks, but it's not really noticeable to developers.

Poiesz: That's generally true of most of the things that we work on. It's pretty rare to find one singular change, and you'll hang your hat on it and say, "Oh, everyone's gonna see this. Everyone's gonna notice it." Because of all the nuanced behaviors, usually these changes, you're tackling like one vertical of it. There's so many other facets that can affect what's going on, so it's kind of a, "We have to do all the things." The long tail is long, but the long tail adds up to a very big percentage, so it's really good to go after all these things. I think Burke said this, and I've kind of keyed on it as a term, "No single thing that you would do is sufficient to solving the problem, but all these things are necessary." We have to go after everything because that's the nature of the beast. Every small little piece, they will all add up.

Thanks to Benjamin Poiesz and Tim Murray for their time. If you liked this interview, then comment below and share it with others

No comments:

Post a Comment

GOOGLE'S HUMAN-SOUNDING PHONE BOT COMES TO THE PIXEL

COMES TO THE PIXEL ALYSSA WALKER; GETTY IMAGES/JONATHAN STOREY “Uhm,” said the female voice. "Can I book a table for tomorrow?" ...