Back Original

The iPhone's Last Stand?

Apple fans would, for years and years, sneer at Microsoft’s penchant for talking about products that may or may not ship, deriding them as vaporware. After Apple’s bungled 2024 launch of Apple Intelligence and new Siri, however, vaporware is fair game, and just in time for this Article.

Project Solara

Last week, at its annual Build developer conference, Microsoft put forth a vision for a new ecosystem of hardware devices under the banner of Project Solara:

The concept — which isn’t entirely clear from that video, but was more fully explained on stage — is that in the future you will be surrounded by an ecosystem of devices, none of which stand alone, but are more like portals to interact with your agents, which live in the cloud. In other words, as I wrote in February, Thin Is In:

This is even clearer when you consider the next big wave of AI: agents. The point of an agent is not to use the computer for you; it’s to accomplish a specific task. Everything between the request and the result, at least in theory, should be invisible to the user. This is the concept of a thin client taken to the absolute extreme: it’s not just that you don’t need any local compute to get an answer from a chatbot; you don’t need any local compute to accomplish real work. The AI on the server does it all.

I made the case in that Article that server-side inference would dominate AI workloads, thanks in particular to increasingly high memory demands for agents. What I found intriguing about Microsoft’s vaporware, however, is that it showcased a use case wherein this thin client approach was compelling for reasons beyond KV cache.

Specifically, for most of tech history computing has been indistinguishable from interacting; that’s why we place so much value on new input methods, as they often set off new paradigm shifts. By the same token, the problem with wearables as the paradigm beyond the iPhone is that interacting with them generally sucks. Sure, you can imagine a future where voice interaction is completely seamless or where a device can “see” what you see, but anything longer than a few seconds is much less convenient than simply swiping on your phone. Agents, however, compute on your behalf, without any interaction necessary: a few seconds is all you need to get work done for hours — at least in theory.

Siri AI

Apple, a company that can actually make devices, was under heavy scrutiny going into yesterday’s WWDC keynote for a different concern: can the company make AI? And, if your standards are the state of the art in AI circa June 2024, when Apple took their first crack at answering the question, they did quite well. The company’s pre-recorded keynote took great pains to show actual demos — spinning indicators and all — and they worked! Here was the first one of what Apple is calling “Siri AI”:

What’s fascinating about this specific demo is that it also showed just how far behind Apple is. New head of Siri Mike Rockwell successfully used Siri to set a reminder to enter a lottery for concert tickets, demonstrating context awareness and the ability to interact with the Reminders app through Apple’s App Intents framework; what would have been state of the art would have been asking Siri to enter the lottery on his behalf when the time came. In other words, to act outside of the interaction paradigm that has traditionally defined computing, and which Apple has dominated.

At the same time, the fact that Apple is behind the state of the art might not matter that much given Apple’s market and opportunity in that market. To start with the former, Apple is targeting consumers, for whom traditional chatbot functionality is probably sufficient for the vast majority of their AI needs. Siri will be able to give you recipes, tips on do-it-yourself projects, or generate images. Moreover, the fact that Siri will have access to your iPhone gives it all of the same advantages that made me optimistic about Apple Intelligence in the first place. From an Update after that initial June 2024 launch:

The key part here is the “understanding personal context” bit: Apple Intelligence will know more about you than any other AI, because your phone knows more about you than any other device (and knows what you are looking at whenever you invoke Apple Intelligence); this, by extension, explains why the infrastructure and privacy parts are so important.

What this means is that Apple Intelligence is by-and-large focused on specific use cases where that knowledge is useful; that means the problem space that Apple Intelligence is trying to solve is constrained and grounded — both figuratively and literally — in areas where it is much less likely that the AI screws up. In other words, Apple is addressing a space that is very useful, that only they can address, and which also happens to be “safe” in terms of reputation risk. Honestly, it almost seems unfair — or, to put it another way, it speaks to what a massive advantage there is for a trusted platform. Apple gets to solve real problems in meaningful ways with low risk, and that’s exactly what they are doing.

Apple actually made this version of Siri much more capable in terms of accessing world knowledge and image generation, which should make the experience much more seamless, but the real differentiation will clearly be that access to your personal information. You can ask Siri about something you received in messages — or was it email, or a voicemail? — and it will actually find what you’re looking for; it can also “see” what you are looking at on your screen, and act on the information. And, to the extent that third-party apps offer up their data to the Spotlight semantic index, and make actions available via App Intents, Siri can actually operate across different services in a way other AIs can not, at least without making massive sacrifices in security on a local Mac or PC.

The Consumer Market

These capabilities are genuinely useful, and there’s a good chance they’re enough, at least for now, and that’s because there is another aspect of the consumer market that is worth considering — beyond the fact that billions of consumers already have iPhones. Specifically, consumers don’t want to work, and don’t really care about being productive.

This reality about the consumer market is a lesson that Silicon Valley has to re-learn every decade or so. Consider Dropbox, whose founder, Drew Houston, is in the process of stepping down. Dropbox was a category-defining product that had a viral hook — if someone signed up with your referral code, you got more storage — and grew extremely fast amongst consumers; the company then spent too long trying to actually build a business in the consumer space, before finally realizing that the only way to make money with what was ultimately a productivity product was by selling to enterprise.

The reason is obvious when you think about it: enterprises are paying for their employees’ time, so of course they are willing to pay for tools that make those employees more productive; consumers, on the other hand, are mostly looking to waste time, which is why attention-harvesting advertising is the only software business model that works at scale for consumer services. The fact that Silicon Valley forgets this is downstream from Silicon Valley being a bubble; normal people aren’t looking for agents to buy them tickets to a concert.

Still, the bubble was strong enough to convince OpenAI to make the exact same mistake Dropbox did: the company somehow convinced itself that it could make enough money selling subscriptions to consumers; Anthropic, meanwhile, realized that it was enterprises who were willing to pay for AI’s massive productivity benefits, even as OpenAI failed to capitalize on their consumer market penetration by refusing to build an advertising product.

This is a long-winded way of saying that I don’t think that Apple’s agentic shortcomings are a big deal, at least for now. Agents help you do work and be more productive, and consumers don’t want to work or care about being productive. What they do want to do is watch short-form video, and an iPhone is simply much better at that than any other device ever will be; in that context, Siri being good enough is enough, and it appears that Apple crossed that bar.

The iPhone’s Centrality

There are actually a lot of interesting technical details about how Apple rebuilt Siri, including expanding Private Cloud Compute to include Nvidia chips running in Google data centers, as well as a 20 billion parameter on-device mixture-of-experts model that selects the expert on a per-query basis (as opposed to on a per-token basis) so that it can run in an iPhone’s limited memory.

The key strategic takeaway of these implementation details, however, is the centrality of the iPhone. Microsoft’s Project Solara obviously makes sense for Microsoft given the fact that the company missed out on mobile, but it also fits with the infrastructure of AI, which is in the cloud, and increasingly about compute happening without a human in the loop. Apple, in contrast, is heavily incentivized to preserve the iPhone’s importance, and by extension, to focus on use cases organized around human interaction.

However, it’s too simplistic to reduce these approaches to a cynical analysis of incentives; both make sense in their own right. What makes me intrigued about Project Solara is the fact that Microsoft is positioning it as purely an enterprise play, which is important because an enterprise has context about the work being done, making it more viable to build long-running agents — which the enterprise is willing to pay for. That context would be far more difficult to build for consumers, given the need to tie together a huge number of services to get a coherent set of data over which to operate. Indeed, the only entities that can probably pull that off are Google and Apple via Android and iOS, respectively — and Google is always going to be focused more on its cloud services as the point of integration instead of the device.

That leaves Apple as the only company truly — dare I say it? — thinking differently. And yes, the iPhone as the true core of Siri (which will work across your devices, but get its differentiated context first-and-foremost from your iPhone) just so happens to perfectly align with Apple’s business model and desire to not spend billions in capex, but that doesn’t mean it’s the wrong approach. You’ll be able to access all of that capex that other companies are building on your phone, you’ll just have to use an app; if you need to find something personal, or work across apps, Siri will be the only one who can pull it off — as long as it’s not vaporware (and it appears the second time is the charm).