whateverthing.com

How to Talk to Apples

A Few Words on Apple's Secret Killer Feature

I've managed to go a number of years in the software industry without having any repetitive strain issues. That all seems to have changed this month, so I've been looking for ways to ease the burden on my wrists and arms.

Enter: macOS voice control

This built in accessibility tool is free to use and runs locally* on–Device without sending my audio to a remote server. (* requires Apple silicon)

When I first noticed this feature, many years ago, I completely ignored it. At that time, it required remote server voice analysis in order to operate. So if you're wondering what the Secret Killer Feature actually is here, it's the fact that this is now on-device functionality, at least for Apple Silicon devices.

It's not perfect, but I'm getting used to its quirks and foibles.

How it works

To begin with, it's important to note that voice control is completely separate from Siri. You don't have to say something like "Hey Siri" to activate it; when enabled, it is always listening.

It has two modes, dictation mode and command mode. Command mode is always enabled, unless you tell voice control to "go to sleep". Dictation mode is activated by saying "dictation mode". It can be deactivated by saying "Command mode".

Many commands are supported, and you can see the list of commands by saying "show commands" or "show me what to say".

I've been using voice control on macOS and iOS for about two weeks now. I am by no means an expert, but I am getting by. In fact, this blog post was written almost entirely in dictation mode. So far, it is going well. I have only had to make a handful of corrections.

For example, because I work in the photography business, a word that I have said frequently these past few weeks is "prince". I mean, "prints". Or, "SmugMug", which sometimes comes out as "smoke bear" for unknown reasons. Recently, I tried to say "about", and voice control thought I said "up boat".

One benefit of these silly mistakes is that it gives me a chance to learn the various commands to select and manipulate text, and move the cursor.

In order to get the word "prints" identified accurately above, I tried to add it to the special list of vocabulary words. Even then, it often still wants to type "Prince" instead of "prints". So… I'm not sure exactly what the vocabulary list is supposed to do.

iOS versus macOS

Actually, I should point out that I am dictating this post specifically on an iOS device. Navigating around an iOS device with voice control is fairly easy. This is mostly thanks to "overlays", which are available in the form of names, numbers, and grids. I can interact with the device almost as though I was using it normally. Except, of course, more slowly.

Names

Saying "show names" causes little captions to pop up beside interactive elements, showing a name by which I can interact with that element. In Safari, for example, I can say "tap refresh" to refresh the current page. Or "tap show bookmarks" to display my bookmarks, followed by "tap whateverthing" to go to this blog. It is quite a natural way to navigate the system.

Numbers

Saying "show numbers" shows little number captions beside interactive elements. By saying the number, I can activate that element.

The Grid

For more granular control I can say "show grid". This divides the screen into columns and rows. I can say the number of one of the resulting squares, and it will zoom in and show a smaller grid for that square. Then, I can say "click five", or similar, to simulate pressing it.

macOS limitations

One thing I've noticed is that the level of control is quite different between iOS and macOS. On on macOS, the "show names" feature does not seem to exist. Additionally, many of the text selection features seen on iOS are missing.

Also on macOS, I am seeing a greater number of applications that do not have accessibility integrated into them. Fewer applications have support for the "show numbers" feature - making their interfaces less accessible.

This basically leaves me with grid overlay interaction and mouse cursor pixel manipulation ("move cursor 500 pixels left", "click") as my main options for interacting with macOS. Not great. But, admittedly, not terrible, for being free.

Quirks and foibles

There are some bugs.

Sometimes, on macOS, having voice control enabled can cause weird behaviours in menus. If you click to pull down a menu, it might display for half a second and then vanish. This seems somehow related to the "show numbers" feature. I suspect that the way it scans interfaces to find interactive elements is triggering some kind of menu close event. Saying "hide numbers" can help stop this from happening.

Inaccessible Apps

I mentioned that some apps don't seem particularly accessible. When in "show numbers" mode, this manifests as numbers being visible on the window elements (minimize, maximize, close, etc.), but no numbers being visible within the application window itself.

Inscrutable Scrolling

Also, the nature of some gooey interfaces means that multiple scrollable panes are visible at the same time. This can confuse the "scroll down" command, resulting in both panes scrolling at the same time - or neither pane scrolling at all.

Some apps don't even support the "scroll down" command, so for those apps on iOS you have to use "swipe up". Reversing the axis can be a tiny bit confusing. I haven't found a corresponding fix on macOS.

Liberal Listening

You have to be careful not to have voice control enabled when watching movies or YouTube clips. Or in zoom calls. Or when people walk past while chatting. Or if your house has ghosts. Or if your employer ordered you to Return-To-Office in an open-plan office. Voice control is very eager to turn all of these inputs into commands and dictated text.

Luckily, I'm basically a hermit, so this doesn't affect me very often. But if you're not careful, NoHo Hank might type messages to your coworkers. I've seen it happen.

CPU/Battery Usage

Voice control can be intense on CPU usage and battery usage. Even if you use the "go to sleep" command to stop it from listening to every single thing. It can use up to 30% of the CPU when idle, from what I've seen.

Turning It Off

On iOS, you can easily turn it off by saying "turn off voice control". And more importantly, you can use Siri to reactivate it. On desktop macOS, you can turn it off just as easily, but if you don't have Siri enabled (and I don't, because being unable to change the wake word means that it can't differentiate commands meant for my iPhone), it can be clunky to turn it back on.

Don't make Solomon Epstein's mistake: if you can get by with just using the "go to sleep" command, you'll be better able to recover from problems like the classic "bone-crushing high-G burn" situation.

Final thoughts

Overall, it has been a fun experiment. I imagine that superior tooling and a greater need would allow someone to become very adept at navigating interfaces. But inaccessible applications will slow people down quite a bit, so make sure to keep accessibility features in mind when developing your applications.

This has even helped me get more writing done. Unfortunately, evidence suggests it will not help very much with programming, particularly in PHP. (Oh, neat – it recognized the name PHP. However, I had to try five separate times for it to recognize the word "neat". 🙄)

Anyway, that's all for now. If you've got a Mac or an iPhone, give voice control a try. And let me know if you would like to see some video demos of how I am using it.

Published: September 9, 2023

Categories: Reviews

Tags: opinion, utilities, fun, mobile