Apple’s ‘Gesture control of multimedia editing applications’ patent

Today Apple was awarded an interesting patent. It was applied for in 2004, so developments over the intervening years may have superseded some of its concepts, however it shows how gestures can be used in non-multitouch enabled situations.

Here’s a long edited excerpt from Apple’s patent:

The first part explains why using gestures is a good idea:

The use of gestures to control a multimedia editing application provides a more efficient and easier to use interface paradigm over conventional keyboard and iconic interfaces.

First, since the gestures can replace individual icons, less screen space is required for displaying icons, and thereby more screen space is available to display the multimedia object itself. Indeed, the entire screen can be devoted to displaying the multimedia object (e.g., a full screen video), and yet the user can still control the application through the gestures.

Second, because the user effects the gestures with the existing pointing device, there is no need for the user to move one of his hands back and forth between the pointing device and keyboard as may be required with keystroke combinations. Rather, the user can fluidly input gestures with the pointing device in coordination with directly manipulating elements of the multimedia object by clicking and dragging. Nor is the user required to move the cursor to a particular portion of the screen in order to input the gestures, as is required with iconic input.

Third, the gestures provide a more intuitive connection between the form of the gesture and the associated function, as the shape of the gesture may be related to the meaning of the function.

Fourth, whereas there is a relatively limited number of available keystroke combinations–since many keystroke combination may already be assigned to the operating system, for example–there is a much larger set of available gestures that can be defined, and thus the user can control more of the application through gestures, then through keystrokes. 


FIG. 1: The user interface includes three primary regions, the canvas (102), the timing panel (106), and the file browser (110). The canvas is used to display the objects as they are being manipulated and created by the user, and may generally be characterized as a graphics window. In the example of FIG. 1, there are shown three multimedia objects (104), Square A, Circle C, and Star B, which will be referred to throughout as such this disclosure when necessary to reference a particular object. In the preferred embodiment, the multimedia application is a non-linear video editing application, and allows the creation of multimedia presentations, including videos. Accordingly, the objects displayed in the canvas at any given time represent the “current” time of the multimedia presentation. 

FIG. 2 illustrates the nomenclature that will be used to describe a gesture. A gesture has a starting point (204) and an ending point (206), and is preferably a single continuous stroke between these points. The gesture can be input by the user via any type of pointing device, including a mouse, pen, fingertip, trackball, joystick and the like.


FIG. 3a shows the first of the transport gestures, the play forward gesture  and the play reverse gesture. 

FIGS. 3b-3d illustrate the effect of the play forward gesture, when input into the timing panel.

For key combinations, the following modifier keys may be used in conjunction with many of the gestures that are described herein: the SHIFT key, the COMMAND key, and the ALT key. 

In the context of the play forward or play reverse commands, these modifier keys have the following effects: 

SHIFT: Play forwards/reverse from start or end of multimedia presentation. 

COMMAND: Play the currently selected object from its In and Out points, according to the direction of play of the gesture. 

ALT: Play the multimedia presentation in a continuous loop.


FIG. 4 illustrates the pause play gesture. This gesture is used to pause the current playback, whether it is of a currently selected object, set of objects, a preview, audio playback, or the entire multimedia presentation. Again, this gesture’s geometric form, a downward stroked vertical line, is a mnemonic of the directionality of the function, as the notion of stopping or pausing something in motion can be imagined as a straight downward motion to hold the object in place. 

Referring now to FIG. 5a there is shown the frame forward gesture  and the frame reverse gesture. These gestures are associated with the functions of a single frame advance in the forward or reverse directions respectively.

FIGS. 5b-5d illustrate the functionality of the frame forward gesture.



FIG. 6a shows the go to start of play range gesture and the go to end of play range gesture.


Referring to FIG. 7a there is shown a further pair of navigation gestures, the go to head gesture and go to tail gesture. These gestures are also visual mnemonic gestures, as they represent a directional movement to the beginning or ending of an object.


Referring to FIG. 8, there is shown a further pair of navigation gestures, the go to project start gesture 800 and the go to project end gesture 802.

Referring to FIG. 9a, the first pair of editing gestures is the group gesture and the ungroup gesture. 


In FIG. 10a, there is shown the file browser 110 at some level of a directory structure, so that the user sees at least some of the files in this directory. Three of the files, Files A, B, and E, have been selected by the user, as indicated by the selection boxes around each file. FIG. 10b illustrates the multimedia editing application receiving the group gesture from the user. FIG. 10c illustrates the result of the grouping function as applied by the multimedia editing application, showing that the application has caused the underlying file system to create a new folder, into which Files A, B, and E have been moved. The user can now perform standard file operations on the folder, such as rename it, move it, or delete it. Thus, in the file browser, the group gesture takes on the semantics of grouping a set of files into a common entity, which corresponds to the mental model of forming a folder containing the objects.


FIG. 11a illustrates the next pair of editing gestures, the set local in gesture and the set local out gesture. The set local in gesture established an In point for a currently selected multimedia object at the current time marker (and hence the current frame). The set local out gesture sets the Out point for the currently related multimedia object at the current time marker. 


The set local in gesture and set local out gesture are also context sensitive. The user can input these gestures while viewing preview of a file from the file browser, or when a particular file is selected in the browser. If the user inputs the set local in gesture when viewing or selecting a file, then the object represented by the file is inserted into the multimedia presentation and its In point is set at the current time. FIGS. 12a-12c illustrate this function. In FIG. 12a there is shown the user interface of the multimedia editing application as the user reviews a preview of a file, File F, selected in the browser. The preview shows object F in preview window. In FIG. 12b, the user has input the set local in gesture  into the timing panel while viewing the preview of this object. 8,448,083-12c

FIG. 12c illustrates the results of this gesture, whereby the multimedia editing application has instantiated an instance of object F into the canvas (shown in front of the other objects, and behind preview window), and has likewise added an object bar for object F into the last track of the timing panel. The In point for this object is set to the current time marker, and its extent lasts for the duration of the object. The use can now further adjust the In and Out points of this object as desired.  8,448,083-13a-13d

The next pair of editing gestures are for changing the order (layer) of a select object or set of objects. These are the up one level gesture 1300 and the down one level gesture 1302, as illustrated in FIG. 13a.


FIGS. 14a-14d illustrates the next pair of editing gestures, the set global marker gesture  and the set local marker gesture.  8,448,083-14e-14g

In FIG. 14e, the user has selected object bar of object B, as indicated by the dotted outline of this object. Notice that the current time marker is currently within the extent of the this object. In FIG. 14f, the user has input the set local marker gesture. In FIG. 14g, the multimedia editing application has responded by inserting a local time marker within the object bar for object B at the current time marker.  8,448,083-15a-15d 8,448,083-15e

FIGS. 15a-15e illustrates the set play range start gesture and the set play range end gesture.


The next set of gestures comprises gestures associated with functions to control and management the view of the various window and components of the multimedia editing application user interface. 

The first pair of these gestures is the zoom in gesture and the zoom out gesture, illustrated in FIG. 16a.  8,448,083-17a-17e

FIGS. 17a-e illustrates the functionality of these gestures in the file browser. 8,448,083-18a-18c

The zoom in gesture and zoom out gesture gestures likewise operate in the timing panel, to change the resolution of the timescale for the timing panel.


In FIG. 19a, there is shown Star B in the canvas, and the user has input the zoom in gesture by drawing the gesture around Star B. FIG. 19b shows that the multimedia editing application in response, zooms in on that object and resizes the object so that the Star B object fills the window. 

A related view management gesture is the interactive zoom gesture, illustrated in FIG. 20. This gesture places the multimedia editing application in an interactive zoom mode, whereby the user can dynamically and continuously zoom in and out. In this mode, the user can input a first gesture to variably zoom in, and second gesture to variably zoom out. The particular gestures that control the zoom function can vary. In one embodiment, to control the zoom factor, the user drags the input device to the left or right.

The next view navigation gesture is the pan gesture, as illustrated in FIG. 21. The geometric form of this gesture, a upper semicircle with an ending stroke drawn across the diameter of the semicircle, and the extending in length beyond the perimeter of the semicircle, connotes a “mitt,” used to hold and move an object. This gesture places the current window pane in a pan (or scroll) mode, allow the user to interactively pan the content of the window pan by holding down a selection button on the input device, and dragging the current window in any desired direction. The multimedia editing application can change the cursor to reflect the pan mode, for example using a “hand” cursor. 

A further view navigation gesture is the home view gesture, as illustrated in FIG. 22a. Generally, the home view gesture resets the view of the window in which the gesture is received to a native view, which includes rescaling the resolution of the window to a native resolution (typically 100%), and resetting the offset of the window to a native offset position (e.g., a center of the window; a first element of a list; a top, left corner; a beginning time point; or other user designated native offset). In this regards, the home view gesture is another context sensitive gesture. When input into the canvas, the multimedia editing application responds by resetting the canvas view to its native resolution (e.g., 100% resolution) and resetting to a native view by centering the view at the (X,Y) center of the window. FIGS. 22b-c illustrate this functionality.


The next view management gesture is the fit to window gesture, as illustrated in FIG. 23a. 



The next view management gestures are for opening and closing various ones of the window panes of the user interface. FIG. 24a illustrates the open/close bottom panel gesture and open/close top panel gesture. FIG. 25a illustrates the open/close bottom left panel gesture and the open/close right panel gesture. Each of these gestures is a visual mnemonic of the motion of a hand moving to open or close a window panel. The forms of these gestures can be described, for purposes of explanation only, as “N” or “Z” shaped. 


FIGS. 25b-c illustrate the functionality of the open/close left panel gesture 2500.


The following gestures are associated with general functionality of the multimedia editing application. FIG. 26 illustrates the delete gesture. 

FIG. 27 illustrates the select mode gesture. This gesture causes the multimedia editing application to change from whatever is the current mode of operation (e.g., pan mode, interactive zoom mode, etc.) to the select mode in which the user can select objects, menu items, controls, toolbars and the like, for editing and direct manipulation.

FIG. 28 illustrates the show/hide “heads up” display gesture. This gesture turns on and off a display area used to display confirmation message that a particular gesture has been successfully input. 

FIG. 29 again illustrates the basic form of the user interface, including the canvas, timing panel, and file browser. When the user inputs any gesture in any of the window panels, the multimedia editing application can be configured to display the name of the gesture in a confirmation message such as the gesture name, or other visually indicative confirmation information, such as green light, an “OK” symbol, or the like.

Already in Apple Motion

As well as defining some interesting gestures, the patent shows gestures that Apple has since used in Apple Motion. Sadly Motion gestures only work with Wacom tablets.

motion doc on gestures

Hopefully future versions of Motion and Final Cut Pro will add gestures to multitouch input devices.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: