summaryrefslogtreecommitdiffstats
path: root/Documentation/userspace-api/media/v4l/dev-encoder.rst
blob: fb44f20924de9bdf5a7a512e5467788117433510 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
.. This file is dual-licensed: you can use it either under the terms
.. of the GPL 2.0 or the GFDL 1.1+ license, at your option. Note that this
.. dual licensing only applies to this file, and not this project as a
.. whole.
..
.. a) This file is free software; you can redistribute it and/or
..    modify it under the terms of the GNU General Public License as
..    published by the Free Software Foundation version 2 of
..    the License.
..
..    This file is distributed in the hope that it will be useful,
..    but WITHOUT ANY WARRANTY; without even the implied warranty of
..    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
..    GNU General Public License for more details.
..
.. Or, alternatively,
..
.. b) Permission is granted to copy, distribute and/or modify this
..    document under the terms of the GNU Free Documentation License,
..    Version 1.1 or any later version published by the Free Software
..    Foundation, with no Invariant Sections, no Front-Cover Texts
..    and no Back-Cover Texts. A copy of the license is included at
..    Documentation/userspace-api/media/fdl-appendix.rst.
..
.. TODO: replace it to GPL-2.0 OR GFDL-1.1-or-later WITH no-invariant-sections

.. _encoder:

*************************************************
Memory-to-Memory Stateful Video Encoder Interface
*************************************************

A stateful video encoder takes raw video frames in display order and encodes
them into a bytestream. It generates complete chunks of the bytestream, including
all metadata, headers, etc. The resulting bytestream does not require any
further post-processing by the client.

Performing software stream processing, header generation etc. in the driver
in order to support this interface is strongly discouraged. In case such
operations are needed, use of the Stateless Video Encoder Interface (in
development) is strongly advised.

Conventions and Notations Used in This Document
===============================================

1. The general V4L2 API rules apply if not specified in this document
   otherwise.

2. The meaning of words "must", "may", "should", etc. is as per `RFC
   2119 <https://tools.ietf.org/html/rfc2119>`_.

3. All steps not marked "optional" are required.

4. :c:func:`VIDIOC_G_EXT_CTRLS` and :c:func:`VIDIOC_S_EXT_CTRLS` may be used
   interchangeably with :c:func:`VIDIOC_G_CTRL` and :c:func:`VIDIOC_S_CTRL`,
   unless specified otherwise.

5. Single-planar API (see :ref:`planar-apis`) and applicable structures may be
   used interchangeably with multi-planar API, unless specified otherwise,
   depending on encoder capabilities and following the general V4L2 guidelines.

6. i = [a..b]: sequence of integers from a to b, inclusive, i.e. i =
   [0..2]: i = 0, 1, 2.

7. Given an ``OUTPUT`` buffer A, then A' represents a buffer on the ``CAPTURE``
   queue containing data that resulted from processing buffer A.

Glossary
========

Refer to :ref:`decoder-glossary`.

State Machine
=============

.. kernel-render:: DOT
   :alt: DOT digraph of encoder state machine
   :caption: Encoder State Machine

   digraph encoder_state_machine {
       node [shape = doublecircle, label="Encoding"] Encoding;

       node [shape = circle, label="Initialization"] Initialization;
       node [shape = circle, label="Stopped"] Stopped;
       node [shape = circle, label="Drain"] Drain;
       node [shape = circle, label="Reset"] Reset;

       node [shape = point]; qi
       qi -> Initialization [ label = "open()" ];

       Initialization -> Encoding [ label = "Both queues streaming" ];

       Encoding -> Drain [ label = "V4L2_ENC_CMD_STOP" ];
       Encoding -> Reset [ label = "VIDIOC_STREAMOFF(CAPTURE)" ];
       Encoding -> Stopped [ label = "VIDIOC_STREAMOFF(OUTPUT)" ];
       Encoding -> Encoding;

       Drain -> Stopped [ label = "All CAPTURE\nbuffers dequeued\nor\nVIDIOC_STREAMOFF(OUTPUT)" ];
       Drain -> Reset [ label = "VIDIOC_STREAMOFF(CAPTURE)" ];

       Reset -> Encoding [ label = "VIDIOC_STREAMON(CAPTURE)" ];
       Reset -> Initialization [ label = "VIDIOC_REQBUFS(OUTPUT, 0)" ];

       Stopped -> Encoding [ label = "V4L2_ENC_CMD_START\nor\nVIDIOC_STREAMON(OUTPUT)" ];
       Stopped -> Reset [ label = "VIDIOC_STREAMOFF(CAPTURE)" ];
   }

Querying Capabilities
=====================

1. To enumerate the set of coded formats supported by the encoder, the
   client may call :c:func:`VIDIOC_ENUM_FMT` on ``CAPTURE``.

   * The full set of supported formats will be returned, regardless of the
     format set on ``OUTPUT``.

2. To enumerate the set of supported raw formats, the client may call
   :c:func:`VIDIOC_ENUM_FMT` on ``OUTPUT``.

   * Only the formats supported for the format currently active on ``CAPTURE``
     will be returned.

   * In order to enumerate raw formats supported by a given coded format,
     the client must first set that coded format on ``CAPTURE`` and then
     enumerate the formats on ``OUTPUT``.

3. The client may use :c:func:`VIDIOC_ENUM_FRAMESIZES` to detect supported
   resolutions for a given format, passing the desired pixel format in
   :c:type:`v4l2_frmsizeenum` ``pixel_format``.

   * Values returned by :c:func:`VIDIOC_ENUM_FRAMESIZES` for a coded pixel
     format will include all possible coded resolutions supported by the
     encoder for the given coded pixel format.

   * Values returned by :c:func:`VIDIOC_ENUM_FRAMESIZES` for a raw pixel format
     will include all possible frame buffer resolutions supported by the
     encoder for the given raw pixel format and coded format currently set on
     ``CAPTURE``.

4. The client may use :c:func:`VIDIOC_ENUM_FRAMEINTERVALS` to detect supported
   frame intervals for a given format and resolution, passing the desired pixel
   format in :c:type:`v4l2_frmsizeenum` ``pixel_format`` and the resolution
   in :c:type:`v4l2_frmsizeenum` ``width`` and :c:type:`v4l2_frmsizeenum`
   ``height``.

   * Values returned by :c:func:`VIDIOC_ENUM_FRAMEINTERVALS` for a coded pixel
     format and coded resolution will include all possible frame intervals
     supported by the encoder for the given coded pixel format and resolution.

   * Values returned by :c:func:`VIDIOC_ENUM_FRAMEINTERVALS` for a raw pixel
     format and resolution will include all possible frame intervals supported
     by the encoder for the given raw pixel format and resolution and for the
     coded format, coded resolution and coded frame interval currently set on
     ``CAPTURE``.

   * Support for :c:func:`VIDIOC_ENUM_FRAMEINTERVALS` is optional. If it is
     not implemented, then there are no special restrictions other than the
     limits of the codec itself.

5. Supported profiles and levels for the coded format currently set on
   ``CAPTURE``, if applicable, may be queried using their respective controls
   via :c:func:`VIDIOC_QUERYCTRL`.

6. Any additional encoder capabilities may be discovered by querying
   their respective controls.

Initialization
==============

1. Set the coded format on the ``CAPTURE`` queue via :c:func:`VIDIOC_S_FMT`.

   * **Required fields:**

     ``type``
         a ``V4L2_BUF_TYPE_*`` enum appropriate for ``CAPTURE``.

     ``pixelformat``
         the coded format to be produced.

     ``sizeimage``
         desired size of ``CAPTURE`` buffers; the encoder may adjust it to
         match hardware requirements.

     ``width``, ``height``
         ignored (read-only).

     other fields
         follow standard semantics.

   * **Return fields:**

     ``sizeimage``
         adjusted size of ``CAPTURE`` buffers.

     ``width``, ``height``
         the coded size selected by the encoder based on current state, e.g.
         ``OUTPUT`` format, selection rectangles, etc. (read-only).

   .. important::

      Changing the ``CAPTURE`` format may change the currently set ``OUTPUT``
      format. How the new ``OUTPUT`` format is determined is up to the encoder
      and the client must ensure it matches its needs afterwards.

2. **Optional.** Enumerate supported ``OUTPUT`` formats (raw formats for
   source) for the selected coded format via :c:func:`VIDIOC_ENUM_FMT`.

   * **Required fields:**

     ``type``
         a ``V4L2_BUF_TYPE_*`` enum appropriate for ``OUTPUT``.

     other fields
         follow standard semantics.

   * **Return fields:**

     ``pixelformat``
         raw format supported for the coded format currently selected on
         the ``CAPTURE`` queue.

     other fields
         follow standard semantics.

3. Set the raw source format on the ``OUTPUT`` queue via
   :c:func:`VIDIOC_S_FMT`.

   * **Required fields:**

     ``type``
         a ``V4L2_BUF_TYPE_*`` enum appropriate for ``OUTPUT``.

     ``pixelformat``
         raw format of the source.

     ``width``, ``height``
         source resolution.

     other fields
         follow standard semantics.

   * **Return fields:**

     ``width``, ``height``
         may be adjusted to match encoder minimums, maximums and alignment
         requirements, as required by the currently selected formats, as
         reported by :c:func:`VIDIOC_ENUM_FRAMESIZES`.

     other fields
         follow standard semantics.

   * Setting the ``OUTPUT`` format will reset the selection rectangles to their
     default values, based on the new resolution, as described in the next
     step.

4. Set the raw frame interval on the ``OUTPUT`` queue via
   :c:func:`VIDIOC_S_PARM`. This also sets the coded frame interval on the
   ``CAPTURE`` queue to the same value.

   * ** Required fields:**

     ``type``
	 a ``V4L2_BUF_TYPE_*`` enum appropriate for ``OUTPUT``.

     ``parm.output``
	 set all fields except ``parm.output.timeperframe`` to 0.

     ``parm.output.timeperframe``
	 the desired frame interval; the encoder may adjust it to
	 match hardware requirements.

   * **Return fields:**

     ``parm.output.timeperframe``
	 the adjusted frame interval.

   .. important::

      Changing the ``OUTPUT`` frame interval *also* sets the framerate that
      the encoder uses to encode the video. So setting the frame interval
      to 1/24 (or 24 frames per second) will produce a coded video stream
      that can be played back at that speed. The frame interval for the
      ``OUTPUT`` queue is just a hint, the application may provide raw
      frames at a different rate. It can be used by the driver to help
      schedule multiple encoders running in parallel.

      In the next step the ``CAPTURE`` frame interval can optionally be
      changed to a different value. This is useful for off-line encoding
      were the coded frame interval can be different from the rate at
      which raw frames are supplied.

   .. important::

      ``timeperframe`` deals with *frames*, not fields. So for interlaced
      formats this is the time per two fields, since a frame consists of
      a top and a bottom field.

   .. note::

      It is due to historical reasons that changing the ``OUTPUT`` frame
      interval also changes the coded frame interval on the ``CAPTURE``
      queue. Ideally these would be independent settings, but that would
      break the existing API.

5. **Optional** Set the coded frame interval on the ``CAPTURE`` queue via
   :c:func:`VIDIOC_S_PARM`. This is only necessary if the coded frame
   interval is different from the raw frame interval, which is typically
   the case for off-line encoding. Support for this feature is signalled
   by the :ref:`V4L2_FMT_FLAG_ENC_CAP_FRAME_INTERVAL <fmtdesc-flags>` format flag.

   * ** Required fields:**

     ``type``
	 a ``V4L2_BUF_TYPE_*`` enum appropriate for ``CAPTURE``.

     ``parm.capture``
	 set all fields except ``parm.capture.timeperframe`` to 0.

     ``parm.capture.timeperframe``
	 the desired coded frame interval; the encoder may adjust it to
	 match hardware requirements.

   * **Return fields:**

     ``parm.capture.timeperframe``
	 the adjusted frame interval.

   .. important::

      Changing the ``CAPTURE`` frame interval sets the framerate for the
      coded video. It does *not* set the rate at which buffers arrive on the
      ``CAPTURE`` queue, that depends on how fast the encoder is and how
      fast raw frames are queued on the ``OUTPUT`` queue.

   .. important::

      ``timeperframe`` deals with *frames*, not fields. So for interlaced
      formats this is the time per two fields, since a frame consists of
      a top and a bottom field.

   .. note::

      Not all drivers support this functionality, in that case just set
      the desired coded frame interval for the ``OUTPUT`` queue.

      However, drivers that can schedule multiple encoders based on the
      ``OUTPUT`` frame interval must support this optional feature.

6. **Optional.** Set the visible resolution for the stream metadata via
   :c:func:`VIDIOC_S_SELECTION` on the ``OUTPUT`` queue if it is desired
   to be different than the full OUTPUT resolution.

   * **Required fields:**

     ``type``
         a ``V4L2_BUF_TYPE_*`` enum appropriate for ``OUTPUT``.

     ``target``
         set to ``V4L2_SEL_TGT_CROP``.

     ``r.left``, ``r.top``, ``r.width``, ``r.height``
         visible rectangle; this must fit within the `V4L2_SEL_TGT_CROP_BOUNDS`
         rectangle and may be subject to adjustment to match codec and
         hardware constraints.

   * **Return fields:**

     ``r.left``, ``r.top``, ``r.width``, ``r.height``
         visible rectangle adjusted by the encoder.

   * The following selection targets are supported on ``OUTPUT``:

     ``V4L2_SEL_TGT_CROP_BOUNDS``
         equal to the full source frame, matching the active ``OUTPUT``
         format.

     ``V4L2_SEL_TGT_CROP_DEFAULT``
         equal to ``V4L2_SEL_TGT_CROP_BOUNDS``.

     ``V4L2_SEL_TGT_CROP``
         rectangle within the source buffer to be encoded into the
         ``CAPTURE`` stream; defaults to ``V4L2_SEL_TGT_CROP_DEFAULT``.

         .. note::

            A common use case for this selection target is encoding a source
            video with a resolution that is not a multiple of a macroblock,
            e.g.  the common 1920x1080 resolution may require the source
            buffers to be aligned to 1920x1088 for codecs with 16x16 macroblock
            size. To avoid encoding the padding, the client needs to explicitly
            configure this selection target to 1920x1080.

   .. warning::

      The encoder may adjust the crop/compose rectangles to the nearest
      supported ones to meet codec and hardware requirements. The client needs
      to check the adjusted rectangle returned by :c:func:`VIDIOC_S_SELECTION`.

7. Allocate buffers for both ``OUTPUT`` and ``CAPTURE`` via
   :c:func:`VIDIOC_REQBUFS`. This may be performed in any order.

   * **Required fields:**

     ``count``
         requested number of buffers to allocate; greater than zero.

     ``type``
         a ``V4L2_BUF_TYPE_*`` enum appropriate for ``OUTPUT`` or
         ``CAPTURE``.

     other fields
         follow standard semantics.

   * **Return fields:**

     ``count``
          actual number of buffers allocated.

   .. warning::

      The actual number of allocated buffers may differ from the ``count``
      given. The client must check the updated value of ``count`` after the
      call returns.

   .. note::

      To allocate more than the minimum number of OUTPUT buffers (for pipeline
      depth), the client may query the ``V4L2_CID_MIN_BUFFERS_FOR_OUTPUT``
      control to get the minimum number of buffers required, and pass the
      obtained value plus the number of additional buffers needed in the
      ``count`` field to :c:func:`VIDIOC_REQBUFS`.

   Alternatively, :c:func:`VIDIOC_CREATE_BUFS` can be used to have more
   control over buffer allocation.

   * **Required fields:**

     ``count``
         requested number of buffers to allocate; greater than zero.

     ``type``
         a ``V4L2_BUF_TYPE_*`` enum appropriate for ``OUTPUT``.

     other fields
         follow standard semantics.

   * **Return fields:**

     ``count``
         adjusted to the number of allocated buffers.

8. Begin streaming on both ``OUTPUT`` and ``CAPTURE`` queues via
   :c:func:`VIDIOC_STREAMON`. This may be performed in any order. The actual
   encoding process starts when both queues start streaming.

.. note::

   If the client stops the ``CAPTURE`` queue during the encode process and then
   restarts it again, the encoder will begin generating a stream independent
   from the stream generated before the stop. The exact constraints depend
   on the coded format, but may include the following implications:

   * encoded frames produced after the restart must not reference any
     frames produced before the stop, e.g. no long term references for
     H.264/HEVC,

   * any headers that must be included in a standalone stream must be
     produced again, e.g. SPS and PPS for H.264/HEVC.

Encoding
========

This state is reached after the `Initialization` sequence finishes
successfully.  In this state, the client queues and dequeues buffers to both
queues via :c:func:`VIDIOC_QBUF` and :c:func:`VIDIOC_DQBUF`, following the
standard semantics.

The content of encoded ``CAPTURE`` buffers depends on the active coded pixel
format and may be affected by codec-specific extended controls, as stated
in the documentation of each format.

Both queues operate independently, following standard behavior of V4L2 buffer
queues and memory-to-memory devices. In addition, the order of encoded frames
dequeued from the ``CAPTURE`` queue may differ from the order of queuing raw
frames to the ``OUTPUT`` queue, due to properties of the selected coded format,
e.g. frame reordering.

The client must not assume any direct relationship between ``CAPTURE`` and
``OUTPUT`` buffers and any specific timing of buffers becoming
available to dequeue. Specifically:

* a buffer queued to ``OUTPUT`` may result in more than one buffer produced on
  ``CAPTURE`` (for example, if returning an encoded frame allowed the encoder
  to return a frame that preceded it in display, but succeeded it in the decode
  order; however, there may be other reasons for this as well),

* a buffer queued to ``OUTPUT`` may result in a buffer being produced on
  ``CAPTURE`` later into encode process, and/or after processing further
  ``OUTPUT`` buffers, or be returned out of order, e.g. if display
  reordering is used,

* buffers may become available on the ``CAPTURE`` queue without additional
  buffers queued to ``OUTPUT`` (e.g. during drain or ``EOS``), because of the
  ``OUTPUT`` buffers queued in the past whose encoding results are only
  available at later time, due to specifics of the encoding process,

* buffers queued to ``OUTPUT`` may not become available to dequeue instantly
  after being encoded into a corresponding ``CAPTURE`` buffer, e.g. if the
  encoder needs to use the frame as a reference for encoding further frames.

.. note::

   To allow matching encoded ``CAPTURE`` buffers with ``OUTPUT`` buffers they
   originated from, the client can set the ``timestamp`` field of the
   :c:type:`v4l2_buffer` struct when queuing an ``OUTPUT`` buffer. The
   ``CAPTURE`` buffer(s), which resulted from encoding that ``OUTPUT`` buffer
   will have their ``timestamp`` field set to the same value when dequeued.

   In addition to the straightforward case of one ``OUTPUT`` buffer producing
   one ``CAPTURE`` buffer, the following cases are defined:

   * one ``OUTPUT`` buffer generates multiple ``CAPTURE`` buffers: the same
     ``OUTPUT`` timestamp will be copied to multiple ``CAPTURE`` buffers,

   * the encoding order differs from the presentation order (i.e. the
     ``CAPTURE`` buffers are out-of-order compared to the ``OUTPUT`` buffers):
     ``CAPTURE`` timestamps will not retain the order of ``OUTPUT`` timestamps.

.. note::

   To let the client distinguish between frame types (keyframes, intermediate
   frames; the exact list of types depends on the coded format), the
   ``CAPTURE`` buffers will have corresponding flag bits set in their
   :c:type:`v4l2_buffer` struct when dequeued. See the documentation of
   :c:type:`v4l2_buffer` and each coded pixel format for exact list of flags
   and their meanings.

Should an encoding error occur, it will be reported to the client with the level
of details depending on the encoder capabilities. Specifically:

* the ``CAPTURE`` buffer (if any) that contains the results of the failed encode
  operation will be returned with the ``V4L2_BUF_FLAG_ERROR`` flag set,

* if the encoder is able to precisely report the ``OUTPUT`` buffer(s) that triggered
  the error, such buffer(s) will be returned with the ``V4L2_BUF_FLAG_ERROR`` flag
  set.

.. note::

   If a ``CAPTURE`` buffer is too small then it is just returned with the
   ``V4L2_BUF_FLAG_ERROR`` flag set. More work is needed to detect that this
   error occurred because the buffer was too small, and to provide support to
   free existing buffers that were too small.

In case of a fatal failure that does not allow the encoding to con