We run compute on the cloud and have no real-time requirements. I was asked by the Chair, “How much complexity increase is acceptable?”
I was not prepared for the question, so did some quick math in my mind estimating an upper bound and said “At the worst case 100X.”
The room of about a hundred video standardization experts burst out laughing. I looked at the Chair perplexed, and he says,
“Don’t worry they are happy that they can try-out new things. People typically say 3X.” We were all immersed in the video codec space and yet my views surprised them and vice versa.
1) They have a fairly limited catalog (contrast with the constant ingestion of new content by Youtube, for example)
2) The cheapest way for them to ensure they have sufficient capacity to handle peak loads leaves them with a lot of extra compute during non-peak times. That excess compute is essentially free.
Since they host on AWS, wouldn't they just scale down during non-peak times and save money?
EDIT: After all, they did implement VP8/9 ;)
Re #2: This is a fairly standard batch-vs-online compute mix tradeoff faced by large enterprises.
I agree with your overall point though, that at some point the extra cost for encode isn't worth it.
Their goal is to have video of acceptable quality at 250 kbit/s, which makes Netflix an option for a lot more people worldwide.
Source: I work at a company that distributes a lot of high-quality video :)
Especially when Netflix already has the Open Alliance in many ISP DC. Bandwidth cost saving are much less that most estimate.
However I do understand her point, she should ask I give you 100x complexity headroom, can you get me 80% reduction in bitrate compare to HEVC.
This is already the case for AVC/HEVC on mobile devices where storage and power considerations overwhelm the possible quality and coding efficiency advances that could be available from a highly-intensive, highly-tunable CPU-based encode.
In comparison, if you look at digital cinema, video is dumped as raw, unencoded pixels onto high-speed, high-power SSDs without any compression; this way, the original image can be manipulated losslessly before going through an intensive encode process for an optimal quality/space balance for end-user media delivery.
What cameras shoot uncompressed 8K HDR video? That doesn’t even seem possible given current SSD bus speeds. Note that many high-quality lossless formats are actually still compressed (they subtract current frame data from the previous frame and then apply LZ4 for example).
12-bit bayer RGB matrix 8k @24 fps: 7680 * 4320 * 12bit * 24/s = 1.2 GB/s uncompressed. Currently SSDs go up to 3 GB/s. And you could have a RAID 0 array made out of multiple SSDs.
So it is possible. Entirely another matter whether it makes sense.
One hour of video at this bitrate requires 4.3 TB.
There's just one color component per pixel, thus 12-16 bits per pixel is enough.
Use the right tool for the job.
This is why pretty much anyone has standardized on h264, and now on h265, as the video codec of choice, despite using very different encoder software or hardware to realize hugely differing trade-offs on the encoding side of things.
If your Dutch isn't up to this let me summarise: A German family took a boat trip from Kiel to Oslo. Their 12yo son watched a few movies (or clips, no idea) while on board. Total data consumption was 470 MB. They got a bill for €12.000,- which comes down to just over €25 per MB.
Bandwidth can be very expensive as well. Of course this is an extreme case but any byte shaved off an encoded stream can end up saving some people a lot of money.
Agreed that, if all they can do is shave a "few kB" off a video, it's probably not worth the investment. But what if they can buy a 5–10% bandwidth improvement that actually looks better on the device they're targeting?
The really interesting question is how much decoder complexity increase is acceptable.
It, of course, only works if you render widely replicated material. Even an average Youtube video is likely not like that, to say nothing of the encoding done in a consumer-grade camera / phone.
Just brainstorming here, but encoding local photos/videos might be an acceptable task for a cellphone to do when it is fully charged and plugged in.
In Chrome, Firefox and Opera you're getting 720p max.
To get 1080p you need to be watching in Internet Explorer, Safari or on a Chromebook.
To get 4k you need to be watching in Edge and have a 7xxx or 8xxx Intel CPU and HDCP 2.2 enabled display.
What's the point? Even with theoretically perfect protocol level DRM, the consumer eventually has to be able to see/hear the protected content. If the frames of the video are displayed on screen, and the audio played through the speakers, the output can be recorded and preserved, period.
Do the people in charge of making these decisions not realize that whatever convoluted DRM technologies they pay to be developed and implemented will always be defeated by a $30 camcorder from the 90s?
The tendency for humans to implement elaborate but ineffective security theaters in an attempt to convince people they're protected from fundamentally unprotectable threats is as old as society itself.
As a consequence, chrome can't watch netflix videos at full quality, anybody who flys in the US has to remove their shoes, and my child has to wear a clear backpack to school. I wish we'd stop playing these silly games of pretend which degrade the quality of life of everyone.
> What's the point?
It's the same reason you put a lock on the door to your house. You know the lock is easily bypassed by tools. The windows are easily broken. The door can probably be easily forced open. But you still lock the door when you leave in the morning.
Locks are about keeping casual theives as honest people. DRM is about keeping casual pirates as honest customers. It's about making it just difficult enough to copy that most people will consider it not worth the bother. It's about saying, "You must be this determined to break the law."
I don't think that the analogy holds up. The deterrence only applies to uploaders and not consumers, because the processes for removing DRM and distributing videos are independent. It just takes one person to make a video available to the entire world. It does seem like torrents are available for most of the Netflix catalogue, so I'm skeptical that DRM is useful for popular shows.
That all said, camcorders don't get you anywhere: the goal isn't ripping the video, it's ripping the high quality video. DRM can theoretically defend against that, but you'd need to control the whole stack, incl. hardware, incl. the monitor and speakers.
Even then, you could tear monitor apart and grab LVDS signals to panel.
Someone above linked a help page that says, that for 4K Netflix you need Edge and Intel Kaby Lake or newer. Do you think that it was free for Microsoft or Intel, or some good deal sweetened that?
Not sure if you mean causal or casual. Casual is going to piratebay and downloading.
I would agree that DRM and other anti-consumer things (unskippable things on dvds, adverts accusing you of pirating the dvd you've actually bought, etc) does cause piracy though
1. Marketing and psychology: Viewers want to believe they are viewing the original, not a degraded copy.
2. Unfaithful copy: Analog output and analog input introduce errors. LCDs use a variety of tricks to improve resolution such as spacial and temporal dithering. Also you can't use a normal camera to record a monitor because of aliasing (of pixels and non-genlocked frame rates).
3. Encoding noise. The encoding of the original is based on the higher quality original, and carefully optimised for the least visual artifacts. Any re-encoding also has to deal with noise introduced by the copying process, and with the noise introduced by the original encoding. This noise noticeably reduces the quality of a copy.
- There is no perfect security. There is a notion of raising the expense of piracy to a level that it effectively does not matter.
- IIRC, for instance, rooted Android loses support for... Widevine? So you can't really use Netflix on a rooted device where you could easily steal frames from the video buffer. Yeah, you can rig up a nice camera system and record analog off the display. Nothing they can do about that. They also may insert watermarks to let them know who recorded it.
I actually haven't even taken it out of the box yet. But it just feels good to know their DRM is pointless.
Obviously, BitTorrent was the big reason in the past, but now the reality is that there is a lot of competition in the video space - you aren't just competing with films and tv shows, but youtube videos, twitch streams, etc...
Something something nyquist something
Because actually, it can be.
Although 8k is overkill, 4k will be enough, and 1440p nearly ok on your old 1024x768 monitor. Typically video encoding does some subsampling on some color components. If you play 4k content on a FHD screen, the quality can be better because you will have no subsampling on your FHD screen, compared to mere FHD encoding (in most cases).
True, but the video is already subsampled. That's how it was able to be uploaded at 1080p at all, since the source video is 8k. So 8k vs 1080p shouldn't make any difference on monitors less than M-by-1080 resolution.
So video codecs most of the time work with some subsampled chroma components. So your encoded 1080p might be able to render after decoding only e.g. 540 lines of those components, while with the 4k stream it might be: 2160/2 => back to 1080.
Edit: but to be clear, I'm not advocating for people to choose 2x stream and start watching 4k on FHD screens in general, that would be insane. Chroma subsampling is used because the eye is less sensitive to those colors.
So your encoded 1080p might be able to render after decoding only e.g. 540 lines of those components, while with the 4k stream it might be: 2160/2 => back to 1080.
I'm not sure that's accurate -- whatever downscaling process was used to convert from 8k to 1080p on Google's servers is probably the same process to convert from 8k to 1080p in the youtube player, isn't it? At least perceptually.
I would agree that if they convert from 8k (compressed) to 4k (compressed), then 4k to 1080p (compressed), then that would introduce perceptible differences. But in general reencoding video multiple times is fail, so that would be a bug in the encoding process server side. They should be going from the source material directly to 1080p, which would give the encoder a chance to employ precisely the situation you mention.
Either way, you should totes email me or shoot me a keybase message. It's not every day that I find someone to debate human perceptual differences caused by esoteric encoding minutiae.
Although your 4:2:0 subsampled 1080p video only has 540x960 pixels with chroma information, the decoder should be doing chroma upsampling, and unless its a super simple algorithm it should be doing basic edge detecting and fixing the blurry edges chroma subsampling is known to cause. I posit that even with training, without getting very very close to your screen you wouldn't be able to tell if the source material was subsampled 4:2:0, 4:2:2, or 4:4:4.
The truth is that generally people DO subjectively prefer high resolution source material that has been downscaled. Downscaling can clean up aliasing and soften over-sharp edges.
People who watch anime sometimes upscale video beyond their screen size with a neuron-based algorithm, and then downscale to their screen size, in order to achieve subjectively better image quality. This is even considering that almost all 1080p anime is produced in 720p and then upscaled in post-processing!
A 4k or 8k stream is coming into your computer at 10+mbps and being downsampled to 1080p can very contain more information than a lower quality 1080p stream coming into your computer at 4mbps even after downsampling.
In addition, YouTube generally encodes 4k at like 5-6x the bitrate of their 1080p encodes (codec for codec), rather than merely 3-4x higher which would be closer to the same quality per pixel.
So yeah, YouTube's 4k is better on a 1080p screen than their 1080p stream.
Assuming you have 20/20 vision, You won't be able to tell the distance between 4K and higher unless your screen fills more than about 40 degrees, in which case you are losing detail at the edges.
An 8K monitor on your desk may make sense -- if you're say 3' away from it and it's say 60", you'll start noticing a difference between 4k and 8k, however you will be focused on one area of the screen, rather than the entire screen.
Even with 4K, for most people watching television/films the main (only) benefits are HDR and increased temporal resolution (60p vs 60i/30p)
All the 8K stuff I've seen comes with 22.2 sound to help direct your vision to the area of the screen wanted. It certainly has applications in room sized screens where there are multiple events going on, and you can choose what to focus on (sport for example).
If you were to buy a 32" 8K screen - say the UP3218K, about 28" wide, to get the benefit of going above 4K you would need to be sat within about 30 inches. At 30" you would have the screen filling about 50 degrees of vision. Even an immersive film should only be 40 degrees.
In particular, is it due to DRM requirements, or pure performance? I suspect it's the former.
For example, my desktop with an i7-2600k (that's a Sandy Bridge CPU from 2011) has zero issues playing 4K60 VP9 footage on YouTube in Chrome with CPU decoding, yet on Netflix with the same Chrome I'm arbitrarily restricted to 720p H.264 video.
See in Widevine there are a number of “levels”, the highest being when it can decode, decrypt and push to the frame buffer all in a secure zone. This can not be achieved (atm, well atleast the time of my research into the matter) with widevine on Desktop, so in such a setup widevine will only decrypt upto 720p content.
When running on Android and ARM this is possible and you can get 1080p, which is why you can get cheap android based tv sticks (even the old Amazon Fire TV sticks) supported 1080p but your gaming rig and Chrome could not.
Don’t work for Widevine, Google, NetFlix or anyone else for that matter. Just a nerd with too much time on my hands so I looking into this stuff. Any corrections welcomed :-D
> As far as I understand it there are 3 security levels to widevine Level1 being the highest and 3 being the lowest.
> Level 1 is where the decrypt and decode are all done within a trusted execution environment (As far as I understand it Google work with chipset vendors such as broadcom, qualcomm, etc to implement this) and then sent directly to the screen.
> Level 2 is where widevine decrypts the content within the TEE and passes the content back to the application for decoding which could then be decoded with hardware or software.
> Level 3 (I believe) is where widevine decrypts and decodes the content within the lib itself (it can use a hardware cryptographic engine but the rpi doesn't have one).
> Android/ChromeOS support either Level1 or Level3 depending on the hardware and Chrome on desktops only seems to support Level 3. Kodi is using the browser implementation (at least when kodi is not running on Android) of widevine which seems to only support Level 3 (So decrypt & decode in software) and therefore can not support hardware decoding. But that doesn't mean that hardware decoding of widevine protected content can not be supported on any mobile SoC. Sorry if I gave that impression.
> When a license for media is requested the security level it will be decrypted/decoded with is also sent and the returned license will restrict the widevine CDM to that security level.
> I believe NetFlix only support Level 1 and Level 3, which is why for a while the max resolution you could get watching NetFlix on chrome in a desktop browser was 720p as I believe that was the max resolution NetFlix offered at Level 3 and we had to use Edge/IE(iirc) to watch at 1080p as it used a different DRM system (PlayReady) and why atm Desktop 4k Netflix is only currently supported on Edge using (iirc) Intel gen7+ processors and NVidia Pascal GPUs (I don't know if AMD support PlayReady 3.0 on their GPUs as I don't have one so not really had the desire to investigate, I'm guessing that current Ryzen CPUs do not as they currently don't have integrated GPUs).
But even using Edge is not a silver bullet for all content, as some seems to be limited to that low bitrate 480p on all browsers, even if higher quality is available on a TV app.
You can actually get some content in 1080p in Chrome and Firefox with a browser extension. It is somewhat unreliable however and some videos still get capped at 720p.
Chrome extension (with explanation of how it works): https://github.com/truedread/netflix-1080p
Firefox extension (unfortunately doesn't seem to work at the moment): https://github.com/vladikoff/netflix-1080p-firefox
A WEB-DL is different from a (1080i) HDTV capture is different from a Blu Ray rip.
Netflix are optimising for bandwidth over quality - hell, the audio still seems to be 96 kbit AAC.
On youtube, you can right click > stats for nerds for some similar info.
I tend to find the quality of HD Netflix streams pretty great over my PS4. Certainly never a cause to download a torrent instead.
I believe fast.com is supposed to test against actual Netflix video delivery servers, just to detect this kind of ISP fuckery.
EDIT: Looking at some stuff, it seems like Netflix might "trust" a first-party browser to select the highest-quality stream that it has hardware video decode support for. In comparison, it sounds like there are extensions that enable 1080p in Chrome by pushing it into the list of playlist options, but it can cause a serious performance hit by decoding on the CPU.
Can anyone with more industry knowledge chime in here? To me, that sounds a lot like the kind of group that created the patent- and royalty-encumbered stuff the AOM was created to avoid.
In the ITU-T VCEG and ISO/IEC MPEG standardization world, the Joint Video Experts Team (JVET) was formed in October 2017 to develop a new video standard that has capabilities beyond HEVC. The recently-concluded Call for Proposals attracted an impressive number of 32 institutions from industry and academia, with a combined 22 submissions. The new standard, which will be called Versatile Video Coding (VVC), is expected to be finalized by October 2020.
This annoys me quite a bit. Because then it list out the so called large-scale video encoding from Facebook, Twitter, and Youtube. As if Video Encoding are only done by OTT providers. And the Video Encode from iPhone ( Consumers ) TV broadcast ( the good old content distributor ), livestream of events from Sports to Olympics.. etc doesn't matter. It is a very Silicon Valley mentality, and shown in AOM / AV1. After all they are creating their own codec for their own use. While other industry codec organisation will have to take care many "edge" use case.
>So how will we get to deliver HD quality Stranger Things at 100 kbps for the mobile user in rural Philippines? How will we stream a perfectly crisp 4K-HDR-WCG episode of Chef’s Table without requiring a 25 Mbps broadband connection?
It is interesting this 100kbps bitrate and rural Philippines comes up. Because this is the exact same quote from Amazon's video specialist Ben Waggoner mentioned on Doom9.
Shouldn't we be a little more realistic with the bitrate? We have 20 years of experience and research and yet we still don't have a single Audio codec ( two dimensions ) that could perform better then MP3 128Kbps at half the bitrate. Opus only manage to slightly edge it out at 96kbps, and that is with selected samples. There is only so far we can go, 100Kbps is barely enough for Audio. And we have Massive MIMO, and 5G, both will bring immense capacity increase to current Network. There is so much in the pipeline to further increase efficiency, capacity, lower latency, cost, and power. It is a little hard to think designing a 100Kbps for Video.
Currently Youtube streams 1080p AVC @ ~2.2Mbps. Which seems to be fine with most people already, especially on Computer / Tablet or Smartphone Screen Size. HEVC can probably do similar quality with 1.5Mbps. VVC should be aiming at below 1Mbps. Netflix is doing 15Mbps 4K Streaming with HEVC ( And people are complaining about quality already, I have no idea why I don't watch Netflix) VVC should really be aiming at better quality with 8Mbps. We should aim we specific bitrate and resolution with Real World Encoder as anchor, and specific quality to achieve.
I am glad that Silicon Valley is driving it since they have users all over the world, not just fiber-enabled users. If and when other people have a better aim, vision and understanding of the problem, I think you should find a good collaborative atmosphere.
In the meantime, let's not design for 100m users when we should be thinking about 2-3b users.
For example, one solution for low-bandwidth environments is to edge-cache it (farther toward the real edge than "local ISP") and then spread it locally via peer-to-peer short-distance radio (wifi, bluetooth, local cellular). This is not a 100x data compression challenge; it's a local delivery infrastructure challenge.
I did mention Youtube are doing 1080P at 2.2Mbps. We could be doing 1.2Mbps with HEVC today already. That is HD-capacble bandwidths. And 720P is also HD.
>In the meantime, let's not design for 100m users when we should be thinking about 2-3b users.
There are roughly 5 Billion Phones users world wide, 3.3Billion of users with Smartphone. 1.3 Billion user on LTE, with most of the other 2.3 Billion user do have LTE capable Smartphone but not on LTE plan or their country are on its way to LTE. There are still 600M user in China that does not have Smartphone but could have had LTE should they choose to. India is skipping 3G and moving on to 4G. With 300M Smartphone user and adding even more as we speak. We are expecting to have 2B+ users on 4G or 5G by 2020. With the majority other 1.4B user chooses not to be on LTE rather then inaccessible. And in developing part of the time world, there are little incentive to keep the 3G equipment and spectrum.
I pointed out the we have lots of Network improvement in the pipeline. Purely in terms of technical achievement, arguably speaking Massive MIMO is the biggest invention since wireless communication itself. We will see huge increase in capacity, available for everyone and cheaply. It isn't about first world or third world. US used to have the worst Telecom services, even compared to some third world countries. But Smartphone has changed that because users are willing to pay to get better services. They moves up the ladder, the impact were much more profound in those developing countries. As pointed out by Benedict Evans, there are lot of places where charging the phones cost more that the Data Plan.
I am not saying we shouldn't care about the rest. But a new video codec focussing on 100kbps is wasting energy actually for the few rather then many. By the time this video codec that is done with research and implementation, and hardware available to mobile user, we are talking about a cycle at best 3-4 years, or more likely 6 years+. Network bandwidth would have improved more so.
Another point worth mentioning, the post about Netflix 270Kbps encoding. x264 and x265, were never great at encoding low bitrate. rmvb / Rv10, or Rv9 EHQ from Real Media ( God I am old.... ) has always had better perceived quality at those bitrate. It isn't that we need a new codec aiming at low bitrate, it is that the current encoder are hardly optimise for those bitrate. There is a lot could be done in decoder filter, pre video cleaning to achieve better perceived quality at those bitrate. And RMVB used to do a very good job at it. The new RV11, which is based on HEVC, may have applied those tools as well. ( Not tested RV11 so I am not sure )
It could be streams containing information about smaller subblocs, etc.
If you take a fourrier transform, the more coefficients you include, the more faithful the reproduction is.
Splitting the quality level into multiple independent streams could have multiple advantages :
- better use of Multicast, as viewers for different quality settings get the same data, so you use approximately the bandwidth for one full quality video instead of full quality + medium + low...
- save on storage space for the same reason - save on transcoding time, as only one pass is needed
- better suited for distributed storage/transmission, on a platform like PeerTube, where every p2p client could contribute back instead of clustering by quality.
I am not working in this field, but I know a fair bit about compression, and this seems a no-brainer to me. Is it already done? Where? Or did I oversaw some issues?
This seems to be quite a general technology problem that looks like it applies to car engines, batteries, computer memory and any other highly optimized mature technology. I wonder if there's some way to change incentives so others get a chance. It happens in evolution in nature too and it's sometimes solved by mass extinctions. Hopefully that's not the only way.
I think Daala will likely live in AV2. But let wait for AV1 to go out first before they work on that. As the promised bit stream frozen still hasn't happened.