As one of our senior server-side developers on the Red5 Pro team, I get to play around with our code to figure out what other cool stuff we can build with our server. I was helping a Red5 Pro user look at packets inside some FLV files and I realized the technique is not well documented as the method for reading live packets. Having experimented with packets many times in the past, I decided to write up a tutorial combining the two methods. This is a first post in a three part series covering some advanced server-side programming which involves dynamically injecting content into live streams. Let's first start by asking why would you even want to do this. Here are a few use cases:

1. Scheduling Pre Recorded Events

  • Using a prerecorded file, you may want to rebroadcast it at a scheduled time. You can keep the stream open with a place-holder video or pre-roll until the scheduled time begins.

2. Stream Switching

  • Show one person's stream when talking and switch to another when they begin talking. Like Google Hangouts.

3. Inserting interstitial commercials

  • You may have a need to insert clips into a live stream. In this series of blog posts, I'm going to show just how to do that.

Let's take number 3, and look at how we can we build an experience which inserts commercials (a short video clip) at a set interval into a live stream. I hope this example will be useful to learn how to build any of the above use cases and more. This first post shows how to examine the video packets of the pre-recorded video (the commercial) to determine where the key frames are so that we can cleanly insert it into the live stream. The next post in the series will delve into examining audio packets of both live and prerecorded to illustrate smoothly inserting new audio in place of the live audio. The final one will bring it all together in a live streaming example complete with commercial interruptions every ten seconds.

Examining Stream Packets

Let's get started by examining the stream packets. Since Red5 Pro mobile clients stream h264 video, I focus on that codec exclusively in this example.

Red5 Pro gives you the ability to work with the individual packets that make up a media stream. There are two different methods to do this. One is used for live streams, and the other for video on demand (VOD). The live stream method uses the IStreamListener interface, which we will get to later. The VOD method shown in this post uses the IMessageInput interface. The live interface pushes packets to our consumer. The VOD interface requires us to pull each packet. We also have to look up the file and request the server to set up the file reader which parses the packets from the FLV container.

Code:

Let's start with creating a private method in our MultithreadedApplicationAdapter class to read the packets. We will need to collect information about the file before inserting it into a live stream.

private void readVideoPackets(){
        log.info("Reading video frames now.");

Use the app scope to look up the FLV in the streams folder.

Resource resource = scope.getResource("super_commercial.flv");

Next we want to create a VOD provider service which is responsible for managing FLVs and MP4s. Once we have the provider, we request the desired file be set up with a parser for us (super_commercial.flv). The provider service is registered as a spring bean. We have a reference to the spring context through the application adapter scope.

IProviderService providerService = (IProviderService) scope.getContext().getBean(IProviderService.BEAN_NAME);
IMessageInput msgIn = providerService.getVODProviderInput(scope, resource.getFilename());

Assuming the resource/file is there :), we begin reading the packets which wrap units of the elementary A/V streams. We set up a while-loop and pull messages until we receive a null which signifies the end of the file. Each packet we are interested in would be wrapped in an RTMPMessage. The payload of the RTMPMessage is one of VideoData, AudioData, or Notify, which is metadata or a cue-point.

while ((msg=msgIn.pullMessage()) != null) {
    //we have a packet. It better be an rtmp message!
    if (msg instanceof RTMPMessage) {
        // Cast the reference to the api object.
        RTMPMessage lMsg = (RTMPMessage) msg;
        // is it VideoDat,AudioData, or Notify(meta data, net stream send)

This time we are looking at video data. AVC video has two bytes for header info, and 3 bytes to offset the millisecond timestamp by additional nanos.

        if (lMsg.getBody() instanceof VideoData) {
             video = (VideoData) lMsg.getBody();
            IoBuffer buffer = video.getData();
            int tagh = video.getData().get();//tag header
            byte key=video.getData().get(); //tag info

A note about timestamps. AVC video is allowed to output consecutive packets which have the same timestamp. In fact, it can output frames which require reordering. Some writers of the FLV format do not allow this and as a result, they increment the timestamp by one or two milliseconds. If we were interested in sending only a key frame, we might have to send packets until the timestamp incremented beyond .001 or .002 seconds.

            log.info("tag       time -> {}",video.getTimestamp());
            if(lastTime+2>=video.getTimestamp() && lastTime!=video.getTimestamp()){
                log.warn("Multi slice video present");
            }
            lastTime =video.getTimestamp();

Reading the two tag headers, we can determine first, the type of frame and type of codec, and second, if it is our codec, what the payload contains. There are three types of payloads we are looking for. The codec critical data, the key frame data, and the coded slice data. The critical data packet is typically sent prior to a key frame packet or series of key packets. The first block is for the key frame packet.

            //we only process H264
            if(tagh==0x17 && key==1){
                //timestamp presentation time offset nanos , usually zero.
                video.getData().get();
                video.getData().get();
                video.getData().get();
                log.info("key packet");

Once we have read past the two byte header and the three byte extended timestamp, we reach the AVCc payload. H264 video comes in two flavors. One is AVC represented as marker separated packets. Ours is AVCc represented as size prepended packets. The number of bytes for the size is stored in the critical data section. Lets assume we parsed it and found it to be 4 bytes long in big-endian format. We will read the size and copy the payloads until we reach the end of the packet. We will double check that the h264 unit type is 5 which is a key frame packet.

                //the actual idr nalus AVCc format
                while(video.getData().remaining()>3){
                    int size = ((video.getData().get()&0xFF) << 24) 
                    		| ((video.getData().get()&0xFF) << 16) 
                    		| ((video.getData().get()&0xFF) << 8) 
                    		| ((video.getData().get()&0xFF) ) ;
                    log.info("idr size {}", size);
                    byte[] load = new byte[size];
                    video.getData().get(load);
                    log.info(" confirm nalu type 5 == {}",load[0] & 0x1F); 
                }   

This next block is processing coded slice data. This is non-key frame data and usually covers a small section of screen where motion has occurred. We will read past the extended timestamp, and then begin reading the size and copying the payloads. We're not as concerned about verifying the h264 content type.

            }else if(tagh==0x27){
                //this is a non key frame packet.
                log.info("interframe packet");
                //timestamp presentation time offset nanos , usually zero.
                video.getData().get();
                video.getData().get();
                video.getData().get();
                //the actual slice nalus AVCc format
                while(video.getData().remaining()>3){
                    int size = ((video.getData().get()&0xFF) << 24) 
                    		| ((video.getData().get()&0xFF) << 16) 
                    		| ((video.getData().get()&0xFF) << 8) 
                    		| ((video.getData().get()&0xFF) ) ;
                    log.info("slice size {}", size);
                    byte[] load = new byte[size];
                    video.getData().get(load);
                    log.info("nalu type {}",load[0] & 0x1F); 
                }

The final block is where we would parse the critical data and determine the H264 level, profile, and decoder compatibility settings. It would also show us how many bytes make up the size prepending the payloads. It can also show resolution, framerate, and orientation, but not everything is 'guaranteed' to be flagged and present. We will cover parsing the critical data in another post.

            }else if(tagh==0x17&& key==0){
                log.info("config");
                //we will cover parsing the sps and pps in another post.
                //it is the same structure in the iso mpeg4 video format data box.
            }

And finally, we need to rewind the video data object for the next processor in line, be it a subscriber, or another processor. In the case of VOD, it is not really required but I have a habit of rewinding any buffered data packets when finished.

            if(video!=null)
                video.getData().rewind();
        }
    }    
}

That is all there is to it. In the next post, I'll show how to process the audio packets.

-Andy Shaules

  • Share: