菜单 学习猿地 - LMONKEY

VIP

开通学习猿地VIP

尊享10项VIP特权 持续新增

知识通关挑战

打卡带练!告别无效练习

接私单赚外块

VIP优先接,累计金额超百万

学习猿地私房课免费学

大厂实战课仅对VIP开放

你的一对一导师

每月可免费咨询大牛30次

领取更多软件工程师实用特权

入驻
39
0

Flink Pre-defined Timestamp Extractors / Watermark Emitters(预定义的时间戳提取/水位线发射器)

原创
05/13 14:22
阅读数 26567

https://ci.apache.org/projects/flink/flink-docs-release-1.6/dev/event_timestamp_extractors.html

根据官网描述,Flink提供预定义的时间戳提取/水位线发射器。如下:

Flink provides abstractions that allow the programmer to assign their own timestamps and emit their own watermarks.

More specifically, one can do so by implementing one of the AssignerWithPeriodicWatermarks and AssignerWithPunctuatedWatermarks interfaces, depending on the use case.

In a nutshell, the first will emit watermarks periodically, while the second does so based on some property of the incoming records, e.g. whenever a special element is encountered in the stream.

AssignerWithPeriodicWatermarks介绍:

源码路径:flink\flink-streaming-java\src\main\java\org\apache\flink\streaming\api\functions\AssignerWithPeriodicWatermarks.java

/*
 * Licensed to the Apache Software Foundation (ASF) under one
 * or more contributor license agreements.  See the NOTICE file
 * distributed with this work for additional information
 * regarding copyright ownership.  The ASF licenses this file
 * to you under the Apache License, Version 2.0 (the
 * "License"); you may not use this file except in compliance
 * with the License.  You may obtain a copy of the License at
 *
 *    http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */

package org.apache.flink.streaming.api.functions;

import org.apache.flink.api.common.ExecutionConfig;
import org.apache.flink.streaming.api.watermark.Watermark;

import javax.annotation.Nullable;

/**
 * The {@code AssignerWithPeriodicWatermarks} assigns event time timestamps to elements,
 * and generates low watermarks that signal event time progress within the stream.
 * These timestamps and watermarks are used by functions and operators that operate
 * on event time, for example event time windows.
 *
 * <p>Use this class to generate watermarks in a periodical interval.
 * At most every {@code i} milliseconds (configured via
 * {@link ExecutionConfig#getAutoWatermarkInterval()}), the system will call the
 * {@link #getCurrentWatermark()} method to probe for the next watermark value.
 * The system will generate a new watermark, if the probed value is non-null
 * and has a timestamp larger than that of the previous watermark (to preserve
 * the contract of ascending watermarks).
 *
 * <p>The system may call the {@link #getCurrentWatermark()} method less often than every
 * {@code i} milliseconds, if no new elements arrived since the last call to the
 * method.
 *
 * <p>Timestamps and watermarks are defined as {@code longs} that represent the
 * milliseconds since the Epoch (midnight, January 1, 1970 UTC).
 * A watermark with a certain value {@code t} indicates that no elements with event
 * timestamps {@code x}, where {@code x} is lower or equal to {@code t}, will occur any more.
 *
 * @param <T> The type of the elements to which this assigner assigns timestamps.
 *
 * @see org.apache.flink.streaming.api.watermark.Watermark
 */
public interface AssignerWithPeriodicWatermarks<T> extends TimestampAssigner<T> {

    /**
     * Returns the current watermark. This method is periodically called by the
     * system to retrieve the current watermark. The method may return {@code null} to
     * indicate that no new Watermark is available.
     *
     * <p>The returned watermark will be emitted only if it is non-null and its timestamp
     * is larger than that of the previously emitted watermark (to preserve the contract of
     * ascending watermarks). If the current watermark is still
     * identical to the previous one, no progress in event time has happened since
     * the previous call to this method. If a null value is returned, or the timestamp
     * of the returned watermark is smaller than that of the last emitted one, then no
     * new watermark will be generated.
     *
     * <p>The interval in which this method is called and Watermarks are generated
     * depends on {@link ExecutionConfig#getAutoWatermarkInterval()}.
     *
     * @see org.apache.flink.streaming.api.watermark.Watermark
     * @see ExecutionConfig#getAutoWatermarkInterval()
     *
     * @return {@code Null}, if no watermark should be emitted, or the next watermark to emit.
     */
    @Nullable
    Watermark getCurrentWatermark();
}

AssignerWithPunctuatedWatermarks 接口介绍

源码路径 flink\flink-streaming-java\src\main\java\org\apache\flink\streaming\api\functions\AssignerWithPunctuatedWatermarks.java

/*
 * Licensed to the Apache Software Foundation (ASF) under one
 * or more contributor license agreements.  See the NOTICE file
 * distributed with this work for additional information
 * regarding copyright ownership.  The ASF licenses this file
 * to you under the Apache License, Version 2.0 (the
 * "License"); you may not use this file except in compliance
 * with the License.  You may obtain a copy of the License at
 *
 *    http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */

package org.apache.flink.streaming.api.functions;

import org.apache.flink.streaming.api.watermark.Watermark;

import javax.annotation.Nullable;

/**
 * The {@code AssignerWithPunctuatedWatermarks} assigns event time timestamps to elements,
 * and generates low watermarks that signal event time progress within the stream.
 * These timestamps and watermarks are used by functions and operators that operate
 * on event time, for example event time windows.
 *
 * <p>Use this class if certain special elements act as markers that signify event time
 * progress, and when you want to emit watermarks specifically at certain events.
 * The system will generate a new watermark, if the probed value is non-null
 * and has a timestamp larger than that of the previous watermark (to preserve
 * the contract of ascending watermarks).
 *
 * <p>For use cases that should periodically emit watermarks based on element timestamps,
 * use the {@link AssignerWithPeriodicWatermarks} instead.
 *
 * <p>The following example illustrates how to use this timestamp extractor and watermark
 * generator. It assumes elements carry a timestamp that describes when they were created,
 * and that some elements carry a flag, marking them as the end of a sequence such that no
 * elements with smaller timestamps can come anymore.
 *
 * <pre>{@code
 * public class WatermarkOnFlagAssigner implements AssignerWithPunctuatedWatermarks<MyElement> {
 *
 *     public long extractTimestamp(MyElement element, long previousElementTimestamp) {
 *         return element.getSequenceTimestamp();
 *     }
 *
 *     public Watermark checkAndGetNextWatermark(MyElement lastElement, long extractedTimestamp) {
 *         return lastElement.isEndOfSequence() ? new Watermark(extractedTimestamp) : null;
 *     }
 * }
 * }</pre>
 *
 * <p>Timestamps and watermarks are defined as {@code longs} that represent the
 * milliseconds since the Epoch (midnight, January 1, 1970 UTC).
 * A watermark with a certain value {@code t} indicates that no elements with event
 * timestamps {@code x}, where {@code x} is lower or equal to {@code t}, will occur any more.
 *
 * @param <T> The type of the elements to which this assigner assigns timestamps.
 *
 * @see org.apache.flink.streaming.api.watermark.Watermark
 */
public interface AssignerWithPunctuatedWatermarks<T> extends TimestampAssigner<T> {

    /**
     * Asks this implementation if it wants to emit a watermark. This method is called right after
     * the {@link #extractTimestamp(Object, long)} method.
     *
     * <p>The returned watermark will be emitted only if it is non-null and its timestamp
     * is larger than that of the previously emitted watermark (to preserve the contract of
     * ascending watermarks). If a null value is returned, or the timestamp of the returned
     * watermark is smaller than that of the last emitted one, then no new watermark will
     * be generated.
     *
     * <p>For an example how to use this method, see the documentation of
     * {@link AssignerWithPunctuatedWatermarks this class}.
     *
     * @return {@code Null}, if no watermark should be emitted, or the next watermark to emit.
     */
    @Nullable
    Watermark checkAndGetNextWatermark(T lastElement, long extractedTimestamp);
}

 

两种接口的DEMO:

AssignerWithPeriodicWatermarks 接口DEMO 如:https://www.cnblogs.com/felixzh/p/9687214.html

AssignerWithPunctuatedWatermarks 接口DEMO如下:

package org.apache.flink.streaming.examples.wordcount;

import org.apache.flink.api.common.functions.FlatMapFunction;
import org.apache.flink.api.common.functions.ReduceFunction;
import org.apache.flink.api.java.tuple.Tuple3;
import org.apache.flink.core.fs.FileSystem;
import org.apache.flink.streaming.api.TimeCharacteristic;
import org.apache.flink.streaming.api.datastream.DataStream;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.streaming.api.functions.AssignerWithPunctuatedWatermarks;
import org.apache.flink.streaming.api.watermark.Watermark;
import org.apache.flink.streaming.api.windowing.time.Time;
import org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer010;
import org.apache.flink.api.common.serialization.SimpleStringSchema;
import org.apache.flink.util.Collector;

import javax.annotation.Nullable;
import java.text.ParseException;
import java.text.SimpleDateFormat;
import java.util.Properties;


public class wcNew {
    public static void main(String[] args) throws Exception {
        final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
        env.enableCheckpointing(35000);
        env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime);
        Properties props = new Properties();
        props.setProperty("bootstrap.servers", "127.0.0.1:9092");
        props.setProperty("group.id", "flink-group-debug");
        props.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
        props.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");

        FlinkKafkaConsumer010<String> consumer =
            new FlinkKafkaConsumer010<>(args[0], new SimpleStringSchema(), props);
        consumer.setStartFromEarliest();
        consumer.assignTimestampsAndWatermarks(new MessageWaterEmitter());

        DataStream<Tuple3<String, Integer, String>> keyedStream = env
            .addSource(consumer)
            .flatMap(new MessageSplitter())
            .keyBy(0)
            .timeWindow(Time.seconds(10))
            .reduce(new ReduceFunction<Tuple3<String, Integer, String>>() {
                @Override
                public Tuple3<String, Integer, String> reduce(Tuple3<String, Integer, String> t0, Tuple3<String, Integer, String> t1) throws Exception {
                    String time0 = t0.getField(2);
                    String time1 = t1.getField(2);
                    Integer count0 = t0.getField(1);
                    Integer count1 = t1.getField(1);
                    return new Tuple3<>(t0.getField(0), count0 + count1, time0 +"|"+ time1);
                }
            });

        keyedStream.writeAsText(args[1], FileSystem.WriteMode.OVERWRITE);
        keyedStream.print();
        env.execute("Flink-Kafka num count");
    }

    private static class MessageWaterEmitter implements AssignerWithPunctuatedWatermarks<String> {

        private SimpleDateFormat sdf = new SimpleDateFormat("yyyyMMdd-hhmmss");

        /*
         * 先执行该函数,从element中提取时间戳
         *@param element  record行
         *@param previousElementTimestamp  当前的时间
         */
        @Override
        public long extractTimestamp(String element, long previousElementTimestamp) {
            if (element != null && element.contains(",")) {
                String[] parts = element.split(",");
                if (parts.length == 3) {
                    try {
                        return sdf.parse(parts[2]).getTime();
                    } catch (ParseException e) {
                        e.printStackTrace();
                    }
                }
            }
            return 0L;
        }

        /*
         * 再执行该函数,extractedTimestamp的值是extractTimestamp的返回值
         */
        @Nullable
        @Override
        public Watermark checkAndGetNextWatermark(String lastElement, long extractedTimestamp) {
            if (lastElement != null && lastElement.contains(",")) {
                String[] parts = lastElement.split(",");
                if(parts.length==3) {
                    try {
                        return new Watermark(sdf.parse(parts[2]).getTime());
                    } catch (ParseException e) {
                        e.printStackTrace();
                    }
                }

            }
            return null;
        }
    }
    private static class MessageSplitter implements FlatMapFunction<String, Tuple3<String, Integer, String>> {

        @Override
        public void flatMap(String s, Collector<Tuple3<String, Integer, String>> collector) throws Exception {
            if (s != null && s.contains(",")) {
                String[] strings = s.split(",");
                if(strings.length==3) {
                    collector.collect(new Tuple3<>(strings[0], Integer.parseInt(strings[1]), strings[2]));
                }
            }
        }
    }
}

打包成jar包后,上传到flink所在服务器,在控制台输入

flink run -c org.apache.flink.streaming.examples.wordcount.wcNew flink-kafka.jar topic_test_numcount /tmp/numcount.txt

 控制台输入

eee,1,20180504-113411
eee,2,20180504-113415
eee,2,20180504-113412
eee,2,20180504-113419
eee,1,20180504-113421

tail -f numcount.txt 监控numcount.txt输出 当最后一条输入时,可以看到程序输出了前4条的计算结果 (eee,7,20180504-113411|20180504-113415|20180504-113412|20180504-113419)

发表评论

0/200
39 点赞
0 评论
收藏