SlideShare a Scribd company logo
Ufuk Celebi

Hadoop Summit Dublin
April 13, 2016
Unified 

Stream & Batch Processing
with Apache Flink
What is Apache Flink?
2
Apache Flink is an open source stream
processing framework.
• Event Time Handling
• State & Fault Tolerance
• Low Latency
• High Throughput
Developed at the Apache Software Foundation.
Recent History
3
April ‘14 December ‘14
v0.5 v0.6 v0.7
March ‘16
Project
Incubation
Top Level
Project
v0.8 v0.10
Release
1.0
Flink Stack
4
DataStream API
Stream Processing
DataSet API
Batch Processing
Runtime
Distributed Streaming Data Flow
Libraries
Streaming and batch as first class citizens.
Today
5
DataStream API
Stream Processing
DataSet API
Batch Processing
Runtime
Distributed Streaming Data Flow
Libraries
Streaming and batch as first class citizens.
Counting
6
Seemingly simple application:
Count visitors, ad impressions, etc.
But generalizes well to other problems.
Batch Processing
7
All Input
Batch 

Job
All Output
Hadoop,
Spark,
Flink
Batch Processing
8
DataSet<ColorEvent>	counts	=	env	
.readFile("MM-dd.csv")	
.groupBy("color")	
.count();
Continuous Counting
9
Time
1h
Job 1
Continuous ingestion
Periodic files
Periodic batch jobs
Continuous Counting
9
Time
1h
Job 1
Continuous ingestion
Periodic files
Periodic batch jobs
1h
Job 2
Continuous Counting
9
Time
1h
Job 1
Continuous ingestion
Periodic files
Periodic batch jobs
1h
Job 2
1h
Job 3
Many Moving Parts
10
Batch Job
1h
Serving
Layer
Many Moving Parts
10
Batch Job
1h
Serving
Layer
Data loading
into HDFS

(e.g. Flume)
Many Moving Parts
10
Batch Job
1h
Serving
Layer
Periodic job
scheduler
(e.g. Oozie)
Many Moving Parts
10
Batch Job
1h
Serving
Layer
Batch 

processor
(e.g. Hadoop,

Spark, Flink)
High Latency
11
Latency from event to serving layer

usually in the range of hours.
Batch Job
1h
Serving
Layer
Schedule every X hours
Implicit Treatment of Time
12
Time is treated outside of your application.
Batch Job
1h
Serving
Layer
Implicit Treatment of Time
12
Time is treated outside of your application.
Batch Job
1h
Serving
LayerBatch Job
1h
Implicit Treatment of Time
12
Time is treated outside of your application.
Batch Job
1h
Serving
LayerBatch Job
1h
Batch Job
1h
Implicit Treatment of Time
13
DataSet<ColorEvent>	counts	=	env	
.readFile("MM-dd.csv")	
.groupBy("color")	
.count();
Time is implicit
in input file
Batch Job
Serving
Layer
Continuously
produced
Files are 

finite streams
Periodically
executed
Streaming over Batch
14
Streaming
15
Until now, stream processors were less mature

than their batch counterparts. This led to:
• in-house solutions,
• abuse of batch processors,
• Lambda architectures
This is no longer needed with new generation 

stream processors like Flink.
Streaming All the Way
16
Streaming
Job
Serving
Layer
Streaming All the Way
16
Streaming
Job
Serving
Layer
Message Queue

(e.g. Apache Kafka)
Durability and Replay
Streaming All the Way
16
Streaming
Job
Serving
Layer
Stream Processor

(e.g. Apache Flink)
Consistent Processing
Building Blocks of Flink
17
Explicit Handling

of Time
State & Fault
Tolerance
Performance
Building Blocks of Flink
17
Explicit Handling

of Time
State & Fault
Tolerance
Performance
Explicit Handling

of Time
Windowing
18
Time
Aggregates on streams
are scoped by windows
Time-driven Data-driven
e.g. last X minutes e.g. last X records
Tumbling Windows (No Overlap)
19
Time
e.g.“Count over the last 5 minutes”,


“Average over the last 100 records”
Tumbling Windows (No Overlap)
19
Time
e.g.“Count over the last 5 minutes”,


“Average over the last 100 records”
Tumbling Windows (No Overlap)
19
Time
e.g.“Count over the last 5 minutes”,


“Average over the last 100 records”
Tumbling Windows (No Overlap)
19
Time
e.g.“Count over the last 5 minutes”,


“Average over the last 100 records”
Tumbling Windows (No Overlap)
19
Time
e.g.“Count over the last 5 minutes”,


“Average over the last 100 records”
Sliding Windows (with Overlap)
20
Time
e.g. “Count over the last 5 minutes,
updated each minute.”,



“Average over the last 100 elements,
updated every 10 elements”
Sliding Windows (with Overlap)
20
Time
e.g. “Count over the last 5 minutes,
updated each minute.”,



“Average over the last 100 elements,
updated every 10 elements”
Sliding Windows (with Overlap)
20
Time
e.g. “Count over the last 5 minutes,
updated each minute.”,



“Average over the last 100 elements,
updated every 10 elements”
Sliding Windows (with Overlap)
20
Time
e.g. “Count over the last 5 minutes,
updated each minute.”,



“Average over the last 100 elements,
updated every 10 elements”
Sliding Windows (with Overlap)
20
Time
e.g. “Count over the last 5 minutes,
updated each minute.”,



“Average over the last 100 elements,
updated every 10 elements”
Explicit Handling of Time
21
DataStream<ColorEvent>	counts	=	env	
.addSource(new	KafkaConsumer(…))	
.keyBy("color")	
.timeWindow(Time.minutes(60))	
.apply(new	CountPerWindow());
Time is explicit
in your program
Session Windows
22
Time
Sessions close after period of inactivity.
Inactivity
Inactivity
e.g. “Count activity from login until time-out or logout.”
Session Windows
23
DataStream<ColorEvent>	counts	=	env	
.addSource(new	KafkaConsumer(…))	
.keyBy("color")	
.window(EventTimeSessionWindows	
		.withGap(Time.minutes(10))	
.apply(new	CountPerWindow());
Notions of Time
24
12:23 am
Event Time
Time when event happened.
Notions of Time
24
12:23 am
Event Time
1:37 pm
Processing Time
Time measured by system clock
Time when event happened.
1977 1980 1983 1999 2002 2005 2015
Processing Time
Episode

IV
Episode

V
Episode

VI
Episode

I
Episode

II
Episode

III
Episode

VII
Event Time
Out of Order Events
25
Out of Order Events
26
1st burst of events
2nd burst of events
Event Time
Windows
Processing Time
Windows
Notions of Time
27
env.setStreamTimeCharacteristic(	
		TimeCharacteristic.EventTime);

DataStream<ColorEvent>	counts	=	env	
...	
.timeWindow(Time.minutes(60))	
.apply(new	CountPerWindow());
Explicit Handling of Time
28
1. Expressive windowing
2. Accurate results for out of order data
3. Deterministic results
Building Blocks of Flink
29
Explicit Handling

of Time
State & Fault
Tolerance
Performance
Stateful Streaming
30
Stateless Stream

Processing
Stateful Stream

Processing
Op Op
State
Processing Semantics
31
At-least once
May over-count
after failure
Exactly Once
Correct counts
after failures
End-to-end exactly once
Correct counts in external system
(e.g. DB, file system) after failure
Processing Semantics
32
• Flink guarantees exactly once (can be configured

for at-least once if desired)
• End-to-end exactly once with specific sources

and sinks (e.g. Kafka -> Flink -> HDFS)
• Internally, Flink periodically takes consistent

snapshots of the state without ever stopping

computation
Building Blocks of Flink
33
Explicit Handling

of Time
State & Fault
Tolerance
Performance
Yahoo! Benchmark
34
• Storm 0.10, Spark Streaming 1.5, and Flink 0.10

benchmark by Storm team at Yahoo!
• Focus on measuring end-to-end latency 

at low throughputs (~ 200k events/sec)
• First benchmark modelled after a real application
https://yahooeng.tumblr.com/post/135321837876/

benchmarking-streaming-computation-engines-at
Yahoo! Benchmark
35
• Count ad impressions grouped by campaign
• Compute aggregates over last 10 seconds
• Make aggregates available for queries in Redis
99th Percentile

Latency (sec)
9
8
2
1
Storm 0.10
Flink 0.10
60 80 100 120 140 160 180
Throughput

(1000 events/sec)
Storm and Flink at

low latencies
Latency (Lower is Better)
36
99th Percentile

Latency (sec)
9
8
2
1
Storm 0.10
Flink 0.10
60 80 100 120 140 160 180
Throughput

(1000 events/sec)
Spark Streaming 1.5
Spark latency increases

with throughput
Storm and Flink at

low latencies
Latency (Lower is Better)
36
Extending the Benchmark
37
• Great starting point, but benchmark stops at 

low write throughput and programs are not

fault-tolerant
• Extend benchmark to high volumes and 

Flink’s built-in fault-tolerant state
http://data-artisans.com/extending-the-yahoo-streaming-
benchmark/
Extending the Benchmark
38
Use Flink’s internal state
Throughput (Higher is Better)
39
5.000.000 10.000.000 15.000.000
Maximum Throughput (events/sec)
0
Flink
w/ Kafka
Storm
w/ Kafka
Throughput (Higher is Better)
39
5.000.000 10.000.000 15.000.000
Maximum Throughput (events/sec)
0
Flink
w/ Kafka
Storm
w/ Kafka
Limited by bandwidth between

Kafka and Flink cluster
Throughput (Higher is Better)
39
5.000.000 10.000.000 15.000.000
Maximum Throughput (events/sec)
0
Flink
w/o Kafka
Flink
w/ Kafka
Storm
w/ Kafka
Limited by bandwidth between

Kafka and Flink cluster
Summary
40
• Stream processing is gaining momentum, the right
paradigm for continuous data applications
• Choice of framework is crucial – even seemingly
simple applications become complex at scale and
in production
• Flink offers unique combination of efficiency,
consistency and event time handling
Libraries
41
DataStream API
Stream Processing
DataSet API
Batch Processing
Runtime
Distributed Streaming Data Flow
Libraries

Complex Event Processing (CEP), ML, Graphs
42
Pattern<MonitoringEvent, ?> warningPattern = 

Pattern.<MonitoringEvent>begin("First Event")
.subtype(TemperatureEvent.class)
.where(evt -> evt.getTemperature() >= THRESHOLD)
.next("Second Event")
.subtype(TemperatureEvent.class)
.where(evt -> evt.getTemperature() >= THRESHOLD)
.within(Time.seconds(10));
Complex Event Processing (CEP)
Upcoming Features
43
• SQL: ongoing work in collaboration with Apache
Calcite
• Dynamic Scaling: adapt resources to stream volume,
scale up for historical stream processing
• Queryable State: query the state inside the stream
processor

SQL
44
SELECT	STREAM	*	FROM	Orders	WHERE	units	>	3;
		rowtime	|	productId	|	orderId	|	units	
----------+-----------+---------+-------	
	10:17:00	|								30	|							5	|					4	
	10:18:07	|								30	|							8	|				20	
	11:02:00	|								10	|							9	|					6	
	11:09:30	|								40	|						11	|				12	
	11:24:11	|								10	|						12	|					4

						…			|									…	|							…	|					…
key­value states have to be redistributed when rescaling a Flink job. Distributing the key­value 
states coherently with the job’s new partitioning will lead to a consistent state. 
 
 
Dynamic Scaling
45
Queryable State
46
Query Flink directly
Join the Community
47
Read
http://flink.apache.org/blog
http://data-artisans.com/blog
Follow

@ApacheFlink

@dataArtisans
Subscribe
(news | user | dev)@flink.apache.org

More Related Content

PDF
A Thorough Comparison of Delta Lake, Iceberg and Hudi
Databricks
 
PDF
Apache Flink internals
Kostas Tzoumas
 
ODP
Stream processing using Kafka
Knoldus Inc.
 
PPTX
Using Apache Arrow, Calcite, and Parquet to Build a Relational Cache
Dremio Corporation
 
PDF
Building an open data platform with apache iceberg
Alluxio, Inc.
 
PDF
Introduction to Kafka Streams
Guozhang Wang
 
PDF
Introduction to Apache Flink - Fast and reliable big data processing
Till Rohrmann
 
PPTX
How to build a streaming Lakehouse with Flink, Kafka, and Hudi
Flink Forward
 
A Thorough Comparison of Delta Lake, Iceberg and Hudi
Databricks
 
Apache Flink internals
Kostas Tzoumas
 
Stream processing using Kafka
Knoldus Inc.
 
Using Apache Arrow, Calcite, and Parquet to Build a Relational Cache
Dremio Corporation
 
Building an open data platform with apache iceberg
Alluxio, Inc.
 
Introduction to Kafka Streams
Guozhang Wang
 
Introduction to Apache Flink - Fast and reliable big data processing
Till Rohrmann
 
How to build a streaming Lakehouse with Flink, Kafka, and Hudi
Flink Forward
 

What's hot (20)

PPTX
Apache Flink and what it is used for
Aljoscha Krettek
 
PPTX
Airflow - a data flow engine
Walter Liu
 
PDF
ksqlDB - Stream Processing simplified!
Guido Schmutz
 
PPTX
Autoscaling Flink with Reactive Mode
Flink Forward
 
PDF
Disaster Recovery Options Running Apache Kafka in Kubernetes with Rema Subra...
HostedbyConfluent
 
PDF
Apache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
Databricks
 
PDF
Introduction to Apache Flink
datamantra
 
PDF
Running Apache Spark on Kubernetes: Best Practices and Pitfalls
Databricks
 
PPTX
Apache Flink in the Cloud-Native Era
Flink Forward
 
PPTX
Evening out the uneven: dealing with skew in Flink
Flink Forward
 
PDF
Building a fully managed stream processing platform on Flink at scale for Lin...
Flink Forward
 
PDF
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
Databricks
 
PDF
Incremental View Maintenance with Coral, DBT, and Iceberg
Walaa Eldin Moustafa
 
PPTX
Kafka Tutorial - Introduction to Apache Kafka (Part 1)
Jean-Paul Azar
 
PDF
Apache Flink 101 - the rise of stream processing and beyond
Bowen Li
 
PPTX
kafka
Amikam Snir
 
PPTX
Spark architecture
GauravBiswas9
 
PDF
Performance Tuning RocksDB for Kafka Streams’ State Stores
confluent
 
PDF
Apache Airflow
Sumit Maheshwari
 
PDF
Introduction To Flink
Knoldus Inc.
 
Apache Flink and what it is used for
Aljoscha Krettek
 
Airflow - a data flow engine
Walter Liu
 
ksqlDB - Stream Processing simplified!
Guido Schmutz
 
Autoscaling Flink with Reactive Mode
Flink Forward
 
Disaster Recovery Options Running Apache Kafka in Kubernetes with Rema Subra...
HostedbyConfluent
 
Apache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
Databricks
 
Introduction to Apache Flink
datamantra
 
Running Apache Spark on Kubernetes: Best Practices and Pitfalls
Databricks
 
Apache Flink in the Cloud-Native Era
Flink Forward
 
Evening out the uneven: dealing with skew in Flink
Flink Forward
 
Building a fully managed stream processing platform on Flink at scale for Lin...
Flink Forward
 
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
Databricks
 
Incremental View Maintenance with Coral, DBT, and Iceberg
Walaa Eldin Moustafa
 
Kafka Tutorial - Introduction to Apache Kafka (Part 1)
Jean-Paul Azar
 
Apache Flink 101 - the rise of stream processing and beyond
Bowen Li
 
Spark architecture
GauravBiswas9
 
Performance Tuning RocksDB for Kafka Streams’ State Stores
confluent
 
Apache Airflow
Sumit Maheshwari
 
Introduction To Flink
Knoldus Inc.
 
Ad

Viewers also liked (20)

PPTX
Click-Through Example for Flink’s KafkaConsumer Checkpointing
Robert Metzger
 
PPTX
The Stream Processor as the Database - Apache Flink @ Berlin buzzwords
Stephan Ewen
 
PPTX
Flink vs. Spark
Slim Baltagi
 
PPTX
Real-time Stream Processing with Apache Flink
DataWorks Summit
 
PPT
Running Spark in Production
DataWorks Summit/Hadoop Summit
 
PPTX
Apache Flink at Strata San Jose 2016
Kostas Tzoumas
 
PPT
Step-by-Step Introduction to Apache Flink
Slim Baltagi
 
PDF
Stream Meets Batch for Smarter Analytics- Impetus White Paper
Impetus Technologies
 
PPTX
Apache Flink @ NYC Flink Meetup
Stephan Ewen
 
PPTX
Data Analysis With Apache Flink
DataWorks Summit
 
PPTX
Flink Batch Processing and Iterations
Sameer Wadkar
 
PPTX
Apache Flink - Hadoop MapReduce Compatibility
Fabian Hueske
 
PDF
Apache Flink Deep Dive
Vasia Kalavri
 
PPTX
Is your Enterprise Data lake Metadata Driven AND Secure?
DataWorks Summit/Hadoop Summit
 
PPTX
A Data Streaming Architecture with Apache Flink (berlin Buzzwords 2016)
Robert Metzger
 
PDF
Querying the Internet of Things: Streaming SQL on Kafka/Samza and Storm/Trident
Julian Hyde
 
PDF
Streaming Analytics & CEP - Two sides of the same coin?
Till Rohrmann
 
PDF
FlinkML: Large Scale Machine Learning with Apache Flink
Theodoros Vasiloudis
 
PDF
Batch and Stream Graph Processing with Apache Flink
Vasia Kalavri
 
PDF
Interactive Data Analysis with Apache Flink @ Flink Meetup in Berlin
Till Rohrmann
 
Click-Through Example for Flink’s KafkaConsumer Checkpointing
Robert Metzger
 
The Stream Processor as the Database - Apache Flink @ Berlin buzzwords
Stephan Ewen
 
Flink vs. Spark
Slim Baltagi
 
Real-time Stream Processing with Apache Flink
DataWorks Summit
 
Running Spark in Production
DataWorks Summit/Hadoop Summit
 
Apache Flink at Strata San Jose 2016
Kostas Tzoumas
 
Step-by-Step Introduction to Apache Flink
Slim Baltagi
 
Stream Meets Batch for Smarter Analytics- Impetus White Paper
Impetus Technologies
 
Apache Flink @ NYC Flink Meetup
Stephan Ewen
 
Data Analysis With Apache Flink
DataWorks Summit
 
Flink Batch Processing and Iterations
Sameer Wadkar
 
Apache Flink - Hadoop MapReduce Compatibility
Fabian Hueske
 
Apache Flink Deep Dive
Vasia Kalavri
 
Is your Enterprise Data lake Metadata Driven AND Secure?
DataWorks Summit/Hadoop Summit
 
A Data Streaming Architecture with Apache Flink (berlin Buzzwords 2016)
Robert Metzger
 
Querying the Internet of Things: Streaming SQL on Kafka/Samza and Storm/Trident
Julian Hyde
 
Streaming Analytics & CEP - Two sides of the same coin?
Till Rohrmann
 
FlinkML: Large Scale Machine Learning with Apache Flink
Theodoros Vasiloudis
 
Batch and Stream Graph Processing with Apache Flink
Vasia Kalavri
 
Interactive Data Analysis with Apache Flink @ Flink Meetup in Berlin
Till Rohrmann
 
Ad

Similar to Unified Stream and Batch Processing with Apache Flink (20)

PDF
Unified Stream & Batch Processing with Apache Flink (Hadoop Summit Dublin 2016)
ucelebi
 
PDF
Stream Processing with Apache Flink
C4Media
 
PDF
Stream Processing with Apache Flink (Flink.tw Meetup 2016/07/19)
Apache Flink Taiwan User Group
 
PPTX
QCon London - Stream Processing with Apache Flink
Robert Metzger
 
PPTX
GOTO Night Amsterdam - Stream processing with Apache Flink
Robert Metzger
 
PPTX
Data Stream Processing with Apache Flink
Fabian Hueske
 
PPTX
Apache Flink Meetup Munich (November 2015): Flink Overview, Architecture, Int...
Robert Metzger
 
PDF
Apache Flink @ Tel Aviv / Herzliya Meetup
Robert Metzger
 
PPTX
Flexible and Real-Time Stream Processing with Apache Flink
DataWorks Summit
 
PPTX
Apache Flink(tm) - A Next-Generation Stream Processor
Aljoscha Krettek
 
PPTX
2018-04 Kafka Summit London: Stephan Ewen - "Apache Flink and Apache Kafka fo...
Ververica
 
PPTX
Counting Elements in Streams
Jamie Grier
 
PDF
Advanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan Ewen
confluent
 
PDF
Apache Flink: Better, Faster & Uncut - Piotr Nowojski, data Artisans
Evention
 
PDF
K. Tzoumas & S. Ewen – Flink Forward Keynote
Flink Forward
 
PPTX
Architecture of Flink's Streaming Runtime @ ApacheCon EU 2015
Robert Metzger
 
PPTX
Chicago Flink Meetup: Flink's streaming architecture
Robert Metzger
 
PDF
Unlocking the Power of Apache Flink: An Introduction in 4 Acts
HostedbyConfluent
 
PPTX
Continuous Processing with Apache Flink - Strata London 2016
Stephan Ewen
 
PDF
Introduction to Stateful Stream Processing with Apache Flink.
Konstantinos Kloudas
 
Unified Stream & Batch Processing with Apache Flink (Hadoop Summit Dublin 2016)
ucelebi
 
Stream Processing with Apache Flink
C4Media
 
Stream Processing with Apache Flink (Flink.tw Meetup 2016/07/19)
Apache Flink Taiwan User Group
 
QCon London - Stream Processing with Apache Flink
Robert Metzger
 
GOTO Night Amsterdam - Stream processing with Apache Flink
Robert Metzger
 
Data Stream Processing with Apache Flink
Fabian Hueske
 
Apache Flink Meetup Munich (November 2015): Flink Overview, Architecture, Int...
Robert Metzger
 
Apache Flink @ Tel Aviv / Herzliya Meetup
Robert Metzger
 
Flexible and Real-Time Stream Processing with Apache Flink
DataWorks Summit
 
Apache Flink(tm) - A Next-Generation Stream Processor
Aljoscha Krettek
 
2018-04 Kafka Summit London: Stephan Ewen - "Apache Flink and Apache Kafka fo...
Ververica
 
Counting Elements in Streams
Jamie Grier
 
Advanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan Ewen
confluent
 
Apache Flink: Better, Faster & Uncut - Piotr Nowojski, data Artisans
Evention
 
K. Tzoumas & S. Ewen – Flink Forward Keynote
Flink Forward
 
Architecture of Flink's Streaming Runtime @ ApacheCon EU 2015
Robert Metzger
 
Chicago Flink Meetup: Flink's streaming architecture
Robert Metzger
 
Unlocking the Power of Apache Flink: An Introduction in 4 Acts
HostedbyConfluent
 
Continuous Processing with Apache Flink - Strata London 2016
Stephan Ewen
 
Introduction to Stateful Stream Processing with Apache Flink.
Konstantinos Kloudas
 

More from DataWorks Summit/Hadoop Summit (20)

PPT
Running Apache Spark & Apache Zeppelin in Production
DataWorks Summit/Hadoop Summit
 
PPT
State of Security: Apache Spark & Apache Zeppelin
DataWorks Summit/Hadoop Summit
 
PDF
Unleashing the Power of Apache Atlas with Apache Ranger
DataWorks Summit/Hadoop Summit
 
PDF
Enabling Digital Diagnostics with a Data Science Platform
DataWorks Summit/Hadoop Summit
 
PDF
Revolutionize Text Mining with Spark and Zeppelin
DataWorks Summit/Hadoop Summit
 
PDF
Double Your Hadoop Performance with Hortonworks SmartSense
DataWorks Summit/Hadoop Summit
 
PDF
Hadoop Crash Course
DataWorks Summit/Hadoop Summit
 
PDF
Data Science Crash Course
DataWorks Summit/Hadoop Summit
 
PDF
Apache Spark Crash Course
DataWorks Summit/Hadoop Summit
 
PDF
Dataflow with Apache NiFi
DataWorks Summit/Hadoop Summit
 
PPTX
Schema Registry - Set you Data Free
DataWorks Summit/Hadoop Summit
 
PPTX
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
DataWorks Summit/Hadoop Summit
 
PDF
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
DataWorks Summit/Hadoop Summit
 
PPTX
Mool - Automated Log Analysis using Data Science and ML
DataWorks Summit/Hadoop Summit
 
PPTX
How Hadoop Makes the Natixis Pack More Efficient
DataWorks Summit/Hadoop Summit
 
PPTX
HBase in Practice
DataWorks Summit/Hadoop Summit
 
PPTX
The Challenge of Driving Business Value from the Analytics of Things (AOT)
DataWorks Summit/Hadoop Summit
 
PDF
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
DataWorks Summit/Hadoop Summit
 
PPTX
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
DataWorks Summit/Hadoop Summit
 
PPTX
Backup and Disaster Recovery in Hadoop
DataWorks Summit/Hadoop Summit
 
Running Apache Spark & Apache Zeppelin in Production
DataWorks Summit/Hadoop Summit
 
State of Security: Apache Spark & Apache Zeppelin
DataWorks Summit/Hadoop Summit
 
Unleashing the Power of Apache Atlas with Apache Ranger
DataWorks Summit/Hadoop Summit
 
Enabling Digital Diagnostics with a Data Science Platform
DataWorks Summit/Hadoop Summit
 
Revolutionize Text Mining with Spark and Zeppelin
DataWorks Summit/Hadoop Summit
 
Double Your Hadoop Performance with Hortonworks SmartSense
DataWorks Summit/Hadoop Summit
 
Hadoop Crash Course
DataWorks Summit/Hadoop Summit
 
Data Science Crash Course
DataWorks Summit/Hadoop Summit
 
Apache Spark Crash Course
DataWorks Summit/Hadoop Summit
 
Dataflow with Apache NiFi
DataWorks Summit/Hadoop Summit
 
Schema Registry - Set you Data Free
DataWorks Summit/Hadoop Summit
 
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
DataWorks Summit/Hadoop Summit
 
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
DataWorks Summit/Hadoop Summit
 
Mool - Automated Log Analysis using Data Science and ML
DataWorks Summit/Hadoop Summit
 
How Hadoop Makes the Natixis Pack More Efficient
DataWorks Summit/Hadoop Summit
 
HBase in Practice
DataWorks Summit/Hadoop Summit
 
The Challenge of Driving Business Value from the Analytics of Things (AOT)
DataWorks Summit/Hadoop Summit
 
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
DataWorks Summit/Hadoop Summit
 
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
DataWorks Summit/Hadoop Summit
 
Backup and Disaster Recovery in Hadoop
DataWorks Summit/Hadoop Summit
 

Recently uploaded (20)

PDF
SparkLabs Primer on Artificial Intelligence 2025
SparkLabs Group
 
PDF
Software Development Company | KodekX
KodekX
 
PDF
Chapter 2 Digital Image Fundamentals.pdf
Getnet Tigabie Askale -(GM)
 
PDF
Oracle AI Vector Search- Getting Started and what's new in 2025- AIOUG Yatra ...
Sandesh Rao
 
PDF
Advances in Ultra High Voltage (UHV) Transmission and Distribution Systems.pdf
Nabajyoti Banik
 
PDF
Using Anchore and DefectDojo to Stand Up Your DevSecOps Function
Anchore
 
PDF
AI Unleashed - Shaping the Future -Starting Today - AIOUG Yatra 2025 - For Co...
Sandesh Rao
 
PDF
Google I/O Extended 2025 Baku - all ppts
HusseinMalikMammadli
 
PDF
Presentation about Hardware and Software in Computer
snehamodhawadiya
 
PDF
agentic-ai-and-the-future-of-autonomous-systems.pdf
siddharthnetsavvies
 
PDF
Accelerating Oracle Database 23ai Troubleshooting with Oracle AHF Fleet Insig...
Sandesh Rao
 
PDF
Security features in Dell, HP, and Lenovo PC systems: A research-based compar...
Principled Technologies
 
PPTX
Comunidade Salesforce São Paulo - Desmistificando o Omnistudio (Vlocity)
Francisco Vieira Júnior
 
PDF
Enable Enterprise-Ready Security on IBM i Systems.pdf
Precisely
 
PDF
CIFDAQ's Token Spotlight: SKY - A Forgotten Giant's Comeback?
CIFDAQ
 
PDF
A Day in the Life of Location Data - Turning Where into How.pdf
Precisely
 
PDF
Doc9.....................................
SofiaCollazos
 
PDF
Make GenAI investments go further with the Dell AI Factory - Infographic
Principled Technologies
 
PDF
NewMind AI Weekly Chronicles - July'25 - Week IV
NewMind AI
 
PPTX
New ThousandEyes Product Innovations: Cisco Live June 2025
ThousandEyes
 
SparkLabs Primer on Artificial Intelligence 2025
SparkLabs Group
 
Software Development Company | KodekX
KodekX
 
Chapter 2 Digital Image Fundamentals.pdf
Getnet Tigabie Askale -(GM)
 
Oracle AI Vector Search- Getting Started and what's new in 2025- AIOUG Yatra ...
Sandesh Rao
 
Advances in Ultra High Voltage (UHV) Transmission and Distribution Systems.pdf
Nabajyoti Banik
 
Using Anchore and DefectDojo to Stand Up Your DevSecOps Function
Anchore
 
AI Unleashed - Shaping the Future -Starting Today - AIOUG Yatra 2025 - For Co...
Sandesh Rao
 
Google I/O Extended 2025 Baku - all ppts
HusseinMalikMammadli
 
Presentation about Hardware and Software in Computer
snehamodhawadiya
 
agentic-ai-and-the-future-of-autonomous-systems.pdf
siddharthnetsavvies
 
Accelerating Oracle Database 23ai Troubleshooting with Oracle AHF Fleet Insig...
Sandesh Rao
 
Security features in Dell, HP, and Lenovo PC systems: A research-based compar...
Principled Technologies
 
Comunidade Salesforce São Paulo - Desmistificando o Omnistudio (Vlocity)
Francisco Vieira Júnior
 
Enable Enterprise-Ready Security on IBM i Systems.pdf
Precisely
 
CIFDAQ's Token Spotlight: SKY - A Forgotten Giant's Comeback?
CIFDAQ
 
A Day in the Life of Location Data - Turning Where into How.pdf
Precisely
 
Doc9.....................................
SofiaCollazos
 
Make GenAI investments go further with the Dell AI Factory - Infographic
Principled Technologies
 
NewMind AI Weekly Chronicles - July'25 - Week IV
NewMind AI
 
New ThousandEyes Product Innovations: Cisco Live June 2025
ThousandEyes
 

Unified Stream and Batch Processing with Apache Flink