Analytics Engine

์‹œ์Šคํ…œ ๊ตฌ์„ฑ๋„

Hadoop Echo System

Spark System

๋ฐ์ดํ„ฐ์ฒ˜๋ฆฌ ํŒจํ‚ค์ง€ ๋น„๊ต

Feature Strom Spark Samza
Delivery Semantics At Least Once Exactly-Once with Trident Exactly Once Except in some failure scenarios At Least Once
State Management Stateless Roll your own or use Trident Stateful Writes state to storage (HDFS) Stateful Embeded key-value store
Latency Sub-Second Seconds Depending on batch size Sub-Second
Language Support Any JVM-languages, Ruby, Python, Javascript, Perl Scala, JAVA, Python, R Scala, JAVA JVM-languages only
Processing Model one-at-a-time micro-batch / batch one-at-a-time
Backpressure O O X
Stream Source Spouts Receivers ConSumers
Stream Primitive Tuple Dstream Message
Stream Computation Bolts Transformations Window operations Tasks

๊ฒฐ๋ก 

Spark๋Š” Latency ์—์„œ ๋‹ค์†Œ ๋Š๋ฆฐ ์ ์ด ์žˆ์ง€๋งŒ, exactly-once ์ „์†ก์„ ์ œ๊ณตํ•˜๊ณ  ๋‹ค์–‘ํ•œ ์›Œํฌ๋กœ๋“œ ์ปดํฌ๋„ŒํŠธ๋ฅผ ์ œ๊ณตํ•œ๋‹ค.

Storm์€ Latency ์—์„œ ์œ ๋ฆฌํ•˜์ง€๋งŒ, Trident๋ฅผ ์ ์šฉํ•ด์•ผ exactly-once ์ „์†ก์„ ์ œ๊ณตํ•˜๋ฉฐ ์ƒํƒœ๊ด€๋ฆฌ๊ธฐ๋Šฅ๋„ Trident๋ฅผ ํ†ตํ•ด์„œ ์ ์šฉํ•˜๊ฑฐ๋‚˜ ์ž์ฒด๊ฐœ๋ฐœ์„ ํ•ด์•ผ ํ•œ๋‹ค.

Samza๋Š” Latency ์—์„œ ์œ ๋ฆฌํ•˜์ง€๋งŒ, ์•„์ง ๋งˆ์ด๋„ˆ๋ฒ„์ „์ด๊ณ  At Least Once ์ „์†ก๋งŒ์„ ์ œ๊ณตํ•˜๊ณ  ์žˆ๋‹ค.

๊ฒ€์ƒ‰ ํŒจํ‚ค์ง€ ๋น„๊ต

Feature Solr Elastic Search
Community & Developers Apache Software Foundation and community support Single commercial entity and its employees
Node Discovery Apache Zookeeper, mature and battle-tested in a large number of projects Zen, built into Elasticsearch itself, requires dedicated master nodes to be split brain proof
Shard Placement Static in nature, requires manual work to migrate shards, starting from Solr 7 โ€“ Autoscaling API allows for some dynamic actions Dynamic, shards can be moved on demand depending on the cluster state
Caches Global, invalidated with each segment change Per segment, better for dynamically changing data
Analytics Engine Facets and powerful streaming aggregations Sophisticated and highly flexible aggregations
Optimized Query Execution Currently none Faster range queries depending on the context
Search Speed Best for static data, because of caches and uninverted reader Very good for rapidly changing data, because of per-segment caches
Analysis Engine Performance Great for static data with exact calculations Exactness of the results depends on data placement
Full Text Search Features Language analysis based on Lucene, multiple suggesters, spell checkers, rich highlighting support Language analysis based on Lucene, single suggest API implementation, highlighting rescoring
DevOps Friendliness Not fully there yet, but coming Very good APIs
Non-flat Data Handling Nested documents and parent-child support Natural support with nested and object types allowing for virtually endless nesting and parent-child support
QueryDSL JSON (limited), XML (limited) or URL parameters JSON
Index/Collection Leader Control Leader placement control and leader rebalancing possibility to even the load on the nodes Not possible
Join Currently none Parent_type/Children_type
Machine Learning Built-in โ€“ on top of streaming aggregations focused on logistic regression and learning to rank contrib module Commercial feature, focused on anomalies and outliers and time-series data
Ecosystem Modest โ€“ Banana, Zeppelin with community support Rich โ€“ Kibana, Grafana, with large entities support and big user base
RDBMS Ingestion dataimportHandler logstash

๋ฐ์ดํ„ฐ ์ˆ˜์ง‘ ํŒจํ‚ค์ง€ ๋น„๊ต

Feature Flume Fluentd Log Stash Sqoop
์–ธ์–ด Java Ruby Ruby & Java Java
์ž…๋ ฅ Avro Source Dummy Input Beats Input FTP Connector
Exec Source Exec Input Elastic Search Input Hbase Connector
HTTP Source Forward Input Exec Input HDFS Connector
JMS Source HTTP Input File Input JDBC Connector
Kafka Source Monitor Agent Input HTTP Input Kafka Connector
NetCat TCP Source Syslog Input IRC Input Kite Connector
NetCat UDP Source Tail Input JDBC Input
Scribe Source TCP Input JMS Input
Sequence Generator Source UDP Input Kafka Input
Spooling Directory Source Windows Eventlog Input Log4J Input
Stress Source Source Pipe Input
Syslog Source Rabbit MQ Input
Taildir Source Redis Input
Thrift Source S3 Input
Twitter 1% firehose Source Stdin Input
STOMP Input
Syslog Input
TCP Input
UDP Input
XMPP Input
์ถœ๋ ฅ Avro Sink Copy Output CSV Output Accumulo Connector
Elastic Search Sink Elastic Search Output Elastic Search Output FTP Connector
File Roll Sink Exec Filter Output Email Output HBase Connector
HBase Sink Exec Output Exec Output HDFS Connector
HDFS Sink File Output File Output HIVE Connector
Hive Sink Forward Output HTTP Output JDBC Connector
HTTP Sink Mongo Output InfluxDB Output Kafka Connector
IRC Sink Mongo Replset Output IRC Output Kate Connector
Kafka Sink Null Option Kafka Output
Kite Dataset Sink Relabel Ouput Mongo DB Output
Logger Sink Rewrite Tag Filter Output Nagios Output
MorphlineSolr Sink Round Robin Output Open TSDB Output
Null Sink S3 Output Pipi Output
Thrift Sink Stdout Output Rabbit MQ Output
WebHDFS Output Redis Output
S3 Output
Solr Http Output
Stdout Output
STOMP Output
Syslog Output
TCP Output
UDP Output
Web HDFS Output
XMPP Output
๋ฒ„ํผ Memory Channel Memory Buffer Memory Queue
JDBC Channel File Buffer Presistent Queue
Kafka Channel
File Channel
Spillable Memory Channel
Pseudo Transaction Channel
๊ตฌ์„ฑ Single
Multi-Agent Flow
Consolidation
Multiplexing

๋ฉ”์„ธ์ง€ ์ „์†ก ํŒจํ‚ค์ง€ ๋น„๊ต

Feature Kafka Rabbit MQ
ํด๋ผ์ด์–ธํŠธ C / C++ Clojure
Python Erlang
Go (AKA golang) Haskell
Erlang Perl
.Net Scala
Clojure Java
Ruby Python
Node JS Ruby
Proxy (HTTP REST, etc) PHP
Perl Swift
Stdin / Stdout .Net (C#)
PHP Objective-C
Rust JS
Alternative Java Go
Storm Elixir
Scala DSL
Swift
๊ธฐ๋ณธ ๋ฉ”์‹œ์ง• ์ฒ˜๋ฆฌ ๋ฐฉ์‹ Topic ๋ฐฉ์‹ (๋ฐœํ–‰ - ๊ตฌ๋…) Queue ๋ฐฉ์‹ (๋ฐœ์ƒ - ์†Œ๋น„)
Consumer Group ์„ค์ •ํ•ด์„œ Queue ๋ฐฉ์‹ ์‚ฌ์šฉ MQTT ๋ฅผ ํ†ตํ•ด์„œ Topic ๋ฐฉ์‹ ์‚ฌ์šฉ
ํด๋ผ์ด์–ธํŠธ ๋™์ž‘ Client Pull Server Push
์ง€์› ํ”„๋กœํ† ์ฝœ TCP AMQP
MQTT
STOMP
์ €์žฅ์†Œ ํŒŒ์ผ ๋ฉ”๋ชจ๋ฆฌ / ํŒŒ์ผ

๊ฒฐ๋ก 

Kafka๋Š” IO ๋ฐฉ์‹๊ณผ ํ”„๋กœํ† ์ฝœ์˜ ์˜ค๋ฒ„ ํ—ค๋“œ๊ฐ€ ์ ์–ด ์ŠคํŠธ๋ฆฌ๋ฐ ํ˜•ํƒœ์˜ ๋ฐ์ดํ„ฐ ์ „์†ก์— ์ ํ•ฉ

Rabbbit MQ๋Š” ํ‘œ์ค€ ํ”„๋กœํ† ์ฝœ์„ ์ œ๊ณตํ•˜๊ณ  ํ์— ๋ฐ์ดํ„ฐ๊ฐ€ ์œ ์ž…๋˜๋ฉด ์ด๋ฒคํŠธ๊ฐ€ ๋ฐœํ–‰ํ•ด์„œ ํด๋ผ์ด์–ธํŠธ์— ๋ฐ์ดํ„ฐ๋ฅผ ์ „๋‹ฌํ•˜๋Š” ์ด๋ฒคํŠธ ์ฒ˜๋ฆฌ์— ์ ํ•ฉ ํ•˜๋‹ค.