Analytics Engine
์์คํ ๊ตฌ์ฑ๋
๋ฐ์ดํฐ์ฒ๋ฆฌ ํจํค์ง ๋น๊ต
Feature | Strom | Spark | Samza |
---|---|---|---|
Delivery Semantics | At Least Once Exactly-Once with Trident | Exactly Once Except in some failure scenarios | At Least Once |
State Management | Stateless Roll your own or use Trident | Stateful Writes state to storage (HDFS) | Stateful Embeded key-value store |
Latency | Sub-Second | Seconds Depending on batch size | Sub-Second |
Language Support | Any JVM-languages, Ruby, Python, Javascript, Perl | Scala, JAVA, Python, R | Scala, JAVA JVM-languages only |
Processing Model | one-at-a-time | micro-batch / batch | one-at-a-time |
Backpressure | O | O | X |
Stream Source | Spouts | Receivers | ConSumers |
Stream Primitive | Tuple | Dstream | Message |
Stream Computation | Bolts | Transformations Window operations | Tasks |
๊ฒฐ๋ก
Spark๋ Latency ์์ ๋ค์ ๋๋ฆฐ ์ ์ด ์์ง๋ง, exactly-once ์ ์ก์ ์ ๊ณตํ๊ณ ๋ค์ํ ์ํฌ๋ก๋ ์ปดํฌ๋ํธ๋ฅผ ์ ๊ณตํ๋ค.
Storm์ Latency ์์ ์ ๋ฆฌํ์ง๋ง, Trident๋ฅผ ์ ์ฉํด์ผ exactly-once ์ ์ก์ ์ ๊ณตํ๋ฉฐ ์ํ๊ด๋ฆฌ๊ธฐ๋ฅ๋ Trident๋ฅผ ํตํด์ ์ ์ฉํ๊ฑฐ๋ ์์ฒด๊ฐ๋ฐ์ ํด์ผ ํ๋ค.
Samza๋ Latency ์์ ์ ๋ฆฌํ์ง๋ง, ์์ง ๋ง์ด๋๋ฒ์ ์ด๊ณ At Least Once ์ ์ก๋ง์ ์ ๊ณตํ๊ณ ์๋ค.
๊ฒ์ ํจํค์ง ๋น๊ต
Feature | Solr | Elastic Search |
---|---|---|
Community & Developers | Apache Software Foundation and community support | Single commercial entity and its employees |
Node Discovery | Apache Zookeeper, mature and battle-tested in a large number of projects | Zen, built into Elasticsearch itself, requires dedicated master nodes to be split brain proof |
Shard Placement | Static in nature, requires manual work to migrate shards, starting from Solr 7 โ Autoscaling API allows for some dynamic actions | Dynamic, shards can be moved on demand depending on the cluster state |
Caches | Global, invalidated with each segment change | Per segment, better for dynamically changing data |
Analytics Engine | Facets and powerful streaming aggregations | Sophisticated and highly flexible aggregations |
Optimized Query Execution | Currently none | Faster range queries depending on the context |
Search Speed | Best for static data, because of caches and uninverted reader | Very good for rapidly changing data, because of per-segment caches |
Analysis Engine Performance | Great for static data with exact calculations | Exactness of the results depends on data placement |
Full Text Search Features | Language analysis based on Lucene, multiple suggesters, spell checkers, rich highlighting support | Language analysis based on Lucene, single suggest API implementation, highlighting rescoring |
DevOps Friendliness | Not fully there yet, but coming | Very good APIs |
Non-flat Data Handling | Nested documents and parent-child support | Natural support with nested and object types allowing for virtually endless nesting and parent-child support |
QueryDSL | JSON (limited), XML (limited) or URL parameters | JSON |
Index/Collection Leader Control | Leader placement control and leader rebalancing possibility to even the load on the nodes | Not possible |
Join | Currently none | Parent_type/Children_type |
Machine Learning | Built-in โ on top of streaming aggregations focused on logistic regression and learning to rank contrib module | Commercial feature, focused on anomalies and outliers and time-series data |
Ecosystem | Modest โ Banana, Zeppelin with community support | Rich โ Kibana, Grafana, with large entities support and big user base |
RDBMS Ingestion | dataimportHandler | logstash |
๋ฐ์ดํฐ ์์ง ํจํค์ง ๋น๊ต
Feature | Flume | Fluentd | Log Stash | Sqoop |
---|---|---|---|---|
์ธ์ด | Java | Ruby | Ruby & Java | Java |
์ ๋ ฅ | Avro Source | Dummy Input | Beats Input | FTP Connector |
Exec Source | Exec Input | Elastic Search Input | Hbase Connector | |
HTTP Source | Forward Input | Exec Input | HDFS Connector | |
JMS Source | HTTP Input | File Input | JDBC Connector | |
Kafka Source | Monitor Agent Input | HTTP Input | Kafka Connector | |
NetCat TCP Source | Syslog Input | IRC Input | Kite Connector | |
NetCat UDP Source | Tail Input | JDBC Input | ||
Scribe Source | TCP Input | JMS Input | ||
Sequence Generator Source | UDP Input | Kafka Input | ||
Spooling Directory Source | Windows Eventlog Input | Log4J Input | ||
Stress Source Source | Pipe Input | |||
Syslog Source | Rabbit MQ Input | |||
Taildir Source | Redis Input | |||
Thrift Source | S3 Input | |||
Twitter 1% firehose Source | Stdin Input | |||
STOMP Input | ||||
Syslog Input | ||||
TCP Input | ||||
UDP Input | ||||
XMPP Input | ||||
์ถ๋ ฅ | Avro Sink | Copy Output | CSV Output | Accumulo Connector |
Elastic Search Sink | Elastic Search Output | Elastic Search Output | FTP Connector | |
File Roll Sink | Exec Filter Output | Email Output | HBase Connector | |
HBase Sink | Exec Output | Exec Output | HDFS Connector | |
HDFS Sink | File Output | File Output | HIVE Connector | |
Hive Sink | Forward Output | HTTP Output | JDBC Connector | |
HTTP Sink | Mongo Output | InfluxDB Output | Kafka Connector | |
IRC Sink | Mongo Replset Output | IRC Output | Kate Connector | |
Kafka Sink | Null Option | Kafka Output | ||
Kite Dataset Sink | Relabel Ouput | Mongo DB Output | ||
Logger Sink | Rewrite Tag Filter Output | Nagios Output | ||
MorphlineSolr Sink | Round Robin Output | Open TSDB Output | ||
Null Sink | S3 Output | Pipi Output | ||
Thrift Sink | Stdout Output | Rabbit MQ Output | ||
WebHDFS Output | Redis Output | |||
S3 Output | ||||
Solr Http Output | ||||
Stdout Output | ||||
STOMP Output | ||||
Syslog Output | ||||
TCP Output | ||||
UDP Output | ||||
Web HDFS Output | ||||
XMPP Output | ||||
๋ฒํผ | Memory Channel | Memory Buffer | Memory Queue | |
JDBC Channel | File Buffer | Presistent Queue | ||
Kafka Channel | ||||
File Channel | ||||
Spillable Memory Channel | ||||
Pseudo Transaction Channel | ||||
๊ตฌ์ฑ | Single | |||
Multi-Agent Flow | ||||
Consolidation | ||||
Multiplexing |
๋ฉ์ธ์ง ์ ์ก ํจํค์ง ๋น๊ต
Feature | Kafka | Rabbit MQ |
---|---|---|
ํด๋ผ์ด์ธํธ | C / C++ | Clojure |
Python | Erlang | |
Go (AKA golang) | Haskell | |
Erlang | Perl | |
.Net | Scala | |
Clojure | Java | |
Ruby | Python | |
Node JS | Ruby | |
Proxy (HTTP REST, etc) | PHP | |
Perl | Swift | |
Stdin / Stdout | .Net (C#) | |
PHP | Objective-C | |
Rust | JS | |
Alternative Java | Go | |
Storm | Elixir | |
Scala DSL | ||
Swift | ||
๊ธฐ๋ณธ ๋ฉ์์ง ์ฒ๋ฆฌ ๋ฐฉ์ | Topic ๋ฐฉ์ (๋ฐํ - ๊ตฌ๋ ) | Queue ๋ฐฉ์ (๋ฐ์ - ์๋น) |
Consumer Group ์ค์ ํด์ Queue ๋ฐฉ์ ์ฌ์ฉ | MQTT ๋ฅผ ํตํด์ Topic ๋ฐฉ์ ์ฌ์ฉ | |
ํด๋ผ์ด์ธํธ ๋์ | Client Pull | Server Push |
์ง์ ํ๋กํ ์ฝ | TCP | AMQP |
MQTT | ||
STOMP | ||
์ ์ฅ์ | ํ์ผ | ๋ฉ๋ชจ๋ฆฌ / ํ์ผ |
๊ฒฐ๋ก
Kafka๋ IO ๋ฐฉ์๊ณผ ํ๋กํ ์ฝ์ ์ค๋ฒ ํค๋๊ฐ ์ ์ด ์คํธ๋ฆฌ๋ฐ ํํ์ ๋ฐ์ดํฐ ์ ์ก์ ์ ํฉ
Rabbbit MQ๋ ํ์ค ํ๋กํ ์ฝ์ ์ ๊ณตํ๊ณ ํ์ ๋ฐ์ดํฐ๊ฐ ์ ์ ๋๋ฉด ์ด๋ฒคํธ๊ฐ ๋ฐํํด์ ํด๋ผ์ด์ธํธ์ ๋ฐ์ดํฐ๋ฅผ ์ ๋ฌํ๋ ์ด๋ฒคํธ ์ฒ๋ฆฌ์ ์ ํฉ ํ๋ค.