Analytics Engine
์์คํ ๊ตฌ์ฑ๋


๋ฐ์ดํฐ์ฒ๋ฆฌ ํจํค์ง ๋น๊ต
| Feature | Strom | Spark | Samza |
|---|---|---|---|
| Delivery Semantics | At Least Once Exactly-Once with Trident | Exactly Once Except in some failure scenarios | At Least Once |
| State Management | Stateless Roll your own or use Trident | Stateful Writes state to storage (HDFS) | Stateful Embeded key-value store |
| Latency | Sub-Second | Seconds Depending on batch size | Sub-Second |
| Language Support | Any JVM-languages, Ruby, Python, Javascript, Perl | Scala, JAVA, Python, R | Scala, JAVA JVM-languages only |
| Processing Model | one-at-a-time | micro-batch / batch | one-at-a-time |
| Backpressure | O | O | X |
| Stream Source | Spouts | Receivers | ConSumers |
| Stream Primitive | Tuple | Dstream | Message |
| Stream Computation | Bolts | Transformations Window operations | Tasks |
๊ฒฐ๋ก
Spark๋ Latency ์์ ๋ค์ ๋๋ฆฐ ์ ์ด ์์ง๋ง, exactly-once ์ ์ก์ ์ ๊ณตํ๊ณ ๋ค์ํ ์ํฌ๋ก๋ ์ปดํฌ๋ํธ๋ฅผ ์ ๊ณตํ๋ค.
Storm์ Latency ์์ ์ ๋ฆฌํ์ง๋ง, Trident๋ฅผ ์ ์ฉํด์ผ exactly-once ์ ์ก์ ์ ๊ณตํ๋ฉฐ ์ํ๊ด๋ฆฌ๊ธฐ๋ฅ๋ Trident๋ฅผ ํตํด์ ์ ์ฉํ๊ฑฐ๋ ์์ฒด๊ฐ๋ฐ์ ํด์ผ ํ๋ค.
Samza๋ Latency ์์ ์ ๋ฆฌํ์ง๋ง, ์์ง ๋ง์ด๋๋ฒ์ ์ด๊ณ At Least Once ์ ์ก๋ง์ ์ ๊ณตํ๊ณ ์๋ค.
๊ฒ์ ํจํค์ง ๋น๊ต
| Feature | Solr | Elastic Search |
|---|---|---|
| Community & Developers | Apache Software Foundation and community support | Single commercial entity and its employees |
| Node Discovery | Apache Zookeeper, mature and battle-tested in a large number of projects | Zen, built into Elasticsearch itself, requires dedicated master nodes to be split brain proof |
| Shard Placement | Static in nature, requires manual work to migrate shards, starting from Solr 7 โ Autoscaling API allows for some dynamic actions | Dynamic, shards can be moved on demand depending on the cluster state |
| Caches | Global, invalidated with each segment change | Per segment, better for dynamically changing data |
| Analytics Engine | Facets and powerful streaming aggregations | Sophisticated and highly flexible aggregations |
| Optimized Query Execution | Currently none | Faster range queries depending on the context |
| Search Speed | Best for static data, because of caches and uninverted reader | Very good for rapidly changing data, because of per-segment caches |
| Analysis Engine Performance | Great for static data with exact calculations | Exactness of the results depends on data placement |
| Full Text Search Features | Language analysis based on Lucene, multiple suggesters, spell checkers, rich highlighting support | Language analysis based on Lucene, single suggest API implementation, highlighting rescoring |
| DevOps Friendliness | Not fully there yet, but coming | Very good APIs |
| Non-flat Data Handling | Nested documents and parent-child support | Natural support with nested and object types allowing for virtually endless nesting and parent-child support |
| QueryDSL | JSON (limited), XML (limited) or URL parameters | JSON |
| Index/Collection Leader Control | Leader placement control and leader rebalancing possibility to even the load on the nodes | Not possible |
| Join | Currently none | Parent_type/Children_type |
| Machine Learning | Built-in โ on top of streaming aggregations focused on logistic regression and learning to rank contrib module | Commercial feature, focused on anomalies and outliers and time-series data |
| Ecosystem | Modest โ Banana, Zeppelin with community support | Rich โ Kibana, Grafana, with large entities support and big user base |
| RDBMS Ingestion | dataimportHandler | logstash |
๋ฐ์ดํฐ ์์ง ํจํค์ง ๋น๊ต
| Feature | Flume | Fluentd | Log Stash | Sqoop |
|---|---|---|---|---|
| ์ธ์ด | Java | Ruby | Ruby & Java | Java |
| ์ ๋ ฅ | Avro Source | Dummy Input | Beats Input | FTP Connector |
| Exec Source | Exec Input | Elastic Search Input | Hbase Connector | |
| HTTP Source | Forward Input | Exec Input | HDFS Connector | |
| JMS Source | HTTP Input | File Input | JDBC Connector | |
| Kafka Source | Monitor Agent Input | HTTP Input | Kafka Connector | |
| NetCat TCP Source | Syslog Input | IRC Input | Kite Connector | |
| NetCat UDP Source | Tail Input | JDBC Input | ||
| Scribe Source | TCP Input | JMS Input | ||
| Sequence Generator Source | UDP Input | Kafka Input | ||
| Spooling Directory Source | Windows Eventlog Input | Log4J Input | ||
| Stress Source Source | Pipe Input | |||
| Syslog Source | Rabbit MQ Input | |||
| Taildir Source | Redis Input | |||
| Thrift Source | S3 Input | |||
| Twitter 1% firehose Source | Stdin Input | |||
| STOMP Input | ||||
| Syslog Input | ||||
| TCP Input | ||||
| UDP Input | ||||
| XMPP Input | ||||
| ์ถ๋ ฅ | Avro Sink | Copy Output | CSV Output | Accumulo Connector |
| Elastic Search Sink | Elastic Search Output | Elastic Search Output | FTP Connector | |
| File Roll Sink | Exec Filter Output | Email Output | HBase Connector | |
| HBase Sink | Exec Output | Exec Output | HDFS Connector | |
| HDFS Sink | File Output | File Output | HIVE Connector | |
| Hive Sink | Forward Output | HTTP Output | JDBC Connector | |
| HTTP Sink | Mongo Output | InfluxDB Output | Kafka Connector | |
| IRC Sink | Mongo Replset Output | IRC Output | Kate Connector | |
| Kafka Sink | Null Option | Kafka Output | ||
| Kite Dataset Sink | Relabel Ouput | Mongo DB Output | ||
| Logger Sink | Rewrite Tag Filter Output | Nagios Output | ||
| MorphlineSolr Sink | Round Robin Output | Open TSDB Output | ||
| Null Sink | S3 Output | Pipi Output | ||
| Thrift Sink | Stdout Output | Rabbit MQ Output | ||
| WebHDFS Output | Redis Output | |||
| S3 Output | ||||
| Solr Http Output | ||||
| Stdout Output | ||||
| STOMP Output | ||||
| Syslog Output | ||||
| TCP Output | ||||
| UDP Output | ||||
| Web HDFS Output | ||||
| XMPP Output | ||||
| ๋ฒํผ | Memory Channel | Memory Buffer | Memory Queue | |
| JDBC Channel | File Buffer | Presistent Queue | ||
| Kafka Channel | ||||
| File Channel | ||||
| Spillable Memory Channel | ||||
| Pseudo Transaction Channel | ||||
| ๊ตฌ์ฑ | Single | |||
| Multi-Agent Flow | ||||
| Consolidation | ||||
| Multiplexing |
๋ฉ์ธ์ง ์ ์ก ํจํค์ง ๋น๊ต
| Feature | Kafka | Rabbit MQ |
|---|---|---|
| ํด๋ผ์ด์ธํธ | C / C++ | Clojure |
| Python | Erlang | |
| Go (AKA golang) | Haskell | |
| Erlang | Perl | |
| .Net | Scala | |
| Clojure | Java | |
| Ruby | Python | |
| Node JS | Ruby | |
| Proxy (HTTP REST, etc) | PHP | |
| Perl | Swift | |
| Stdin / Stdout | .Net (C#) | |
| PHP | Objective-C | |
| Rust | JS | |
| Alternative Java | Go | |
| Storm | Elixir | |
| Scala DSL | ||
| Swift | ||
| ๊ธฐ๋ณธ ๋ฉ์์ง ์ฒ๋ฆฌ ๋ฐฉ์ | Topic ๋ฐฉ์ (๋ฐํ - ๊ตฌ๋ ) | Queue ๋ฐฉ์ (๋ฐ์ - ์๋น) |
| Consumer Group ์ค์ ํด์ Queue ๋ฐฉ์ ์ฌ์ฉ | MQTT ๋ฅผ ํตํด์ Topic ๋ฐฉ์ ์ฌ์ฉ | |
| ํด๋ผ์ด์ธํธ ๋์ | Client Pull | Server Push |
| ์ง์ ํ๋กํ ์ฝ | TCP | AMQP |
| MQTT | ||
| STOMP | ||
| ์ ์ฅ์ | ํ์ผ | ๋ฉ๋ชจ๋ฆฌ / ํ์ผ |
๊ฒฐ๋ก
Kafka๋ IO ๋ฐฉ์๊ณผ ํ๋กํ ์ฝ์ ์ค๋ฒ ํค๋๊ฐ ์ ์ด ์คํธ๋ฆฌ๋ฐ ํํ์ ๋ฐ์ดํฐ ์ ์ก์ ์ ํฉ
Rabbbit MQ๋ ํ์ค ํ๋กํ ์ฝ์ ์ ๊ณตํ๊ณ ํ์ ๋ฐ์ดํฐ๊ฐ ์ ์ ๋๋ฉด ์ด๋ฒคํธ๊ฐ ๋ฐํํด์ ํด๋ผ์ด์ธํธ์ ๋ฐ์ดํฐ๋ฅผ ์ ๋ฌํ๋ ์ด๋ฒคํธ ์ฒ๋ฆฌ์ ์ ํฉ ํ๋ค.