Skip to content

Commit eda64c0

Browse files
Flume TwitterAgent.conf and demo02
1 parent 0471782 commit eda64c0

File tree

2 files changed

+245
-0
lines changed

2 files changed

+245
-0
lines changed
Lines changed: 200 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,200 @@
1+
{
2+
"cells": [
3+
{
4+
"cell_type": "markdown",
5+
"metadata": {},
6+
"source": [
7+
"# Criando uma App no Twitter\n",
8+
"\n",
9+
"## Acessar o endereço abaixo e criar uma App: https://apps.twitter.com/\n",
10+
"\n",
11+
"Criar login, senha e logar\n",
12+
"\n",
13+
"Criar uma nova App clicando em Create New App\n",
14+
"\n",
15+
"Definir os detalhes da aplicação: nome, descrição, website, etc\n",
16+
"\n",
17+
"**No menu \"Keys and Tokens\" gerar as chaves da App para usar na configuração do Flume e substituir no arquivo twitterAgent.conf abaixo.**"
18+
]
19+
},
20+
{
21+
"cell_type": "markdown",
22+
"metadata": {},
23+
"source": [
24+
"# Configurando o agent, source, channel e sink\n",
25+
"\n",
26+
"**Agent**: Apenas um agente chamado *TwitterAgent*\n",
27+
"\n",
28+
"**Source**: Twitter\n",
29+
"\n",
30+
"**Channel**: Memória\n",
31+
"\n",
32+
"**Sink**: Registra os dados no HDFS"
33+
]
34+
},
35+
{
36+
"cell_type": "code",
37+
"execution_count": 1,
38+
"metadata": {},
39+
"outputs": [
40+
{
41+
"name": "stdout",
42+
"output_type": "stream",
43+
"text": [
44+
"#Nome dos componentes do agente\n",
45+
"TwitterAgent.sources = Twitter\n",
46+
"TwitterAgent.channels = MemChannel\n",
47+
"TwitterAgent.sinks = HDFS\n",
48+
"\n",
49+
"#Configuração do Source\n",
50+
"TwitterAgent.sources.Twitter.type = com.cloudera.flume.source.TwitterSource\n",
51+
"TwitterAgent.sources.Twitter.consumerKey = 6BWSmQX6AUfKhcNsMeav9zhi2\n",
52+
"TwitterAgent.sources.Twitter.consumerSecret = DcYHM3EFR5oJR7VEq8cBVtjTPQxftI9PMrST71P7oXW0BlGiZv\n",
53+
"TwitterAgent.sources.Twitter.accessToken = 1046705580-MTYNfMbLL6XSyQQgeL3Sah9RejwDRK5caBO9GRZ\n",
54+
"TwitterAgent.sources.Twitter.accessTokenSecret = 2FRFyHQEdAIFVCFgTxyxCvl4zqoNtTEMZuwJHCfhXW2jk\n",
55+
"TwitterAgent.sources.Twitter.keywords = #hadoop, #flume, #bigdata\n",
56+
"\n",
57+
"#Configuração do Sink\n",
58+
"TwitterAgent.sinks.HDFS.type = hdfs\n",
59+
"TwitterAgent.sinks.HDFS.hdfs.path = hdfs://localhost:9000/user/matheus\n",
60+
"TwitterAgent.sinks.HDFS.hdfs.fileType = DataStream\n",
61+
"TwitterAgent.sinks.HDFS.hdfs.writeFormat = Text\n",
62+
"\n",
63+
"#number of events written to file before it is flushed to HDFS\n",
64+
"TwitterAgent.sinks.HDFS.hdfs.batchSize = 50 \n",
65+
"\n",
66+
"#File size to trigger roll, in bytes (0: never roll based on file size)\n",
67+
"TwitterAgent.sinks.HDFS.hdfs.rollSize = 0\n",
68+
"\n",
69+
"#Number of events written to file before it rolled (0 = never roll based on number of events)\n",
70+
"TwitterAgent.sinks.HDFS.hdfs.rollCount = 50\n",
71+
"\n",
72+
"#Number of seconds to wait before rolling current file (0 = never roll based on time interval)\n",
73+
"TwitterAgent.sinks.HDFS.hdfs.rollInterval = 0\n",
74+
"\n",
75+
"#Configuração do Channel\n",
76+
"TwitterAgent.channels.MemChannel.type = memory\n",
77+
"#The maximum number of events stored in the channel\n",
78+
"TwitterAgent.channels.MemChannel.capacity = 100\n",
79+
"#The maximum number of events the channel will take from a source or give to a sink per transaction\n",
80+
"TwitterAgent.channels.MemChannel.transactionCapacity = 100\n",
81+
"\n",
82+
"#Conectando Source, Sink, Channel\n",
83+
"TwitterAgent.sources.Twitter.channels = MemChannel\n",
84+
"TwitterAgent.sinks.HDFS.channel = MemChannel\n",
85+
"\n",
86+
"\n",
87+
"#https://developer.twitter.com/en/docs/tweets/data-dictionary/overview/intro-to-tweet-json\n",
88+
"#https://developer.twitter.com/en/docs/tweets/data-dictionary/overview/tweet-object"
89+
]
90+
}
91+
],
92+
"source": [
93+
"!cat /home/jovyan/labs/lab6-flume/twitterAgent.conf"
94+
]
95+
},
96+
{
97+
"cell_type": "markdown",
98+
"metadata": {},
99+
"source": [
100+
"## Nota: \n",
101+
"Usando as configurações descritas no site do Flume - **TwitterAgent.sources.Twitter.type = org.apache.flume.source.twitter.TwitterSource** - na configuração do source, o arquivo é gerado com caracteres ilegíveis. Assim, iremos utilizar **TwitterAgent.sources.Twitter.type = com.cloudera.flume.source.TwitterSource** e temos que copiar o arquivo flume-sources-1.0-SNAPSHOT.jar para a pasta do Flume para o seu correto funcionamento."
102+
]
103+
},
104+
{
105+
"cell_type": "code",
106+
"execution_count": 2,
107+
"metadata": {},
108+
"outputs": [],
109+
"source": [
110+
"!cp resources/flume-sources-1.0-SNAPSHOT.jar ~/resources/local/flume-${FLUME_VERSION}/lib"
111+
]
112+
},
113+
{
114+
"cell_type": "markdown",
115+
"metadata": {},
116+
"source": [
117+
"# Abrir um terminal e iniciar o FlumeAgent\n",
118+
"\n",
119+
"``` bash\n",
120+
"flume-ng agent --conf conf --conf-file labs/lab6-flume/twitterAgent.conf --name TwitterAgent -Dflume.looger=INFO,console\n",
121+
"``` "
122+
]
123+
},
124+
{
125+
"cell_type": "markdown",
126+
"metadata": {},
127+
"source": [
128+
"# TrendTopics\n",
129+
"Criando um rank de palavras que contém #"
130+
]
131+
},
132+
{
133+
"cell_type": "code",
134+
"execution_count": 3,
135+
"metadata": {},
136+
"outputs": [
137+
{
138+
"name": "stdout",
139+
"output_type": "stream",
140+
"text": [
141+
"No configs found; falling back on auto-configuration\n",
142+
"No configs specified for hadoop runner\n",
143+
"Looking for hadoop binary in /home/jovyan/resources/local/hadoop-2.9.2/bin...\n",
144+
"Found hadoop binary: /home/jovyan/resources/local/hadoop-2.9.2/bin/hadoop\n",
145+
"Using Hadoop version 2.9.2\n",
146+
"Creating temp directory /tmp/mrjob-ex-3.jovyan.20190913.175842.269968\n",
147+
"uploading working dir files to hdfs:///user/jovyan/tmp/mrjob/mrjob-ex-3.jovyan.20190913.175842.269968/files/wd...\n",
148+
"Copying other local files to hdfs:///user/jovyan/tmp/mrjob/mrjob-ex-3.jovyan.20190913.175842.269968/files/\n",
149+
"Running step 1 of 2...\n",
150+
" packageJobJar: [/tmp/hadoop-unjar4483334449690778934/] [] /tmp/streamjob4300404582507966699.jar tmpDir=null\n",
151+
" Connecting to ResourceManager at /0.0.0.0:8032\n",
152+
" Connecting to ResourceManager at /0.0.0.0:8032\n",
153+
" Total input files to process : 4\n",
154+
" Cleaning up the staging area /tmp/hadoop-yarn/staging/jovyan/.staging/job_1568395552187_0004\n",
155+
" Error Launching job : Not a file: hdfs://localhost:9000/user/matheus/output/output8\n",
156+
" Streaming Command Failed!\n",
157+
"Attempting to fetch counters from logs...\n",
158+
"Can't fetch history log; missing job ID\n",
159+
"No counters found\n",
160+
"Scanning logs for probable cause of failure...\n",
161+
"Can't fetch history log; missing job ID\n",
162+
"Can't fetch task logs; missing application ID\n",
163+
"Step 1 of 2 failed: Command '['/home/jovyan/resources/local/hadoop-2.9.2/bin/hadoop', 'jar', '/home/jovyan/resources/local/hadoop-2.9.2/share/hadoop/tools/lib/hadoop-streaming-2.9.2.jar', '-files', 'hdfs:///user/jovyan/tmp/mrjob/mrjob-ex-3.jovyan.20190913.175842.269968/files/wd/mrjob-ex-3.py#mrjob-ex-3.py,hdfs:///user/jovyan/tmp/mrjob/mrjob-ex-3.jovyan.20190913.175842.269968/files/wd/mrjob.zip#mrjob.zip,hdfs:///user/jovyan/tmp/mrjob/mrjob-ex-3.jovyan.20190913.175842.269968/files/wd/setup-wrapper.sh#setup-wrapper.sh', '-input', 'hdfs:///user/matheus/*', '-output', 'hdfs:///user/jovyan/tmp/mrjob/mrjob-ex-3.jovyan.20190913.175842.269968/step-output/0000', '-mapper', '/bin/sh -ex setup-wrapper.sh python3 mrjob-ex-3.py --step-num=0 --mapper', '-reducer', '/bin/sh -ex setup-wrapper.sh python3 mrjob-ex-3.py --step-num=0 --reducer']' returned non-zero exit status 1280.\n"
164+
]
165+
}
166+
],
167+
"source": [
168+
"!python resources/mrjob-ex-3.py -r hadoop --hadoop-streaming-jar $HADOOP_HOME/share/hadoop/tools/lib/hadoop-streaming-2.9.2.jar hdfs:///user/matheus/*"
169+
]
170+
},
171+
{
172+
"cell_type": "code",
173+
"execution_count": null,
174+
"metadata": {},
175+
"outputs": [],
176+
"source": []
177+
}
178+
],
179+
"metadata": {
180+
"kernelspec": {
181+
"display_name": "Python 3",
182+
"language": "python",
183+
"name": "python3"
184+
},
185+
"language_info": {
186+
"codemirror_mode": {
187+
"name": "ipython",
188+
"version": 3
189+
},
190+
"file_extension": ".py",
191+
"mimetype": "text/x-python",
192+
"name": "python",
193+
"nbconvert_exporter": "python",
194+
"pygments_lexer": "ipython3",
195+
"version": "3.7.3"
196+
}
197+
},
198+
"nbformat": 4,
199+
"nbformat_minor": 2
200+
}

labs/lab6-flume/twitterAgent.conf

Lines changed: 45 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,45 @@
1+
#Nome dos componentes do agente
2+
TwitterAgent.sources = Twitter
3+
TwitterAgent.channels = MemChannel
4+
TwitterAgent.sinks = HDFS
5+
6+
#Configuração do Source
7+
TwitterAgent.sources.Twitter.type = com.cloudera.flume.source.TwitterSource
8+
TwitterAgent.sources.Twitter.consumerKey = <consumerKey>
9+
TwitterAgent.sources.Twitter.consumerSecret = <consumerSecret>
10+
TwitterAgent.sources.Twitter.accessToken = <accessToken>
11+
TwitterAgent.sources.Twitter.accessTokenSecret = <accessTokenSecret>
12+
TwitterAgent.sources.Twitter.keywords = #hadoop, #flume, #bigdata
13+
14+
#Configuração do Sink
15+
TwitterAgent.sinks.HDFS.type = hdfs
16+
TwitterAgent.sinks.HDFS.hdfs.path = hdfs://localhost:9000/user/matheus
17+
TwitterAgent.sinks.HDFS.hdfs.fileType = DataStream
18+
TwitterAgent.sinks.HDFS.hdfs.writeFormat = Text
19+
20+
#number of events written to file before it is flushed to HDFS
21+
TwitterAgent.sinks.HDFS.hdfs.batchSize = 50
22+
23+
#File size to trigger roll, in bytes (0: never roll based on file size)
24+
TwitterAgent.sinks.HDFS.hdfs.rollSize = 0
25+
26+
#Number of events written to file before it rolled (0 = never roll based on number of events)
27+
TwitterAgent.sinks.HDFS.hdfs.rollCount = 50
28+
29+
#Number of seconds to wait before rolling current file (0 = never roll based on time interval)
30+
TwitterAgent.sinks.HDFS.hdfs.rollInterval = 0
31+
32+
#Configuração do Channel
33+
TwitterAgent.channels.MemChannel.type = memory
34+
#The maximum number of events stored in the channel
35+
TwitterAgent.channels.MemChannel.capacity = 100
36+
#The maximum number of events the channel will take from a source or give to a sink per transaction
37+
TwitterAgent.channels.MemChannel.transactionCapacity = 100
38+
39+
#Conectando Source, Sink, Channel
40+
TwitterAgent.sources.Twitter.channels = MemChannel
41+
TwitterAgent.sinks.HDFS.channel = MemChannel
42+
43+
44+
#https://developer.twitter.com/en/docs/tweets/data-dictionary/overview/intro-to-tweet-json
45+
#https://developer.twitter.com/en/docs/tweets/data-dictionary/overview/tweet-object

0 commit comments

Comments
 (0)