on 04-21-2014 3:37 PM
I am trying to read data from Hadoop (Hortonworks). Is there any documentation or guide about this subject?
I am seeing File/Hadoop csv,xml,json adapters available in SP4 but I could not find the way to connect to Hadoop.
Thanks
See the ESP adapters guide. That should give you the info you need (I hope).
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Jeff, thanks for quick response:)
I have read File/Hadoop CSV Input Adapter properties but I could not understand completely.
Actually I am not so familiar with the Hadoop but I will test Hadoop Adapter on customer site. Therefore I am searching how can I configure the integration.
On the below screen (file/hadoop csv input adapter) there are Directory and File fields. I am expecting IP and user etc. fields for hadoop.
Hello,
It looks like they did not document the Hadoop specific information for the managed adapters. They only put this information into the unmanaged adapters. I have logged a documentation bug for this:
762524 - Hadoop support not documented on managed adapters
See if the Hadoop specific information about the 'Dir' parameter for the unmanaged adapters gives you the information you need to proceed:
File/Hadoop CSV Output Adapter Configuration
Thanks,
Neal
Hello,
The hdfs URL you show would seem correct. Were there any clues in the "esp_server.log" for your project? See the following section on how to find the "esp_server.log" file:
Thanks,
Neal
Hi Neal,
Thank you for your suggestion, but it seems there were no issues according to the log. All it says is:
2014-04-29 16:09:28.205 | 9368 | container | [SP-4-108062] (187.932) sp(2036) GatewayClient::GatewayClient(9368:141) host:[<host>] has initiated a connection.
2014-04-29 16:09:29.130 | 9368 | container | [SP-4-108008] (188.857) sp(2036) GatewayClient(9368:141)::execute() Client closed/dropped connection.
2014-04-29 16:09:29.142 | 9368 | container | [SP-4-108001] (188.869) sp(2036) GatewayClient(9368:141)::~GatewayClient() destroyed (auto).
When I set the CSV/Hadoop adapter URL to some local folder, ESP writes the output file just fine.
Best regards,
Michael
Hi Michael,
Based on what I see in the docs for File/Hadoop CSV Input Adapter Configuration, maybe try eliminating the user and password from the uri you are using.
From:
hdfs://user:password@host:9000/path
To:
hdfs://host:9000/path
Thanks,
Alice
Here's what the docs say:
To use Hadoop system files, use an HDFS folder uri instead of a local file system folder. For example,hdfs://<hdfsserver>:9000/<foldername>/<subfoldername>/<leaffoldername>.
To use Hadoop, download the binaries for Hadoop version 1.2.1 fromhttp://hadoop.apache.org. Copy the hadoop-core.jar file (for example, for version 1.2.1hadoop-core-1.2.1.jar) to %ESP_HOME%\adapters\framework\libj. Ensure you use a stable version rather than a beta.
Use a forward slash for both UNIX and Windows paths.
Hi Alice,
I actually tried this before, but it did not work for me. Using the Java API I could confirm that authentication is really enabled and unauthorized connections are refused, so that there is no way for ESP to connect to Hadoop without magically guessing the user name
Best regards,
Michael
Hello,
I should have had you look at a different log. In ESP there are three logs:
In my testing I could see some errors in the "frameworkadapter.log". However, sometimes exceptions that are thrown to "stderr" only show up in stdstreams.log.
I've been doing some testing and I finally have it working. First thing I should mention is it doesn't look like Hadoop itself accepts the user name/password in the URL:
% hadoop fs -put test.txt hdfs://hadoopUser:Password1@servername.acme.com:9000/user/hadoop/data/test.txt
put: Permission denied: user=nstack, access=WRITE, inode="/user/hadoop/data":hadoop:supergroup:drwxr-xr-x
But there is a not-very-well-known environment variable that you can set:
% setenv HADOOP_USER_NAME hadoopUser
% hadoop fs -put test.txt hdfs://servername.acme.com:9000/user/hadoop/data/test.txt
This is essentially equivalent to setting it in your Java program:
System.setProperty("HADOOP_USER_NAME", "hadoopUser");
You can edit $ESP_HOME/adapters/framework/bin/start.sh and set the Hadoop user name there:
HADOOP_USER_NAME=hadoopUser;export HADOOP_USER_NAME
For version 2.2.0, copy these files over to %ESP_HOME%\adapters\framework\libj:
hadoop-common-2.2.0.jar
hadoop-auth-2.2.0.jar
hadoop-hdfs-2.2.0.jar
guava-11.0.2.jar
protobuf-java-2.5.0.jar
commons-cli-1.2.jar (SP04 requires but SP08 already has it)
NOTE: There is a problem with ESP 5.1 SP04. When running the Hadoop Output Adapter inside a project (managed mode), I could not stop it. This was using the Hadoop 2.2.0 JAR files, I did not test Hadoop 1.2.1 because it can't communicate with Hadoop 2.2.0.
This was not a problem with the upcoming release of ESP 5.1 SP08. Nor was it a problem with SP04 when running the adapter in unmanaged mode (started and stopped manually from outside the project).
So if you wish to use ESP 5.1 SP04, run the adapter in unmanaged mode:
1) Edit $ESP_HOME/adapters/framework/bin/start.sh and set the Hadoop user name there:
HADOOP_USER_NAME=hadoop;export HADOOP_USER_NAME
2) Make a copy of the file output (and/or input) adapter:
cp -Rf $ESP_HOME/adapters/framework/instances/file_csv_output /tmp
cd /tmp/file_csv_output
3) Edit the copy's adapter_config.xml file and change it so that it can run in unmanaged mode:
a) Uncomment these lines and change the "StreamName" element to the stream in the project that the adapter should subscribe to. Leave "ProjectName" as is:
<ProjectName>EspProject2</ProjectName>
<StreamName>MyStream</StreamName>
b) Change the "Dir" and "File" elements belonging to "FileOutputTransporterParameters":
<Dir>hdfs://servername.acme.com:9000/user/hadoop/data</Dir>
<File>neal_test.csv</File>
c) Change the "Uri" element for the project "EspProject2" to point to your ESP project:
<Uri>esp://esp_server_name.acme.com:51011/default/hadoop_test</Uri>
d) Change the "User" and "Password" so the adapter can connect to the ESP project:
<User>espadm</User>
<Password encrypted="false">Password1</Password>
e) Start the adapter:
./start_adapter.sh
d) Stop the adapter:
./stop_adapter.sh
Thanks,
Neal
Hi,
I checked frameworkadapter.log and I found this error:
04-07-2016 12:23:15.338 INFO [main] (Framework.main) start C:\SAP_ESP_5.1\ESP-5_1/adapters/framework/instances/file_csv_output/adapter_config.xml
04-07-2016 12:23:15.366 ERROR [main] (Shell.getWinUtilsPath) Failed to locate the winutils binary in the hadoop binary path
java.io.IOException: Could not locate executable null\bin\winutils.exe in the Hadoop binaries.
at org.apache.hadoop.util.Shell.getQualifiedBinPath(Shell.java:356)
at org.apache.hadoop.util.Shell.getWinUtilsPath(Shell.java:371)
at org.apache.hadoop.util.Shell.<clinit>(Shell.java:364)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:260)
at com.sybase.esp.adapter.framework.internal.AdapterController.<clinit>(AdapterController.java:74)
at com.sybase.esp.adapter.framework.internal.Adapter.<init>(Adapter.java:60)
at com.sybase.esp.adapter.framework.Framework.main(Framework.java:50)
I have installed Hadoop on different server, not localhost.
Do you know how to fix this?
Thank you,
Jan
Hi Jan,
This seems to be an issue with Hadoop and Windows. See
http://stackoverflow.com/questions/18630019/running-apache-hadoop-2-1-0-on-windows
Thanks,
Alice
User | Count |
---|---|
87 | |
10 | |
10 | |
10 | |
7 | |
6 | |
6 | |
5 | |
5 | |
4 |
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.