cancel
Showing results for 
Search instead for 
Did you mean: 

Hadoop Adaptor Usage

Former Member
0 Kudos

I am trying to read data from Hadoop (Hortonworks). Is there any documentation or guide about this subject?

I am seeing File/Hadoop csv,xml,json adapters available in SP4 but I could not find the way to connect to Hadoop.

Thanks

Accepted Solutions (0)

Answers (1)

Answers (1)

JWootton
Advisor
Advisor
0 Kudos

See the ESP adapters guide.  That should give you the info you need (I hope).

Former Member
0 Kudos

Jeff, thanks for quick response:)

I have read File/Hadoop CSV Input Adapter properties but I could not understand completely.

Actually I am not so familiar with the Hadoop but I will test Hadoop Adapter on customer site. Therefore I am searching how can I configure the integration.

On the below screen (file/hadoop csv input adapter) there are Directory and File fields. I am expecting IP and user etc. fields for hadoop.

Former Member
0 Kudos

Hello,

It looks like they did not document the Hadoop specific information for the managed adapters.  They only put this information into the unmanaged adapters.  I have logged a documentation bug for this:

   762524 - Hadoop support not documented on managed adapters

See if the Hadoop specific information about the 'Dir' parameter for the unmanaged adapters gives you the information you need to proceed:

   File/Hadoop CSV Output Adapter Configuration

Thanks,

  Neal

michael_jess
Participant
0 Kudos

Hi Jeff,

the docs do not seem to cover the Hadoop authentication. I tried hdfs://user:password@host:9000/path, but this does not seem to do the trick. Is there some option I missed?

Thanks,

Michael

Former Member
0 Kudos


Hello,

The hdfs URL you show would seem correct.  Were there any clues in the "esp_server.log" for your project?  See the following section on how to find the "esp_server.log" file:

   SyBooks Online

Thanks,

  Neal

michael_jess
Participant
0 Kudos

Hi Neal,

Thank you for your suggestion, but it seems there were no issues according to the log. All it says is:

2014-04-29 16:09:28.205 | 9368 | container | [SP-4-108062] (187.932) sp(2036) GatewayClient::GatewayClient(9368:141) host:[<host>] has initiated a connection.

2014-04-29 16:09:29.130 | 9368 | container | [SP-4-108008] (188.857) sp(2036) GatewayClient(9368:141)::execute() Client closed/dropped connection.

2014-04-29 16:09:29.142 | 9368 | container | [SP-4-108001] (188.869) sp(2036) GatewayClient(9368:141)::~GatewayClient() destroyed (auto).

When I set the CSV/Hadoop adapter URL to some local folder, ESP writes the output file just fine.

Best regards,

Michael

former_member217348
Participant
0 Kudos

Hi Michael,

Based on what I see in the docs for File/Hadoop CSV Input Adapter Configuration, maybe try eliminating the user and password from the uri you are using.

From:

hdfs://user:password@host:9000/path

To:

hdfs://host:9000/path

Thanks,

Alice


Here's what the docs say:

To use Hadoop system files, use an HDFS folder uri instead of a local file system folder. For example,hdfs://<hdfsserver>:9000/<foldername>/<subfoldername>/<leaffoldername>.

To use Hadoop, download the binaries for Hadoop version 1.2.1 fromhttp://hadoop.apache.org. Copy the hadoop-core.jar file (for example, for version 1.2.1hadoop-core-1.2.1.jar) to %ESP_HOME%\adapters\framework\libj. Ensure you use a stable version rather than a beta.

Use a forward slash for both UNIX and Windows paths.

michael_jess
Participant
0 Kudos

Hi Alice,

I actually tried this before, but it did not work for me. Using the Java API I could confirm that authentication is really enabled and unauthorized connections are refused, so that there is no way for ESP to connect to Hadoop without magically guessing the user name

Best regards,

Michael

Former Member
0 Kudos

Hello,

I should have had you look at a different log.  In ESP there are three logs:

  1. stdstreams.log - This log contains licensing information/errors.
  2. esp_server.log - This log contains information/errors about the project and certain adapters
  3. frameworkadapter.log - This log contains information/errors about certain adapters (including the Hadoop adapter).

In my testing I could see some errors in the "frameworkadapter.log".  However, sometimes exceptions that are thrown to "stderr" only show up in stdstreams.log.


I've been doing some testing and I finally have it working.  First thing I should mention is it doesn't look like Hadoop itself accepts the user name/password in the URL:


% hadoop fs -put test.txt hdfs://hadoopUser:Password1@servername.acme.com:9000/user/hadoop/data/test.txt
put: Permission denied: user=nstack, access=WRITE, inode="/user/hadoop/data":hadoop:supergroup:drwxr-xr-x


But there is a not-very-well-known environment variable that you can set:

% setenv HADOOP_USER_NAME hadoopUser
% hadoop fs -put test.txt hdfs://servername.acme.com:9000/user/hadoop/data/test.txt

This is essentially equivalent to setting it in your Java program:
System.setProperty("HADOOP_USER_NAME", "hadoopUser");

You can edit $ESP_HOME/adapters/framework/bin/start.sh and set the Hadoop user name there:
HADOOP_USER_NAME=hadoopUser;export HADOOP_USER_NAME

For version 2.2.0, copy these files over to %ESP_HOME%\adapters\framework\libj:
hadoop-common-2.2.0.jar
hadoop-auth-2.2.0.jar
hadoop-hdfs-2.2.0.jar
guava-11.0.2.jar
protobuf-java-2.5.0.jar
commons-cli-1.2.jar (SP04 requires but SP08 already has it)

NOTE: There is a problem with ESP 5.1 SP04.  When running the Hadoop Output Adapter inside a project (managed mode), I could not stop it.  This was using the Hadoop 2.2.0 JAR files, I did not test Hadoop 1.2.1 because it can't communicate with Hadoop 2.2.0.

This was not a problem with the upcoming release of ESP 5.1 SP08.  Nor was it a problem with SP04 when running the adapter in unmanaged mode (started and stopped manually from outside the project).

So if you wish to use ESP 5.1 SP04, run the adapter in unmanaged mode:

1) Edit $ESP_HOME/adapters/framework/bin/start.sh and set the Hadoop user name there:
HADOOP_USER_NAME=hadoop;export HADOOP_USER_NAME

2) Make a copy of the file output (and/or input) adapter:
cp -Rf $ESP_HOME/adapters/framework/instances/file_csv_output /tmp
cd /tmp/file_csv_output

3) Edit the copy's adapter_config.xml file and change it so that it can run in unmanaged mode:
a) Uncomment these lines and change the "StreamName" element to the stream in the project that the adapter should subscribe to.  Leave "ProjectName" as is:
  <ProjectName>EspProject2</ProjectName>
  <StreamName>MyStream</StreamName>
b) Change the "Dir" and "File" elements belonging to "FileOutputTransporterParameters":
  <Dir>hdfs://servername.acme.com:9000/user/hadoop/data</Dir>
  <File>neal_test.csv</File>
c) Change the "Uri" element for the project "EspProject2" to point to your ESP project:
  <Uri>esp://esp_server_name.acme.com:51011/default/hadoop_test</Uri>
d) Change the "User" and "Password" so the adapter can connect to the ESP project:
  <User>espadm</User>
  <Password encrypted="false">Password1</Password>
e) Start the adapter:
  ./start_adapter.sh
d) Stop the adapter:
  ./stop_adapter.sh

Thanks,
  Neal

Former Member
0 Kudos

Hi,

I checked frameworkadapter.log and I found this error:

04-07-2016 12:23:15.338 INFO [main] (Framework.main) start C:\SAP_ESP_5.1\ESP-5_1/adapters/framework/instances/file_csv_output/adapter_config.xml

04-07-2016 12:23:15.366 ERROR [main] (Shell.getWinUtilsPath) Failed to locate the winutils binary in the hadoop binary path

java.io.IOException: Could not locate executable null\bin\winutils.exe in the Hadoop binaries.

    at org.apache.hadoop.util.Shell.getQualifiedBinPath(Shell.java:356)

    at org.apache.hadoop.util.Shell.getWinUtilsPath(Shell.java:371)

    at org.apache.hadoop.util.Shell.<clinit>(Shell.java:364)

    at java.lang.Class.forName0(Native Method)

    at java.lang.Class.forName(Class.java:260)

    at com.sybase.esp.adapter.framework.internal.AdapterController.<clinit>(AdapterController.java:74)

    at com.sybase.esp.adapter.framework.internal.Adapter.<init>(Adapter.java:60)

    at com.sybase.esp.adapter.framework.Framework.main(Framework.java:50)

I have installed Hadoop on different server, not localhost.

Do you know how to fix this?

Thank you,

Jan

Former Member
0 Kudos

Hi Alice,

thank you for your reply. I read that, but isn't that problem on adapter side? I don't have any installation of Hadoop on my computer. Maybe put the files on the server Hadoop configuration?

I'm little bit confused, because I'v got this error before trying to connect.

Thank you,

Jan