Additional Blogs by Members
cancel
Showing results for 
Search instead for 
Did you mean: 
Former Member
0 Kudos

h2. Introduction

All the bloggers in SDN face a common problem of formatting

the Weblog content

before posting it into SDN. Formatting is mostly removing

the unnecessary tags and having only the allowed tags in SDN

Weblogs which is a

painful task. To avoid this painful task, I had come up with a small solution which I thought of sharing with you all fellow bloggers through this blog.

You can also find an equivalent ABAP program by Brain McKellar  (https://www.sdn.sap.com/irj/servlet/prt/portal/prtroot/com.sap.sdn.businesscard.sdnbusinesscard?u=s1...) in his weblog,

The 1-2-3 Steps To Producing a Weblog

import java.io.File;

import java.io.FileNotFoundException;

import java.io.FileOutputStream;

import java.io.IOException;

import java.util.Vector;

import org.cyberneko.html.parsers.DOMParser;

import org.w3c.dom.NamedNodeMap;

import org.w3c.dom.Node;

import org.xml.sax.SAXException;

/*

  • @author: Felix Jeyareuben, Cognizant Technology Solutions

*/

public class FormatSDNWeblog {

     static Vector vTags = null;

     static Vector noEndTags = null;

     static boolean _flag = false;

     static FileOutputStream fos = null;

     static String outputFile;

     public static void main(String[] args) {

          try {

               String inputFile = "weblog.htm";

               if (args.length >= 1)

                    inputFile = args[0];

               else {

                    System.out.println("Usage: java FormatSDNWeblog <html-file>");

                    System.exit(-1);

               }

               outputFile = "sdn_" + inputFile;

               /*

  • Ref: SDN: Weblogs and Formatting! By Craig Cmehil

                */

               String validTags[] = { "p", "b", "i", "em", "strong", "code", "tt",

                         "br", "a", "sub", "sup", "ul", "ol", "li", "pre", "img",

                         "blockquote", "small", "div", "hr", "h2", "h3", "h4", "h5",

                         "table", "tr", "td", "th", "center", "textarea", "a" };

               // No End Tags

               noEndTags = new Vector();

               noEndTags.add("img");

               noEndTags.add("br");

               noEndTags.add("hr");

               vTags = new Vector();

               for (int i = 0; i < validTags.length; i++)

vTags.add(validTags[i]);

DOMParser parser = new DOMParser();

fos = new FileOutputStream(new File(outputFile));

parser.parse(inputFile);

// A recursive function which does the stripping of unnecessary tags

SDNParser(parser.getDocument(), "");

fos.close();

System.out

.println("Filtered html successfully converted into SDN Weblog Content as "

+ outputFile + "!");

} catch (FileNotFoundException e) {

System.out.println(e.getMessage());

} catch (SAXException e) {

System.out.println(e.getMessage());

} catch (IOException e) {

System.out.println(e.getMessage());

}

}

public static void SDNParser(Node node, String intend) throws IOException {

String _node = "";

Node ch = null;

// To check if the current node is a TAG

if (node.getNodeType() == 1) {

_node = node.getNodeName();

// To remove unnecessary

if (_node.equalsIgnoreCase("P")) {

ch = node.getFirstChild();

if (ch.getNodeType() == 3

&& (int) ch.getNodeValue().charAt(0) == 160)

return;

}

// To check if the current TAG is a valid one

if (vTags.contains(_node.toLowerCase())) {

_flag = true;

fos.write(("
" + intend + "<" + _node).getBytes());

// Iterating through the attributes of the current node

NamedNodeMap a = node.getAttributes();

if (a != null) {

for (int i = 0; i < a.getLength(); i++) {

// Removing the 'class' attribute which might be found

// in the valid allowed TAGS

if (a.item(i).getNodeName().toLowerCase().startsWith(

"class", 0))

continue;

// Removing the 'style' attribute which might be found

// in the valid allowed TAGS

if (a.item(i).getNodeName().toLowerCase().startsWith(

"style", 0))

continue;

fos.write((" " + a.item(i)).getBytes());

}

fos.write(">".getBytes());

}

} else

_flag = false;

for (Node child = node.getFirstChild(); child != null; child = child

.getNextSibling())

// Recursive call to it's child node

SDNParser(child, intend + "     ");

// Ending the tag

if (vTags.contains(_node.toLowerCase())

&& !noEndTags.contains(_node.toLowerCase()))

fos.write(("
" + intend + "</" + _node + ">").getBytes());

} else {

// Else part is of text and isn't any TAG

// To check if it is the root document

if (node.getNodeType() != 9)

if (_flag)

fos.write(node.getNodeValue().trim().getBytes());

for (Node child = node.getFirstChild(); child != null; child = child

.getNextSibling())

SDNParser(child, intend + "     ");

}

}

}

The above code contains all the valid tags allowed in SDN

Weblogs.

Ref: SDN: Weblogs and Formatting! By Craig

Cmehil SDN: Weblogs and Formatting!

Step-by-Step Demo

After completing the blog, click Save As from the word document

Select Web Page, Filtered

Click Yes to save it in html

When we open the saved html in Notepad, there are many

unnecessary tags and attributes.

Execute the command(s)

javac -classpath

nekohtml.jar;xercesImpl.jar;xmlParserAPIs.jar

FormatSDNWeblog.java

java -classpath

nekohtml.jar;xercesImpl.jar;xmlParserAPIs.jar;. FormatSDNWeblog

weblog.htm

Make sure you have javac.exe & java.exe in path.

The generated output file containing only the allowed tags by SDN

!https://weblogs.sdn.sap.com/weblogs/images/41443/sdnimage006.jpg|height=361|alt=image|width=531|src=...!</body>

4 Comments