h2. Introduction
All the bloggers in SDN face a common problem of formatting
the Weblog content
before posting it into SDN. Formatting is mostly removing
the unnecessary tags and having only the allowed tags in SDN
Weblogs which is a
painful task. To avoid this painful task, I had come up with a small solution which I thought of sharing with you all fellow bloggers through this blog.
You can also find an equivalent ABAP program by Brain McKellar (https://www.sdn.sap.com/irj/servlet/prt/portal/prtroot/com.sap.sdn.businesscard.sdnbusinesscard?u=s1...) in his weblog,
The 1-2-3 Steps To Producing a Weblogimport java.io.File;
import java.io.FileNotFoundException;
import java.io.FileOutputStream;
import java.io.IOException;
import java.util.Vector;
import org.cyberneko.html.parsers.DOMParser;
import org.w3c.dom.NamedNodeMap;
import org.w3c.dom.Node;
import org.xml.sax.SAXException;
/*
*/
public class FormatSDNWeblog {
static Vector vTags = null;
static Vector noEndTags = null;
static boolean _flag = false;
static FileOutputStream fos = null;
static String outputFile;
public static void main(String[] args) {
try {
String inputFile = "weblog.htm";
if (args.length >= 1)
inputFile = args[0];
else {
System.out.println("Usage: java FormatSDNWeblog <html-file>");
System.exit(-1);
}
outputFile = "sdn_" + inputFile;
/*
*/
String validTags[] = { "p", "b", "i", "em", "strong", "code", "tt",
"br", "a", "sub", "sup", "ul", "ol", "li", "pre", "img",
"blockquote", "small", "div", "hr", "h2", "h3", "h4", "h5",
"table", "tr", "td", "th", "center", "textarea", "a" };
// No End Tags
noEndTags = new Vector();
noEndTags.add("img");
noEndTags.add("br");
noEndTags.add("hr");
vTags = new Vector();
for (int i = 0; i < validTags.length; i++)
vTags.add(validTags[i]);
DOMParser parser = new DOMParser();
fos = new FileOutputStream(new File(outputFile));
parser.parse(inputFile);
// A recursive function which does the stripping of unnecessary tags
SDNParser(parser.getDocument(), "");
fos.close();
System.out
.println("Filtered html successfully converted into SDN Weblog Content as "
+ outputFile + "!");
} catch (FileNotFoundException e) {
System.out.println(e.getMessage());
} catch (SAXException e) {
System.out.println(e.getMessage());
} catch (IOException e) {
System.out.println(e.getMessage());
}
}
public static void SDNParser(Node node, String intend) throws IOException {
String _node = "";
Node ch = null;
// To check if the current node is a TAG
if (node.getNodeType() == 1) {
_node = node.getNodeName();
// To remove unnecessary
if (_node.equalsIgnoreCase("P")) {
ch = node.getFirstChild();
if (ch.getNodeType() == 3
&& (int) ch.getNodeValue().charAt(0) == 160)
return;
}
// To check if the current TAG is a valid one
if (vTags.contains(_node.toLowerCase())) {
_flag = true;
fos.write(("
" + intend + "<" + _node).getBytes());
// Iterating through the attributes of the current node
NamedNodeMap a = node.getAttributes();
if (a != null) {
for (int i = 0; i < a.getLength(); i++) {
// Removing the 'class' attribute which might be found
// in the valid allowed TAGS
if (a.item(i).getNodeName().toLowerCase().startsWith(
"class", 0))
continue;
// Removing the 'style' attribute which might be found
// in the valid allowed TAGS
if (a.item(i).getNodeName().toLowerCase().startsWith(
"style", 0))
continue;
fos.write((" " + a.item(i)).getBytes());
}
fos.write(">".getBytes());
}
} else
_flag = false;
for (Node child = node.getFirstChild(); child != null; child = child
.getNextSibling())
// Recursive call to it's child node
SDNParser(child, intend + " ");
// Ending the tag
if (vTags.contains(_node.toLowerCase())
&& !noEndTags.contains(_node.toLowerCase()))
fos.write(("
" + intend + "</" + _node + ">").getBytes());
} else {
// Else part is of text and isn't any TAG
// To check if it is the root document
if (node.getNodeType() != 9)
if (_flag)
fos.write(node.getNodeValue().trim().getBytes());
for (Node child = node.getFirstChild(); child != null; child = child
.getNextSibling())
SDNParser(child, intend + " ");
}
}
}
The above code contains all the valid tags allowed in SDN
Weblogs.
Ref: SDN: Weblogs and Formatting! By Craig
Cmehil SDN: Weblogs and Formatting!
Step-by-Step Demo
After completing the blog, click Save As from the word document
Select Web Page, Filtered
Click Yes to save it in html
When we open the saved html in Notepad, there are many
unnecessary tags and attributes.
Execute the command(s)
javac -classpath
nekohtml.jar;xercesImpl.jar;xmlParserAPIs.jar
FormatSDNWeblog.java
java -classpath
nekohtml.jar;xercesImpl.jar;xmlParserAPIs.jar;. FormatSDNWeblog
weblog.htm
Make sure you have javac.exe & java.exe in path.
The generated output file containing only the allowed tags by SDN
!https://weblogs.sdn.sap.com/weblogs/images/41443/sdnimage006.jpg|height=361|alt=image|width=531|src=...!</body>