.NET实现Script标记清除功能代码

浏览数：30 / 时间：2015年06月09日

当今网页中经常使用到网页编辑器，因为人们需要在网页中插入图片，视频，样式等html代码内容，这使得网页的信息更加丰富。随之而来的，也给程序开发者带来了不少麻烦，因为提交的html中难免会出现不安全标记和非法标记，比如script，比如未知标签。这需要我们编写大量的程序代码去分析指定用户提交的html信息安全性，标准性。

方法1：

今天我要给大家推荐一个组件，他可以智能的分析出代码的出错部份和清除出错部份，并且配置比较简单。他的名字叫SafeHelper，通过配置文件设定的标记外，他将清楚和检查出不允许出现的标记。使用方法相当简单，只需要调用一个静态方法即可。

第一步，新建一个文件名为“wuxiu.HtmlAnalyserConfig.xml”的xml文件到网站跟目录，并添写以下代码：

<?xml version="1.0" encoding="utf-8" ?>
<HtmlAnylyser >
  <AllowTags>
    <div attrs="class|style"/>
    <ul attrs="class"/>
    <li/>
    <table attrs="class|cellpadding|cellspacing|border|width"/>
    <tr attrs="class"/>
    <th attrs="class"/>
    <td attrs="class"/>
    <span attrs="style|class"/>
    <object attrs="classid|codebase|width|height"/>
    <param attrs="name|value"/>
    <embed attrs="src|width|height|quality|pluginspage|type|wmode"/>
    <a attrs="href|target|title"/>
    <h1 attrs="class"/>
    <h2 attrs="class"/>
    <h3 attrs="class"/>
    <h4 attrs="class"/>
    <h5 attrs="class"/>
    <h6 attrs="class"/>
    <strong attrs="class"/>
    <b attrs="class"/>
    <i attrs="class"/>
    <em attrs="class"/>
    <u attrs="class"/>
    <hr attrs="class"/>
    <br attrs="class"/>
    <img attrs="class|src|width|height|alt"/>
    <p attrs="class"/>
    <ol attrs="class"/>
    <dl attrs="class"/>
    <dt attrs="class"/>
    <dd attrs="class"/>
  </AllowTags>
</HtmlAnylyser>

第二步，添加dll引用，safehelper官网：http://www.wuxiu.org/downloads.html

第三步,调用如下代码可以实现对html中未知标记清除（wuxiu.HtmlAnalyserConfig.xml中未定义的所有标记）：

string html = "<script>alert(‘yes‘);</script><p>content</p>";
html = wuxiu.SafeHelper.HtmlSafer.HtmlSaferAnalyser.ToSafeHtml(html);
Response.Write(html);

或检查所有未知标记

string html = "<script>alert(‘yes‘);</script><p>myhtmlcontent</p>";
string [] dangers = wuxiu.SafeHelper.HtmlSafer.HtmlSaferAnalyser.ValidHtml(html,false);
foreach (string danger_tag in dangers)
{
    Response.Write(danger_tag+"<br/>");
}

清除所有Html标记

string text=wuxiu.SafeHelper.HtmlSafer.HtmlSaferAnalyser.ClearHtmlTags("<p>hello world</p>");

方法二，通过正则表达式匹配出script危险标记：

public static string StripHTML(string strHtml)
{
    string[]aryReg =
    {
      @"<script[^>]*?>.*?</script>",


      @"<(\/\s*)?!?((\w+:)?\w+)(\w+(\s*=?\s*(([""‘])(\\["
        "‘tbnr]|[^\7])*?\7|\w+)|.{0})|\s)*?(\/\s*)?>", @"([\r\n])[\s]+", @
        "&(quot|#34);", @"&(amp|#38);", @"&(lt|#60);", @"&(gt|#62);", @
        "&(nbsp|#160);", @"&(iexcl|#161);", @"&(cent|#162);", @"&(pound|#163);",
        @"&(copy|#169);", @"&#(\d+);", @"-->", @"<!--.*\n"
    };


    string[]aryRep =
    {
      "", "", "", "\"", "&", "<", ">", "   ", "\xa1", //chr(161),
      "\xa2", //chr(162),
      "\xa3", //chr(163),
      "\xa9", //chr(169),
      "", "\r\n", ""
    };


    string newReg = aryReg[0];
    string strOutput = strHtml;
    for (int i = 0; i < aryReg.Length; i++)
    {
      Regex regex = new Regex(aryReg[i], RegexOptions.IgnoreCase);
      strOutput = regex.Replace(strOutput, aryRep[i]);
    }
    strOutput.Replace("<", "");
    strOutput.Replace(">", "");
    strOutput.Replace("\r\n", "");
    return strOutput;
}

.NET实现Script标记清除功能代码,古老的榕树,5-wow.com

郑重声明：本站内容如果来自互联网及其他传播媒体，其版权均属原媒体及文章作者所有。转载目的在于传递更多信息及用于网络分享，并不代表本站赞同其观点和对其真实性负责，也不构成任何其他建议。

.NET实现Script标记清除功能代码