用Cheerio实现html字符串替换

html字符串相当复杂,一是要求不能破坏html标签和结构,二是仅仅在必要的文本中替换,保留如pre,code内的内容尤其是嵌套的标签,所以对此类问题的处理尽量用操作html/dom/xml的库来处理,谨慎使用正则表达式,我为processwire写过一个插件用来处理关键词和内链的替换,当时使用的是php-dom这个库来把文档解析出来处理,但是在<pre>标签中嵌套的<span>内链的样式的内容,同样出了bug:在代码高亮部分依旧是把关键词提出来加上了链接,由于年代久远了也懒得更新了。

下面是我用cheerio实现的JavaScript版本,在filter中把需要排除的html标签和class添加进去即可。

import * as cheerio from 'cheerio';

const html = `<div><img alt="使用符合gdpr和ccpa法规的Cookie声明FastifyFastify" class="transition duration-500 group-hover:scale-105 h-64 object-cover object-top w-full" height="667" src="https://zlapi.zhuli.eu/assets/b90187d0-1bec-43b5-a642-c080f8ee47f7?width=320&amp;height=213&amp;fit=cover" width="1000" loading="lazy" sizes="(min-width: 66em) 33vw,
      (min-width: 44em) 50vw,
      100vw" srcset="https://zlapi.zhuli.eu/assets/b90187d0-1bec-43b5-a642-c080f8ee47f7?width=320&amp;height=213&amp;fit=cover 300w, https://zlapi.zhuli.eu/assets/b90187d0-1bec-43b5-a642-c080f8ee47f7?width=480&amp;height=320&amp;fit=cover 600w, https://zlapi.zhuli.eu/assets/b90187d0-1bec-43b5-a642-c080f8ee47f7?width=1000&amp;height=667&amp;fit=cover 1200w"><p>使用符合GDPR(通用数据保护条例)和CCPA(加利福尼亚消费者隐私法)hello world<a href="https://towait.com/Fastify" class="Fastify">Fastify</a>法规的Cookie声明是为了保护用户的隐私权和数据保护。</p><p><a href="https://gdpr-info.eu/" rel="noopener" target="_blank">GDPR</a>是欧盟制定的法规,旨在保护个人数据的隐私和安全。根据GDPR,Fastify网站需要明确告知用户哪些个人数据将被收集,为什么收集,以及如何使用这些数据。Cookie声明是一种透明的方式,Hello world允许网站告知用户它们使用的Cookie类型和目的,并为用户提供选择是否同意使用这些Cookie。这确保了用户具有知情权和控制权,可以自主决定是否接受Cookie的使用。</p><p>同样,<a href="https://oag.ca.gov/privacy/ccpa" rel="noopener" target="_blank">CCPA</a>是美国加利福尼亚州的法规,旨在保护消费者的隐私权。根据CCPA,网站必须告知用户其收集和共享的个人数据类型,hello world并为用户提供选择是否允许这些数据的销售。Cookie声明是满足CCPA要求的一种方式,使用户能够了解他们的数据如何被使用,并控制他们的个人信息。</p>
      <h1>什么是Fastify?</h1>
      <p>因此,使用符合GDPR和CCPA法规的Cookie声明可以帮助网站遵守相关的隐私法规,并增强用户对其隐私权的保护。这些声明提供了透明度和选择权,<code>test</code>使用户能够更好地掌握自己的个人数据,<code>Fastify,fastify</code>并决定是否愿意分享这些数据。</p><pre class="sh hljs bash" data-pbcklang="sh" data-pbcktabsize="4">  fastify.register(require(<span class="hljs-string">'@fastify/swagger'</span>), {
    swagger: {
      info: {
        title: <span class="hljs-string">'Test swagger'</span>,
        description: <span class="hljs-string">'Testing the Fastify swagger API'</span>,
        version: <span class="hljs-string">'0.1.0'</span>
      },
      externalDocs: {
        url: <span class="hljs-string">'https://swagger.io'</span>,
        description: <span class="hljs-string">'Find more info here'</span>
      },
      host: <span class="hljs-string">'127.0.0.1:3000'</span>,
      schemes: [<span class="hljs-string">'http'</span>, <span class="hljs-string">'https'</span>],
      consumes: [<span class="hljs-string">'application/json'</span>],
      produces: [<span class="hljs-string">'application/json'</span>],
      tags: [
        { name: <span class="hljs-string">'user'</span>, description: <span class="hljs-string">'User related end-points'</span> },
        { name: <span class="hljs-string">'code'</span>, description: <span class="hljs-string">'Code related end-points'</span> }
      ],
      definitions: {
        User: {
          <span class="hljs-built_in">type</span>: <span class="hljs-string">'object'</span>,
          required: [<span class="hljs-string">'id'</span>, <span class="hljs-string">'email'</span>],
          properties: {
            id: { <span class="hljs-built_in">type</span>: <span class="hljs-string">'string'</span>, format: <span class="hljs-string">'uuid'</span> },
            firstName: { <span class="hljs-built_in">type</span>: <span class="hljs-string">'string'</span> },
            lastName: { <span class="hljs-built_in">type</span>: <span class="hljs-string">'string'</span> },
            email: {<span class="hljs-built_in">type</span>: <span class="hljs-string">'string'</span>, format: <span class="hljs-string">'email'</span> }
          }
        }
      },
      securityDefinitions: {
        apiKey: {
          <span class="hljs-built_in">type</span>: <span class="hljs-string">'apiKey'</span>,
          name: <span class="hljs-string">'apiKey'</span>,
          <span class="hljs-keyword">in</span>: <span class="hljs-string">'header'</span>
        }
      }
    }
  })


  fastify.register(require(<span class="hljs-string">'@fastify/swagger-ui'</span>), {
    routePrefix: <span class="hljs-string">'/docs'</span>,
    uiConfig: {
      docExpansion: <span class="hljs-string">'full'</span>,
      deepLinking: <span class="hljs-literal">false</span>
    },
    uiHooks: {
      onRequest: <span class="hljs-keyword">function</span> (request, reply, next) { next() },
      preHandler: <span class="hljs-keyword">function</span> (request, reply, next) { next() }
    },
    staticCSP: <span class="hljs-literal">true</span>,
    transformStaticCSP: (header) =&gt; header,
    transformSpecification: (swaggerObject, request, reply) =&gt; { <span class="hljs-built_in">return</span> swaggerObject },
    transformSpecificationClone: <span class="hljs-literal">true</span>
  })
</pre></div>`;


const filter = {
	'tags' : ["pre", "code", "a", "h1", "h2", "h3", "embed", "caption", "gallery", "playlist", "audio", "video", "blockquote"],
	'classes' : ["hljs-"]
}

// 使用cheerio加载HTML字符串
const $ = cheerio.load(html);

// 获取所有文本节点并替换匹配的文本
$('*').contents().each(function () {

	let ignore = false;

	if(filter.tags.includes(this.parent.name)){
		//console.log('ignore', $(this).text());
		ignore = true;
	}

	filter.classes.map( (name, index) => {
		if(this.parent.attribs.class && this.parent.attribs.class.includes(name)){
			ignore = true;
			return false;
		}
	});

	if(ignore) return;

	let find = 'hello world';
	let findReg = new RegExp(find, "gi");	
	//console.log(findReg);
	if (this.nodeType === 3) {
	  // 文本节点
	  const text = $(this).text();
	  const replacedText = text.replace(findReg, `<a href="#" class="className">$&</a>`);
	  $(this).replaceWith(replacedText);
	}


});

// 返回替换后的HTML字符串
const allHtml = $('body').html();

Post Comment