用Cheerio实现html字符串替换
html字符串相当复杂,一是要求不能破坏html标签和结构,二是仅仅在必要的文本中替换,保留如pre,code内的内容尤其是嵌套的标签,所以对此类问题的处理尽量用操作html/dom/xml的库来处理,谨慎使用正则表达式,我为processwire写过一个插件用来处理关键词和内链的替换,当时使用的是php-dom这个库来把文档解析出来处理,但是在<pre>
标签中嵌套的<span>
内链的样式的内容,同样出了bug:在代码高亮部分依旧是把关键词提出来加上了链接,由于年代久远了也懒得更新了。
下面是我用cheerio实现的JavaScript版本,在filter
中把需要排除的html标签和class添加进去即可。
import * as cheerio from 'cheerio'; const html = `<div><img alt="使用符合gdpr和ccpa法规的Cookie声明FastifyFastify" class="transition duration-500 group-hover:scale-105 h-64 object-cover object-top w-full" height="667" src="https://zlapi.zhuli.eu/assets/b90187d0-1bec-43b5-a642-c080f8ee47f7?width=320&height=213&fit=cover" width="1000" loading="lazy" sizes="(min-width: 66em) 33vw, (min-width: 44em) 50vw, 100vw" srcset="https://zlapi.zhuli.eu/assets/b90187d0-1bec-43b5-a642-c080f8ee47f7?width=320&height=213&fit=cover 300w, https://zlapi.zhuli.eu/assets/b90187d0-1bec-43b5-a642-c080f8ee47f7?width=480&height=320&fit=cover 600w, https://zlapi.zhuli.eu/assets/b90187d0-1bec-43b5-a642-c080f8ee47f7?width=1000&height=667&fit=cover 1200w"><p>使用符合GDPR(通用数据保护条例)和CCPA(加利福尼亚消费者隐私法)hello world<a href="https://towait.com/Fastify" class="Fastify">Fastify</a>法规的Cookie声明是为了保护用户的隐私权和数据保护。</p><p><a href="https://gdpr-info.eu/" rel="noopener" target="_blank">GDPR</a>是欧盟制定的法规,旨在保护个人数据的隐私和安全。根据GDPR,Fastify网站需要明确告知用户哪些个人数据将被收集,为什么收集,以及如何使用这些数据。Cookie声明是一种透明的方式,Hello world允许网站告知用户它们使用的Cookie类型和目的,并为用户提供选择是否同意使用这些Cookie。这确保了用户具有知情权和控制权,可以自主决定是否接受Cookie的使用。</p><p>同样,<a href="https://oag.ca.gov/privacy/ccpa" rel="noopener" target="_blank">CCPA</a>是美国加利福尼亚州的法规,旨在保护消费者的隐私权。根据CCPA,网站必须告知用户其收集和共享的个人数据类型,hello world并为用户提供选择是否允许这些数据的销售。Cookie声明是满足CCPA要求的一种方式,使用户能够了解他们的数据如何被使用,并控制他们的个人信息。</p> <h1>什么是Fastify?</h1> <p>因此,使用符合GDPR和CCPA法规的Cookie声明可以帮助网站遵守相关的隐私法规,并增强用户对其隐私权的保护。这些声明提供了透明度和选择权,<code>test</code>使用户能够更好地掌握自己的个人数据,<code>Fastify,fastify</code>并决定是否愿意分享这些数据。</p><pre class="sh hljs bash" data-pbcklang="sh" data-pbcktabsize="4"> fastify.register(require(<span class="hljs-string">'@fastify/swagger'</span>), { swagger: { info: { title: <span class="hljs-string">'Test swagger'</span>, description: <span class="hljs-string">'Testing the Fastify swagger API'</span>, version: <span class="hljs-string">'0.1.0'</span> }, externalDocs: { url: <span class="hljs-string">'https://swagger.io'</span>, description: <span class="hljs-string">'Find more info here'</span> }, host: <span class="hljs-string">'127.0.0.1:3000'</span>, schemes: [<span class="hljs-string">'http'</span>, <span class="hljs-string">'https'</span>], consumes: [<span class="hljs-string">'application/json'</span>], produces: [<span class="hljs-string">'application/json'</span>], tags: [ { name: <span class="hljs-string">'user'</span>, description: <span class="hljs-string">'User related end-points'</span> }, { name: <span class="hljs-string">'code'</span>, description: <span class="hljs-string">'Code related end-points'</span> } ], definitions: { User: { <span class="hljs-built_in">type</span>: <span class="hljs-string">'object'</span>, required: [<span class="hljs-string">'id'</span>, <span class="hljs-string">'email'</span>], properties: { id: { <span class="hljs-built_in">type</span>: <span class="hljs-string">'string'</span>, format: <span class="hljs-string">'uuid'</span> }, firstName: { <span class="hljs-built_in">type</span>: <span class="hljs-string">'string'</span> }, lastName: { <span class="hljs-built_in">type</span>: <span class="hljs-string">'string'</span> }, email: {<span class="hljs-built_in">type</span>: <span class="hljs-string">'string'</span>, format: <span class="hljs-string">'email'</span> } } } }, securityDefinitions: { apiKey: { <span class="hljs-built_in">type</span>: <span class="hljs-string">'apiKey'</span>, name: <span class="hljs-string">'apiKey'</span>, <span class="hljs-keyword">in</span>: <span class="hljs-string">'header'</span> } } } }) fastify.register(require(<span class="hljs-string">'@fastify/swagger-ui'</span>), { routePrefix: <span class="hljs-string">'/docs'</span>, uiConfig: { docExpansion: <span class="hljs-string">'full'</span>, deepLinking: <span class="hljs-literal">false</span> }, uiHooks: { onRequest: <span class="hljs-keyword">function</span> (request, reply, next) { next() }, preHandler: <span class="hljs-keyword">function</span> (request, reply, next) { next() } }, staticCSP: <span class="hljs-literal">true</span>, transformStaticCSP: (header) => header, transformSpecification: (swaggerObject, request, reply) => { <span class="hljs-built_in">return</span> swaggerObject }, transformSpecificationClone: <span class="hljs-literal">true</span> }) </pre></div>`; const filter = { 'tags' : ["pre", "code", "a", "h1", "h2", "h3", "embed", "caption", "gallery", "playlist", "audio", "video", "blockquote"], 'classes' : ["hljs-"] } // 使用cheerio加载HTML字符串 const $ = cheerio.load(html); // 获取所有文本节点并替换匹配的文本 $('*').contents().each(function () { let ignore = false; if(filter.tags.includes(this.parent.name)){ //console.log('ignore', $(this).text()); ignore = true; } filter.classes.map( (name, index) => { if(this.parent.attribs.class && this.parent.attribs.class.includes(name)){ ignore = true; return false; } }); if(ignore) return; let find = 'hello world'; let findReg = new RegExp(find, "gi"); //console.log(findReg); if (this.nodeType === 3) { // 文本节点 const text = $(this).text(); const replacedText = text.replace(findReg, `<a href="#" class="className">$&</a>`); $(this).replaceWith(replacedText); } }); // 返回替换后的HTML字符串 const allHtml = $('body').html();