having fun with code

Regular expressions in Freemarker

At work we have our custom CMS (SWITCH) which uses Freemarker to create template-based pages. So far I had never needed this, but today I had to figure out how to strip all html tags from a text block. Well, nothing like a good RegEx for that :)

Trimming an HTML block to 100 characters and stripping all the html tags:

  1. <p class="bio">
  2.   ${Text?replace("</?[^>]+(>|$)", "", "r")?substring(0,100)}…
  3. </p>

RegEx in Freemarker differ a little bit from RegEx in Javascript, so be aware of that :) Freemarker uses Java 1.4 RegEx syntax.

Enjoy!

Related Posts:

9 Comments to Regular expressions in Freemarker

  1. Jf's Gravatar Jf
    August 13, 2009 at 05:46 | Permalink

    Hey thanks man :)

  2. grg's Gravatar grg
    October 24, 2009 at 06:06 | Permalink

    Many thx ! I’ve used it for Alfresco to send proper formatted mails… It does tha trick !

    U Rule !

  3. March 31, 2011 at 02:08 | Permalink

    Thanks Eneko for your solution! But it is incomplete in a number of ways:

    ** You do not catch “foo bar<" or "foo bar </", as you require (using '+') one character before the document's end. If you replace '+' with '*', this is fixed; on the other hand, it won't hurt if the pattern then matches occurrences of "” and “”.

    ** In some cases, such as with compact HTML, there are no spaces when there should be, e.g. “… paragraphHeading …” turns into “… paragraphHeading …”.

    ** You do not process entities at the document’s end. Nor do you replace them in the case that the result should not be HTML. (I will not cover that replacement here, I think Freemarker has some built-in for that.) Turning “ ” into a space is probably also a good idea.

    My solution (assuming the result will be treated as HTML as it may contain entities):


    ${Text?replace("]*(>|$)|&(nbsp;?|#?[0-9A-Za-z]*$)", " ", "r")?replace("\\s+", " ", "r")?substring(0,100)}…

    Caveat: Stuff like “embedded” turns into “em bed ded” – but IMHO this case is rarer than compact HTML. A more clever pattern might be able to deal with this.

    PS: A preview for these commends would be awesome.

  4. March 31, 2011 at 02:11 | Permalink

    Above code again, this time with proper HTML escaping (hope it will be fine this time):

    ${Text?replace("</?[^>]*(>|$)|&(nbsp;?|#?[0-9A-Za-z]*$)", " ", "r")?replace("\\s+", " ", "r")?substring(0,100)}...

  5. March 31, 2011 at 02:15 | Permalink

    And the above example in the second point was: “paragraph</p><h2>Heading”

  6. March 31, 2011 at 02:16 | Permalink

    ‘nother one: It should read: ‘Turning “&nbsp;” into a space …’

  7. March 31, 2011 at 02:17 | Permalink

    Last example: Stuff like “em<b>bed</b>ded” …

  8. March 31, 2011 at 02:25 | Permalink

    Yeah: … it won’t hurt if the pattern then matches occurrences of “<>” and “</>”.

  9. March 31, 2011 at 02:30 | Permalink

    I just repeat my post again with all HTML properly escaped ;-) – have fun:

    Thanks Eneko for your solution! But it is incomplete in a number of ways:

    ** You do not catch “foo bar<” or “foo bar </”, as you require (using ‘+’) one character before the document’s end. If you replace ‘+’ with ‘*’, this is fixed; on the other hand, it won’t hurt if the pattern then matches occurrences of “<>” and “</>”.

    ** In some cases, such as with compact HTML, there are no spaces when there should be, e.g. “… paragraph</p><h2>Heading …” turns into “… paragraphHeading …”.

    ** You do not process entities at the document’s end. Nor do you replace them in the case that the result should not be HTML. (I will not cover that replacement here, I think Freemarker has some built-in for that.) Turning “&nbsp;” into a space is probably also a good idea.

    My solution (assuming the result will be treated as HTML as it may contain entities):


    ${Text?replace("</?[^>]*(>|$)|&(nbsp;?|#?[0-9A-Za-z]*$)", " ", "r")?replace("\\s+", " ", "r")?substring(0,100)}…

    Caveat: Stuff like “em<span>bed</span>ded” turns into “em bed ded” – but IMHO this case is rarer than compact HTML. A more clever pattern might be able to deal with this.

Leave a Reply

You can use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Additional comments powered by BackType

About the blog

This is a blog about development, focused mainly on Javascript but also other languages like python, shell scripts and more.

About the author

Eneko Alonso is a software engineer and UI developer with more than eight years of experience in software and web development. He lives in San Luis Obispo, California and works at LEVEL Studios.

Contact Info

Contact Info

PromoteJS

JavaScript JS Documentation