At work we have our custom CMS (SWITCH) which uses Freemarker to create template-based pages. So far I had never needed this, but today I had to figure out how to strip all html tags from a text block. Well, nothing like a good RegEx for that :)
Trimming an HTML block to 100 characters and stripping all the html tags:
-
<p class="bio">
-
${Text?replace("</?[^>]+(>|$)", "", "r")?substring(0,100)}…
-
</p>
RegEx in Freemarker differ a little bit from RegEx in Javascript, so be aware of that :) Freemarker uses Java 1.4 RegEx syntax.
Enjoy!

Hey thanks man :)
Many thx ! I’ve used it for Alfresco to send proper formatted mails… It does tha trick !
U Rule !
Thanks Eneko for your solution! But it is incomplete in a number of ways:
** You do not catch “foo bar<" or "foo bar </", as you require (using '+') one character before the document's end. If you replace '+' with '*', this is fixed; on the other hand, it won't hurt if the pattern then matches occurrences of "” and “”.
** In some cases, such as with compact HTML, there are no spaces when there should be, e.g. “… paragraphHeading …” turns into “… paragraphHeading …”.
** You do not process entities at the document’s end. Nor do you replace them in the case that the result should not be HTML. (I will not cover that replacement here, I think Freemarker has some built-in for that.) Turning “ ” into a space is probably also a good idea.
My solution (assuming the result will be treated as HTML as it may contain entities):
${Text?replace("]*(>|$)|&(nbsp;?|#?[0-9A-Za-z]*$)", " ", "r")?replace("\\s+", " ", "r")?substring(0,100)}…
Caveat: Stuff like “embedded” turns into “em bed ded” – but IMHO this case is rarer than compact HTML. A more clever pattern might be able to deal with this.
PS: A preview for these commends would be awesome.
Above code again, this time with proper HTML escaping (hope it will be fine this time):
${Text?replace("</?[^>]*(>|$)|&(nbsp;?|#?[0-9A-Za-z]*$)", " ", "r")?replace("\\s+", " ", "r")?substring(0,100)}...
And the above example in the second point was: “paragraph</p><h2>Heading”
‘nother one: It should read: ‘Turning “ ” into a space …’
Last example: Stuff like “em<b>bed</b>ded” …
Yeah: … it won’t hurt if the pattern then matches occurrences of “<>” and “</>”.
I just repeat my post again with all HTML properly escaped ;-) – have fun:
Thanks Eneko for your solution! But it is incomplete in a number of ways:
** You do not catch “foo bar<” or “foo bar </”, as you require (using ‘+’) one character before the document’s end. If you replace ‘+’ with ‘*’, this is fixed; on the other hand, it won’t hurt if the pattern then matches occurrences of “<>” and “</>”.
** In some cases, such as with compact HTML, there are no spaces when there should be, e.g. “… paragraph</p><h2>Heading …” turns into “… paragraphHeading …”.
** You do not process entities at the document’s end. Nor do you replace them in the case that the result should not be HTML. (I will not cover that replacement here, I think Freemarker has some built-in for that.) Turning “ ” into a space is probably also a good idea.
My solution (assuming the result will be treated as HTML as it may contain entities):
${Text?replace("</?[^>]*(>|$)|&(nbsp;?|#?[0-9A-Za-z]*$)", " ", "r")?replace("\\s+", " ", "r")?substring(0,100)}…
Caveat: Stuff like “em<span>bed</span>ded” turns into “em bed ded” – but IMHO this case is rarer than compact HTML. A more clever pattern might be able to deal with this.