Author Topic: HOWTO: Rewrite HTML body URLs to avoid absolute protocol references  (Read 3900 times)

Offline TheOracle

  • Hero Member
  • *****
  • Posts: 152
  • Karma: 16
One of the most common mistakes by web designers is the use of absolute URLs when writing HTML.  When companies attempt to put SSL offload in place of an application, the content in general doesn't work properly, as you may have mixed protocol references due to the content.  As such, the ability to rewrite content is important when doing SSL offload to account for poor programming.  This guide will provide several examples of how to rewrite content in order to resolve such issues.  Please see the following guide on body rewrite best practices in this forum to avoid some common pitfalls.

Example 1:  Rewrite all "http(s)://" strings to "//"
Note:  It isn't commonly aware, but if you write a URL in HTML using the format "//domain/path" it will act as an absolute URL in that it inherits the protocol the page the URL was included in, yet will use the host after the // to access the content.  Credit:  Slashdot HTML source, tested with Firefox 2 and IE7

add rewrite action remove_http_https replace_all "http.RES.BODY(1000000).SET_TEXT_MODE(ignorecase)" "\"//\"" -pattern "re~https?://~"
add rewrite policy remove_http_https true remove_http_https
bind rewrite global remove_http_https 20 NEXT -type RES_DEFAULT

Offline jmelika

  • Administrator
  • Hero Member
  • *****
  • Posts: 284
  • Karma: 4
This is a life saver, Oracle.  Thanks!

Offline evildani

  • Administrator
  • Hero Member
  • *****
  • Posts: 282
  • Karma: 17
In my expirience you should use this feature only as a last resort, again LAST RESORT....

It can really really go bad on you, and screw up what are mediocre applications and really good applications.

My advice would be to try other methods first, we have found that using Apache rewrite module with Simultae https and Addcertheader a better solution, and it is not as intrusive as body rewrite.

If you absolutly have no other way, then remember to bind the rewrite rule to a vserver, NEVER under any circunstances do a global bind on a body rewrite policy.

Offline TheOracle

  • Hero Member
  • *****
  • Posts: 152
  • Karma: 16
I would tend to agree on the "don't bind to global", although I did in the example.  I've been testing with the 8.1 beta, and the behavior seems to be reasonably well behaved, although additional constraints should in general be put in place to prevent rewrites on pages that don't need it.  In addition, you want to insure that on the back-end, you use HTTP 1.0 and disable server side compression (examples of both are posted) to insure that the rewrite is reliable.  Rewrite won't operate if the server delivers the response as chunked or compressed.  I still need to put my rewrite best practices post up, although I'm still working on the details.

The Oracle

Offline Paul B

  • Hero Member
  • *****
  • Posts: 123
  • Karma: 14
Silly question / comment time:

I always believed that the Netscaler rewrite feature only works on HEADERS. Indeed, the manual says: The Rewrite feature modifies only the header section of an HTTP request or response, not the data section. "

So how come you are able to re-write the body?


Paul Blitz

Offline jmelika

  • Administrator
  • Hero Member
  • *****
  • Posts: 284
  • Karma: 4
Paul,

ver 7 only allows you to rewrite the header.  ver 8 allows you to rewrite body.  I don't have ver 8 running to validate this, but I am almost sure that's the case.

JM

Offline TheOracle

  • Hero Member
  • *****
  • Posts: 152
  • Karma: 16
correct, 7.0 was headers only, 8.0 included body rewrite.

The Oracle

Offline jmelika

  • Administrator
  • Hero Member
  • *****
  • Posts: 284
  • Karma: 4
I wonder how taxing it is on the NS's CPU when you start rewriting body.  Do you have any metrics on that, Oracle?

Offline TheOracle

  • Hero Member
  • *****
  • Posts: 152
  • Karma: 16
No, although I can anticipate that it is at least a 30% hit on the cpu as the buffering impact will be similar to tcpb, which generally is about 30%.  That hit though is only for the objects that are really being rewritten, if you write your rules properly, only the html pages will cause the overhead, not images, etc, so the actual impact will be smaller than 30%.

The Oracle

Offline Paul B

  • Hero Member
  • *****
  • Posts: 123
  • Karma: 14
ver 7 only allows you to rewrite the header.  ver 8 allows you to rewrite body.  I don't have ver 8 running to validate this, but I am almost sure that's the case.

Shame they don't update the manual, isn't it? I was quoting from the version 8 ICG!

Paul

Offline oldguy

  • Contributor
  • *
  • Posts: 6
  • Karma: 2
While many people seem to like to use url rewrite I've never understood the fascination with it.  To me it is an outage call just waiting to happen.

The Oracle says it in the opening sentences of this thread "One of the most common mistakes by web designers ... to account for poor programming"

I've got close to 100 pairs of Load Balancers with over 12,000 different lb vserver's and over 50,000 real servers and not a single line of url rewriting.
Maybe I'm just lucky but I've always been able to convince the app teams to fix their code instead of having the Load Balancers do it for them.

Having spouted off how successful I've been... I'm now faced with a similar but different issue.  Security is mandating that applications be forbidden to return 4xx or 5xx error pages.  I'm being asked by several apps to fix the issue by trapping all 4xx and 5xx error responses and rewriting them to a 302 (and deleting the content of the earlier error message) to a standard error page.   Some of them want me to rewrite it to a 200 with the standard error message in it.

Any experience with this?  My take is to have the web server catch this with a standard error page but I may be loosing the battle after a successful run of many years...

What is the real hit to the Load Balancer with a Global policy like this?

Offline TheOracle

  • Hero Member
  • *****
  • Posts: 152
  • Karma: 16
You arn't working with apps that people have lost the source for (yes, it happens) or third party apps that arn't supported anymore (yep, happens too), or just poorly coded third party apps that refuse to change their code.

As for the solution, this is a somewhat tricky one, as the NS will break down a response into separate fields, and allow you to touch each one separately.  Dropping the 4xx/5xx responses would be easy:

add rewrite policy drop_4xx5xx "http.RES.STATUS.GE(400) && http.RES.STATUS.LT(599)" RESET

This will basically force a RST anytime the server replies with anything between 400 and 599 as the status code.  This may be your "answer", in that you arn't rewriting, but you will insure that the webservers don't generate the errors to the outside world.  It seems a fair compromise to your philosophy of not doing rewrites or heavy content management on the LB, yet insure security policies are met.  The app programmers can then fix their code.

The Oracle

Offline Paul B

  • Hero Member
  • *****
  • Posts: 123
  • Karma: 14
The Oracle says it in the opening sentences of this thread "One of the most common mistakes by web designers ... to account for poor programming"

When I went on the Teros (Application Firewall) course, one of the trainer's first comments was to explain the need for the product: "it's there to protect badly written websites"! Pretty similar thing eh?

And I'm sure there are PLENTY of those (eg amateur websites with forums etc hosted on an ISP's server). It only needs ONE of the websites to do something stupid (or rather "to allow a hacker to do something not so stupid :-)"), to potentially create a "hole" that can affect MANY other websites, properly written or not!


Paul

Offline jmelika

  • Administrator
  • Hero Member
  • *****
  • Posts: 284
  • Karma: 4
My two cents:
We have an infrastructure with about 40 million impressions a day.  Often times we get requests that require immediate adjustments that need to take place.  It'd take days to go through development, QA, etc before it's put in production.  So we use URL rewrite to implement the necessary changes until the app is revised.

I'm sure there are many occassions where one feature could be used for so many different reasons.  It all depends on your infrastructure, need, and how it can help.

JM

Offline oldguy

  • Contributor
  • *
  • Posts: 6
  • Karma: 2
Thanks all for being gentle in your responses  :-)  Some very valid reasons given that I've been able to avoid (so far) but I do understand how it is necessary at times.

About 7 years ago security came to me to do some url rewriting for a security hole and I told them the Load Balancer we were using could not do that.
I guess they still remember that day and haven’t came back since.  The Load Balancer I have today (Netscalers) are now more than capable of doing it and I haven't volunteered it as a solution.  I'll see how long I can keep it to myself!

I'm going to play with the RESET option and see how the app community likes that one.

Say I have some Load Balancers (10k's) that are running rather mildly with 4,000+ http requests /sec at about 40% mem and 40% CPU.  What should I expect to see with a Global rewrite policy if I have to make all of the 4xx and 5xx responses to the outside world 200's with some specific text?

I'd just hate to put it in and see the boxes tank when the real world hits them.  I'm still rather timid because of past tragedies with other vendors products.