Different URI encodings for one Tomcat-based application
2015-12-21There some cases when you would like to map different URI encodings on different HTTP endpoints. And one of those cases is when your application handles GET requests containing percent-encoded non-ASCII data in different charsets. For example, one HTTP endpoint uses standard UTF-8 while the other uses Windows-1251.
Plain Tomcat way
According to How do I change how GET parameters are interpreted? the only way to specify GET request encoding is to use by-connector URIEncoding
attribute. For example:
<Connector port="8081" URIEncoding="utf-8"/>
<Connector port="8082" URIEncoding="cp1251"/>
Then you have map servlets to different connectors somehow.
Spring Boot multiple HTTP connectors way
Spring Boot can help you out in this matter. Although it uses the only one URI encoding which is specified in server.tomcat.uri-encoding
parameter (“UTF-8” by default, see Appendix A. Common application properties), it can fire up multiple child applications residing on different ports.
Implementation is really simple as you can see from Spring Boot Connectors application.
Pros:
- no hacks and workarounds, pure Spring Boot solution :)
- controller unit tests are passed.
Cons:
- some mess with controllers mappings;
- integration tests (with full context initialization) are failed on non-ASCII requests; in fact, you have no option to point which connector are used in test;
- say bye-bye to Spring Boot actuators, you’ll have to use some workarounds to plug them in.
Links:
- Spring-Boot : How can I add tomcat connectors to bind to controller
- Multiple HTTP connectors in Spring Boot example
Nginx+Lua way
Nginx being built with Lua module becomes a very fast non-blocking application server. They even have a framework! So the URI re-encode it’s a quite an easy task to solve.
It’s a more complicated way, but if you already use Nginx as a reverse proxy server / balancer / HTTPS terminator in front of or Java application — why not?
Nginx build options, functions and configuration file example can be found in docker-nginx project. Function to convert encoding:
function M.iconv(cd, args)
for key, val in pairs(args) do
if type(val) == "table" then
for k, v in pairs(val) do
val[k] = cd:iconv(v)
end
else
args[key] = cd:iconv(val)
end
end
return args
end
It converts only URI parameter values and leaves parameter names untouched. Converting is performed by iconv C library with help of Lua-iconv binding, so it’s very fast.
This Nginx config block configures Lua module, load convert function and initializes iconv:
lua_package_path '/etc/nginx/lua/?.lua;;';
init_by_lua_block {
functions = require("functions")
iconv = require("iconv")
cd = iconv.new("utf8", "cp1251")
}
In order to re-encode URIs before proxying to Java backend use the following sample:
location /app {
rewrite_by_lua_block {
if ngx.var.args == nil then return end
local args = ngx.decode_args(ngx.var.args)
args = functions.iconv(cd, args)
args = ngx.encode_args(args)
ngx.var.args = args
}
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_pass http://localhost:8000;
}
Pros:
- no multiple connectors, listen on single port;
- extremely fast, no performance drawback;
- all tests are passed.
Cons:
- extra devops work :)
Links:
Java hackish way
The idea is the same — url-decode GET-request values, convert to proper encoding and url-encode. At this time it’s achieved by using private API of org.apache.coyote.Request class to decode query string conditionally.
Implementation is quite simple as you can see from Spring Boot Filters application.
URIs are re-encoded in servlet filter:
static class CoyoteRequestManipulator extends OncePerRequestFilter {
private static Field getField(Class clazz, String fieldName) throws NoSuchFieldException {
...
}
@Override
protected void doFilterInternal(HttpServletRequest request, HttpServletResponse response,
FilterChain chain) throws ServletException, IOException {
if (hasText(request.getQueryString()) && request.getQueryString().contains("%")) {
RequestFacade facade = (RequestFacade) request;
try {
// First hack is to get org.apache.coyote.Request instance
Field requestField = getField(RequestFacade.class, "request");
requestField.setAccessible(true);
Request connRequest = (Request) requestField.get(facade);
org.apache.coyote.Request coyoteRequest = connRequest.getCoyoteRequest();
// But it's already filled with decoded query parameters, so query string has
// to be re-handled after URI encoding switch. So, in fact, query string is
// processed twice. Yet, org.apache.coyote.Request instances are reusable,
// so query encoding has to set every time.
Parameters parameters = coyoteRequest.getParameters();
parameters.setQueryStringEncoding(request.getServletPath().startsWith("/two")
? "cp1251" : "utf-8");
parameters.recycle();
parameters.handleQueryParameters();
} catch (Exception e) {
e.printStackTrace();
}
}
chain.doFilter(request, response);
}
}
Pros:
- no multiple connectors, listen on single port;
- no recompiled Nginx stuff - just Java;
- controller unit tests are passed;
- integration tests are passed.
Cons:
- reflection and private API usage :)
- query string could be handled twice.
Links:
The other way
Use the microservices, Luke! But for their simplicity and scalability you have pay with massive infrastructure changes. See the following articles for consideration: