BCRDF

Born to code, rock ...

Friday, June 12, 2009

Delivering Files From the Database Using HTTP Conditional GET Request

In this article I'll explain in details how to implement a conditional HTTP get request using Servlet API.
It common situation when a web application need to deliver static files content (for example images) to the client. In such a situation it is very important to decide where to store these files.
There are two basic approaches - in the file system or in the database. Both of them have advantages and disadvantages.
For example the first one's has the advantage that we do not need to manage when the file was created or changed, but also has the disadvantage that we need to care about file permissions. This approach is outside the scope of this article.
In second approach disadvantage is that we need to manage when the file was last changed, what is its length.
Here is a class that represent such a file stored in the database.

package org.tsachev.info.conditionalget.domain;

import java.io.InputStream;

public class File {
private Long id;
private String fileName;
private String mimeType;
private long lastChanged;
private int length;
private InputStream inputStream;

// Getter and setters skipped for simplicity
}


There are different ways to produce objects from the above class depending on your DB access framework. No matter if you use pure JDBC, Hibernate, JPA, IBATIS, JDO, etc you can do it. In this article we suppose that you already have a way to find the file in the database by its id and create such file object.
Next step is to create a Servlet that handles get requests from the client. Let suppose that this request has the form http://host:port/context/file?file_id=xxxx. Where http://host:port/context is the address where our application context can be accessed and request of file will be handled from our servlet.
Here is the code for our servlet:


package org.tsachev.info.conditionalget.servlets;

import java.io.IOException;

import javax.servlet.ServletConfig;
import javax.servlet.ServletException;
import javax.servlet.http.HttpServlet;
import javax.servlet.http.HttpServletRequest;
import javax.servlet.http.HttpServletResponse;

public class FileServlet extends HttpServlet {
private static final String ID_INIT_PARAM_NAME = "org.tsachev.info.conditionalget.FileServlet.FILE_ID_PARAM_NAME";
private static final String DEFAULT_ID_PARAM_NAME = "id";
private String idInitParameter = DEFAULT_ID_PARAM_NAME;

@Override
public void init(ServletConfig config) throws ServletException {
super.init(config);
String idInitParameterValue = config
.getInitParameter(ID_INIT_PARAM_NAME);
if (idInitParameterValue != null) {
idInitParameter = idInitParameterValue;
}
}

protected void doGet(HttpServletRequest request,
HttpServletResponse response) throws ServletException, IOException {
String id = request.getParameter(idInitParameter);
// Here goes the file serving part
}
}

And it declaration in the web.xml file:

<servlet>
<description>Handles get requests for files</description>
<servlet-name>file</servlet-name>
<servlet-class>org.tsachev.info.conditionalget.servlets.FileServlet</servlet-class>
<init-param>
<description>Name of the request parameter containing the id of the file</description>
<param-name>org.tsachev.info.conditionalget.FileServlet.FILE_ID_PARAM_NAME</param-name>
<param-value>file_id</param-value>
</init-param>
</servlet>
<servlet-mapping>
<servlet-name>file</servlet-name>
<url-pattern>/file</url-pattern>
</servlet-mapping>


I have parameterized the name of the id parameter so this code to be more reusable.
Here comes the most interesting part of this article implementing the conditional get.
What is conditional get anyway?
Suppose you have your images in the database and in many places of your HTML page there is something like this:
<img src="file?file_id=1" alt="image" />

This will make your browser to produce a get request in the form as described above, and on the server side it will be served by our servlet. What is the problem? The problem is that if do not take care every time you load your page this image will be deliver again to the client. Browser have the ability to cache get request to save bandwidth. To do this your browser loads your file first time its requested and every next time it adds HTTP Request Header If-Modified-Since. So your browser is actually expecting that you won't return the file, if it is not changed, but instead return Status Code 304 - Not Modified. Standard browsers have this feature by default, so all we need to do is to follow HTTP specification and return the right response.
A guide line for my logic will be the HTTP specification there we can find:

a) If the request would normally result in anything other than a
200 (OK) status, or if the passed If-Modified-Since date is
invalid, the response is exactly the same as for a normal GET.
A date which is later than the server's current time is
invalid.

b) If the variant has been modified since the If-Modified-Since
date, the response is exactly the same as for a normal GET.

c) If the variant has not been modified since a valid If-
Modified-Since date, the server SHOULD return a 304 (Not
Modified) response.
By normal GET here we understand to return the file content to the client.
I'll add two more cases:
  • If there are no id parameter given or we cannot parse it response will be 400- Bad Request.
  • If the we cannot find file by given id than the response will be 404 - Not Found.
This behavior is not accurate for all cases, so if you need something else like delivering default image, you may implement it yourself.
Here is full implementation of the doGet method:
protected void doGet(HttpServletRequest request,
HttpServletResponse response) throws ServletException, IOException {
String idValue = request.getParameter(idInitParameter);
Long id = parseId(idValue);
if (id != null) {
File file = find(id);
if (file != null) {
long clientModifiedSince;
try {
clientModifiedSince = request
.getDateHeader(FileServlet.IF_MODIFIED_SINCE);
} catch (IllegalArgumentException e) {
clientModifiedSince = -1;
}
if (file.getLastChanged() < clientModifiedSince) {
response.setStatus(HttpServletResponse.SC_NOT_MODIFIED);
} else {
response.setContentLength(file.getLength());
response.setContentType(file.getMimeType());
OutputStream output = response.getOutputStream();
InputStream input = file.getInputStream();
writeFileToResponse(output, input);
}
} else {
response.setStatus(HttpServletResponse.SC_NOT_FOUND);
}
} else {
response.setStatus(HttpServletResponse.SC_BAD_REQUEST);
}
}

Well this is it. I'll just add some comments on the method above.
  • The constant FileServlet.IF_MODIFIED_SINCE contains name of the If-Modified-Since request header.
  • The method parseId() should transform the value of the request parameter to a value your find() method will understand
  • The method find() should return File instance by parsed id. Note here File is our domain class not the java.io.File
  • The method writeFileToResponse() will read your input stream and write it to the response output stream.
This methods are easy to implement and are not directly linked with conditional get, so I left 'em without implementation in this post.

This is how I do the conditional get when developing Java web applications. Hope this helps to someone.

Friday, May 29, 2009

Skipping Invalid XML Character with ReaderFilter

While doing integration with a legacy system through xml files I face a strange fact. It turned out that according to specification of xml version 1.0 there are unicode characters that are not allowed in the content of the xml document.
Naturally, this legacy system produced such invalid characters. So we had nothing to do, but to get rid of them in one way or another.

This is because a standard java xml parser will throw an exception with message like:
"An invalid XML character (Unicode: 0xXXXX) was found in the element content of the document".
And since our application does not need this symbols we decided just to skip 'em.

Here is a sample file that contains the symbol START TEXT (Unicode: 0x2)

<?xml version="1.0" encoding="UTF-8" ?>
<chars>
<valid>a</valid>
<invalid></invalid>
</chars>

And next is simple Java code that shows the problem. It uses XML streaming API (aka StAX).

package xmlchars;

import java.io.FileNotFoundException;
import java.io.FileReader;

import javax.xml.stream.XMLInputFactory;
import javax.xml.stream.XMLStreamException;
import javax.xml.stream.XMLStreamReader;

public class XmlInvalidCharactersDemo {

public static void main(String[] args) throws FileNotFoundException,
XMLStreamException {
XMLInputFactory inputFactory = XMLInputFactory.newInstance();
XMLStreamReader reader = inputFactory
.createXMLStreamReader(new FileReader(
"resources/invalid-chars.xml"));
while (reader.hasNext()) {
reader.next();
}
}
}
This code just passes through the document and throws ParseError when tries to read text content of tag.

How can we skip this ugly chars?
  1. One solution is to read the xml document to memory remove all nasty (restricted) chars and then give the result to the parser. But in this case we will read the document twice which is not what I want.
  2. Other solution that came into my mind was to extends the java.io.FilterReader class. With this we can skip the unwanted characters or escape or replace them.
I wrote a class implementing the second approach. Here it is:


package xmlchars;

import java.io.FilterReader;
import java.io.IOException;
import java.io.Reader;

import com.sun.org.apache.xerces.internal.util.XMLChar;

/**
* {@link FilterReader} to skip invalid xml version 1.0 characters. Valid
* Unicode chars for xml version 1.0 according to http://www.w3.org/TR/xml are
* #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD], [#x10000-#x10FFFF] . In
* other words - any Unicode character, excluding the surrogate blocks, FFFE,
* and FFFF.
*
* @author tsachev
*
*/
public class Xml10FilterReader extends FilterReader {

/**
* Creates filter reader which skips invalid xml characters.
*
* @param in
* original reader
*/
public Xml10FilterReader(Reader in) {
super(in);
}

/**
* Every overload of {@link Reader#read()} method delegates to this one so
* it is enough to override only this one. <br />
* To skip invalid characters this method shifts only valid chars to left
* and returns decreased value of the original read method. So after last
* valid character there will be some unused chars in the buffer.
*
* @return Number of read valid characters or <code>-1</code> if end of the
* underling reader was reached.
*/
@Override
public int read(char[] cbuf, int off, int len) throws IOException {
int read = super.read(cbuf, off, len);
/*
* If read chars are -1 then we have reach the end of the reader.
*/
if (read == -1) {
return -1;
}
/*
* pos will show the index where chars should be moved if there are gaps
* from invalid characters.
*/
int pos = off - 1;

for (int readPos = off; readPos < off + read; readPos++) {
if (XMLChar.isValid(cbuf[readPos])) {
pos++;
} else {
continue;
}
/*
* If there is gap(s) move current char to its position.
*/
if (pos < readPos) {
cbuf[pos] = cbuf[readPos];
}
}
/*
* Number of read valid characters.
*/
return pos - off + 1;
}

}


Note that this is solution with Readers (aka character streams) only. Yes, you can use java.io.InputStreamReader, but expect encoding problems.

So that's how I tricked the legacy system's xml content which we cannot change and which does not follow the standards.

You can download Eclipse project with source here.

Wednesday, May 27, 2009

How to post source in blogger

I created this blog to share various things, but mostly to bring interesting and useful source code. For this purpose, had to find a clear and easy way to post source code here.
The strange is that google (as you see I'm using blogger) do not provide any meaningful default solution.

I seek for different solutions and this was the coolest one Source Code Highlighting - In Blogger!. It uses vi editor and works great for me.

However I need to do some modifications.
First one is before executing :TOhtml command I do enable xhtml with :let use_xhtml=1. Second one is to remove all the <br /> tags since the rich editor of the blogger is making new lines for me. This can be done with :1,$s/<br \/>//g

Then you can paste the source in the reserved placeholders.
<pre> tag is needed to protect your code from the compose. It will damage it if <pre> is missing..

Here is the result from a simple java file.


package source.in.blogger;

import java.util.List;

/**
* Some commet is here.
*/
@Documented
public class BloggerSource {
// Simple comment
public static void main(String[] args) {
System.out.printf("I'm source code%n");
}
}



It's not extremely easy but works. Try it out and have fun.
Surely I'll use it for my few next post till I find something more useful.