피곤키오 :: 안드로이드 개발 :: 피곤키오

파일 인코딩 알아내기

JAVA 기초 다지기 2010. 11. 7. 00:12

사용방법

1. CharsetToolkit를 사용

File file = new File("windows-1252.txt"); 
   
Charset guessedCharset = CharsetToolkit.guessEncoding(file, 4096); 
System.err.println("Charset found: " + guessedCharset.displayName()); 
 
FileInputStream fis = new FileInputStream(file); 
InputStreamReader isr = new InputStreamReader(fis, guessedCharset); 
BufferedReader br = new BufferedReader(isr); 

String line; 
while ((line = br.readLine()) != null) 
{ 
	System.out.println(line); 
}

2. SmartEncodingInputStream를 사용

FileInputStream fis = new FileInputStream("us-ascii.txt"); 
 
SmartEncodingInputStream smartIS = new SmartEncodingInputStream(fis); 
System.err.println("The charset of this input stream is: " + smartIS.getEncoding().displayName()); 

Reader reader = smartIS.getReader(); 
BufferedReader bufReader = new BufferedReader(reader); 

String line; 
while ((line = bufReader.readLine()) != null) 
{ 
	System.out.println(line); 
}

GuessEncoding

New Home for the Project

Please note that the project has moved to Codehaus a while ago, and you can find the latest and up-to-date version here:
http://docs.codehaus.org/display/GUESSENC

Origins

At work, I'm developping with IntelliJ IDEA, from Jetbrains. Though I've tried Eclipse and Jbuilder in the past, I came to love this IDE. It's certainly the best IDE around. It's a real pleasure to develop with it.

During the summer 2002, I came across an issue regarding file encodings. At work, one of our concerns is localisation/internationalisation issues. We develop applications that are i18n/i10n aware. We used to have our Java source files encoded in ISO-latin-1, and our XML files encoded in UTF-8 (especially because there were some language specific stuff inside). At that time, IDEA was able to read a file within a specified encoding. But it could not detect the encoding used to encode that file. And as shit happens sometimes ;-) I totally messed up a very important XML file... I then realised that it was due to the fact that IDEA was not able to guess the encoding. Charsets issues are very critical when dealing with l10n/i18n, that's why I filed some feature requests to the IDEA's developers. I wrote a two simple classes to show them that it was very easy to guess a charset, and I granted them the right to include (and modify) my source code inside IDEA. That's what they did, and since then, all IDEA fans can open their files without worring about messing up their files... (who hasn't seen some weird boxes or interrogation points in their messed files ?)

Content

The package com.glaforge.i18n.io consists of two classes : CharsetToolkit and SmartEncodingInputStream. The first one is a utility class that guesses the charset used in the byte buffer given as parameter. The latter one is a specialised input stream that wraps an input stream and reads a certain amount of the file to guess the right charset, and then opens the file with the right encoding.

Source code and API

source files in HTML : http://glaforge.free.fr/projects/guessencoding/html/com/glaforge/i18n/io
source code : http://glaforge.free.fr/projects/guessencoding/src
sample files : http://glaforge.free.fr/projects/guessencoding/samples
Javadoc API : http://glaforge.free.fr/projects/guessencoding/api

Usage

For a more detailed explanation of the usage of this package, please go read the Javadoc.

저작자표시 비영리 동일조건

Posted by 피곤키오

이전 1 2 3 4 ··· 33 다음

일	월	화	수	목	금	토
			1	2	3	4
5	6	7	8	9	10	11
12	13	14	15	16	17	18
19	20	21	22	23	24	25
26	27	28	29	30	31

피곤키오 :: 안드로이드 개발

파일 인코딩 알아내기

사용방법

1. CharsetToolkit를 사용

2. SmartEncodingInputStream를 사용

GuessEncoding

New Home for the Project

Origins

Content

Source code and API

Usage

카테고리

태그목록

최근에 올라온 글

최근에 달린 댓글

최근에 받은 트랙백

글 보관함

달력

링크

티스토리툴바