<?xml version="1.0" encoding="utf-8" ?>
<?xml-stylesheet href="http://rss.egloos.com/style/blog.xsl" type="text/xsl" media="screen"?>
<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/">
<channel>
	<title>Enterprise Search Consulting</title>
	<link>http://esconsult.egloos.com</link>
	<description>기업검색엔진과 solr 이야기</description>
	<language>ko</language>
	<pubDate>Fri, 23 Oct 2009 01:23:11 GMT</pubDate>
	<generator>Egloos</generator>
	<image>
		<title>Enterprise Search Consulting</title>
		<url>http://pds10.egloos.com/logo/200809/09/30/f0057030.jpg</url>
		<link>http://esconsult.egloos.com</link>
		<width>80</width>
		<height>34</height>
		<description>기업검색엔진과 solr 이야기</description>
	</image>
  	<item>
		<title><![CDATA[ Solr관련 책 출간 ]]> </title>
		<link>http://esconsult.egloos.com/1693297</link>
		<guid>http://esconsult.egloos.com/1693297</guid>
		<description>
			<![CDATA[ 
  <p>링크 : <a href="http://www.amazon.com/Solr-1-4-Enterprise-Search-Server/dp/1847195881/ref=sr_1_1?ie=UTF8&amp;s=books&amp;qid=1256260132&amp;sr=8-1" target="_blank">Solr 1.4 Enterprise Search Server</a> <br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <a href="http://www.packtpub.com/solr-1-4-enterprise-search-server?utm_source=http://lucene.apache.org/solr/&amp;utm_medium=spons&amp;utm_content=pod&amp;utm_campaign=mdb_000275#indetail" target="_blank">Adobe eBook PDF Version</a></p><p><p><a href="https://www.packtpub.com/author_view_profile/id/324"></a></p><img style="DISPLAY: inline; MARGIN-LEFT: 0px; WIDTH: 255px; MARGIN-RIGHT: 0px; HEIGHT: 303px" height="380" src="https://www.packtpub.com/images/full/1847195881.jpg" width="308" align="left"> <br>Enhance your search with faceted navigation, result highlighting, fuzzy queries, ranked scoring, and more <p></p><ul><li>Deploy, embed, and integrate Solr with a host of programming languages </li><li>Implement faceting in e-commerce and other sites to summarize and navigate the results of a text search </li><li>Enhance your search by highlighting search results, offering spell-corrections, auto-suggest, finding “similar” records, boosting records and fields for scoring, phonetic matching </li><li>Informative and practical approach to development with fully working examples of integrating a variety of technologies </li><li>Written and tested for Solr 1.4 pre-release 2009.08</li></ul><blockquote><b>Language</b> English <br><b>Paperback</b> 336 pages [191mm x 235mm] <br><b>Release date</b> August 2009 <br><b>ISBN</b> 1847195881 <br><b>ISBN 13</b> 978-1-847195-88-3 <br><b>Author(s)</b> <a href="https://www.packtpub.com/author_view_profile/id/325">David Smiley</a>, <a href="https://www.packtpub.com/author_view_profile/id/324">Eric Pugh</a> <br><b>Topics and Technologies</b> <a href="https://www.packtpub.com/books/topic/8">Open Source</a></blockquote><hr><a name="indetail">In Detail</a> <blockquote>If you are a developer building a high-traffic web site, you need to have a terrific search engine. Sites like Netflix.com and Zappos.com employ Solr, an open source enterprise search server, which uses and extends the Lucene search library. This is the first book in the market on Solr and it will show you how to optimize your web site for high volume web traffic with full-text search capabilities along with loads of customization options. So, let your users gain a terrific search experience. <br>This book is a comprehensive reference guide for every feature Solr has to offer. It serves the reader right from initiation to development to deployment. It also comes with complete running examples to demonstrate its use and show how to integrate it with other languages and frameworks. <br>This book first gives you a quick overview of Solr, and then gradually takes you from basic to advanced features that enhance your search. It starts off by discussing Solr and helping you understand how it fits into your architecture—where all databases and document/web crawlers fall short, and Solr shines. The main part of the book is a thorough exploration of nearly every feature that Solr offers. To keep this interesting and realistic, we use a large open source set of metadata about artists, releases, and tracks courtesy of the MusicBrainz.org project. Using this data as a testing ground for Solr, you will learn how to import this data in various ways from CSV to XML to database access. You will then learn how to search this data in a myriad of ways, including Solr's rich query syntax, "boosting" match scores based on record data and other means, about searching across multiple fields with different boosts, getting facets on the results, auto-complete user queries, spell-correcting searches, highlighting queried text in search results, and so on. <br>After this thorough tour, we'll demonstrate working examples of integrating a variety of technologies with Solr such as Java, JavaScript, Drupal, Ruby, XSLT, PHP, and Python. <br>Finally, we'll cover various deployment considerations to include indexing strategies and performance-oriented configuration that will enable you to scale Solr to meet the needs of a high-volume site. </blockquote><br>What you will learn from this book <blockquote><ul><li>Blend structured data with real search features </li><li>Import CSV formatted data, XML, common document formats, and from databases </li><li>Deploy Solr and provide reference to Solr's query syntax from the basics to range queries </li><li>Enhance search results with spell-checking, auto-completing queries, highlighting search results, and more. </li><li>Secure Solr </li><li>Integrate a host of technologies with Solr from the server side to client-side JavaScript, to frameworks like Drupal </li><li>Scale Solr using replication, distributed searches, and tuning</li></ul></blockquote>Approach <blockquote>The book takes a tutorial approach with fully working examples. It will show you how to implement a Solr-based search engine on your intranet or web site.</blockquote>Who this book is written for <blockquote>This book is for developers who would like to use Solr for their applications. You only need to have basic programming skills to use Solr. Knowledge of Lucene is certainly a bonus.</blockquote><hr>Author(s) <blockquote><b>David Smiley</b> <p>Born to code, David Smiley is a senior software developer with 10 years of experience in the defense industry using Java and various web technologies. David is a strong believer in the open-source development model and has made small contributions to various projects over the years.</p><p>David began using Lucene way back in 2000 and was immediately excited by it and its future potential. Later on he went to use the Lucene-based "Compass" library to construct a very basic search server similar in spirit to Solr. Since then, David has used Solr for a larger search project and was able to contribute modifications back to the Solr community. Although preferring open-source solutions, David has also been trained on the commercial Endeca search platform and is currently using that product as well as Solr for a different project.</p><p><b>Eric Pugh</b></p><p>Fascinated by the "craft" of software development, Eric Pugh has been heavily involved in the open source world as a developer, committer, and user for the past 5 years. He is a member of the Apache Software Foundation and lately has been mulling over how we move from the read/write web to the read/write/share web.</p><p>In biotech, financial services, and defense IT, he has helped European and American companies develop coherent strategies for embracing open source software. As a speaker he has advocated the advantages of Agile practices in software development.</p><p>Eric became involved in Solr when he submitted the patch SOLR-284 for Parsing Rich Document types such as PDF and MS Office formats that became the single most popular patch as measured by votes! SOLR-284 became part of Solr version 1.4.</p></blockquote><h4>&nbsp;</h4><h4>Table of Contents</h4><p><a name="chapter_0">Preface</a> <br><a name="chapter_1">Chapter 1: Quick Starting Solr</a></p><ul><li>An introduction to Solr <ul><li>Lucene, the underlying engine </li><li>Solr, the Server-ization of Lucene</li></ul></li></ul><ul><li>Comparison to database technology </li><li>Getting started <ul><li>The last official release or fresh code from source control </li><li>Testing and building Solr </li><li>Solr's installation directory structure </li><li>Solr's home directory </li><li>How Solr finds its home </li><li>Deploying and running Solr</li></ul></li></ul><ul><li>A quick tour of Solr! <ul><li>Loading sample data </li><li>A simple query </li><li>Some statistics</li></ul></li></ul><ul><li>The schema and configuration files </li><li>Solr resources outside this book </li><li>Summary</li></ul><ul></ul><p><a name="chapter_2">Chapter 2: Schema and Text Analysis</a></p><ul><li>MusicBrainz.org </li><li>One combined index or multiple indices <ul><li>Problems with using a single combined index</li></ul></li></ul><ul><li>Schema design <ul><li>Step 1: Determine which searches are going to be powered by Solr </li><li>Step 2: Determine the entities returned from each search </li><li>Step 3: Denormalize related data <ul><li>Denormalizing—"one-to-one" associated data </li><li>Denormalizing—"one-to-many" associated data</li></ul></li></ul><ul><li>Step 4: (Optional) Omit the inclusion of fields only used in search results</li></ul></li></ul><ul><li>The schema.xml file <ul><li>Field types </li><li>Field options </li><li>Field definitions <ul><li>Sorting </li><li>Dynamic fields </li><li>Using copyField </li><li>Remaining schema.xml settings</li></ul></li></ul></li></ul><ul><li>Text analysis <ul><li>Configuration </li><li>Experimenting with text analysis </li><li>Tokenization </li><li>WorkDelimiterFilterFactory </li><li>Stemming </li><li>Synonyms <ul><li>Index-time versus Query-time, and to expand or not</li></ul></li></ul><ul><li>Stop words </li><li>Phonetic sounds-like analysis </li><li>Partial/Substring indexing <ul><li>N-gramming costs</li></ul></li></ul><ul><li>Miscellaneous analyzers</li></ul></li></ul><ul><li>Summary</li></ul><ul></ul><p><a name="chapter_3">Chapter 3: Indexing Data</a></p><ul><li>Communicating with Solr <ul><li>Direct HTTP or a convenient client API </li><li>Data streamed remotely or from Solr's filesystem </li><li>Data formats</li></ul></li></ul><ul><li>Using curl to interact with Solr </li><li>Remote streaming </li><li>Sending XML to Solr <ul><li>Deleting documents </li><li>Commit, optimize, and rollback</li></ul></li></ul><ul><li>Sending CSV to Solr <ul><li>Configuration options</li></ul></li></ul><ul><li>Direct database and XML import <ul><li>Getting started with DIH <ul><li>The DIH development console </li><li>DIH documents, entities </li><li>DIH fields and transformers</li></ul></li></ul><ul><li>Importing with DIH</li></ul></li></ul><ul><li>Indexing documents with Solr Cell <ul><li>Extracting binary content </li><li>Configuring Solr </li><li>Extracting karaoke lyrics </li><li>Indexing richer documents</li></ul></li></ul><ul><li>Summary</li></ul><ul></ul><p><a name="chapter_4">Chapter 4: Basic Searching</a></p><ul><li>Your first search, a walk-through </li><li>Solr's generic XML structured data representation </li><li>Solr's XML response format <ul><li>Parsing the URL</li></ul></li></ul><ul><li>Query parameters <ul><li>Parameters affecting the query </li><li>Result paging </li><li>Output related parameters </li><li>Diagnostic query parameters</li></ul></li></ul><ul><li>Query syntax <ul><li>Matching all the documents </li><li>Mandatory, prohibited, and optional clauses <ul><li>Boolean operators</li></ul></li></ul><ul><li>Sub-expressions (aka sub-queries) <ul><li>Limitations of prohibited clauses in sub-expressions</li></ul></li></ul><ul><li>Field qualifier </li><li>Phrase queries and term proximity </li><li>Wildcard queries <ul><li>Fuzzy queries</li></ul></li></ul><ul><li>Range queries <ul><li>Date math</li></ul></li></ul><ul><li>Score boosting </li><li>Existence (and non-existence) queries </li><li>Escaping special characters</li></ul></li></ul><ul><li>Filtering </li><li>Sorting </li><li>Request handlers </li><li>Scoring <ul><li>Query-time and index-time boosting </li><li>Troubleshooting scoring</li></ul></li></ul><ul><li>Summary</li></ul><ul></ul><p><a name="chapter_5">Chapter 5: Enhanced Searching</a></p><ul><li>Function queries <ul><li>An example: Scores influenced by a lookupcount </li><li>Field references </li><li>Function reference <ul><li>Mathematical primitives </li><li>Miscellaneous math </li><li>ord and rord</li></ul></li></ul><ul><li>An example with scale() and lookupcount <ul><li>Using logarithms </li><li>Using inverse reciprocals </li><li>Using reciprocals and rord with dates</li></ul></li></ul><ul><li>Function query tips</li></ul></li></ul><ul><li>Dismax Solr request handler <ul><li>Lucene's DisjunctionMaxQuery <ul><li>Configuring queried fields and boosts</li></ul></li></ul><ul><li>Limited query syntax </li><li>Boosting: Automatic phrase boosting <ul><li>Configuring automatic phrase boosting </li><li>Phrase slop configuration</li></ul></li></ul><ul><li>Boosting: Boost queries </li><li>Boosting: Boost functions </li><li>Min-should-match <ul><li>Basic rules </li><li>Multiple rules </li><li>What to choose</li></ul></li></ul><ul><li>A default search</li></ul></li></ul><ul><li>Faceting <ul><li>A quick example: Faceting release types <ul><li>MusicBrainz schema changes</li></ul></li></ul><ul><li>Field requirements </li><li>Types of faceting </li><li>Faceting text </li><li>Alphabetic range bucketing (A-C, D-F, and so on) </li><li>Faceting dates <ul><li>Date facet parameters</li></ul></li></ul><ul><li>Faceting on arbitrary queries </li><li>Excluding filters <ul><li>The solution: Local Params</li></ul></li></ul><ul><li>Facet prefixing (term suggest)</li></ul></li></ul><ul><li>Summary</li></ul><ul></ul><p><a name="chapter_6">Chapter 6: Search Components</a></p><ul><li>About components </li><li>The highlighting component <ul><li>A highlighting example </li><li>Highlighting configuration</li></ul></li></ul><ul><li>Query elevation <ul><li>Configuration</li></ul></li></ul><ul><li>Spell checking <ul><li>Schema configuration </li><li>Configuration in solrconfig.xml <ul><li>Configuring spellcheckers (dictionaries) </li><li>Processing of the q parameter </li><li>Processing of the spellcheck.q parameter</li></ul></li></ul><ul><li>Building the dictionary from its source </li><li>Issuing spellcheck requests </li><li>Example usage for a mispelled query <ul><li>An alternative approach</li></ul></li></ul></li></ul><ul><li>The more-like-this search component <ul><li>Configuration parameters <ul><li>Parameters specific to the MLT search component </li><li>Parameters specific to the MLT request handler </li><li>Common MLT parameters</li></ul></li></ul><ul><li>MLT results example</li></ul></li></ul><ul><li>Stats component <ul><li>Configuring the stats component </li><li>Statistics on track durations</li></ul></li></ul><ul><li>Field collapsing <ul><li>Configuring field collapsing</li></ul></li></ul><ul><li>Other components <ul><li>Terms component </li><li>termVector component </li><li>LocalSolr component</li></ul></li></ul><ul><li>Summary</li></ul><ul></ul><p><a name="chapter_7">Chapter 7: Deployment</a></p><ul><li>Implementation methodology <ul><li>Questions to ask</li></ul></li></ul><ul><li>Installing into a Servlet container <ul><li>Differences between Servlet containers <ul><li>Defining solr.home property</li></ul></li></ul></li></ul><ul><li>Logging <ul><li>HTTP server request access logs </li><li>Solr application logging <ul><li>Configuring logging output </li><li>Logging to Log4j </li><li>Jetty startup integration </li><li>Managing log levels at runtime</li></ul></li></ul></li></ul><ul><li>A SearchHandler per search interface </li><li>Solr cores <ul><li>Configuring solr.xml </li><li>Managing cores </li><li>Why use multicore</li></ul></li></ul><ul><li>JMX <ul><li>Starting Solr with JMX <ul><li>Take a walk on the wild side! Use JRuby to extract JMX information</li></ul></li></ul></li></ul><ul><li>Securing Solr <ul><li>Limiting server access <ul><li>Controlling JMX access</li></ul></li></ul><ul><li>Securing index data <ul><li>Controlling document access </li><li>Other things to look at</li></ul></li></ul></li></ul><ul><li>Summary</li></ul><ul></ul><p><a name="chapter_8">Chapter 8: Integrating Solr</a></p><ul><li>Structure of included examples <ul><li>Inventory of examples</li></ul></li></ul><ul><li>SolrJ: Simple Java interface <ul><li>Using Heritrix to download artist pages </li><li>Indexing HTML in Solr </li><li>SolrJ client API <ul><li>Indexing POJOs</li></ul></li></ul><ul><li>When should I use Embedded Solr <ul><li>In-Process streaming </li><li>Rich clients </li><li>Upgrading from legacy Lucene</li></ul></li></ul></li></ul><ul><li>Using JavaScript to integrate Solr <ul><li>Wait, what about security? </li><li>Building a Solr powered artists autocomplete widget with jQuery and JSONP </li><li>SolrJS: JavaScript interface to Solr</li></ul></li></ul><ul><li>Accessing Solr from PHP applications <ul><li>solr-php-client </li><li>Drupal options <ul><li>Apache Solr Search integration module </li><li>Hosted Solr by Acquia</li></ul></li></ul></li></ul><ul><li>Ruby on Rails integrations <ul><li>acts_as_solr <ul><li>Setting up MyFaves project </li><li>Populating MyFaves relational database from Solr </li><li>Build Solr indexes from relational database </li><li>Complete MyFaves web site</li></ul></li></ul><ul><li>Blacklight OPAC <ul><li>Indexing MusicBrainz data</li></ul></li></ul><ul><li>Customizing display </li><li>solr-ruby versus rsolr</li></ul></li></ul><ul><li>Summary</li></ul><ul></ul><p><a name="chapter_9">Chapter 9: Scaling Solr</a></p><ul><li>Tuning complex systems <ul><li>Using Amazon EC2 to practice tuning <ul><li>Firing up Solr on Amazon EC2</li></ul></li></ul></li></ul><ul><li>Optimizing a single Solr server (Scale High) <ul><li>JVM configuration </li><li>HTTP caching </li><li>Solr caching <ul><li>Tuning caches</li></ul></li></ul><ul><li>Schema design considerations </li><li>Indexing strategies <ul><li>Disable unique document checking </li><li>Commit/optimize factors</li></ul></li></ul><ul><li>Enhancing faceting performance </li><li>Using term vectors </li><li>Improving phrase search performance <ul><li>The solution: Shingling</li></ul></li></ul></li></ul><ul><li>Moving to multiple Solr servers (Scale Wide) <ul><li>Script versus Java replication </li><li>Starting multiple Solr servers <ul><li>Configuring replication</li></ul></li></ul><ul><li>Distributing searches across slaves <ul><li>Indexing into the master server </li><li>Configuring slaves</li></ul></li></ul><ul><li>Distributing search queries across slaves </li><li>Sharding indexes <ul><li>Assigning documents to shards </li><li>Searching across shards</li></ul></li></ul></li></ul><ul><li>Combining replication and sharding (Scale Deep) </li><li>Summary</li></ul>			 ]]> 
		</description>
		<category>SOLR</category>

		<comments>http://esconsult.egloos.com/1693297#comments</comments>
		<pubDate>Fri, 23 Oct 2009 01:18:43 GMT</pubDate>
		<dc:creator>슈퍼맨</dc:creator>
	</item>
	<item>
		<title><![CDATA[ 세미나 및 학회 소식 ]]> </title>
		<link>http://esconsult.egloos.com/1623732</link>
		<guid>http://esconsult.egloos.com/1623732</guid>
		<description>
			<![CDATA[ 
  <p>매년 정기적으로 참석하였던 세미나와 학회가 가을을 맞이하여 개최된다.</p><p>1. <a href="http://www.sek.co.kr/kmfall2009/greeting.asp" target="_blank">KM &amp; ECM Conference Fall 2009</a> <br>주&nbsp;&nbsp;&nbsp; 제 <br>경제에 활기를! 기업에 비전을!</p><p>일&nbsp;&nbsp;&nbsp; 시 <br>2009년 9월 24일(목) 9:00 ~ 17:30 </p><p><strong>장&nbsp;&nbsp;&nbsp; 소 <br></strong>삼성동 코엑스 그랜드볼룸(봉은사 맞은편)</p><p>행사목적 <br>- KMS, ECM, EP/EIP, EDMS, Search Engine 관련 시장 활성화 및 성공사례 전파 <br>- 정부, 공공기관, 금융기관 및 각 기업체의 생산성 향상과 경쟁력 강화 <br>- 국내외 KMS/ECM 관련 솔루션 최근 동향 및 신기술 관련 최신 정보 제공 </p><p>행사규모 <br>2트랙 10세션, 전시 15부스 내외, 참관객 500여명</p><p>행사내용 <br>- 세미나 : KMS/EDMS/EP/ECM/자료관/기록물관리 관련 사례 및 신기술 발표 <br>- 전시회 : KMS/EDMS/EP/ECM/자료관/기록물관리 관련 제품 전시 </p><p>참석대상 <br>- 정부/공공/금융 및 일반기업체 전산 부서장 및 담당자 <br>- 각 기업 CEO/CIO/기획실/혁신팀/지식경영 부서장 및 담당자 <br>- 경영혁신의 일환으로 지식경영을 도입하였거나 계획이 있는 <br>&nbsp; 정부/공공기관 및 기업체 관계자 <br>- KMS/ECM을 사용 중이거나 계획이 있는 정부/공공기관 및 기업 관계자 </p><p>주&nbsp;&nbsp;&nbsp; 최 <br>전자신문사, 한국소프트웨어산업협회 </p><p>주&nbsp;&nbsp;&nbsp; 관 <br>KM&amp;ECM협의회 </p><p><strong>참 가 사</strong> <br>가온아이, 굿센테크날러지, 나눔기술, 날리지큐브, 사이버다임, 삼성SDS, 솔트룩스, 아이디에스앤트러스트, 온더아이티, 와이즈소프트 </p><p>2. <a href="http://duan.chonbuk.ac.kr/~hclt2009/index.html" target="_blank">[HCLT2009] 한글 및 한국어 정보처리 학술대회</a>&nbsp; <br><strong>10월 9일 (장소 : 대전 유성 레전드호텔) </strong></p><p>12:30 - 14:00 등록인지과학회 회장 환영사 <br>13:30 - 14:00 한국정보과학회, 언어공학연구회 위원장 인사말, 조직위원장 보고 <br>14:00 - 14:40 초청강연1 <br>14:40 - 15:20 초청강연2 <br>15:20 - 15:40 휴식 <br>15:40 - 17:00 구두발표 세션1(15분 발표 5분 질의응답 ) 4편 <br>17:00 - 17:10 휴식 <br>17:10 - 18:30 구두발표 세션2(15분 발표 5분 질의응답) 4편 <br>18:30 - 20:00 만찬</p><p><strong>10월 10일 (장소 : 대전 유성 레전드호텔)</strong></p><p>09:00 - 10:20 구두발표 세션3(15분 발표 5분 질의응답) 4편 <br>10:20 - 11:10 포스터 세션 <br>11:10 - 12:30 구두발표 세션4(15분 발표 5분 질의응답) 4편 <br>12:30 - 13:30 점심식사</p><p>------------ <br>KM&amp;ECM행사는 솔트룩스를 제외한 검색관련 업체가 빠져서 완전히 KM&amp;ECM전문업체 중심으로 진행이 될 것 같으며, <br>행사비(사전등록 : 55,000원)가 있어서 초대장이 있지 않으면 참석하는 것이 조금 어려울 것 같다. <br><br>HCLT2009행사는 매년 열리는 학회이나 예전에 비해 규모면에서 점점 줄어가는 느낌은 어쩔 수 없다. <br>공과계열이 천대받는다는 느낌은 여기서도 느낄 수 있는 듯... </p><br/><br/>tag : <a href="/tag/학회" rel="tag">학회</a>,&nbsp;<a href="/tag/컨퍼런스" rel="tag">컨퍼런스</a>			 ]]> 
		</description>
		<category>기술</category>
		<category>학회</category>
		<category>컨퍼런스</category>

		<comments>http://esconsult.egloos.com/1623732#comments</comments>
		<pubDate>Tue, 15 Sep 2009 04:20:04 GMT</pubDate>
		<dc:creator>슈퍼맨</dc:creator>
	</item>
	<item>
		<title><![CDATA[ STS2009 참관 후기 ]]> </title>
		<link>http://esconsult.egloos.com/1618460</link>
		<guid>http://esconsult.egloos.com/1618460</guid>
		<description>
			<![CDATA[ 
  <p>어제 STS2009행사를 참가하였는데 요즘에 할일이 많이 지면서 이제야 후기를 남기게 되었다.</p><p>1. 전체느낌 </p><p>- 행사 자체는 비교적 깔끔하게 진행이 되었다. <br>- <a href="http://itnews.inews24.com/php/news_view.php?g_menu=020200&amp;g_serial=441517" target="_blank">신문기사</a>에는 일천명이 넘는 참관객이 왔다고 하였으나, 작년에 비해 10%~15% 정도 줄은 것 같다. –&gt; 대략적인 느낌임 <br>- 주관사, 참여사부스 간에 모여드는 사람의 수가 확연하게 차이가 났다 –&gt; 이유는 잘 모르겠음 <br>- 쓰리웨어가 처음에 행사하다고 할때는 참관사로 있었는데 실제로는 빠졌다.&nbsp; --&gt; 오픈베이스는 이미 빠졌었음 <br>- 예상한대로 너무 대놓고 자사의 제품을 홍보하는 발표가 몇몇 눈이 띄었다. –&gt; 특히 참관사.. <br>- 대략적으로 <br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 시맨틱 기술&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; : 3편 <br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 개인화&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; : 1편 <br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 검색UX와 기술의 조합&nbsp; : 1편 <br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 텍스트 마이닝&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; : 1편 <br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 검색품질 기술&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; : 1편 <br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 멀티미디어 검색&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; : 1편&nbsp; <br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 걍~ 제품홍보&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; : 3편&nbsp; --&gt; 기술적으로 새로운 시도/개념의 제품도 아니었다.</p><p>2. 세션별 느낌</p><ul><li>Keynote – 검색의 미래, 디스커버리 Semantics for Searching and Mining – What, Where, and How? - 맹성현 교수/KAIST&nbsp; <br>&nbsp;&nbsp;&nbsp; 참석하였음. <br>&nbsp;&nbsp;&nbsp; 세미나의 슬로건(“디스커버리! 검색의 미래를 만나다”)에 가장 적합한 내용이 아니었을까? <br>&nbsp;&nbsp;&nbsp; semantics기술을 3가지 타입으로 나누고 각각에 대한 설명과 예를 보여주었다. <br></li><li>시맨틱 검색과 시맨틱 분석의 실용 기술 구현 – 이경일 대표 / 솔트룩스 <br>&nbsp;&nbsp;&nbsp; 참석하지 않음 <br>&nbsp;&nbsp;&nbsp; 솔트룩스의 전반적인 기술적 어프로치를 설명 <br>&nbsp;&nbsp;&nbsp; 자료상으로는 TopicRank를 이끌어 내는 과정과 그후의 내용은 약간 괴리감이 존재한다는 느낌 <br>&nbsp;&nbsp;&nbsp; TopicRank를 이용해서 얼마나 검색품질을 끌어올릴 수 있었는지에 대한 통계데이타가 있었으면 더 좋지 않았을까? <br>&nbsp;&nbsp;&nbsp; 참고로 솔트룩스 부스에서 <a href="http://www.owlim.com/" target="_blank">아울림</a>이 보이지 않았던 것은 약간 의외 <br></li><li>사용자 모델링을 이용한 검색서비스 구축전략 – 정휘웅 차장 / 다이퀘스트 <br>&nbsp;&nbsp;&nbsp; 참석하지 않음 <br>&nbsp;&nbsp;&nbsp; 쇼핑몰 강자답게 쇼핑몰을 예제로 검색UX 및 검색기술의 융화방법론을 설명 <br>&nbsp;&nbsp;&nbsp; 검색기술에 관련된 내용은 기존 업체들일 이미 몇년전부터 이야기 하던 것들이라 신선하지는 않았음 <br>&nbsp;&nbsp;&nbsp; 대신 검색UX와 관련된 접근은 다이퀘스트가 아니면 고민하기 힘든 부분이어서 신선했음 <br></li><li>감성검색과 텍스트 마이닝 – 이상주 박사 / 다음소프트 <br>&nbsp;&nbsp;&nbsp; 참석하였음 <br>&nbsp;&nbsp;&nbsp; 감성(Sentiment)에 대한 검색방법을 설명 <br>&nbsp;&nbsp;&nbsp; 예전부터 온라인 풍문에 관한 프로젝트를 많이 하여서인지 데모 프로그램의 전체적인 짜임새는 거의 완벽 <br>&nbsp;&nbsp;&nbsp; 기술적인 부분 이외에 매직(????)을 많이 사용하였다는 생각은 지울 수가 없음 <br>&nbsp;&nbsp;&nbsp; 하지만, 매직도 기술이고 노력이니 인정... <br>&nbsp;&nbsp;&nbsp; </li><li>프로파일링을 이용한 검색의 개인화 – 윤진섭 부장 / 마이크로소프트-패스트 <br>&nbsp;&nbsp;&nbsp; 참석하지 않음 <br>&nbsp;&nbsp;&nbsp; 4~5년전에 Verity, 작년에 오픈베이스가 내놓은 개인화검색과 같은 기술에 대한 설명 <br>&nbsp;&nbsp;&nbsp; 이미 개인화에 대한 시장의 반응은 나온것 같은데... <br></li><li>멀티미디어 환경에서 구현한 차세대 내용 기반 검색 – 박만수 박사 / 코난테크놀로지 <br>&nbsp;&nbsp;&nbsp; 참석하지 않음 <br>&nbsp;&nbsp;&nbsp; 코난테크놀로지에서 몇년전부터 밀고 있는 멀티미디어 검색에 대한 설명 <br>&nbsp;&nbsp;&nbsp; 자료상으로는 기술이 쏙~ 빠지고 주로 사례를 중심으로 설명하여 약간 김이 샌듯한 느낌 <br></li><li>Proximity Language Model – 윤여걸 이사 / 와이즈넛 <br>&nbsp;&nbsp;&nbsp; 참석하였음 <br>&nbsp;&nbsp;&nbsp; <a href="http://portal.acm.org/citation.cfm?id=1571993&amp;dl=GUIDE&amp;coll=GUIDE&amp;CFID=52158163&amp;CFTOKEN=16326121" target="_blank">SIGIR ‘09</a>에 발표하였던 검색품질(검색랭킹)을 개선하는 방법에 대한 <a href="https://portal.acm.org/poplogin.cfm?dl=GUIDE&amp;coll=GUIDE&amp;comp_id=1571993&amp;want_href=delivery%2Ecfm%3Fid%3D1571993%26type%3Dpdf%26CFID%3D52158163%26CFTOKEN%3D16326121&amp;CFID=52158163&amp;CFTOKEN=16326121" target="_blank">논문</a>을 설명 <br>&nbsp;&nbsp;&nbsp; 문서내에서 쿼리단어가 얼마나 가깝게, 자주 나타나는지를 계산하여 검색랭킹에 반영하는 방법 <br>&nbsp;&nbsp;&nbsp; 세미나를 시작하기전엔 사람들이 확~ 빠져나가서 반정도 밖에 없었음 –&gt; 역시 학술전문 세미나는 관심이 떨어짐 <br>&nbsp;&nbsp;&nbsp; </li><li>Enterprise Social Network을 활용한 Knowledge Discovery – 김승연 차장 / IBM <br>&nbsp;&nbsp;&nbsp; 참석하지 않음 <br>&nbsp;&nbsp;&nbsp; IBM의 <a href="http://www-01.ibm.com/software/lotus/products/connections/" target="_blank">SNF(Social Network Framework)제품</a>을 소개 <br>&nbsp;&nbsp;&nbsp; 단순하게 이야기하면 기관(혹은 회사)내에서 구성원들간의 Action(댓글달기, 메일보내기, 회의참석 등)을 분석하여 <br>&nbsp;&nbsp;&nbsp; 사람들간의 관계 및 어떤 분야의 전문가를 찾아주는 시스템 <br>&nbsp;&nbsp;&nbsp; 간단하게 무슨 효과가 있을지를 생각할 수도 있는데, 규모가 아주 큰 회사에서는 적당한 사람을 찾는 것 또한 굉장한 리소스가 들게 마련이다. <br>&nbsp;&nbsp;&nbsp; 부스에서 간단하게 시연을 구경하였는데 그다지 감흥은 없었다는... <br>&nbsp;&nbsp;&nbsp; </li><li>아웃링크 버티컬통합검색 개발에 사용된 차세대 검색기술 소개 – 류홍진 대표이사 / 건지소프트 <br>&nbsp;&nbsp;&nbsp; 참석하지 않음 <br>&nbsp;&nbsp;&nbsp; 노 코맨트 <br></li><li>시맨틱 정보서비스의 진화 – 정한민박사 / KISTI <br>&nbsp;&nbsp;&nbsp; 참석하였음 <br>&nbsp;&nbsp;&nbsp; 온톨로지와 미래의 인터넷에 대한 설명 <br>&nbsp;&nbsp;&nbsp; 개인적으로 Web 3.0을 논하는걸 좋아하지 않지만, Mashup에 관련된 부분은 공감이 감 <br>&nbsp;&nbsp;&nbsp; 참고로 <a href="http://ontoframe.kr/2008/2008_new/main.jsp" target="_blank">Ontoframe</a>에 대한 설명를 다이퀘스트 부스에서 하고 있었던 것 같음 <br></li><li>강력한 문서필터엔진 오라클 Outside-in – 장경운 부장 / Oracle <br>&nbsp;&nbsp;&nbsp; 참석하였음 <br>&nbsp;&nbsp;&nbsp; 오라클에서 출시한 문서필터 제품 Outside-in이란 제품 홍보 <br>&nbsp;&nbsp;&nbsp; <a href="http://www.oracle.com/technology/products/content-management/oit/ds_oitFiles.pdf" target="_blank">한글파일까지 필터링되는것</a>은 와우~&nbsp; <br>&nbsp;&nbsp;&nbsp; 하지만, 삼성그룹에서 사용하는 훈민정음, 정음글로벌 문서는 필터링이 않된다는… <br>&nbsp;&nbsp;&nbsp; 가격이 <a href="http://www.synap.co.kr/" target="_blank">사이냅</a>보다 매력적일까? </li></ul><br/><br/>tag : <a href="/tag/STS2009" rel="tag">STS2009</a>,&nbsp;<a href="/tag/검색엔진" rel="tag">검색엔진</a>			 ]]> 
		</description>
		<category>기업검색</category>
		<category>STS2009</category>
		<category>검색엔진</category>

		<comments>http://esconsult.egloos.com/1618460#comments</comments>
		<pubDate>Thu, 10 Sep 2009 02:55:03 GMT</pubDate>
		<dc:creator>슈퍼맨</dc:creator>
	</item>
	<item>
		<title><![CDATA[ Search Technology Summit 2009개최 ]]> </title>
		<link>http://esconsult.egloos.com/1595035</link>
		<guid>http://esconsult.egloos.com/1595035</guid>
		<description>
			<![CDATA[ 
  <p>작년 1회에 이어 두번째로 <a href="http://www.stskorea.org/" target="_blank">Search Technology Summit 2009</a>가 2009년 9월 8일 그랜드인터컨티넨탈호텔에서 열린다. <br><br>하지만, 작년에 비해 몇가지 바뀐 부분이 있다.</p><ol><li>주관사가 생겼다. <br>&nbsp;&nbsp;&nbsp; 아무래도 업무진행을 원활하게 하기 위해서 일 것이다.</li><li>주최사에서 오픈베이스가 빠지고 다음소프트가 들어갔다. <br>&nbsp;&nbsp;&nbsp; 쓰리소프트가 빠진것은 알겠는데… 오픈베이스도 빠졌다. <br>&nbsp;&nbsp;&nbsp; 예상컨데 비용때문은 아니 것 같고 다른 문제가 있으리라…</li><li>주최사 이외에 참여사가 생다. <br>&nbsp;&nbsp;&nbsp; 컨퍼런스 자체가 무료로 진행되기 때문에 주최사쪽에서는 아무래도 비용이 문제가 된다. <br>&nbsp;&nbsp;&nbsp; 때문에, 참여사를 늘려 비용을 낮추려 했으리라…</li><li>교수님들 발표비중이 확~ 줄고 대신 참여사 발표가 생겼다. <br>&nbsp;&nbsp;&nbsp; 이 부분에서 작년의 취지와는 약간 달라진 듯한 느낌이 드는건 어쩔 수 없다. <br>&nbsp;&nbsp;&nbsp; 물론 작년에도 교수님들 초빙을 업체에서 했기 때문에 업체의 입김(?)에 완전히 자유롭지는 못하였지만 <br>&nbsp;&nbsp;&nbsp; 올해는 아예 대 놓고 회사/제품 홍보(?)하겠다는 것이다. <br>&nbsp;&nbsp;&nbsp; 때문에, 올해는 대학생들 보다는 실제 업무에 적용해야 하는 직장인들이 더 유용한 컨퍼런스가 될 듯 하다. <br><br>뭐… 어째튼 나는 가능하면 참석할 예정이다. <br>물론, 가서 만나는 사람은 별로 없겠지만 어떤 제품과 기능들을 홍보하는지 알아보는것도 큰 공부가 되니까… </li></ol><table cellspacing="1" cellpadding="0" width="650" align="center" bgcolor="#cccccc" border="0"><tbody><tr><td bgcolor="#ffffff"><table id="Table_01" height="2500" cellspacing="0" cellpadding="0" width="650" border="0"><tbody><tr><td><img height="90" alt="" src="http://www.stskorea.org/edm/2nd_01.jpg" width="650" usemap="#Map" border="0"></td></tr><tr><td><img height="264" alt="" src="http://www.stskorea.org/edm/2nd_02.jpg" width="650"></td></tr><tr><td><img height="353" alt="" src="http://www.stskorea.org/edm/2nd_03.gif" width="650"></td></tr><tr><td><img height="739" alt="" src="http://www.stskorea.org/edm/2nd_04.jpg" width="650" usemap="#Map2" border="0"></td></tr><tr><td><img height="59" alt="" src="http://www.stskorea.org/edm/2nd_05.jpg" width="650" usemap="#Map3" border="0"></td></tr><tr><td><a href="http://www.stskorea.org/event.asp" target="_blank"><img height="491" alt="" src="http://www.stskorea.org/edm/2nd_06.jpg" width="650" border="0"></a></td></tr><tr><td><img height="317" alt="" src="http://www.stskorea.org/edm/2nd_07.jpg" width="650"></td></tr><tr><td><a href="mailto:rgst@neodigm.com"><img height="53" alt="" src="http://www.stskorea.org/edm/2nd_08.jpg" width="650" border="0"></a></td></tr><tr><td><img height="134" alt="" src="http://www.stskorea.org/edm/2nd_09.jpg" width="650" usemap="#Map4" border="0"></td></tr></tbody></table></td></tr></tbody></table><br/><br/>tag : <a href="/tag/STS2009" rel="tag">STS2009</a>,&nbsp;<a href="/tag/STS" rel="tag">STS</a>,&nbsp;<a href="/tag/검색" rel="tag">검색</a>,&nbsp;<a href="/tag/기업검색" rel="tag">기업검색</a>			 ]]> 
		</description>
		<category>기업검색</category>
		<category>STS2009</category>
		<category>STS</category>
		<category>검색</category>
		<category>기업검색</category>

		<comments>http://esconsult.egloos.com/1595035#comments</comments>
		<pubDate>Tue, 18 Aug 2009 00:55:45 GMT</pubDate>
		<dc:creator>슈퍼맨</dc:creator>
	</item>
	<item>
		<title><![CDATA[ 기업검색시장 2강 체제 ]]> </title>
		<link>http://esconsult.egloos.com/1594374</link>
		<guid>http://esconsult.egloos.com/1594374</guid>
		<description>
			<![CDATA[ 
  <p>출처 : <a href="http://article.joins.com/article/olink.asp?aid=3427045&amp;serviceday=20090723" target="_blank">코난테크놀로지, 시나리오 경영 성과 가시화</a> <br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <a href="http://www.egloos.com/egloo/content/“일단%20한숨은%20돌렸다”%20국산%20SW업계도%20올%20상반기%20어닝%20서프라이즈" target="_blank">“일단 한숨은 돌렸다” 국산 SW업계도 올 상반기 어닝 서프라이즈</a></p><p>출처의 내용은 최악의 경기불안 요인에도 불구하고 ‘코난테크놀로지’, ‘코리아와이즈넛’은 최고의 한해를 보내고 있다는 것이다. <br>물론 기사의 내용에 나와 있듯이 내실경영, 정부의 상반기 조기발주 등의 원인도 있을 것이나 기사에 나오지 않는 다른 원인도 있을 것이리라…</p><p>작년까지 기업검색 시장은 3강체제를 유지하고 있었다. <br>하지만, 한축을 담당하던 쓰리소프트가 갑작스럽게(?) 사업을 철수하게 되어 자연스럽게 다른 두 회사가 이득을 볼 수밖에 없지 않았을까? <br>물론, 쓰리소프트의 자원을 쓰리웨어가 가져갔다고는 하나 규모나 인지도면에서 떨어질 수 밖에 없다고 할 수 밖에 있다.</p><p>예상하건데, 이런 체제(?)는 향후 몇년간은 이어질 것으로 판단되며 이를 바탕으로 사업적인 전략을 구상해야 할 것으로 판단된다.</p><br/><br/>tag : <a href="/tag/코난테크놀로지" rel="tag">코난테크놀로지</a>,&nbsp;<a href="/tag/코리아와이즈넛" rel="tag">코리아와이즈넛</a>			 ]]> 
		</description>
		<category>기업검색</category>
		<category>코난테크놀로지</category>
		<category>코리아와이즈넛</category>

		<comments>http://esconsult.egloos.com/1594374#comments</comments>
		<pubDate>Mon, 17 Aug 2009 08:58:32 GMT</pubDate>
		<dc:creator>슈퍼맨</dc:creator>
	</item>
	<item>
		<title><![CDATA[ 솔트룩스 공유문서 검색시스템 "서치박스" 공개 ]]> </title>
		<link>http://esconsult.egloos.com/1456715</link>
		<guid>http://esconsult.egloos.com/1456715</guid>
		<description>
			<![CDATA[ 
  <p>솔트룩스에서 공유문서 검색시스템 “서치박스”를 공개했다. <br>4월 27일 <a href="http://in2.saltlux.com/promotion/sb2009/index.html">솔트룩스 세미나</a>에서도 발표를 한다고 한다. <br><br>”서치박스”는 외장하드나 파일서버에 있는 공유문서들을 검색할 수 있도록 구성되었으며, 하드웨어 일체형으로 설치, 관리 또한 간편하도록 구성되었다. <br><br>간단한 기능을 동영상으로 살펴보도록 하자. <br><object id="NFPlayer39024" codebase="http://download.macromedia.com/pub/shockwave/cabs/flash/swflash.cab#version=9,0,0,0" height="408" width="500" classid="clsid:d27cdb6e-ae6d-11cf-96b8-444553540000"><param name="_cx" value="5080"><param name="_cy" value="5080"><param name="FlashVars" value=""><param name="Movie" value="http://serviceapi.nmv.naver.com/flash/NFPlayer.swf?vid=0F7CAE691DD510D1CED0757BD6F9E80411AA&amp;outKey=V1262623b0b29861f6f452a4331b1bd71f1736d8d0757110a45bc2a4331b1bd71f173"><param name="Src" value="http://serviceapi.nmv.naver.com/flash/NFPlayer.swf?vid=0F7CAE691DD510D1CED0757BD6F9E80411AA&amp;outKey=V1262623b0b29861f6f452a4331b1bd71f1736d8d0757110a45bc2a4331b1bd71f173"><param name="WMode" value="Transparent"><param name="Play" value="-1"><param name="Loop" value="-1"><param name="Quality" value="High"><param name="SAlign" value=""><param name="Menu" value="-1"><param name="Base" value=""><param name="AllowScriptAccess" value=""><param name="Scale" value="ShowAll"><param name="DeviceFont" value="0"><param name="EmbedMovie" value="0"><param name="BGColor" value=""><param name="SWRemote" value=""><param name="MovieData" value=""><param name="SeamlessTabbing" value="1"><param name="Profile" value="-1"><param name="ProfileAddress" value=""><param name="ProfilePort" value="598137694"><param name="AllowNetworking" value="all"><param name="AllowFullScreen" value="false"><embed src="http://serviceapi.nmv.naver.com/flash/NFPlayer.swf?vid=0F7CAE691DD510D1CED0757BD6F9E80411AA&amp;outKey=V1262623b0b29861f6f452a4331b1bd71f1736d8d0757110a45bc2a4331b1bd71f173" wmode="transparent" width="500" height="408" allowscriptaccess="always" name="NFPlayer39024" id="NFPlayer39024" allowfullscreen="true" type="application/x-shockwave-flash" /></object></p><p><br><object id="NFPlayer30707" codebase="http://download.macromedia.com/pub/shockwave/cabs/flash/swflash.cab#version=9,0,0,0" height="408" width="500" classid="clsid:d27cdb6e-ae6d-11cf-96b8-444553540000"><param name="_cx" value="5080"><param name="_cy" value="5080"><param name="FlashVars" value=""><param name="Movie" value="http://serviceapi.nmv.naver.com/flash/NFPlayer.swf?vid=491A2A60E5A172F8A2885508428F425CE66C&amp;outKey=V128c1e2b480b4285c737994d9e29bba102c3f6d980f4e377baca994d9e29bba102c3"><param name="Src" value="http://serviceapi.nmv.naver.com/flash/NFPlayer.swf?vid=491A2A60E5A172F8A2885508428F425CE66C&amp;outKey=V128c1e2b480b4285c737994d9e29bba102c3f6d980f4e377baca994d9e29bba102c3"><param name="WMode" value="Transparent"><param name="Play" value="-1"><param name="Loop" value="-1"><param name="Quality" value="High"><param name="SAlign" value=""><param name="Menu" value="-1"><param name="Base" value=""><param name="AllowScriptAccess" value=""><param name="Scale" value="ShowAll"><param name="DeviceFont" value="0"><param name="EmbedMovie" value="0"><param name="BGColor" value=""><param name="SWRemote" value=""><param name="MovieData" value=""><param name="SeamlessTabbing" value="1"><param name="Profile" value="0"><param name="ProfileAddress" value=""><param name="ProfilePort" value="0"><param name="AllowNetworking" value="all"><param name="AllowFullScreen" value="false"><embed src="http://serviceapi.nmv.naver.com/flash/NFPlayer.swf?vid=491A2A60E5A172F8A2885508428F425CE66C&amp;outKey=V128c1e2b480b4285c737994d9e29bba102c3f6d980f4e377baca994d9e29bba102c3" wmode="transparent" width="500" height="408" allowscriptaccess="always" name="NFPlayer30707" id="NFPlayer30707" allowfullscreen="true" type="application/x-shockwave-flash" /></object><br></p><p>솔트룩스가 잘~ 할수 있는 몇몇 기능들이 눈에 보이고 여기저기 신경을 많이 쓴 표시가 난다. <br>전체적으로는 잘 구성된 시스템으로 보인다. (물론 실제로 사용해봐야 정확하게 알 수 있겠지만…) </p>눈에 띄는 장점 <ol><li>텍스트마이닝 기능들(군집, 요약, 추출, 유사문서검색)은 장점으로 이야기 할 수 있을 것 같다. </li><li>하드웨어 일체형으로 비용, 관리, 설치를 쉽게 할 수 있도록 하고 있다. </li><li>얼마나 사용될지는 모르겠지만, 문서내의 “이미지”, “그래프”, “표”등의 오브젝트를 검색할 수 있는 기능이 있다. </li><li>문서이력, 버전관리 기능이 있다. </li></ol><p>몇가지 생각할 점…… </p><ol><li>아무래도 공유폴더 검색의 핵심은 ACL(문서접근권한)이 될 것 같다. <br>하지만, “서치박스”의 기능 설명으로 보면 “문서다운로드시” 사용자권한을 체크한다고 되어 있는데… <br>이는 검색시에는 권한이 없는 사용자일지라도 모든 문서를 검색할 수 있다는 얘기가 된다. <br>이는, 보안에는 좀 치명적이지 않을까? <br>기술적인 어려움은 알겠는데 어렵다고 넘길 부분은 아닌 것 같다. </li><li>일반적으로 “통합검색”을 구현할 때 “파일서버”도 같이 검색될 수 있도록 구현되지 않나? <br>따라서, “파일서버”만 별도로 검색되어지는 시스템은 아무래도 시장성이 떨어질 것 같다는… </li><li>일반 데스크탑검색 패키지로도 공유문서 검색같은 것은 구현할 수 있지 않을까? </li><li>약 2년전부터 패키지 형태로 유사제품을 판매하고 있는 K업체에 비해 시장진입이 너무 늦은 것 아닌것 아닌가? </li></ol><br/><br/>tag : <a href="/tag/솔트룩스" rel="tag">솔트룩스</a>,&nbsp;<a href="/tag/서치박스" rel="tag">서치박스</a>			 ]]> 
		</description>
		<category>기업검색</category>
		<category>솔트룩스</category>
		<category>서치박스</category>

		<comments>http://esconsult.egloos.com/1456715#comments</comments>
		<pubDate>Tue, 21 Apr 2009 07:55:30 GMT</pubDate>
		<dc:creator>슈퍼맨</dc:creator>
	</item>
	<item>
		<title><![CDATA[ 검색엔진 개발자 그룹 ]]> </title>
		<link>http://esconsult.egloos.com/1405903</link>
		<guid>http://esconsult.egloos.com/1405903</guid>
		<description>
			<![CDATA[ 
  <p>검색엔진과 관련된 사람이나, 검색엔진을 개발해 보려고 하는 학생들이 많이 찾는 사이트중에 <a href="http://irgroup.org/zbxe/" target="_blank">검색엔진 개발자 그룹(irgroup)</a>이라는 것이 있다. <br>물론, 나도 여기 회원이고 주로 많은 정보를 얻는다.</p><p>해외에도 이와 비슷한 그룹(?)이 있지 않을까?</p><p>있다.</p><p><a href="http://tech.groups.yahoo.com/group/search_dev/" target="_blank">search_dev ( Independent Search Engine Developers)</a></p><p>irgroup과 다른점은 웹사이트가 아니고 뉴스그룹이라는 점과, irgroup이 검색엔진 개발에 촛점이 맞추어져 있다면 search_dev는 주요 벤더의 검색엔진 활용에 촛점이 맞춰져 있다는 점이다.</p><p>상당히 열성적인 토론이 이루어지고 있으니 관심있으신 분들은 방문해 보시길...</p><p>관심링크 : <a href="http://tech.groups.yahoo.com/group/search_dev/message/719?threaded=1&amp;p=1" target="_blank">Japanese search in autonomy</a></p><br/><br/>tag : <a href="/tag/검색엔진" rel="tag">검색엔진</a>,&nbsp;<a href="/tag/개발자그룹" rel="tag">개발자그룹</a>,&nbsp;<a href="/tag/irgroup" rel="tag">irgroup</a>,&nbsp;<a href="/tag/search_dev" rel="tag">search_dev</a>			 ]]> 
		</description>
		<category>검색엔진</category>
		<category>개발자그룹</category>
		<category>irgroup</category>
		<category>search_dev</category>

		<comments>http://esconsult.egloos.com/1405903#comments</comments>
		<pubDate>Fri, 13 Mar 2009 08:07:33 GMT</pubDate>
		<dc:creator>슈퍼맨</dc:creator>
	</item>
	<item>
		<title><![CDATA[ 기업검색시장 단신 ]]> </title>
		<link>http://esconsult.egloos.com/1403104</link>
		<guid>http://esconsult.egloos.com/1403104</guid>
		<description>
			<![CDATA[ 
  <p><a href="http://www.wisenut.co.kr/entry/WISE-IF" target="_blank">코리아와이즈넛 새로운 지능형 정보 수집기 개발</a> <br>--&gt; 뭐가 달라진 거지???</p><p><a href="http://www.dt.co.kr/contents.html?article_no=2009011402010660600001" target="_blank">코난, SK계열 검색사업 잇단 수주</a> <br>--&gt; 예견된 일</p><p><a href="http://www.dt.co.kr/contents.html?article_no=2009031102010860600002" target="_blank">넷킬러, 기업용 검색도구 '구글 미니' 공급</a> <br>--&gt; 잘 될까?</p><p><a href="http://code.google.com/p/xappy/" target="_blank">파이썬 기반, Open Source search engine : Xappy</a> <br>--&gt; 또 하나의 Open Soruce 검색엔진의 출현이군요...</p><br/><br/>tag : <a href="/tag/기업검색" rel="tag">기업검색</a>			 ]]> 
		</description>
		<category>기업검색</category>
		<category>기업검색</category>

		<comments>http://esconsult.egloos.com/1403104#comments</comments>
		<pubDate>Wed, 11 Mar 2009 02:05:07 GMT</pubDate>
		<dc:creator>슈퍼맨</dc:creator>
	</item>
	<item>
		<title><![CDATA[ Information Access Cross-Check - 2009 ]]> </title>
		<link>http://esconsult.egloos.com/1393902</link>
		<guid>http://esconsult.egloos.com/1393902</guid>
		<description>
			<![CDATA[ 
  <p><a href="http://www.cmswatch.com/" target="_blank">CMS Watch</a>에서 <a href="http://www.cmswatch.com/Trends/1517-Search-Marketplace-Stabilizing-After-Years-of-Turbulence?source=RSS" target="_blank">발표</a>한 올해의 검색엔진 업체에 대한 평가 자료이다. <br>가트너에서 작년 가을에 <a href="http://mediaproducts.gartner.com/reprints/microsoft/vol6/article4/article4.html" target="_blank">발표</a>한 <a href="http://esconsult.egloos.com/936275" target="_blank">자료</a>와 비교해 보면 몇가지 다른점을 느낄 수 있을 것이다.<br></p><img class="image_left" border="0" onmouseover="this.style.cursor='pointer'" alt="" src="http://pds10.egloos.com/pds/200903/04/30/f0057030_49ae2030ae635.jpg" width="400" height="277.6" onclick="Control.Modal.openDialog(this, event, 'http://pds10.egloos.com/pds/200903/04/30/f0057030_49ae2030ae635.jpg');" align="left" /><p></p><br/><br/>tag : <a href="/tag/기업검색" rel="tag">기업검색</a>			 ]]> 
		</description>
		<category>기업검색</category>
		<category>기업검색</category>

		<comments>http://esconsult.egloos.com/1393902#comments</comments>
		<pubDate>Wed, 04 Mar 2009 06:29:01 GMT</pubDate>
		<dc:creator>슈퍼맨</dc:creator>
	</item>
	<item>
		<title><![CDATA[ Open Source Filter - Tika ]]> </title>
		<link>http://esconsult.egloos.com/1393875</link>
		<guid>http://esconsult.egloos.com/1393875</guid>
		<description>
			<![CDATA[ 
  <p>검색엔진에서 색인 대상으로 하는 문서는 일반적으로 TEXT문서이다. <br>따라서, MS Word, PDF 문서와 같이 binary로 되어 있는 문서는 Filter라는 모듈을 거쳐 text문서 형태로 뽑아내게 된다.</p><p>우리나라의 많은 검색엔진들은 <a href="http://www.synap.co.kr/main/main.php" target="_blank">사이냅소프트</a>의 필터 모듈을 정기계약 형식으로 사용하고 있다. <br>포탈검색이나 기업검색 모두에서 오랜시간동안 꽤 안정적인 성능을 보여주고 있다고 할 수 있다.</p><p>Open Source쪽에서도 이러한 모듈이 당연히 필요하지 않겠는가?</p><p>그래서 Apache Lucene의 sub project로 <a href="http://lucene.apache.org/tika/" target="_blank">Tika</a>가 개발되고 있으며, 현재 0.2버전까지 나온 상황이다. <br>(어찌보면) 당연하게 한글, 훈민정흠, 정음글로벌과 같은 문서 포맷은 지원 되지 않지만, Lucene이나 Solr을 사용하는 상황에서 MS계열의 문서들이나 PDF문서들만을 필터링하려고 하면 고려해 볼만 하지 않을까?</p><p>현재 <a href="http://lucene.apache.org/tika/formats.html" target="_blank">지원되는 문서포맷</a> <br>------------- <br></p><h5>Microsoft's OLE 2 Compound Document format</h5><p>A number of Microsoft applications, most notably the Microsoft Office suite, use the generic OLE 2 Compound Document format as the basis of their document formats. Tika uses <a href="http://poi.apache.org/">Apache POI</a> to support a number of these formats.</p><p>The OLE2 Compound Document format is designed for use with random access files, and so the input stream passed to a Tika parser needs to be spooled in memory or in a temporary file depending on the size of the document. See <a href="https://issues.apache.org/jira/browse/TIKA-153">TIKA-153</a> for an effort to avoid this extra temporary file if the input document already comes from a file.</p><p>In addition to the shared base format there's also a shared sets of metadata in typical OLE2 documents. Tika uses the <a href="http://poi.apache.org/hpsf/">HPSF library</a> from POI to parse these property sets and exposes them as the following document metadata:</p><ul><li><tt>TITLE</tt> Title </li><li><tt>SUBJECT</tt> Subject </li><li><tt>AUTHOR</tt> Author </li><li><tt>KEYWORDS</tt> Keywords </li><li><tt>COMMENTS</tt> Comments </li><li><tt>TEMPLATE</tt> Template </li><li><tt>LAST_SAVED</tt> Last Saved By </li><li><tt>REVISION_NUMBER</tt> Revision Number </li><li><tt>LAST_PRINTED</tt> Last Printed </li><li><tt>LAST_SAVED</tt> Last Saved Time/Date </li><li><tt>LAST_SAVED</tt> Last Saved Time/Date </li><li><tt>PAGE_COUNT</tt> Number of Pages </li><li><tt>WORD_COUNT</tt> Number of Words </li><li><tt>CHARACTER_COUNT</tt> Number of Characters </li><li><tt>APPLICATION_NAME</tt> Name of Creating Application </li></ul><p>Note that in practice the metadata in many documents is either missing, incomplete or even incorrect, so a client application should not rely too much on this information.</p><p>Support for the new Office Open XML format used by Microsoft Office version 2007 is pending for a POI upgrade. Current status is recorded in <a href="https://issues.apache.org/jira/browse/TIKA-152">TIKA-152</a> .</p><p>The generic OLE2 Compound Document format is automatically detected using a magic number, and further parsing can automatically determine the more specific document format. Tika also knows a number of common glob patterns like <tt>*.doc</tt> and <tt>*.ppt</tt> for these formats.</p><p>The supported OLE 2 Compound Document formats are:</p><dl><dt>Microsoft Excel (application/vnd.ms-excel) </dt><dd>Excel spreadsheet support is available in all versions of Tika and is based on the <a href="http://poi.apache.org/hssf/">HSSF library</a> from POI. <p>The Excel parser in Tika uses the <a href="http://poi.apache.org/hssf/how-to.html#event_api">HSSF event API</a> and is able to extract much of the document structure, including all (non-empty) worksheets and their table structures. Formula results are extracted as stored in the Excel file, and cell links are exposed as XHTML links. These features were added in Tika version 0.2.</p><p>Cell comments and formatting are currently not supported. See <a href="https://issues.apache.org/jira/browse/TIKA-148">TIKA-148</a> and <a href="https://issues.apache.org/jira/browse/TIKA-103">TIKA-103</a> for the respective issues.</p><p>See the <a href="http://lucene.apache.org/xref-test/org/apache/tika/parser/microsoft/ExcelParserTest.html">ExcelParserTest</a> test case for an example of parsing Microsoft Excel files.</p><p></p></dd><dt>Microsoft Word (application/msword) </dt><dd>Word document support is available in all versions of Tika and is based on the <a href="http://poi.apache.org/hwpf/">HWPF library</a> from POI. <p>The Word parser uses the <a href="http://poi.apache.org/apidocs/org/apache/poi/hwpf/extractor/WordExtractor.html">WordExtractor</a> class from HWPF to extract document content as a sequence of paragraphs.</p><p>See the <a href="http://lucene.apache.org/xref-test/org/apache/tika/parser/microsoft/WordParserTest.html">WordParserTest</a> test case for an example of parsing Microsoft Word files.</p><p></p></dd><dt>Microsoft PowerPoint (application/vnd.ms-powerpoint) </dt><dd>PowerPoint presentation support is available in all versions of Tika and is based on the <a href="http://poi.apache.org/hslf/">HSLF library</a> from POI. <p>The PowerPoint parser uses the <a href="http://poi.apache.org/apidocs/org/apache/poi/hslf/extractor/PowerPointExtractor.html">PowerPointExtractor</a> class from HSLF to extract spreadsheet content as a single paragraph.</p><p>See the <a href="http://lucene.apache.org/xref-test/org/apache/tika/parser/microsoft/PowerPointParserTest.html">PowerPointParserTest</a> test case for an example of parsing Microsoft PowerPoint files.</p><p></p></dd><dt>Microsoft Visio (application/vnd.visio) </dt><dd>Visio diagram support was added in Tika version 0.2 and is based on the <a href="http://poi.apache.org/hdgf/">HDGF library</a> from POI. <p>The Visio parser uses the <a href="http://poi.apache.org/apidocs/org/apache/poi/hdgf/extractor/VisioTextExtractor.html">VisioExtractor</a> class from HDGF to extract diagram content as a sequence of paragraphs.</p><p></p></dd><dt>Microsoft Outlook (application/vnd.ms-outlook) </dt><dd>Outlook message support was added in Tika version 0.2 and is based on the <a href="http://poi.apache.org/hsmf/">HSMF library</a> from POI. <p>The Outlook parser extracts the subject of the message and the From, To, Cc, and Bcc addresses (formatted for display) along with the body text of text/plain messages. The <tt>AUTHOR</tt> , <tt>TITLE</tt> and <tt>SUBJECT</tt> metadata properties are set explicitly, overriding potential generic document metadata retrieved from OLE2 property sets.</p></dd></dl><h5>Compression formats</h5><p>General purpose compression formats are used to reduce the size of any kinds of documents. Tika uses a parsing pipeline to support general purpose compression: in the first stage the compressed stream decompressed and the resulting decompressed stream is passed on to a second parsing stage where it will be processed as if the document had never been compressed.</p><p>Tika contains magic numbers and glob patterns for auto-detecting all supported compression formats. The glob patterns of compression formats are also used to determine the name of the original uncompressed document. If a client application has supplied a <tt>RESOURCE_NAME_KEY</tt> metadata property that matches such a glob pattern, then the decompressing first parsing stage will replace the <tt>RESOURCE_NAME_KEY</tt> metadata property with the deduced original document name before passing control to the second parsing stage.</p><p>Note that apart from the special handling of the <tt>RESOURCE_NAME_KEY</tt> property, no document metadata is passed to or from the second parsing stage. Only the text content extracted by the second stage parser is returned to the client application.</p><p>The supported compression formats are:</p><dl><dt>gzip compression (application/x-gzip) </dt><dd><a href="http://en.wikipedia.org/wiki/Gzip">Gzip</a> support was added in Tika version 0.2 and is based on the <a href="http://java.sun.com/j2se/1.5.0/docs/api/java/util/zip/GZIPInputStream.html">GZIPInputStream</a> class in the Java 5 class library. <p>The known gzip glob patterns are <tt>*.tgz</tt> , <tt>*.gz</tt> and <tt>*-gz</tt> , and they will respectively be replaced with <tt>*.tar</tt> , <tt>*</tt> and <tt>*</tt> as described above.</p><p></p></dd><dt>bzip2 compression (application/x-bzip) </dt><dd><a href="http://en.wikipedia.org/wiki/Bzip2">Bzip2</a> support was added in Tika version 0.2 and is based on bzip2 parsing code from <a href="http://ant.apache.org/">Apache Ant</a> , which in turn was originally based on work by Keiron Liddle from Aftex Software. <p>The known bzip2 glob patterns are <tt>*.tbz</tt> , <tt>*.tbz2</tt> , <tt>*.bz</tt> and <tt>*.bz2</tt> , and they will respectively be replaced with <tt>*.tar</tt> , <tt>*.tar</tt> , <tt>*</tt> and <tt>*</tt> as described above.</p></dd></dl><h5>Other supported formats</h5><dl><dt>Extensible Markup Language (application/xml) </dt><dd>Tika uses the <tt>javax.xml</tt> classes to parse Extensible Markup Language files. Support for Extensible Markup Language files was added in Tika 0.1. </dd><dt>HyperText Markup Language (text/html) </dt><dd>Tika uses the <a href="http://sourceforge.net/projects/nekohtml">CyberNeko</a> library to parse HyperText Markup Language files. Support for HyperText Markup Language files was added in Tika 0.1. </dd><dt>Images (image/*) </dt><dd>Tika uses the <tt>javax.imageio</tt> classes to extract Metadata from Image files. Support for Image files was added in Tika 0.2. </dd><dt>Java class files </dt><dd>The parsing of Java Class files is based on the asm library and work by Dave Brosius in JCR-1522. Support for Java Class files was added in Tika 0.2. </dd><dt>Java jar archives </dt><dd>The parsing of Java JAR archives is performed using a combination of the ZIP and Java class file parsers. Support for Java JAR archives was added in Tika 0.2. </dd><dt>MP3 Audio (audio/mp3) </dt><dd>The parsing of <a href="http://www.id3.org/ID3v1">ID3v1</a> tags from MP3 files was added in Tika version 0.2. If found the following metadata is extracted and set: <ul><li><tt>TITLE</tt> Title </li><li><tt>SUBJECT</tt> Subject </li></ul><p>The above information, as well as the <tt>Album</tt> , <tt>Track</tt> , <tt>Year</tt> , <tt>Genre</tt> and additional <tt>Comment</tt> are extracted when set in the file.</p><p></p></dd><dt>OpenDocument (application/vnd.oasis.opendocument.*) </dt><dd>TODO </dd><dt>Plain text (text/plain) </dt><dd>Tika uses the <a href="http://www.icu-project.org/">International Components for Unicode</a> Java library (ICU4J) to parse plain text. Support for plain text was added in Tika 0.1. <p>Extracting text content from plain text files is actually a relatively complex task due to the fact that the character encoding of the text file is often unknown to the parser.</p><p>The text parser in Tika uses the ICU4J <a href="http://www.icu-project.org/apiref/icu4j/com/ibm/icu/text/CharsetDetector.html">CharsetDetector</a> class to automatically detect the character encoding of any text input. As an added benefit, the ICU4J library is in some cases able to detect also the language in which the text is written.</p><p>The character encoding and language of the plain text document are returned as the <tt>Metadata.CONTENT_ENCODING</tt> and <tt>Metadata.LANGUAGE</tt> metadata properties. If the (declared) content encoding of a text document is already known to the client application, then it can be supplied as the <tt>Metadata.CONTENT_ENCODING</tt> metadata property to the parser to simplify encoding detection.</p><p></p></dd><dt>Portable Document Format (application/pdf) </dt><dd>Tika uses the <a href="http://www.pdfbox.org/">PDFBox</a> library to parse Portable Document Format (PDF) documents. Support for PDF was added in Tika 0.1. </dd><dt>Rich Text Format (application/rtf) </dt><dd>Tika uses Java's built-in Swing library to parse Rich Text Format (RTF) documents. Support for RTF was added in Tika 0.1. <p>The RTF parser in Tika uses the Swing <a href="http://java.sun.com/j2se/1.5.0/docs/api/javax/swing/text/rtf/RTFEditorKit.html">RTFEditorKit</a> class to extract all text from an RTF document as a single paragraph. Document metadata extraction is currently not supported.</p><p></p></dd><dt>tar archive (application/x-tar) </dt><dd>Tika uses an adapted version of the tar parsing code from <a href="http://ant.apache.org/">Apache Ant</a> to parse tar archives. The tar code is originally based on work by Timothy Gerard Endres. Support for tar archives was added in Tika 0.2. </dd><dt>ZIP archive (application/zip) </dt><dd>Tika uses Java's built-in Zip classes to parse ZIP files. Support for ZIP was added in Tika 0.2. </dd></dl><br/><br/>tag : <a href="/tag/Filter" rel="tag">Filter</a>,&nbsp;<a href="/tag/OpenSource" rel="tag">OpenSource</a>,&nbsp;<a href="/tag/Tika" rel="tag">Tika</a>			 ]]> 
		</description>
		<category>SOLR</category>
		<category>Filter</category>
		<category>OpenSource</category>
		<category>Tika</category>

		<comments>http://esconsult.egloos.com/1393875#comments</comments>
		<pubDate>Wed, 04 Mar 2009 06:06:36 GMT</pubDate>
		<dc:creator>슈퍼맨</dc:creator>
	</item>
</channel>
</rss>
