office2007 word 图片无法显示的问题
office2007 word 里面的图片突然不能显示了,只有一个白白的框,但是调整大小的时候能看到原图在闪烁,到网上查了一下,解决方法如下:
1.点击左上角的office 2007特有的button,进入Word选项->高级,
2.第三栏,显示文档内容,将第三项“显示图片框”前面的复选框去掉,点击确认,一切OK了。这个office 2007的default选项设置的很不合理呀,不过或许是为了提高阅读版式视图的使用率吧~~
Anchor Text-链接锚文本
Wikipedia:The anchor text or link label is the visible, clickable text in a hyperlink. The words contained in the Anchor text can determine the ranking that page will receive by search engines.
Anchor Text 非常重要。通过一个简单实验,可以深刻理解这个重要性。
在 http://www.google.com/ 中搜索 “click here” ,我们发现,排在搜索结果第一页第一位的是http://www.adobe.com/ 的网页,下面几位是 http://www.xe.com/ 、http://www.apple.com/ 、http://www.microsoft.com/ 等(这几个的PR值均为9或10,过会儿去看看)。
这几个网站的页面内都不包含 “click here” 这个关键词,那为什么他们排到前几位去了呢?
原因是:为数众多的网页以 “click here” 为链接锚文本指向这几个网站。
现在去看看吧:Google
Lucene:基于Java的全文检索引擎简介
Lucene:基于Java的全文检索引擎简介
请点击查看原文。##CONTINUE##
Lucene是一个基于Java的全文索引工具包。
基于Java的全文索引引擎Lucene简介:关于作者和Lucene的历史
全文检索的实现:Luene全文索引和数据库索引的比较
中文切分词机制简介:基于词库和自动切分词算法的比较
具体的安装和使用简介:系统结构介绍和演示
Hacking Lucene:简化的查询分析器,删除的实现,定制的排序,应用接口的扩展
从Lucene我们还可以学到什么
参考资料:
Apache: Lucene Project
http://jakarta.apache.org/lucene/
Lucene开发/用户邮件列表归档
http://www.mail-archive.com/lucene-dev@jakarta.apache.org/
http://www.mail-archive.com/lucene-user@jakarta.apache.org/
The Lucene search engine: Powerful, flexible, and free
http://www.javaworld.com/javaworld/jw-09-2000/jw-0915-Lucene_p.html
Lucene Tutorial
http://www.darksleep.com/puff/lucene/lucene.html
Notes on distributed searching with Lucene
http://home.clara.net/markharwood/lucene/
中文语言的切分词
http://www.google.com/search?sourceid=navclient&hl=zh-CN&q=chinese+word+segment
搜索引擎工具介绍
http://searchtools.com/
Lucene作者Cutting的几篇论文和专利
http://lucene.sourceforge.net/publications.html
Lucene的.NET实现:dotLucene
http://sourceforge.net/projects/dotlucene/
Lucene作者Cutting的另外一个项目:基于Java的搜索引擎Nutch
http://www.nutch.org/
http://sourceforge.net/projects/nutch/
关于基于词表和N-Gram的切分词比较
http://china.nikkeibp.co.jp/cgi-bin/china/news/int/int200302100112.html
2005-01-08 Cutting在Pisa大学做的关于Lucene的讲座:非常详细的Lucene架构解说
特别感谢:前网易CTO许良杰(Jack Xu)给我的指导:是您将我带入了搜索引擎这个行业。
Nutch介绍[转自Nutch中文网站]
Nutch 是一个开源Java 实现的搜索引擎。它提供了我们运行自己的搜索引擎所需的全部工具。可以为什么我们需要建立自己的搜索引擎呢?毕竟我们已经有google可以使用。这里我列出3点原因: ##CONTINUE##
透明度:Nutch是开放源代码的,因此任何人都可以查看他的排序算法是如何工作的。商业的搜索引擎排序算法都是保密的,我们无法知道为什么搜索出来的排序结果是如何算出来的。更进一步,一些搜索引擎允许竞价排名,比如百度,这样的索引结果并不是和站点内容相关的。因此 Nutch 对学术搜索和政府类站点的搜索来说,是个好选择。因为一个公平的排序结果是非常重要的。
对搜索引擎的理解:我们并没有google的源代码,因此学习搜索引擎Nutch是个不错的选择。了解一个大型分布式的搜索引擎如何工作是一件让人很受益的事情。在写Nutch的过程中,从学院派和工业派借鉴了很多知识:比如:Nutch的核心部分目前已经被重新用 Map Reduce 实现了。看过开复演讲的人都知道 Map Reduce 的一点知识吧。Map Reduce 是一个分布式的处理模型,最先是从 Google 实验室提出来的。你也可以从下面获得更多的消息。
http://www.domolo.com/bbs/list.asp?boardid=29
http://domolo.oicp.net/bbs/list.asp?boardid=29
并且 Nutch 也吸引了很多研究者,他们非常乐于尝试新的搜索算法,因为对Nutch 来说,这是非常容易实现扩展的。
扩展性:你是不是不喜欢其他的搜索引擎展现结果的方式呢?那就用 Nutch 写你自己的搜索引擎吧。 Nutch 是非常灵活的:他可以被很好的客户订制并集成到你的应用程序中:使用Nutch 的插件机制,Nutch 可以作为一个搜索不同信息载体的搜索平台。当然,最简单的就是集成Nutch到你的站点,为你的用户提供搜索服务。
Nutch 的安装分为3个层次:基于本地文件系统,基于局域网,或者基于 internet 。不同的安装方式具有不同的特色。比如:索引一个本地文件系统相对于其他两个来说肯定是要稳定多了,因为没有 网络错误也不同缓存文件的拷贝。基于Internet 的搜索又是另一个极端:抓取数以千计的网页有很多技术问题需要解决:我们从哪些页面开始抓取?我们如何分配抓取工作?何时需要重新抓取?我们如何解决失效的链接,没有响应的站点和重复的内容?还有如何解决对大型数据的上百个并发访问?搭建这样一个搜索引擎是一笔不小的投资呀!在 ” Building Nutch: Open Source Search,” 的作者 Mike Cafarella 和 Doug Cutting 总结如下::
… 一个具有完全功能的搜索系统:1亿页面索引量,每秒2个并发索引,需要每月800美元。10亿页面索引量,每秒50个页面请求,大概需要每月30000美元。
这篇文章将为你演示如何在中等级别的网站上搭建Nutch。第一部分集中在抓取上。Nutch的抓取架构,如何运行一个抓取程序,理解这个抓取过程产生了什么。第二部分关注搜索。演示如何运行Nutch搜索程序。以及如何订制Nutch 。
Nutch Vs. Lucene
Nutch 是基于 Lucene的。Lucene为 Nutch 提供了文本索引和搜索的API。一个常见的问题是;我应该使用Lucene还是Nutch?最简单的回答是:如果你不需要抓取数据的话,应该使用Lucene。常见的应用场合是:你有数据源,需要为这些数据提供一个搜索页面。在这种情况下,最好的方式是直接从数据库中取出数据并用Lucene API建立索引。中文用户,可以参考 WebLucene 或者 车东 的一些列文章。如果需要中文分词帮助还可以联系作者。 http://domolo.oicp.net/bbs/list.asp?boardid=24 Erik Hatcher 和 Otis Gospodnetić’s 的 Lucene in Action 中详细讲述了这个过程。Nutch 适用于你无法直接获取数据库中的网站,或者比较分散的数据源的情况下使用。
架构
总体上Nutch可以分为2个部分:抓取部分和搜索部分。抓取程序抓取页面并把抓取回来的数据做成反向索引,搜索程序则对反向索引搜索回答用户的请求。抓取程序和搜索程序的接口是索引。两者都使用索引中的字段。()
实际上搜索程序和抓取程序可以分别位于不同的机器上。()
这里我们先看看Nutch的抓取部分。
抓取程序:
抓取程序是被Nutch的抓取工具驱动的。这是一组工具,用来建立和维护几个不同的数据结构: web database, a set of segments, and the index。下面我们逐个解释上面提到的3个不同的数据结构。
The web database, 或者WebDB, 是一个特殊存储数据结构,用来映像被抓取网站数据的结构和属性的集合。WebDB 用来存储从抓取开始(包括重新抓取)的所有网站结构数据和属性。WebDB 只是被 抓取程序使用,搜索程序并不使用它。WebDB 存储2种实体:页面 和 链接。页面 表示 网络上的一个网页,这个网页的Url作为标示被索引,同时建立一个对网页内容的MD5 哈希签名。跟网页相关的其它内容也被存储,包括:页面中的链接数量(外链接),页面抓取信息(在页面被重复抓取的情况下),还有表示页面级别的分数 score 。链接 表示从一个网页的链接到其它网页的链接。因此 WebDB 可以说是一个网络图,节点是页面,链接是边。
Segment 是 网页 的集合,并且它被索引。 Segment 的 Fetchlist 是抓取程序使用的 url 列表 , 它是从 WebDB中生成的。Fetcher 的输出数据是从 fetchlist 中抓取的网页。Fetcher 的输出数据先被反向索引,然后索引后的结果被存储在segment 中。 Segment 的生命周期是有限制的,当下一轮抓取开始后它就没有用了。默认的 重新抓取间隔是30天。因此删除超过这个时间期限的segment是可以的。而且也可以节省不少磁盘空间。Segment 的命名是 日期加时间 ,因此很直观的可以看出他们的存活周期。
索引库 是 反向索引所有系统中被抓取的页面,他并不直接从页面反向索引产生,它是合并很多小的 segment 的索引中产生的。Nutch 使用 Lucene 来建立索引,因此所有 Lucene 相关的工具 API 都用来建立索引库。需要说明的是 Lucene 的 segment 的概念 和 Nutch 的 segment 概念是完全不同的,不要混淆哦。 可以参考 车东 的相关文章。 www.chedong.com 简单来说 Lucene 的 segment 是 Lucene 索引库的一部分,而 Nutch 的 Segment 是 WebDB 中 被 抓取和索引的一部分。
The Best Tools for Visualization[ZZ]
Visualization is a technique to graphically represent sets of data. When data is large or abstract, visualization can help make the data easier to read or understand. There are visualization tools for search, music, networks, online communities, and almost anything else you can think of. Whether you want a desktop application or a web-based tool, there are many specific tools are available on the web that let you visualize all kinds of data. Here are some of the best:
Visualize Social Networks
Last.Forward: Thanks to Last.fm’s new widget gallery, you can now explore a wide selection of extras to extend your Last.fm experience. The gallery hosts widgets for your desktop, for the web, for social networks, and much more. One of the better tools in the gallery, last.forward, is open source software that lets you map out any last.fm user and their connections. The web site for the software appears to be in German, but the “Download” button still works. And once it was downloaded and installed, I had no trouble using it myself.
Last Forward
Friends Sociomap: Friends Sociomap is another Last.fm tools that generates a map of the music compatibility between you and your Last.fm friends.
Fidg’t: Fidg’t is a desktop application that gives you a way to view your networks tagging habits. You can see what kind of music your network is into, or what kind of pictures they are taking. The Fidg’t Visualizer allows you to play around with your network. To use Fidg’t, you interface with the Visualizer through Flickr and LastFM tags, using any tag to create what they call a “Magnet.” Once a Tag Magnet is created, members of the network will gravitate towards it if they have photos or music with that same Tag. You can also search through the network for certain users, and see their recent photos and music. The Fidg’t interface is beautiful, too.
Fidg’t
The Digg Tools:
Digg.com has some of the best web-based visualization tools on the net, so they’re a must for any visualization list.
- Pics: Digg Pics is the latest tool that tracks the activity of images on the site with images that slide in from the left as people submit them and digg them.
- Arc: Digg Arc displays stories, topics, and containers wrapped around a sphere. The more diggs, the thicker the arcs.
- BigSpy: Digg BigSpy places stories at the top of the screen as they are dugg. Bigger stories have more diggs.
- Stack: Digg Stack shows diggs in real time, with diggs falling from the top of the screen. As stories get more diggs, they’re shown in brighter colors.
- Swarm: Digg Swarm draws circles for stories as they’re dugg. Diggers swarm around stories which makes them grow and get brighter.
One more: Digg Radar. Although this is an unofficial visual aid, Digg Radar is worth a look too. With Digg Radar, you wait and watch for buttons to appear on the map which indicate that a person has Dugg a story. Hover over the button to see their username. Click it to see details about the story, with links to the Digg page or directly to the article.
YouTube:
You can discover related videos using YouTube‘s visualizations. To use this feature, go to a YouTube video, click on the full-screen button, and then click on the small button that shows a network. You’ll see a lot of video balloons appear and the configuration will change when you hover over a button.
Visualize Music
- Liveplasma and Musicovery let you discover new music.
- Tuneglue music map is a “relationship explorer,” similar to LivePlasma. Using data from Amazon and Last.fm, Tuneglue explores relationships between musical artists.
- Moody lets you tag your music collection with colors. They also have a color-coded web player. (our coverage)
- The Echo Nest is an audio analysis tool which takes an mp3 file, breaks it up into little segments, and gives pitch, loudness, and high-level timbral descriptions of each one of those segments. The program maps a subset of this audio data onto a visual scale and creates video playback of the song. (more)
- An interactive harmony model of music which geometrically describes relationships in harmony. The model can be a visualization tool for songwriters or students of music.
- Musiclens gives music recommendations and presents your current mood and musical taste as a diagram.
- Shape Of Song: What does music look like?
- Musicmap: connections are represented as connected lines; they create a web.
Musicovery
Last.fm music visual tools:
- Last Graph: Create artist wave graphs from your musical history in PDF and SVG format.
- Extra Stats: Colorful Stats and tag clouds.
Visualize the Internet
- Opte is a project that lets you graphically map the internet. The data represented and collected here serves a multitude o
f purposes: Modeling the Internet, analyzing wasted IP space, IP space distribution, detecting the result of natural disasters, weather, war, and esthetics/art. - Akamai Technologies, who deliver 15-20% of all web traffic offered up some interesting tools last year for viewing their traffic data. (Our coverage) From their flagship app, the Real-time Web Monitor, which shows countries with the most traffic to the Network Performance Comparison app, Akami’s tools are an interesting way to see the web in real time. In all, they offer 6 Flash-based apps to the public.
- Other internet traffic visualizations include the Internet Health Report and the Internet Traffic Report.
- MantaRay displays the geographical placement of MBONE infrastructure (Multi-cast backbone) of the internet. Otter displays topological views of the (same) multicast infrastructure.
- Packet Garden is an app that watches your Internet traffic and builds a private world that you can later explore.
- Mapnet is a Java applet to visualize the topologies of backbones of major U.S. Internet Service Providers.
- Websites as graphs. An HTML DOM Visualizer Applet, which displays sites as graphs depending on the amount of links, tables, div tags, images, forms and other tags.
Packet Garden
Amazon
- LivePlasma: music discovery (see also music section of this list)
- Flowser is another flash-based Amazon visualization for search.
- BrowseGoods is a visualization that lets you zoom and pan Amazon’s catalog of products.
- Tuneglue music map is a “relationship explorer,” similar to LivePlasma. Using data from Amazon and Last.fm, Tuneglue explores relationships between musical artists. (see also music section of this list)
- Coverpop is more of an art project that lets you browse Amazon via a collage.
- Amaztype, a typographic book search, collects the information from Amazon and presents it in the form of keyword you’ve provided. To get more information about a given book, simply click on it.
Flickr
- Taglines lets you to visualize Flickr tags over time
- Flickrvision: view real-time flickr photos on a map.
- Flickrtime is a tool that uses Flickr API to present the uploaded images in real-time. The images form the clock which shows the current time.
Some details on these: see “Alternative ways to browse Amazon” (our coverage)
Miscellaneous
- Visual Thesaurus: The Visual Thesaurus is an interactive dictionary and thesaurus which creates word maps that blossom with meanings and branch to related words.
- Twittervision: view real-time tweets on a map.
- 17 More Ways to Visualize Twitter
- All the ways to visual del.icio.us collected here.
- Three Views shows three views of the earth, in which each country is represented by a circle that shows the amount of money spent on the military (size of circle) and what fraction of the country’s earnings that uses (color).
- We Feel Fine shows human feelings calculated from a large number of weblogs.
- Interactive History Timeline presents the history of Great Britain, divided into interactive data blocks.
- Winning Lotto Numbers shows the frequency of appearance of every number from one year to the next one.
- Language Poster – the history of programming languages
Sites Dedicated to Visualization
- IBM’s Many Eyes (our coverage) is a shared visualization and discovery service offering all kinds of visualizations you can explore or create.
- Informationarchitects.jp presents the 200 most successful websites on the web, ordered by category, proximity, success, popularity and perspective in a mindmap.
- VisualComplexity.com is an online collection of visualizations (our coverage)
- Infosthetics discusses the aesthetics of data visualization
- Blogger Anonymous Professor is into visualization, offering visualizations like the 3D visualization/tour of classical music/composers, Visualization of the StumbleUpon network, the value of a Digg and more.
- Zip Codes visualized
Many Eyes
Search
Heatmaps:
Heatmaps site CrazyEgg applies heatmaps to tracking what visitors do on a user’s website. Their software captures user clicks on each page and then presents a summary in the form of a heatmap. Other heatmap sites include Feng-GUI and FuseStats. Summize applies heatmaps to shopping via their search engine(our coverage here, here and here).
Visualizing the Power Struggle in Wikipedia displays the most popular articles and the most frequent search queries in the heatmap.
Visual Search Engines:
- Riya’s Like.com: first true visual search engine does visual search for shopping.
- Searchme: upcoming visual search for the web
- Xcavator: A photo search engine which utilizes visual clues that you provide to identify and extract similar pictures from large groups of digital images.
- ManagedQ: A visual search experiment with some built-in semantics. (our coverage)
- oSkope: Visual search engine for finding products that searches Amazon, Ebay, Flickr, Fotolia, Yahoo!Image Search and YouTube.
- Quintura: visual search engine that uses clouds, tags, and highlighting.
- Tafiti: Microsoft’s experimental visual search engine running on Silverlight.
- Retrievr is an experimental service which lets you search and explore in a selection of Flickr images by drawing a rough sketch.
- Mooter: Visual search engine that organizes results In clusters.
- KartOO: visual web searc.
- SearchCrystal is a search visualization tool that let you compare, remix and share results from sources on the web, whether sites, images, videos, blogs, news engines or RSS feeds. (see also KoolTorch)
- Spacetime: search Google, YouTube, RSS, eBay, Amazon, Yahoo!, Flickr and images all in one 3D space.
- grokker: web search or enterprise search offering map views of data.
- Burst Labs suggests similar or connected items to your search queries in a bubble
- UBrowser renders interactive web pages onto geometry using OpenGL and an embedded instance of Gecko
- walk2web – enter a URL, then visually browse web sites linked from it
- TouchGraph‘s Amazon Browser, Google Browser, and LiveJournal Browser
Touchgraph
News and RSS
- Voyage is an RSS-feader which displays the latest news in the “gravity area”. News can be zoomed in and out. The navigation is possible with a timeline
- Newsmap is an application that visually reflects the constantly changing landscape of the Google News news aggregator.
- Universe DayLife displays events, connections and news as circles which gravitate around the topic they are related to.
Data
- Swivel create pie charts, diagrams and histograms.
- Xtimeline and Circavie let you create your own timelines
- The prefuse visualization toolkit – thebeta-version of a Java-based toolkit for programming of applications with integrated data visualization methods
- Dataesthetics Eric Blue on unusual Data Visualization methods.
- Smashing Magazine’s Diagram tool list
- America by the Numbers (Time.com)
Swivel
豆瓣电台
链接表
近期评论
- hbn 发表在《骑车》
- zhaozz 发表在《走跑骑爬打,运动多样化》
- hbn 发表在《走跑骑爬打,运动多样化》
- 刘苏 发表在《hello, 2010》
- Bony 发表在《hello, 2010》

