« Web相关|WebSite - 数据库|DataBase - 负载均衡|Nginx,AB »
搜索:

HyperTable选择C++还是java开发时的考虑

WhyWeChoseCppOverJava  
Why We Chose C++ Over Java

This document is to clarify our position regarding C++ vs. Java for choice of implementation language. There are two fundamental reasons why C++ is superior to Java for this particular application.

  1. Hypertable is memory (malloc) intensive. Hypertable caches all updates in an in-memory data structure (e.g. stl map). Periodically, these in-memory data structures get spilled to disk. These spilled disk files get merged together to form larger files when their number reaches a certain threshold. The performance of the system is, in large part, dictated by how much memory it has available to it. Less memory means more spilling and merging which increases load on the network and underlying DFS. It also increases the CPU work required of the system, in the form of extra heap-merge operations. Java is a poor choice for memory hungry applications. In particular, in managing a large in-memory map of key/value pairs, Java's memory performance is poor in comparison with C++. It's on the order of two to three times worse (if you don't believe me, try it).
  2. Hypertable is CPU intensive. There are several places where Hypertable is CPU intensive. The first place is the in-memory maps of key/value pairs. Traversing and managing those maps can consume a lot of CPU. Plus, given Java's inefficient use of memory with regard to these maps, the processor caches become much less effective. A recent run of the tool Calibrator (http://monetdb.cwi.nl/Calibrator/) on one of our 2GHz Opterons yields the following statistics:
    caches:
    level  size    linesize   miss
    -latency        replace-time
     
    1     64 KB   64 bytes    6.06 ns =  12 cy    5.60 ns =  11 cy
     
    2    768 KB  128 bytes   74.26 ns = 149 cy   75.90 ns = 152 cy

查看全文:" HyperTable选择C++还是java开发时的考虑 " »

Tags: ''

作者: 大恐龙 - 分类: ' C/C++/VC/GNU ' ' 数据库|DataBase ' ' 系统|System ' - - 评论 ( 0 ) - 引用通告 ( 0 ) - 查看次数: (0)

高性能数据库Hypertable(1)

    今天又有新的问题要问神,在google中输入"high performance open open source database". "Hypertable"赫然名列第一,在和蚊子腿们死磕一阵之后发现又是一个令人惊喜的好好玩艺。

    “Hypertable is a high performance distributed data storage system designed to support applications requiring maximum performance, scalability, and reliability.”

     高性能,分布式可用性,可伸缩性,反正就是牛的意思.......真的吗?

    “This project is for the design and implementation of a high performance, scalable, distributed storage and processing system for structured and unstructured data. It is designed to manage the storage and processing of information on a large cluster of commodity servers, providing resilience to machine and component failures. Data is represented in the system as a multi-dimensional table of information. The data in a table can be transformed and organized at high speed by performing computations in parallel, pushing them to where the data is physically stored. ”

     人家说了,同时面向结构化和非结构化的数据设计。同时使用大规模的服务器群处理信息--云计算?没有单点故障,多重表空间,数据可以被快速并行读取,物理层无关性。说的偶只流口水。

    “Modeled after Google's well known Bigtable project, Hypertable is designed to manage the storage and processing of information on a large cluster of commodity servers......”

     呵呵,果然,真的是要做成BigTable,很好很强大,足以打消任何顾虑svn一下。

     原来人家不用SVN了,最新的git version control system ,够声色犬马的。一直对perl没好感,这次还是觉得要试一下git.   

git.JPG

使用C++和java混合编写,不知道能不能在windows上运行,使用了hadoop作为分布式存储,还有一个简单的分词,很久没有用miniGW,不过相信还是还是可以在windows上跑起来的,不是我不喜欢linux,要知道恐龙我可是AIX和HPunix的双料administrator,在windows编译主要为了2次开发方便点。

      笔者一直在构思一种数据库环境,是在用lucene和hadoop构建搜索引擎时得到的灵感,我们其实需要一种这样的数据库环境:首先是大容量,无限大的容量,因为我们不知道数据会有多大,表有多长,或者有多少个表;其次廉价的运行环境,AIX?Oracle?不要开玩笑,把利润都交给IBM吗;最后是这个系统要易于维护,或者说是免维护,使用无数的普通PC来运行,使用最简单的维护操作,就像《Matrix》面机器生物种植人类提供能源一样。

     恐龙把这种架构定义为数据城市,有农场成产数据,有仓库存储数据,有工厂加工数据,有市场出售数据,所有的功能运行在无数廉价PC上,无边界的整合到IT环境中,形成一个生物圈,这样的环境首先已经出现网站和大型在线服务程序里,就像Bigtable,以后也必将产品化,进入企业领域,笔者也一直在构思这样的系统,开始的想法编写一系列的包,封装lucene的API,使其可以用HSL这样的方式来访问,中间层使用lucene+hadoop在构建非结构化的存储。Hypertable似乎很接近笔者的这种想法,看到了Hypertable又受益不少。

 

12:02分,值班结束,回家会周公去。今天的心得是,一定要相信神,有问题就要来问神,再就是时刻不要忘记思考如何把神一脚踢翻。

Tags: '' '' '' '' ''

作者: 大恐龙 - 分类: ' 数据库|DataBase ' ' 系统|System ' - - 评论 ( 0 ) - 引用通告 ( 0 ) - 查看次数: (0)

Mysql的一些日常操作,备忘!

1、查看当前所有连接的详细资料:
./mysqladmin -uadmin -p -h10.140.1.1 processlist


2、只查看当前连接数(Threads就是连接数.):
./mysqladmin -uadmin -p -h10.140.1.1 status

Tags: ''

作者: 大恐龙 - 分类: ' MySQL ' ' 数据库|DataBase ' ' 系统|System ' - - 评论 ( 0 ) - 引用通告 ( 0 ) - 查看次数: (0)
前一个分类: Web相关|WebSite - 后一个分类: 负载均衡|Nginx,AB