Weekly Reading 180328

下面是之前2-3个月积攒的文摘,清空一下,所以有些看起来可能是不够“新鲜”了。

1, Dubbo源代码分析九:优雅停机
http://manzhizhen.iteye.com/blog/2404220
这几天从dubbo-admin有机会看了一点dubbo代码,对上述shutdown方式,有了点体会。
可以说2.5.3版本 dubbo和dubbo-admin交互还是有多处bug的,比如可以对比 com.alibaba.dubbo.registry.integration.RegistryProtocol在2.5.3和2.5.8版本的实现(doChangeLocalExport/notify方法)。

3,python根据项目生成requirements.txt
查看一些python项目,很多并未采用requirements.txt方式,编译起来耗费时间,今天学到了可以用 pip freeze的方式:

python项目中必须包含一个 requirements.txt 文件,用于记录所有依赖包及其精确的版本号。以便新环境部署。
在虚拟环境中使用pip生成:
(venv) $ pip freeze >requirements.txt
这种方式配合virtualenv 才好使,否则把整个环境中的包都列出来了。
使用 pipreqs
这个工具的好处是可以通过对项目目录的扫描,自动发现使用了那些类库,自动生成依赖清单。
缺点是可能会有些偏差,需要检查并自己调整下。
# pip install pipreqs
# 使用方式也比较简单
pipreqs ./

3, JMX最佳实践与详解
http://shift-alt-ctrl.iteye.com/blog/2404103 文中这里列举下Java已经内置的多个MXBean实现。

1、BufferPoolMXBean:
   有关“direct”、“mapped” buffer的资源信息;如果Application为网络IO系统(比如Netty编程)、或者有大量文件操作,你应该考虑关注此MXBean。
2、ClassLoadingMXBean:
   有关JVM类加载相关的资源信息;如果Application为序列化相关的组件、脚本化集成组件、有较多代理类(包括动态加载,OSGI)等,你应该关注此MXBean。
3、GarbageCollectorMXBean:
   有关JVM GC相关的资源,包括GC时长、GC次数和相关内存状态。
4、MemoryPoolMXBean:
   有关JVM中“内存池”的相关资源信息,可以配合MemoryManagerMXBean一起使用。一个Application中可能有多个“内存池”实例,我们可以通过MemoryManagerMXBean获取内存池的列表,并查看此内存池的存量和GC相关信息。
5、OperatingSystemMXBean:
   有关操作系统的相关资源信息,比如CPU负载等。
6、PlatformManagedObject:
   内部接口,所有的JAVA平台有关的MXBean都扩展此接口,比如上述几个MXBean;通常应用程序不应该实现它。
7、RuntimeMXBean:
   有关runtime的信息,比如VM的参数、版本等。
8、ThreadMXBean:
   有关运行时线程状态的资源信息,比如“CPU高耗线程”、“死锁线程”等,可以帮助我们优化并发操作等。

实际很少用到MXBean来做一些metric/hanck技术,像Kafka也内置了一些MXBean,作为系统运行状态的参考,业界也有一些工具可以将这些MXBean打点至如grafana上展示。

4,暴力美学,“星星之火可以燎原”
记得,有诗人吟送道,大概类似,“我在春天种下希望的种子,秋天收获希望“的句子,或者类似句子。
但是,太祖的“星星之火可以燎原”,只是八个字,显然更有境界。

5, Latency Sensitive Microservices
https://www.infoq.com/presentations/microservices-trading-system

6, 分布式数据库中间件TDDL、Amoeba、Cobar、MyCAT架构比较。文章简洁明了
https://www.jianshu.com/p/ed54162d720c

7,Significant Software Development Developments of 2017
需要自备梯子。2017年几个重大软件研发事件备忘录,想不到去年发生这么多大事
http://marxsoftware.blogspot.com/2017/12/big-news-2017.html

8,踩坑无数,美团点评高可用数据库架构演进,好文。
https://tech.meituan.com/数据库高可用架构的演进与设想.html
MMM(Master-Master replication manager for MySQL)-> MHA(MySQL Master High Availability)-> MHA+Zebra (DAL)

9, 如何设计一个DNS
http://www.infoq.com/cn/articles/how-to-design-dns

10,2017双11核心技术揭秘—阿里数据库进入全网秒级实时监控时代 | 阿里中间件团队博客
http://jm.taobao.org/2017/12/27/20172703/

11,Linux IO磁盘篇整理小记 - 朱小厮的博客 - CSDN博客
一篇磁盘性能简单分析的文章,可以收藏以备查询
http://blog.csdn.net/u013256816/article/details/78945085

12,Peter Norvig,大牛
http://norvig.com/

13, Docker 公司已死 ! | IT瘾
http://itindex.net/detail/57847-docker-%E5%85%AC%E5%8F%B8
mac安装kubernetes并运行echoserver - 简书
https://www.jianshu.com/p/a42eeb66a19c
很早之前看的,好像后来有文章反对,不过值得参考下。

14, uber 技术总结
https://eng.uber.com/2017-highlights/
https://eng.uber.com/2017-open-source/

15, Kafka#3:分布式设计 - 程序园
http://www.voidcn.com/article/p-risahvbp-ez.html

16,百亿访问量的监控平台如何炼成? – 运维派
http://www.yunweipai.com/archives/24462.html

17,美团点评联盟广告场景化定向排序机制
https://tech.meituan.com/targeting_agentscore.html

18,加密数字货币和传统分布式系统共识机制 | 温国兵的随想录
https://dbarobin.com/2017/12/27/blockchain-consensus/

19,Kafka 高性能吞吐揭秘 - 友盟博客 - SegmentFault
https://segmentfault.com/a/1190000003985468
关于Kafka日志留存策略的讨论 - huxihx - 博客园
http://www.cnblogs.com/huxi2b/p/8042099.html

20 JVMTI 参考
http://blog.caoxudong.info/blog/2017/12/07/jvmti_reference

21,
如果你使用kafka new-consumer API 即:__consumer_offset存储你的消费信息,当

kafka-client设置props.put(“auto.commit.interval.ms”, “3000”)

则,对于server端,不论client是否消费新的数据,都是每6秒提交offset,server都写入consumeroffset,server不会做去重判断的,也就是每个三秒都会提交offset到__consumer_offset这个topic存储数据

22,Intel meltdown & spectre的几篇文章以及对spark/elasticsearch的影响,好消息。
http://www.infoq.com/cn/news/2018/01/meltdown-spectre
https://databricks.com/blog/2018/01/13/meltdown-and-spectre-performance-impact-on-big-data-workloads-in-the-cloud.html
https://www.elastic.co/blog/performance-impact-of-meltdown-on-elasticsearch

23,新晋Java Champions,包括一位JVM(SUN/OpenJDK)专家,曾在infoq 网站/现场 听过,网站上视频,很值得一听。
https://www.infoq.com/news/2018/01/JavaChampions2017

24,简介听起来不错
经历400多天打磨,HSF的架构和性能有哪些新突破?
http://jm.taobao.org/2018/01/16/post180116/

25,JSON-B和Yasson详解,新的JAVA EE API官方规范,以及Eclipse的实现 Yasson
http://blog.csdn.net/chszs/article/details/79059116
转眼间,似乎已经定下来改名叫 Jakarta EE了

26,ElasticSearch如何支持深度分页
http://arganzheng.life/deep-pagination-in-elasticsearch.html

学习了,对于超过默认10000条的除了使用ES的 Scan and scroll API之外,文章还提供了官方的 Search After 机制,不过要在ES 5.0版本后。

search_after使用方式上跟scroll很像,但是相对于scroll它是无状态的(stateless),没有search context开销;
而且它是每次请求都实时计算的,所以也没有一致性问题(相反,有索引变化的话,每次排序顺序会变化呢)。
但是比起from+size方式,还是有同样的问题没法解决:就是只能顺序的翻页,不能随意跳页

27,Elasticsearch Performance Tuning Practice at eBay
https://www.ebayinc.com/stories/blogs/tech/elasticsearch-performance-tuning-practice-at-ebay/
具体不列举了,infoq还有翻译的,这里简单总结下:

From our experience, if the index is smaller than 1G, it’s fine to set the shard number to 1. For most scenarios, we can leave the shard number as the default value 5, but if shard size exceeds 30GB, we should increase the shard number to split the index into more shards. The shard number cannot be changed once an index is created, but we can create a new index and use the reindex API to move data.
Increase refresh interval. As we mentioned in the tune indexing performance section, Elasticsearch creates new segment every time a refresh happens. Increasing the refresh interval would help reduce the segment count and reduce the IO cost for search. And, the cache would be invalid once a refresh happens and data is changed. Increasing the refresh interval can make Elasticsearch utilize cache more efficiently.
Use filter context instead of query context if possible. A query clause is used to answer “How well does this document match this clause?” A filter clause is used to answer “Does this document match this clause?” Elasticsearch only needs to answer “Yes” or “No.” It does not need to calculate a relevancy score for a filter clause, and the filter results can be cached. See Query and filter context for details. 
Node query cache. Node query cache only caches queries that are being used in a filter context. Unlike a query clause, a filter clause is a "Yes" or "No" question. Elasticsearch used a bit set mechanism to cache filter results, so that later queries with the same filter will be accelerated. Note that only segments that hold more than 10,000 documents (or 3% of the total documents, whichever is larger) will enable a query cache. For more details, see All about caching.
We can use the following request to check whether a node query cache is having an effect.
GET index_name/_stats?filter_path=indices.**.query_cache
Sort by _doc if you don’t care about the order in which documents are returned

28,最近打算kafka官方提个建议:consumeroffset设置为30个。
kafka默认consumeroffset设置为50个partition,50=552,这对于部署kafka机器除非5的倍数,否则不会均匀分布的,而对于kafka的partition分配策略 hash(group_id)%50,可能效果也是不理想的。而30=235,可选择性就多点,但hash(group_id)%30还要具体数据验证。

29, Spring Boot 2.0 New Features: Infrastructure Changes
https://springuni.com/spring-boot-2-infrastructure-changes/
值得了解下

30,
“Go out there and have huge dreams, then show up to work the next morning and relentlessly incrementally achieve them.”
~ from the book, How Google Works

31,
https://www.elastic.co/blog/categorizing-non-english-log-messages-in-machine-learning-for-elasticsearch

32,有趣,我已经想象到教主九泉下有知,会不会来一句“意不意外惊不惊喜?”
https://www.theverge.com/2018/2/16/17020246/apple-park-headquarters-employees-injury-glass-doors-design

33,值得参考下
https://rcoh.me/posts/cache-oblivious-datastructures/

34,基于日志trace的智能故障定位系统 - 百度搜索运维团队技术负责人
http://www.infoq.com/cn/presentations/intelligent-fault-location-system-based-on-log-trace

35,语言设计,值得参考
https://www.iravid.com/posts/slick-and-shapeless.html

36, Java版WAF实现
https://www.yangguo.info/2018/01/11/Gateway/
API网关的开源解决方案那么多,为什么我们却还要选择自研?
https://github.com/chengdedeng/waf

37,
http://blog.llvm.org/2018/03/clang-is-now-used-to-build-chrome-for.html

38,批判的看待,而不是听风就是雨
http://www.flax.co.uk/blog/2018/03/02/no-elastic-x-pack-not-going-open-source-according-elastic/

39,杂文:
美团外卖原生广告推荐实践
http://www.infoq.com/cn/presentations/the-recommended-practice-of-meituan-takeout-ad
阿里巴巴监控之路
http://www.infoq.com/cn/presentations/the-road-of-monitoring-in-alibaba
阿里异地多活与同城双活的架构演进
http://www.infoq.com/cn/presentations/the-structure-of-alibaba-double-living-in-the-same-city

40, Steve Jobs: Everything in this world… was created by people no smarter than you.
http://muratbuffalo.blogspot.com/2018/02/think-before-you-code.html

41,
https://fosdem.org/2018/schedule/event/unix_evolution/

42,kibana几个有趣使用
https://logz.io/blog/kibana-hacks/

43, 快速定位生产故障问题-JVM进程CPU占用率高于100%
http://blog.csdn.net/flysqrlboy/article/details/79314521
实施要点:
top -Hbp ‘pid’ 定位问题线程
jstack ‘pid’ | grep ‘thread_id’ 找出问题代码

44,http://blog.codefx.org/java/application-class-data-sharing/

45,
http://colobu.com/2018/03/12/Concurrency-Utilities-Enhancements-in-Java-8-Java-9/

46,来自stackoverflow的调查
https://insights.stackoverflow.com/survey/2018/

47,Kubernetes的抗脆弱性
http://www.infoq.com/cn/articles/antifragility-in-kubernetes

48,Java10来了,来看看它一同发布的全新JIT编译器
https://mp.weixin.qq.com/s/fNDBX6pxw2Xa5afZpVaBEg
与interpreter,GC等JVM的其他子系统相比,JIT compiler并不依赖于诸如直接内存访问的底层语言特性。它可以看成一个输入Java bytecode输出二进制码的黑盒,其实现方式取决于开发者对开发效率,可维护性等的要求。Graal是一个以Java为主要编程语言,面向Java bytecode的编译器。与用C++实现的C1及C2相比,它的模块化更加明显,也更加容易维护。Graal既可以作为动态编译器,在运行时编译热点方法;亦可以作为静态编译器,实现AOT编译。在Java 10中,Graal作为试验性JIT compiler一同发布(JEP 317)。这篇文章将介绍Graal在动态编译上的应用。有关静态编译,可查阅JEP 295或Substrate VM