2018-05-09

Many Links 0509

积攒许久链接，于是该用短文方式。改名 many links仿O’Reilly的Four Short Links.
1.
Kafka Streams Topology Visualizer
正如其自述“A tool helps visualizing stream topologies by generating nice looking diagrams from a kafka stream topology descriptions.”
如果你苦于向他人解释采用的kafka stream 数据处理逻辑，可以考虑该连接生成可视化图片展示，适合作为架构设计插图。
2.
The world beyond batch: Streaming 101
老文章，流式数据处理101，正如文述“A high-level tour of modern data-processing concepts.”值得一看
不过文章显然着重是对几个“时间”的理解以及流的粒度探讨：
Event time vs. processing time
Data processing patterns
Bounded data/Unbounded data — batch
Fixed windows/Time-agnostic
Filtering/Inner-joins/Windowing
3.

使用火焰图展示结果
1、Flame Graph项目位于GitHub上：https://github.com/brendangregg/FlameGraph
2、可以用git将其clone下来：git clone https://github.com/brendangregg/FlameGraph.git

我们以perf为例，看一下flamegraph的使用方法：
1、第一步
$sudo perf record -e cpu-clock -g -p 28591
Ctrl+c结束执行后，在当前目录下会生成采样数据perf.data.
2、第二步
用perf script工具对perf.data进行解析
perf script -i perf.data &> perf.unfold
3、第三步
将perf.unfold中的符号进行折叠：
#./stackcollapse-perf.pl perf.unfold &> perf.folded
4、最后生成svg图：
./flamegraph.pl perf.folded > perf.svg

金丝雀发布、滚动发布、蓝绿发布到底有什么差别？关键点是什么
杨波大神讲解各自区分，感觉总结还是蛮详细的.
摘要：

下面是对发布策略的一些选型建议，供不同阶段公司参考：
1)蛮力发布一般是不建议采用的，除非是开发测试环境，用户体验不敏感的非关键应用，或者是创业期什么都缺时候的无奈之举。
2)如果暂时还不具备研发较复杂的滚动发布工具和配套智能 LB，则功能开关是一种不错的轻量级发布技术，投入相对较小的成本，可以让研发人员灵活定制发布逻辑。
3)金丝雀发布通过少量新版本服务器接收生产流量的方式去验证新版本，可以显著降低风险。金丝雀发布适用于大部分场景，一般成长型公司就可以采用。
4)对于达到一定业务体量的公司，考虑到用户体验对业务的关键性，则需要投入研发资源开发支持滚动式发布的工具和配套的智能 LB，实现自动化和零停机的发布。滚动式发布一般和金丝雀发布配合，先发一台金丝雀去验证流量，再按批次增量发布。
5)随着轻量级虚拟化（例如容器）的普及，双服务器组发布方式具有更快的发布和回退速度，是值得投入的高级发布技术。蓝绿部署仅适用于双服务器组，滚动式发布既可以在单服务器组上实现，也可以在双服务器组上实现。
6)对于涉及关键核心业务的新功能上线，采用 A/B 测试，可以显著降低发布风险，A/B 测试是唯一一种支持针对特定用户组进行生产测试的高级发布技术。当然 A/B 测试的投入不低，建议有一定研发能力的组织采用。
7)对于关键核心业务的迁移重构，为确保万无一失，最后的一个大招是影子测试，影子测试对生产流量和用户完全无影响。当然这个大招的投入成本和门槛都高，建议有足够业务体量和研发能力的组织投入。
8)上述的各种发布策略并不是非此即彼的，一个公司常常会综合采用多种发布技术作为互补，实现灵活的发布能力。例如主流的发布手段是金丝雀 + 滚动式发布，某些业务线可能根据业务场景需要采用功能开关发布，还有一些业务线则可能采用高级的 A/B 测试发布手段。

附图：
发布对比图

https://zhuanlan.zhihu.com/p/35295839
6.
IntelliJ IDEA 2018.1正式发布
程序猿DD的：IDEA 2018.1 最新版本给我们带来哪些惊喜

stream代码自动生成更智能
while循环优化
优化多余的资源关闭操作
字符串数组自动排序
拷贝构造函数完整性提示
…

https://www.oreilly.com/ideas/four-short-links-2-april-2018

1，Valve's Networking Code – a basic transport layer for games. The features are: connection-oriented protocol (like TCP)…but message-oriented instead of stream-oriented; mix of reliable and unreliable messages; messages can be larger than underlying MTU, the protocol performs fragmentation and reassembly, and retransmission for reliable; bandwidth estimation based on TCP-friendly rate control (RFC 5348); encryption; AES per packet, Ed25519 crypto for key exchange and cert signatures; the details for shared key derivation and per-packet IV are based on Google QUIC; tools for simulating loss and detailed stats measurement.
2，gron – grep JSON from the command line.
3，The Problem With Voting – I don't agree with all of the analysis, but the proposed techniques are interesting. I did like the term "lazy consensus" where consensus is assumed to be the default state (i.e., “default to yes”). The underlying theory is that most proposals are not interesting enough to discuss. But if anyone does object, a consensus seeking process begins. (via Daniel Bachhuber)
4，pix2code – open source code that generates Android, iOS, and web source code for a UI from just a photo. It's not coming for your job any time soon (over 77% of accuracy), but it's still a nifty idea. (via Two Minute Papers)

看到有些blog页面有很酷的聊天式的组件留言
https://cn.wordpress.org/plugins/collectchat/
http://www.phpwechat.com/

https://github.com/tomnomnom/gron/
命令行解析json，如其自述：
Make JSON greppable!
gron transforms JSON into discrete assignments to make it easier to grep for what you want and see the absolute ‘path’ to it. It eases the exploration of APIs that return large blobs of JSON but have terrible documentation.

Leveraging Elasticsearch for a 1,000% performance boost 有点意外官博还会有类似文章，不过入门可以看看

其实开头讲lamda背后的数学，从哲学的角度，引用数学家戏谑的定义：
One of my favorite definitions comes from Dr. Eugenia Cheng who says that Category Theory is the mathematics of mathematics.
Over the course of three months, I was fortunate enough to attend three awesome conferences: Lambda World http://www.lambda.world/ in October, ScalaIO in November, and Scala eXchange in December
https://www.47deg.com/blog/science-behind-functional-programming/
http://www.lambdadays.org/lambdadays2018
http://www.lambdadays.org/static/upload/media/15197229996020philipwadlercategoriesfortheworkinghacker.pdf
https://bartoszmilewski.com/
https://danielasfregola.com/
http://homepages.inf.ed.ac.uk/wadler/
http://eugeniacheng.com/

理解 monoid，manad，Applicatives，functor， function
http://www.ccs.neu.edu/home/dherman/research/tutorials/monads-for-schemers.txt
http://dev.stephendiehl.com/hask/#monads
https://www.zhihu.com/question/19635359
http://adit.io/posts/2013-04-17-functors,_applicatives,_and_monads_in_pictures.html
很好的几篇文章
13.
阿里巴巴国际环境下的SRE体系
14.
滴滴出行海量数据场景下的智能监控与故障定位实践
15.
DSL-JSON library

Fastest JVM (Java/Android/Scala/Kotlin) JSON library with advanced compile-time databinding support. Compatible with DSL Platform.
Java JSON library designed for performance. Built for invasive software composition with DSL Platform compiler.

Elasticsearch 写入流程简介
https://zhuanlan.zhihu.com/p/34875310?group_id=960576335035441152
使用ElasticSearch的44条建议
http://mp.weixin.qq.com/s/ER70p1edqkScx_DAMSsuVA

推荐看一看，阿里云分布式NoSQL开发王怀远的分享，不乏详细和深入，非常想大段拷贝粘贴此处。
Elasticsearch分布式一致性原理剖析(一)-节点篇
https://zhuanlan.zhihu.com/p/34830403
讨论：ES集群构成,节点发现,Master选举,错误检测,集群扩缩容,与Zookeeper、raft等实现方式的比较
Elasticsearch分布式一致性原理剖析(二)-Meta篇
https://zhuanlan.zhihu.com/p/35283785
讨论：Master如何管理集群，Meta组成、存储和恢复，ClusterState的更新流程，如何解决当前的一致性问题
Elasticsearch分布式一致性原理剖析(三)-Data篇
https://zhuanlan.zhihu.com/p/35285514
讨论：问题背景，数据写入流程，PacificA算法，SequenceNumber、Checkpoint与故障恢复，ES与PacificA的比较.

重新认识BM25相似性算法，看完文章就会理解shard分布如何影响搜索结果了
Elasticsearch官博的文章，深入浅出，似乎目前出到第三篇了，不过我第一篇还未看完。
In Elasticsearch 5.0, we switched to Okapi BM25 as our default similarity algorithm.
那么该算法除了计算相似性之外，如何影响结果得分？你知道吗，es shards的分布也会影响得分结果。
Understanding How Shards Affect Scoring
https://www.elastic.co/blog/practical-bm25-part-1-how-shards-affect-relevance-scoring-in-elasticsearch
https://www.elastic.co/blog/practical-bm25-part-2-the-bm25-algorithm-and-its-variables

一段一本正经的电影《The Maven》
https://manuel.bernhardt.io/2018/04/19/quick-tour-build-tools-scala/

一个性能压测，许多组织在引用的 Kafka 0.10.x 压缩算法的选择
http://blog.yaorenjie.com/2017/01/03/Kafka-0-10-Compression-Benchmark/

分布式一致性框架Atomix学习
论准确性，分布式一致性算法非Paxos莫属，但种种原因，如复杂到难以理解到大部分实现Paxos算法错误，故许多采用他的改进版本实现，比如Zookeeper、Google Chubby、RAFT等，本文就概览了RAFT的开源实现 Atomix。

https://blog.softwaremill.com/synchronous-or-asynchronous-and-why-wrestle-with-wrappers-2c5667eb7acf
可以说是 java CompletableFuture经典探讨了
23.
敲最少的键，编最多的码

https://www.kaggle.com/creepykoala/study-of-tree-and-forest-algorithms/notebook

http://www.cnblogs.com/kingszelda/p/8886403.html

通过本文分析，可以得知HttpClient默认是有重试机制的，其重试策略是：
1.只有发生IOExecetion时才会发生重试
2.InterruptedIOException、UnknownHostException、ConnectException、SSLException，发生这4中异常不重试
3.get方法可以重试3次，post方法在socket对应的输出流没有被write并flush成功时可以重试3次。
4.读/写超时不进行重试
5.socket传输中被重置或关闭会进行重试
6.以及一些其他的IOException，暂时分析不出来。
5.1 我们的业务重试了吗？
对于我们的场景应用中的get与post，可以总结为：
只有发生IOExecetion时才会发生重试
InterruptedIOException、UnknownHostException、ConnectException、SSLException，发生这4中异常不重试
get方法可以重试3次，post方法在socket对应的输出流没有被write并flush成功时可以重试3次。
首先分析下不重试的异常：
InterruptedIOException，线程中断异常
UnknownHostException，找不到对应host
ConnectException，找到了host但是建立连接失败。
SSLException，https认证异常
另外，我们还经常会提到两种超时，连接超时与读超时：
java.net.SocketTimeoutException: Read timed out
java.net.SocketTimeoutException: connect timed out
这两种超时都是SocketTimeoutException，继承自InterruptedIOException，属于上面的第1种线程中断异常，不会进行重试。
5.2 哪些场景会进行重试？
对于大多数系统而言，很多交互都是通过post的方式与第三方交互的。
所以，我们需要知道有哪些情况HttpClient给我们进行了默认重试。
我们关心的场景转化为，post请求在输出流进行write与flush的时候，会发生哪些除了InterruptedIOException、UnknownHostException、ConnectException、SSLException以外的IOExecetion。
可能出问题的一步在于HttpClientConnection.flush()的一步，跟进去可以得知其操作的对象是一个SocketOutputStream,而这个类的flush是空实现，所以只需要看wirte方法即可。

https://github.com/artix41/awesome-transfer-learning

Java in containers_jdk10
https://mjg123.github.io/2018/01/10/Java-in-containers-jdk10.html

https://code.facebook.com/posts/293371094514305/open-sourcing-racerd-fast-static-race-detection-at-scale

Getting FIREd The tech workers who are engineering a mid-30s retirement
追求时间自由的工程师
https://story.californiasunday.com/tech-retirees

30, 电信运营商劫持何时休
http://www.52nlp.cn/%E7%94%B5%E4%BF%A1%E8%BF%90%E8%90%A5%E5%95%86%E5%8A%AB%E6%8C%81%E4%BD%95%E6%97%B6%E4%BC%91