(2019-07-26 20:31)

beeline 数据导出

beeline --showHeader=false --outputformat=dsv --delimiterForDSV=$'\001' -e 'select * from evan_test3' >test.csv

----导出干净的格式

beeline -u 'jdbc:hive2://h7:2181,h6:2181,h5:2181/ods;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hs2_zk' --incremental=true -n 'username' -p 'password' --showHeader=false --outputformat=dsv --delimiterForDSV=$'\t' -e 'select * from db.tbname where bizdate=20190322 limit 10;' >> test.csv

参考：https://www.cnblogs.com/huaxiaoyao/p/4672316.html

阅读收藏

查看全文>>

yarn集群队列参数说明

(2019-04-01 17:50)

转载▼

标签：

yarn

hadoop

 来源：http://mail-archives.apache.org/mod_mbox/hadoop-common-commits/201602.mbox/<34c274e578b841c3a4d36ccbce9d2d7d@git.apache.org>

 Repository: hadoop Updated Branches:   refs/heads/trunk c89a14a8a -> 63c63e298   YARN-4662. Document some newly added metrics. Contributed by Jian He   Project: http://git-wip-us.apache.org/repos/asf/hadoop/repo Commit: http://git-wip-us.apache.org/repos/asf/hadoop/commit/63c63e29 Tree: http://git-wip-us.apache.org/repos/asf/hadoop/tree/63c63e29 Diff: http://git-wip-us.apache.org/repos/asf/hadoop/diff/63c63e29  Branch: refs/heads/trunk Commit: 63c63e298cf9ff252532297deedde15e77323809 Parents: c89a14a Author: Xuan  Authored: Wed Feb 3 20:05:22 2016 -0800 Committer: Xuan  Committed: Wed Feb 3 20:05:22 2016 -0

阅读收藏

查看全文>>

yarn队列的动态调整，yarn将正在执行的job移到另一个队列中

(2018-09-23 14:43)

转载▼

标签：

yarn

movetoqueue

application

hadoop

hdfs

1.3 MapReduce版本：

hadoop jar app.jar -D mapreduce.job.queuename=root.etl.distcp -D mapreduce.job.priority=HIGH

2、动态调整

如果是已经在运行中的任务，可以动态调整任务所属队列及其优先级。

2.1 调整优先级

hadoop1.0及以下版本：hadoop job -set-priority job_201707060942_6121418 VERY_HIGH

hadoop2.0及以上版本：yarn application -appId application_1478676388082_963529 -updatePriority VERY_HIGH

2.2 动态调整队列

hadoop2.0及以上版本可以通过下面命令

yarn application -movetoqueue application_1478676388082_963529 -queue root.etl

其中application_1478676388082_963529为yarn applition id，queue后跟的是需要move到的队列。

阅读收藏

查看全文>>

sparkstreaming+kafka0.10.0集成指南官方文档翻译

(2018-08-29 00:08)

转载▼

标签：

sparkstreaming

kafka

kafka0.10

kafka1.0

spark

Spark Streaming + Kafka Integration Guide (Kafka broker version 0.10.0 or higher)

译文：Spark Streaming + Kafka集成指南（Kafka broker版本0.10.0或更高版本）

阅读收藏

查看全文>>

ctrlz进程挂起与恢复及进程状态说明

(2018-07-02 15:17)

转载▼

标签：

ctrlz

进程挂起

进程恢复

top

进程状态

当在linux终端中使用ctrl+Z/z 将进程挂起后，如果需要再次将该进程唤醒，可以使用如下方式

注意，在本虚拟机的测试中发现，如果在两个终端中打开，在另外的一个终端中依然没有办法使用jobs列出当前挂起的进程。

http://s8/mw690/0020IfRxzy7lIvkC2Cr07&690

命令运行时使用CTRL+Z，强制当前进程转为后台，并使之挂起（暂停）.
　　1. 使进程恢复运行(后台)

阅读收藏

查看全文>>

从yarn运行完成历史记录中找到执行的hivesql语句

(2018-06-27 16:03)

转载▼

标签：

conf.xml

job.xml

hive

hive.query.string

hql

从文件中找到conf.xml文件，在文件中找到执行的sql语句

hadoop fs -text /user/history/done/2018/06/27/000008/job_1526892856952_8007_conf.xml

-------------------

hive.query.stringselect count(1) from tb where load_dt='20180626'programaticallyjob.xml

阅读收藏

查看全文>>

java 两个集合求差集

(2018-02-01 10:52)

转载▼

标签：

java

list

差集

list.contains

map.get()

对两个40W和20W数据人的list求差集，使用了一个list.contains方法，结果就坑B了，执行了快10分钟才执行完。。。好坑B。查询了一个网络上的资料，说明contains方法每调用一次将会遍历一次集合，故而执行相当消耗性能。

改良：将其中一个list转化为Map,调用map.get(object)方法来进行判断是否存在该元素。则运行时间在300ms左右，提升得不是一点点啊

1517453272873

length of success ===>>> 250449

==================== >>> 0

length of all_mobiles ===>>> 430000

length of all_failure ===>>> 211279

1517453273189

阅读收藏

查看全文>>

there can be only one TIMESTAMP …

(2018-01-24 15:18)

转载▼

标签：

mysql

解决器

ddl

current_timestamp

datetime

在使用两个环境的msyql，需要复制同一个表时候，复制了一个建表语句，结果发现居然不能执行。

抛出错误：

1293 - Incorrect table definition; there can be only one TIMESTAMP column with CURRENT_TIMESTAMP in DEFAULT or ON UPDATE clause

查询资料得知，因为两个mysql版本不一致导致。只能有一个带CURRENT_TIMESTAMP的timestamp列存在，建表语句建立在高版本mysql上，目标库版本较低

http://s8/mw690/0020IfRxzy7hCTxwlKLd7&690

http://s8/mw690/0020IfRxzy7hCTxrqgnb7&69

阅读收藏

查看全文>>

mysql 让某些sql不记录在binlog中

(2017-12-29 18:12)

转载▼

标签：

mysql

binlog

sql_log_bin

不记录sql语句

在mysql5.1及以上的版本。可以通过设置参数：SET sql_log_bin=0; 使该记录不记录在binlog日志中。

SET sql_log_bin=1;将会记录该binlog日志中

阅读收藏

查看全文>>

canal安装配置指南 canal实时同步mysql数据到需要的地方

(2017-12-15 11:32)

yarn

hadoop

yarn

movetoqueue

application

hadoop

hdfs

sparkstreaming

kafka

kafka0.10

kafka1.0

spark

Spark Streaming + Kafka Integration Guide (Kafka broker version 0.10.0 or higher)

ctrlz

进程挂起

进程恢复

top

进程状态

conf.xml

job.xml

hive

hive.query.string

hql

java

list

差集

list.contains

map.get()

mysql

解决器

ddl

current_timestamp

datetime

mysql

binlog

sql_log_bin

不记录sql语句

canal

canal实时读取

mysql数据

kafka

canal实时同步

canal实时同步mysql数据、canal实时读取mysql数据变化

文档来源：