加载中…
个人资料
缓步的骆驼
缓步的骆驼
  • 博客等级:
  • 博客积分:0
  • 博客访问:5,494
  • 关注人气:133
  • 获赠金笔:0支
  • 赠出金笔:0支
  • 荣誉徽章:
正文 字体大小:

SAS常用字符类函数学习

(2012-11-15 11:40:50)
标签:

杂谈

分类: SAS编程及应用

(一)提取字符串中字符、数字或空格所在位置的函数

(1)Anyalnum(参数,n) 

eg1:

data _null_;/*当不想产生数据集时在data后使用_null_关键字*/
a='321 abc';
b=anyalnum(a,4);/* or b=anyalnum(a,3)   结果:b=3*/
put b=;
run;

结果:b=5

eg2: Anyalnum函数的应用 Scanning a String from Left to Right;

data _null_;

   string='Next = Last + 1';
   j=0;
   do until(j=0);
      j=anyalnum(string,j+1);
      if j=0 then put +3 "That's all";
      else do;
         c=substr(string,j,1);
         put +3 j= c=;
      end;
   end;
run;

The following lines are written to the SAS log:

   j=1 c=N     j=2 c=e     j=3 c=x     j=4 c=t     j=8 c=L     j=9 c=a     j=10 c=s     j=11 c=t     j=15 c=1     That's all

(1)Anyalnum(参数,n) 
(2)Anyalpha(参数,n)/*不输入n时返回第一个字符的位置,输入n时从n开始第一次出现字符的位置*/
(3)anydigit( 参数,n)/*不输入n时返回第一个数字的位置,输入n时从n开始第一次出现数字的位置*/
(4)anyspace(参数,n)/*不输入n时返回第一个空格的位置,输入n时从n开始第一次出现空格的位置*/
 
eg:
data _null_;
a='123 e st ,#2 ';
R_alpha1=anyalpha(a,3);
R_alpha2=anyalpha(a,10);
R_digit1=anydigit(a,3);
R_digit2=anydigit(a,10);
R_space1=anyspace(a,3);
R_space2=anyspace(a,10);
put
R_alpha1=
R_alpha2=
R_digit1=
R_digit2=
R_space1=
R_space2=
run;
结果:
R_alpha1=5 R_alpha2=0
R_digit1=3 R_digit2=12 
R_space1=4 R_space2=13

(二)字符连接类函数
(1)Cat(arg1,arg2,....argn)/*连接字符串并且保留原来字符串之间的空格*/
(2)Cats(agr1,arg2,...,argn)/*连接字符串并且去掉原来字符之间的空格*/
(3)Catx('separator-sign',arg1,arg2,...,argn)/*连接字符串去掉字符串间的空格增加设定的连接符*/
 
eg:
data _null_;
a='  Dog';
b='Cat  ';
c='Pig ';
R_cat=cat(a,b,c);
R_cats=cats(a,b,c);
R_catx1=catx('&',a,b,c);
R_catx2=catx('@',a,b,c);
put R_cat=
R_cats=
R_catx1=
R_catx2=
;
run;
 
结果:
R_cat=DogCat  Pig 
R_cats=DogCatPig 
R_catx1=Dog&Cat&Pig 
R_catx2=Dog@Cat@Pig
(三)其他字符串类函数
(1)compress(arg,'sign')/*压缩字符串移除字符串中指定的符号默认移除空格*/
 
eg:
data _null_;
a='  Dog  &cat ';
R1=compress(a);
R2=compress(a,'&');
put R1=
R2=
;
run;
 
结果:
R1=Dog&cat R2=Dog  cat
(2)Index(arg,'string')/*返回指定字符串的起始位置*/
(3)length(str) *返回指定字符串的长度 字符串尾部的空格不计算在内*/
 
eg:
data _null_;
a='Dogcat';
b=' Dog cat';
c=' Dog cat   ';
Ra=length(a);
Rb=length(b);
Rc=length(c);
put Ra=
Rb=
Rc=
;
run;
 
结果:
Ra=6 Rb=8 Rc=8
(3)Substr(str,n,m)/*从位置n开始从字符串中提取m个字符*/
(4)Translate(string,to,from,<,...to-n,from-n>) 
Eg:
data _null_;
A='8/14/2010';
B=translate(a,'-','/');
put
B=
;
run;
结果:
B=8-14-2010
(5)Tranwrd(str,’from’,’to’)
Eg:
data _null_;
A='dog cat';
B=tranwrd(a,'cat','pig');
put;
B=
;
run;
结果:
B=dog pig
(6)trim(string) /*remove trailing blanks from string express*/
(7)upcase(string)/*convert all the letter into upcase*/


二、字符函数
1COMPBL(X)
   去掉字符串中各字符之间的多个空格为一个空格
The COMPBL function removes multiple blanks in a character string by translating each occurrence of two or more consecutive blanks into a single blank.


EXAMPLE

 
2COMPRESS(<source><, chars><, modifiers>)
参数:
Source
specifies a character constant, variable, or expression from which specified characters will be removed.
Chars
Specifies a character constant, variable, or expression that initializes a list of characters.
By default, the characters in this list are removed from the source argument. If you specify the K modifier in the third argument, then only the characters in this list are kept in the result.
Tip:   You can add more characters to this list by using other modifiers in the third argument.
Tip:   Enclose a literal string of characters in quotation marks.


Modifier
specifies a character constant, variable, or expression in which each non-blank character modifies the action of the COMPRESS function. Blanks are ignored. The following characters can be used as modifiers:
a or A      alphabetic characters to the list of characters.
c or C       adds control characters to the list of characters.
d or D       adds digits to the list of characters.
f or F        adds the underscore character and English letters to the list of characters.
g or G      adds graphic characters to the list of characters.
h or H       adds a horizontal tab to the list of characters.
i or I          ignores the case of the characters to be kept or removed.
k or K       keeps the characters in the list instead of removing them.
l or L        adds lowercase letters to the list of characters.
n or N       adds digits, the underscore character, and English letters to the list of characters.
o or O      processes the second and third arguments once rather than every time the COMPRESS function is called. Using the O modifier in the DATA step (excluding WHERE clauses), or in the SQL procedure, can make COMPRESS run much faster when you call it in a loop where the second and third arguments do not change.
p or P     adds punctuation marks to the list of characters.
s or S     adds space characters (blank, horizontal tab, vertical tab, carriage return, line feed, and form feed) to the list of characters.
t or T      trims trailing blanks from the first and second arguments.
u or U     adds uppercase letters to the list of characters.
w or W    adds printable characters to the list of characters.
x or X      adds hexadecimal characters to the list of characters.
Tip:
If the modifier is a constant, enclose it in quotation marks. Specify multiple constants in a single set of quotation marks. Modifier can also be expressed as a variable or an expression.
 
3QUOTE(source)
Adds double quotation marks to a character value.
给字符串加上引号


Details
Length of Returned Variable
In a DATA step, if the QUOTE function returns a value to a variable that has not previously been assigned a length, then that variable is given a length of 200 bytes.
The Basics
The QUOTE function adds double quotation marks, the default character, to a character value. If double quotation marks are found within the argument, they are doubled in the output.
The length of the receiving variable must be long enough to contain the argument (including trailing blanks), leading and trailing quotation marks, and any embedded quotation marks that are doubled. For example, if the argument is ABC followed by three trailing blanks, then the receiving variable must have a length of at least eight to hold "ABC###". (The character # represents a blank space.) If the receiving field is not long enough, the QUOTE function returns a blank string, and writes an invalid argument note to the log.
 
4DEQUOTE(source)
Removes matching quotation marks from a character string that begins with a quotation mark, and deletes all characters to the right of the closing quotation mark.
去掉字符值中的引号


Details
Length of Returned Variable
In a DATA step, if the DEQUOTE function returns a value to a variable that has not been previously assigned a length, then that variable is given the length of the argument.
The Basics
The value that is returned by the DEQUOTE function is determined as follows:
·    If the first character of string is not a single or double quotation mark, DEQUOTE returns string unchanged.
·    If the first two characters of string are both single quotation marks or both double quotation marks, and the third character is not the same type of quotation mark, then DEQUOTE returns a result with a length of zero.
·    If the first character of string is a single quotation mark, the DEQUOTE function removes that single quotation mark from the result. DEQUOTE then scans string from left to right, looking for more single quotation marks. Each pair of consecutive, single quotation marks is reduced to one single quotation mark. The first single quotation mark that does not have an ending quotation mark in string is removed and all characters to the right of that quotation mark are also removed.
·    If the first character of string is a double quotation mark, the DEQUOTE function removes that double quotation mark from the result. DEQUOTE then scans string from left to right, looking for more double quotation marks. Each pair of consecutive, double quotation marks is reduced to one double quotation mark. The first double quotation mark that does not have an ending quotation mark in string is removed and all characters to the right of that quotation mark are also removed.
Note:   If string is a constant enclosed by quotation marks, those quotation marks are not part of the value of string. Therefore, you do not need to use DEQUOTE to remove the quotation marks that denote a constant. 
5INDEX(sorce,excerpt )
Searches a character expression for a string of characters, and returns the position of the string's first character for the first occurrence of the string.
寻找指定字符串在原始字符串中的起始位置


DETAILS
The INDEX function searches source, from left to right, for the first occurrence of the string specified in excerpt, and returns the position in source of the string's first character. If the string is not found in source, INDEX returns a value of 0. If there are multiple occurrences of the string, INDEX returns only the position of the first occurrence.


Examples
options nodate nostimer ls=78 ps=60;
data _null_;
length a b $14;
a='ABC.DEF (X=Y)';
b='X=Y';
q=index(a,b);
w=index(a,trim(b));
put q= w=;
run;
SAS writes the following output to the log:
q=0 w=10
6LENGTH(X)
Returns the length of a non-blank character string, excluding trailing blanks, and returns 1 for a blank character string.
返回变量值的长度值


DETAILS
The LENGTH function returns an integer that represents the position of the rightmost non-blank character in string. If the value of string is blank, LENGTH returns a value of 1. If string is a numeric constant, variable, or expression_r(either initialized or uninitialized), SAS automatically converts the numeric value to a right-justified character string by using the BEST12. format. In this case, LENGTH returns a value of 12 and writes a note in the SAS log stating that the numeric values have been converted to character values.


Comparisons
·    The LENGTH and LENGTHN functions return the same value for non-blank character strings. LENGTH returns a value of 1 for blank character strings, whereas LENGTHN returns a value of 0.
·    The LENGTH function returns the length of a character string, excluding trailing blanks, whereas the LENGTHC function returns the length of a character string, including trailing blanks.
·    The LENGTH function returns the length of a character string, excluding trailing blanks, whereas the LENGTHM function returns the amount of memory in bytes that is allocated for a character string.


Examples

 
7LOWCASE(X)
Converts all letters in an argument to lowercase.
将变量中所有字母转换成小写


Details
In a DATA step, if the LOWCASE function returns a value to a variable that has not previously been assigned a length, then that variable is given the length of the argument.
The LOWCASE function copies the character argument, converts all uppercase letters to lowercase letters, and returns the altered value as a result.
The results of the LOWCASE function depend directly on the translation table that is in effect (see TRANTAB System Option) and indirectly on the ENCODING System Option and the LOCALE System Option in SAS National Language Support (NLS): Reference Guide.


Examples

8UPCASE(X)
Converts all letters in an argument to uppercase.
将变量中所有字符转换成大写


Details
In a DATA step, if the UPCASE function returns a value to a variable that has not previously been assigned a length, then that variable is given the length of the argument.
The UPCASE function copies a character argument, converts all lowercase letters to uppercase letters, and returns the altered value as a result.


Examples

 
9SUBSTR(x,position< , n>)
SUBSTRright of = Function
Extracts a substring from an argument.
返回X字符串的一个子串


Syntax
<variable=>SUBSTR(string, position<,length>)


Details
In a DATA step, if the SUBSTR (right of =) function returns a value to a variable that has not previously been assigned a length, then that variable is given the length of the first argument.
The SUBSTR function returns a portion of an expression that you specify in string. The portion begins with the character that you specify by position, and is the number of characters that you specify in length.


Example:

 
 
10SUBSTR(x,position< , n>)=characters-to-replace
Replaces character value contents.
替换指定位置的字符


Syntax:
SUBSTR(variable, position<,length>)=characters-to-replace


Details
If you use an undeclared variable, it will be assigned a default length of 8 when the SUBSTR function is compiled.
When you use the SUBSTR function on the left side of an assignment statement, SAS replaces the value of variable with the expression on the right side. SUBSTR replaces length characters starting at the character that you specify in position.


Example

 
11REPEATXn
重复字符表达式
12、LFET(X)
将SAS字符串左对齐
13、RIGHT(X)
将SAS字符串右对齐
 
SAS之COMPBL、DEQUOTE函数(20110114)
COMPBL 
去除字符串中多余的空格,也就是将连续的多个空格转化为1个空格。
如:
data test;
string='ab  cxy  z        pq  
;
data a;
set test;
x=compbl(string);
run;
输出为:ab cxy z pq,多个空格变成一个空格。
与compress的区别:
compress(string)输出为abcxyzpq,去掉所有的空格,当然还能去掉指定的字符;
而compbl是将多个空格变成一个,对于单个空格没有影响。
 
DEQUOTE
去除字符串中的配对引号,且该字符串是以该引号开始的,并且去掉与之配对的后引号之后的全部内容。
啰嗦了,举个例子:y=dequote(x);

0

阅读 收藏 喜欢 打印举报/Report
  

新浪BLOG意见反馈留言板 欢迎批评指正

新浪简介 | About Sina | 广告服务 | 联系我们 | 招聘信息 | 网站律师 | SINA English | 产品答疑

新浪公司 版权所有