String indexing in SQL and way to extract domain names from email repository

CHARACTER POSITIONING IN SQL:

In SQL databases each character in string are having a location index starting from 1 from left side of the string;

For example if we have a string with value ‘CRACK’ then position of C is 1, R is 2, A is 3 and so on and so forth.

String index functions are sql functions used to identify any substring position in the main string.

CHARINDEX(), PATNDEX(), POSITION() and STRPOS() are examples for such functions

Identifying domain name from email IDs are perfect use case of such functions.

for this case, basically we need substring of the email everything after @ symbol;

Here on we will see how that is achieved using combination of different functions

illustration sql function

But SUBSTRING function seeks 3 inputs arguments : 1 the string to be split, start of substring and end of substring

syntax : SUBSTRING(<string to split>, <start>,<end>)

As we know end of the sub-string is essentially length of the string

here start of the substring in our case is position of character immediately after @

Example1:

we will take hypothetical email ID your_name@domain_name.com as an example

function substring( ‘your_name@domain_name.com’,11,25) will give out put domain_name.com

Example2:

To see application of the function over SQL table column, I have created a table called email_rep in my locally hosted postgres database. I have inserted few example email IDs; we will try to get domain name of all email IDs with above mentioned function.

test_1=# select * from email_rep;
 id |           email
----+----------------------------
  1 | [email protected]
  2 | [email protected]
  3 | [email protected]
  4 | [email protected]
  5 | [email protected]
(5 rows)

Here below I have written position function to fetch position of @ in the all the email IDs and length to get position of last character

test_1=# select id, POSITION('@' in email) as reference_point,length(email) as total_length from email_rep;
 id | reference_point | total_length
----+-----------------+--------------
  1 |               6 |           15
  2 |              10 |           19
  3 |               6 |           17
  4 |              10 |           21
  5 |               4 |           26
(5 rows)

In below, I have passed position of @ to substring function by adding 1 to it

and

for first record it is as below

substring(‘[email protected]’,(5+1),15) this will output gmail.com

below is return value from select query

test_1=# select id, substring(email,POSITION('@' in email)+1,length(email)) as domain_name from email_rep;
 id |      domain_name
----+------------------------
  1 | gmail.com
  2 | gmail.com
  3 | hotmail.com
  4 | hotmail.com
  5 | crackdataanalytics.com
(5 rows)

Below is video explanation of the same scenario

Bonus tip:

CHARINDEX () is sqlserver equivalent of POSITION() function, we can also use STRPOS(str, str) function in postgres to serve same purpose.

PATINDEX()in sqlserver is a similar function used to pull position of any string of any particular pattern in other string. We can also make use of this function in place of POSITION() based on the DB.

example: PATINDEX(‘%data%’ , ‘crackdataanalytics’) => 6