Using REGEXP_REPLACE for cleaning data in Bigquery


Recently I had to clean a specific column that had semi structure data. For example:


<p> <h3>TitlePartA Title Part B</h3> <br> TitlePartA TitlePartB: <b>ValuePartA ValuePartB</b> <br> </p>


Let's do that example above in BigQuery:


First, create a temporary table with the example data.


Now we will try to extract ‘ValuePartA ValuePartB’ using the REGEXP_EXTRACT function:



The result: