yo what's up this is Patrick from Guyana
Q in this video I'm gonna talk more
about data modelling best practices one
could call this data model in part 2
stay tuned ok data modelling best
practices Adam did a video on it a while
ago I did another video not too long ago
Adam talked about reducing data set size
I talked about it gave you a couple of
tips and I just want to keep going we
had some great conversation in the comp
in the comments for both of those videos
and I just want to talk about it some
more I hope that's okay
alright if you haven't watched those
videos you probably should take some
time and go watch those videos they can
kind of it just kind of builds upon each
other alright if you want to just jump
into it jump into it alright so let's
just get into this let's get into this
sorry so the next tip I want to talk to
talk about and we've kind of beat this
every time I do videos when we talk
about data model and I talked about this
and it just seems people don't pay
attention I don't know in this video the
first thing is only choose the columns
you need you probably go up at you've
heard this a million times well there's
a reason I'm showing this because some
people when they go into the query
editor there's certain wait a minute
too much talking you guys know I like to
do let's head to my laptop alright in
the query editor I've imported a table
and you can see this table that I've
imported it's fact internet sales
there's lots and lots of columns in this
table and I don't really need them all
and what people will tend to do right
what developers data modeler do or data
prep whatever it whatever you want to
call them analysts they'll do this right
and say I don't want these three columns
they'll remove these columns once I
remove these columns let's say I want I
need one of those back what do I need
what what can I do to solve that problem
well I can remove it and then remove the
two I want or I can go into the DAX
alright and you know say I won't I
only want ship date back I can just
delete it out of the code but let's be
honest that's not very efficient and you
guys know how I roll right not lazy
just efficient so let's get rid of this
let's remove this and this is what you
should do right so you can should go
here on the home menu and you see
there's a choose columns option and
let's remove those exact same three
columns
this one this one or this one okay and
then when I do that I realize oh I need
one back well fortunately it's really
easy no code no we're moving that back
right no repeating the steps see where
it says remove other columns there's a
little gear right there I click that
gear and I say I need order date back
and I click OK and bam
so I think I guess we can say this is
not really a best practice just a more
of a tip when you're importing a lot of
data and you only want to pull what you
need in use the choose column option in
the home ribbon as opposed to removing
them the columns individually takes me
to the second point is you should want
one of the best practices that I
recommend and this is a this is honestly
a preference of mine and there's
probably gonna be a lot of debate in the
comments and I'm okay with that is you
should try to abstract yourself away
from the database schema and so this is
really specific today to the database
schema I know some of you guys use Excel
and other data sources but if if you're
going against a relational data source
instead of importing directly from a
table try to get the owner of the
database or if you if you're the owner
of the database create a view abstract
yourself away from the underlying schema
that way if something changes if it's
something in the table changes it's more
likely that the DBA or the person that
owns the database the database developer
will know that hey if I change this
table I need to I need to check the
dependencies for the other objects in
the database like you know views and
store procedures and functions things
like that and so before they roll that
schema change out they'll update the
views and your par bi model should go
unaffected right I'm not saying that's a
guarantee but if there's a good
governance process in place this should
really help because I've had instances
where I've you know inadvertently change
the schema myself and broke some things
but if I would have had a view instead
of you know saying okay I knew I changed
this schema let me go talk to all my
power bi people and make them aware that
I made this change so they can update
their schema if we just create a list of
views and know that those views are for
our data models and Barbie I then it's
easier right they'll recognize that and
say oh let's update my views and let's
try not to affect these data models let
me show you what I'm talking about so
for example whenever
I in adventure works for example you can
see I have these list of views right
here that we use when we do our
workshops and a lot of these videos that
we do that way if I want to add
something or change something in the in
the underlying database schema
I just might update my view keep my view
you know maintain the dependency between
the view and the table make sure they're
in sync and then noting my power bi
models should go unaffected all right so
that's the second thing I want to talk
about I know that's a lot of talking but
you should really consider this and then
so the final thing which kind of goes
right into this for this video the final
thing I want to talk about is where do
you put your calculated columns right
you got to be really careful when you
build out these calculated columns and
so should you do that should you create
the calculated column in Dax should you
do it in the query editor or should you
do it in the database so to be honest
with you my recommendation is you start
in the database if you are the owner of
the database or you have a good
governance process or you know the
person that owns the database
accidentally transform as much as they
possibly can in a database like for
example look at this view right so I had
a request come through let's pretend
let's pretend I had a request come
through where we added needed to add
full name to the data model because we
wanted to report off that well I know
you experienced power bi developers
Patrick we can do this in query editor
using column from examples or we can do
this in X and Dax really easy you guys
are absolutely correct but there's some
implications of using those two and I'll
talk about them in a minute all right so
what I like to do is because I'm the
owner of the database or I know the
person that owns the database we'll do
this we'll run this and then we'll go to
power bi and we'll go to the customer
view the customer query and click
refresh give it a second and bam right
there is my full name column just just
that easy right it's there and it's
simple and it's added to the database
right and so there's no additional work
it's just the query on the back end
needs to efficiently pull that that into
the data model all right I mean into the
query editor all right but okay Patrick
I don't have access to the source I
don't have access to this how can I do
it in the query editor why shouldn't I
do it in the query edit so let's pretend
you don't have access so I'm going to
run
this update this view and then I'm
coming here and refresh my view of the
data it's gonna automatically remove
full name and you're gonna say well so I
need to do it here I need to do it in
the query editor and like I said so you
start in the database the next step is
do I do it in Dax do I do it in the
query editor my recommendation is do it
in the query editor the implication of
that is it could slow your refresh down
a little bit but and it be very careful
when you do it because the more steps
you add the slower the Refresh can be
but in most cases this does not cause
that much of a problem ok I'm not saying
you you won't see a problem but
shouldn't cause too many problems
depending on the complexity of the
transformation ok
especially if query folding kicks in
alright so for example if I wanted to
add full name now it would be really
simple I can go here I do just like that
go to my favorite feature column from
examples from selection and now just
find the most complete item here and I'm
gonna type what I want the pattern I
want set or Jackson just like that and
then I'll call this fool name click OK
and bam right I've added it so remember
start in the database if you can right
start in the database let the database
little work the next step move up to the
query edit to do all the work in the
query editor could slow your refresh
down but you know that's if you don't
own the source you got no choice right I
mean power bi is great at this data prep
and it does if it does do a really good
job of efficiently you know doing the
query folding and running those queries
really fast so I'm gonna go ahead and
click close and apply so let's say there
are cases where you have to do this
index for example for example if the
query if the the two columns you're
trying to combine is across two tables
your only option in the query editor is
to merge them into a single table well
the most common option is that what you
have to merge them into a single table
and then build a calculated column but
in a lot of cases you don't want to
merge your two tables because it's the
reason you created a nice start schema
and so you'll have to do it in Dax using
like a related function or something
like that for example let's say for
example I want to create a column on
this product table it gave me the
product name
and then in parentheses the subcategory
name so I wrote a little Dax don't go
bananas over the Dax this is just a demo
and then I'm gonna add a new column
alright so I add this new column I'm
gonna paste the code in and because it's
across a different table I have to use
related the related functions to go get
that value from another table and that's
why I can't use the query editor so I
have to I don't have any choice I don't
own the source I can't use the query
editor so I need to do it in Dax now I
had the new column got to be really
careful with doing this tail light if a
really good friend of mine wrote a blog
post about this how someone created a
calculated column on a really long a
fact table with lots and lots of columns
and it spiked out the cpu doing
processing because after power bi
processes the data it's got to build
those calculate process those
calculations and that can depending on
the complexity of the calculation can be
a very CPU intensive process and so you
want to avoid those if you possibly can
okay if you do need to add a calculation
try not to put it on the fact table put
it on one of your dimensions just like I
did here so you don't avoid so you kind
of try to avoid that really intense
process you know spiking out your CPU
especially if your own dedicated
capacity okay the second reason that's
the first reason you should try to
really try to avoid the calculated
columns the second reason is beat up
your memory it just can abuse your
memory because when you create that
column right it materializes it and it
uses additional space in or in memory so
let me show you what I'm talking about
before I added that column I have 14
million bytes to my column sizes right
total size if i refresh this so take a
few seconds to refresh but if i refresh
this you'll see it goes from 14 million
bytes to right 15 million not
substantial right not really substantial
but if we go to the product table
we can see that the product sub is it's
not taking a whole bunch of space you
know but imagine if I have a really long
table right a table with a lot of
columns how be how much space that
column can take especially if the
cardinality is really high if it's a lot
of distinct values in it it can consume
a lot of space and I think Adam talked a
little bit about this in the other video
ok
so choose your columns be judiciously
only choose a
columns that you need try to abstract
yourself away from the source and lastly
be careful when you create these
calculated columns I'm not saying don't
create them in Dax but if you can avoid
creating them in Dax if you can avoid
doing them in the query editor if you
can push them to the source push them to
the source okay what do you guys think
got any questions comments you know what
to do post them in the comments below
this is your first time visiting a guy
in a cube channel hit that subscribe
button because you like my video give me
a big thumbs up as always from Adam and
myself thanks for watching and we'll see
you in the next video