May 23, 2023

Full Fledge Customer segmentation, recommendation and deployment

Getting data from multiple sources.

Informatica Power Center Designer :

1. Start Informatica Server -

2. Open informatica Client -

Folder > Create > once the folder is created click on design icon

once the design pad is open you will get something like this.

Now click on sources: if you want to upload data from text file, XML, excel or any local files you need to click on import from file and select the file with certain configuration.

Above selected sources are CSV, Excel and XML

3. For getting data from database -

Click on Sources > Import from database > Set data base configuration like Database name, ID, Password, host, port number. if you don't know the details go to database terminal and type "lsnrctl services" and you will find all the details of database

now we have data from 4 different sources like CSV, XML, Excel, and Oracle database

4. Now to send the data to targeted table or any other format click on Target then select wether you have existing table for export of dataset or create new with configuration once done click on Mapping

Click on Sources in mapping > select all the tables from which you want to extract the data also go to target data and load it to mappings stencil

For filtering specific columns we need to create filter function or any other functions based on requirement from transform dialogue box

Once done with all mapping click on Ctrl+Z to check whether the mapping is valid or not

5. Now click on informatica workflow manager to create workflow click on task and create task enter task name

Load all your data in workflow manager from session and click on connection to check for successful connections.

now create the flow of data

Once all these steps are done time to run the workflow right click on configured workflow and click on start workflow from task it will transform all the data.

Transforming the loaded data to CSV using python

pip install pyodbc

import pyodbc 
conn = pyodbc.connect('Driver={SQL Server};'
                      'Server=RON\SQLEXPRESS;'
                      'Database=test_database;'
                      'Trusted_Connection=yes;')

import pandas as pd
import pyodbc 

conn = pyodbc.connect('Driver={SQL Server};'
                      'Server=RON\SQLEXPRESS;'
                      'Database=test_database;'
                      'Trusted_Connection=yes;')

sql_query = pd.read_sql_query(''' 
                              select * from test_database.dbo.product
                              '''
                              ,conn) # here, the 'conn' is the variable that contains your database connection information from step 2

df = pd.DataFrame(sql_query)
df.to_csv (r'C:\Users\Ron\Desktop\exported_data.csv', index = False) # place 'r' before the path name

Search This Blog

All about Machine Learning

Full Fledge Customer segmentation, recommendation and deployment

Getting data from multiple sources.

Transforming the loaded data to CSV using python

Comments

Post a Comment

Popular Posts